Mostrando entradas con la etiqueta continuation. Mostrar todas las entradas
Mostrando entradas con la etiqueta continuation. Mostrar todas las entradas

miércoles, 18 de enero de 2012

Fastruby 0.0.17 released: performance improvements

Fastruby is a gem which allows to execute ruby code much faster than normal, currently in a state of transition between a spike and a usable gem, it is released when possible with incremental improvements.

The main improvemens on fastruby v0.0.17 are the optimizations:
  • Direct call to CFUNC methods on ruby1.9 (speed up for native methods like Fixnum#+, Fixnum#>, etc...)
  • Refactored Ruby to C translation to use inlined code instead of anonymous functions for each expression, this avoids unnecesary C functions
  • Improvement on non-local jumps return and next to avoid setjmp when possible using normal goto's. This affects performance on ruby1.9 and ruby1.8 as well
  • Implemented scopes on C stack instead of heap when possible
And bug fixes:
The overall result of the optimi
zations (specially the one about relocating scopes on native stack) are the great time improvement on benchmark1.rb for which fastruby takes about 20-30% less time than ruby1.9


Install

You can clone the repository at github:
git clone git://github.com/tario/fastruby.git
git checkout v0.0.17
Or install it using gem install:
gem install fastruby 

Placing Ruby scopes on native stack instead of heap

Initially, at the very beginning of the project, natural translation of ruby to C lead us to translate ruby local variables as C local variables, but this implementation is wrong since ruby stack can diverge in a tree while proc and continuation objects can potentially hold references to local variable scopes (e.g. lambdas can retain references to scopes, read and write while other scopes "at the same level" can be created but in another branch). Many releases ago I solved this problem by implement a structure to support this "branching" behavior of ruby scopes, the "stack chunk" (See Fastruby 0.0.8 released with support for lambdas and Callcc puzzle on fastruby IV: Execution stack as a DAG, lambda cases for more details)

This implementation, while allowing to create and use lambdas and continuation objects, imply a severe performance penalty forcing to all code to access local variables on the heap, access to the heap is slower than the stack for reasons not now be explained

The key issue here is: Why cannot local variables of the ruby scope live on the native stack? The reason for that is that references to the scope can be potentially created, example:
def foo(a)
proc do
a = a + 1
end
end

pr = foo(10)
p pr.call # 11
p pr.call # 12

On the example, a proc is created with a reference to the scope created for the method call. This scope must be allocated on the heap.

What about this?:

def foo(a)
ret = 0
a.each do |x|
ret = ret + 1
end

ret
end

p foo([1,2,3]) # 6

A innocent iteration on a array, ok?
NO, imagine that:
def foo(a)
ret = 0
a.each do |x|
ret = ret + x
end
ret
end

class X
def each(&blk)
$a = proc(&blk)

blk.call(1)
blk.call(1)
blk.call(1)
end
end
x = X.new
p foo(x) # 3
p $a.call(1) # 4

Any block passed to any method can be used to create a lambda. So any method making block calls can potentially allow creation of lambdas referencing scopes created for these methods.
Also, It's virtually impossible to create lambdas referencing scopes of methods which has no block calls. So, since these scopes can not be referenced by lambdas, methods without blocks can allocate their scopes on stack... if not for

Continuations

Continuations are the most macabre and wonderful feature of ruby. Continuations use setjmp/longjmp to work and also write directly to native stack to restore execution data stored on it.
C Local variables on native stack are overwritten each time a continuation is called with the values they had before at the moment that the continuation was created using callcc. This behavior is wrong for ruby variables, so, the use of ruby variables allocated on native stack will lead to unexpected behavior.
For example, this innocent loop would fail if the variables are allocated on stack:
require "continuation"

def foo
a = 10
callcc {|cont| $cont = cont}
a = a - 1
p a
$cont.call if a > 0
end

When the scope of local variables is allocated on heap, the method foo works as expected displaying a countdown on standard output. But, when the scope is allocated on native stack, the first call to callcc copy the entire stack including the variable a storing it; each time the continuation is called to iterate on the loop the value of a is restores to the value they had at the moment of create the continuation (10), the result is an inifinite loop displaying 9 all the time

The key problem here is reading a local variable after a continuation is created, after a continuation is created the value of all local variables allocated on stack will be the value they had at the moment of create the continuation. So, reading a local after creating a continuation may result on unexpected behavior. In summary. Any method with the possibility of following the sequence: continuation -> variable read can potentially work in unexpected way.

Converting Syntax trees to Graphs

The only way to detect potential continuation -> variable read sequences is to generate a graph from the syntax tree of the method being analyzed and then search the sequence on the graph

The graph is composed with the nodes of the AST as vertexes, and the edges are the possibility of execute a given destination node after another origin node. For example, the following source code:
def foo(a)
if a.to_i == 0
10
else
20
end
end
Corresponds to the following syntax tree:



And generates the following graph:



In that case, there is no read of local variables after call on any path, so the method of the example is able to allocate their local variables on native stack

In the other hand, the following method cannot use native stack:
def foo(a,b)
while (a > 0)
b.foo
a = a - 1
end
end
Syntax tree:


Graph:
The use of while makes possible the existence of a cyclic path on the graph. Beyond that, exists many tours that make native stack storage for locals prohibitive, they are marked with blue, green and orange:


Note that all prohibitive tours on graph goes from a call node to a lvar node, of course, the method of the while example must be implemented with the local variable scope allocated on heap

Viewed in this way, almost any method with normal complexity will fall on "heap" category since the only requirement for this to be so is having at least one possible prohibitive tour (from a call node to a lvar node) on the graph, and for now this is true. But this algorithm will have greater relavance when the analyzer know a priori what call can potentially create a continuation and what call will never create a continuation (this calls will be ignored). Today, this improvement gains 20-30% of timed compared with MRI1.9

For example, in the while code sample the method calls are "foo", ">" and "-", if the analyzer can afford assume that these calls never will create a continuation... > and - on numeric dont create continuation objects, but foo?. Of course this will be a little more complicated and probably the optimizations around this will be about type inferences, and qualifiers



martes, 1 de noviembre de 2011

Fastruby 0.0.15 released. Callcc and continuation objects!

Fastruby is a gem which allows to execute ruby code much faster than normal, currently in a state of transition between a spike and a usable gem, it is released when possible with incremental improvements.
The v0.0.15 release adds the support for continuation objects created using callcc function. To acchieve this, the implementation uses the non-lineal stack created on previous releases.
Also this release adds a few extra language support improvements such as method default arguments

Install

You can clone the repository at github:
git clone git://github.com/tario/fastruby.git
git checkout v0.0.15
Or install it using gem install:
gem install fastruby

New Features

Fixes

Examples of code being supported

Example of continuation calls, tail recursion


require "fastruby"

fastruby '
class X
def fact( n )
a = callcc { |k| [ n, 1, k ] }
n = a[0]
f = a[1]
k = a[2]
if ( n == 0 ) then return f
else
k.call n-1, n*f, k
end
end
end
'

p X.new.fact(6) # 720

Default Arguments
require "fastruby"

fastruby '
class X
def foo(a,b = a.reverse)
a + b
end
end
'

p X.new.foo("13") # 1331
p X.new.foo("xx", "yy") # xxyy

Passing proc objects as blocks
require "fastruby"

fastruby '
class X
def bar
yield(1)
yield(2)
yield(3)
end

def foo
pr = proc do |x|
p x
end

bar(&pr) # passing proc as a block
end
end
'

X.new.foo

Receiving blocks as proc objects
require "fastruby"

fastruby '
class X
def bar(&pr)
pr.call(1)
pr.call(2)
pr.call(3)
end

def foo
bar do |x|
p x
end
end
end
'

X.new.foo


lunes, 5 de septiembre de 2011

Callcc puzzle on fastruby

Callcc or "Call with Current Continuation" is the implementation of coroutines on ruby, inherited from another languages such scheme and lisp, callcc allows the creation of continuation objects which hold the context information (stack frame position, local variables of the entire stack, etc...) and can be called to make a non-local goto jump to that context like longjmp and setjmp functions in C (see this explanation of callcc in ruby for more info)
The difference between callcc/Continuation#call and setjmp/longjmp is that callcc allows goto to forward, backward in the stack and even to a side stack while setjmp only allows jump backward. This imply that while setjmp/longjmp works fine with linear stack, callcc will not work in that way and will need a non-linear stack, a complex tree dynamic struct instead a chunk of memory growing when needed (the native stack)

Maybe something easier: lambdas

For example, a code like this, on released version 0.0.7 shouldn't work:
require "fastruby"

fastruby '
class X
def foo
a = 32
lambda {|x|
a+x
}
end
end
'

p X.new.foo.call(16) # it will return 48, or it will fail?

But it works! why?... the current implementation allocs scopes for ruby local variables as a local variable of a struct type, in C, the local variables live in native stack in a fixed address which is passed as block parameter when calling lambda (using rb_funcall, etc..), this memory address remains associated to the block passed to lambda and then that address is used from inside the block like a normal block invocation. The difference here is that the scope struct used from the lambda block is deprecated and not longer valid since that stack space is below the stack pointer

To obtain the example of failure (in the grand tradition of TDD ;) ), we must call any another method after calling the method returning the lambda and before calling the lambda

require "fastruby"

fastruby '
class X
def foo
a = 32
lambda {|x|
a+x
}
end

def bar
end
end
'

lambda_object = X.new.foo # it will return 48, or it will fail?
X.new.bar # this should not affect the lambda object since it is not related
lambda_object.call(16) # but it does, and this does not return 48 as expected

The call to lambda does not return 48 as expected, in fact, it could return any unpredictable value even an invalid object causing a segmentation fault. The reason is that the call to X#bar overwrites the deprecated stack space used by the local scope associated to the lambda block; in this situation, the value of the "local variable" may be any unpredictable random garbage in the stack

Make it pass: alloc locals on the heap using malloc

And it pass, simply by moving the allocation of local variable scopes to heap instead of stack. Memory allocated on the heap never will be deprecated... but it must be de-allocated or the ruby process will die young of cancer. It is late at night and I will not write a test to assert ruby does not die with cancer when six billon of objects are allocated, but surely the de-allocation of local variable scopes will be performed as a refactor task.

And the scope deallocation is the real key issue here

But it is a topic for another post...

Links