miércoles, 28 de septiembre de 2011

Fastruby 0.0.9 released with support for proc

Fastruby is a gem which allows to execute ruby code much faster than normal, currently in a state of transition between a spike and a usable gem, it is released when possible with incremental improvements.
Following the release of version v0.0.8 which did implement several internal improvements including a new non-linear stack structure, the new release v0.0.9 implements the support for procedure objects based in the previous implementation

Install

You can clone the repository at github:
git clone git://github.com/tario/fastruby.git
git checkout v0.0.9
Or install it using gem install:
gem install fastruby

New Features


Examples of code being supported

return method from proc

require "fastruby"

fastruby '
class X
def foo
pr = Proc.new do
return "return foo from inside proc"
end

pr.call

return "return foo"
end
end
'

x = X.new
x.foo # this returns "return foo from inside proc"

The behavior of Proc objects (those created using Proc.new) are very similar to the behavior of lambdas, but with a few little differences:
  • It is possible to do non-local return's from Proc.new of the method where the block was defined, this non-local return will fail with LocalJumpError when the method scope is closed in the moment of invoking the return (in lambdas, return only acts as next; ie, only ends the lambda function)
  • Break jumps are illegal from Proc.new blocks and always fails with LocalJumpError (in lambdas, break acts as next)

As additional note, block objects created using proc method are exact the same block objects created using lambda method, only check the the ruby 1.8 source code to see something like that:

   ...
rb_define_global_function("proc", proc_lambda, 0);
rb_define_global_function("lambda", proc_lambda, 0);
...





lunes, 19 de septiembre de 2011

Fastruby roadmap up to 0.5.0

Fastruby development (up to 1.0.0 release) will be divided into two main phases:
  • Incremental Proof of Concept. Transition from a base spike/PoC (version 0.0.1) to usable gem (version 0.1.0).
  • Usable versions. Transition from minimally usable gem (version 0.1.0) to stable API release (version 1.0.0)
This is a rough roadmap and might change in the future if new ideas or optimizations are discovered

PoC Incremental Versions


Version 0.0.8: lambdas (already released)
Version 0.0.9: proc
Version 0.0.10: Gem packaging fixes

Version 0.0.11: for, retry, redo
Version 0.0.12: Full exception support

Version 0.0.13: Methods with variable number of arguments

Version 0.0.14: Support for method replacement

Version 0.0.15: continuation objects


Etc...

Version 0.1.0: First usable release
  • Any ruby code executable by MRI can be executed by fastruby

Usable Versions (Following semantic versioning)

Zero used as mayor version number is used when API is changed every day without backward compatibility (despite the fact that will be backward compatibility when possible). Also, on 0.X.X versions the API is completely undefined.
Bugs and spec un-compliance of 0.1.0 release will be fixed on subsequent stabilization releases (0.1.1, 0.1.2, etc...)

Version 0.1.0: Execution of Ruby

Features:
  • Any executable code in ruby is executable via fastruby and gives the same results
  • Interoperation between ruby and fastruby is granted

Version 0.2.0: Fast execution of ruby, internal optimizations

Features
Version 0.3.0: Fast execution of ruby, C extension API

  • C extension points
    • API to define methods in C, by class, method name and signature allowing replacement of ruby standard lib implementation in the future.

Version 0.4.0: Fast execution of ruby, C extension API 2

  • C extension points
    • Macro methods (keeping dynamism). Examples:
      1. Fixnum#+(x:Fixnum) => INT2FIX(FIX2INT(self)+FIX2INT( x) )
      2. String#[](x:Fixnum) => rb_str_new2(RSTRING(self)->ptr[FIX2INT(x)],1)
      3. String#[]=(x:Fixnum,y:Fixnum) =>
        RSTRING(self)->ptr[FIX2INT(x)]=FIX2INT(y)

Version 0.5.0: Faster standard lib

  • Ruby standard lib re-implemented in a separated gem optimized for fastruby
  • Fastruby works with and without optimized stdlib

Fastruby 0.0.8 released with support for lambdas

Fastruby is a gem which allows to execute ruby code much faster than normal, currently in a state of transition between a spike and a usable gem, it is released when possible with incremental improvements.
The version v0.0.8 was released with the new feature of supporting lambdas. Functional improvement does not appear to be important, but the underlying internal implementation will allow new features in the next release such as the support for proc and continuation objects.

Install

You can clone the repository at github:
git clone git://github.com/tario/fastruby.git
Or install it using gem install:
gem install fastruby
New Features
Examples of code being supported

Scope retention (lambda has a reference to the scope where it was created)

require "fastruby"

fastruby '
class X
attr_accessor :lambda_set, :lambda_get
def initialize
a = nil

@lambda_set = lambda{|x| a = x}
@lambda_get = lambda{a}
end
end
'

x = X.new

x.lambda_set.call(88)
p x.lambda_get.call

Yield block of the scope

require "fastruby"

fastruby '
class X
def bar
lambda_obj = lambda{
yield
}
lambda_obj
end

def foo
a = 77
bar do
a
end
end
end
'

x = X.new

lambda_obj = x.foo
p lambda_obj.call # 77


viernes, 16 de septiembre de 2011

Callcc puzzle on fastruby IV: Execution stack as a DAG, lambda cases

In order to implement lambdas and continuations (in a near future) a new structure to represent the stack was implemented, this will be available in 0.0.8 release of fastruby as base of the lambda support. Call with current continuation will be implemented on 0.0.9.

The implementation will cover two main cases:
  • Normal lineal stack (keeping performance when possible)
  • Non-linear stack (DAG) used when lambdas are created
While the case of continuation will remain uncovered (for now) being probably the storage of native stack the most important outstanding issue to solve in order to achieve the implementation of continuation.

New Stack Structure


To represent a ruby stack, a new structure was created: StackChunk. A stack chunk is a representation of a linear stack, allocating and de-alocatting memory from this scopes is far simpler than dynamic memory allocations (e.g. malloc)

Since the ruby stack is non-linear, StackChunk are used as the longest linear sections and chained following a DAG pattern. it is like the internal representation of frames used by the MRI with the difference that stack chunks may contain many scopes, even all the scopes in cases of representing stacks without divergences, in these cases a single dynamic memory allocation may be enough for all scopes needed for the execution.

Three main cases

There are three main cases of interest where the new structure has relevance

Case 1: normal linear stack, single stack chunk

The StackChunk (like the native stack) is an entity that grows dynamically linearly. So, in the case of representing a linear stack only one stack chunk is needed and this stack will grows dynamically as new scopes are allocated

The stack chunk is created the first time a fastruby method creates a scope. At an arbitrary point, the stack look like this:



When a ruby method (builded with fastruby) is called, a scope to hold local variables and (other information such the passed block) is created, the memory where this scope goes live is provided by the stack chunk and the stack chunk. There is no need to allocate ruby objects, nor even dynamic memory in most cases. So, linear stack chunk grows as native stack grows:



While empty circles represents ruby objects referenced by local variables, native stack hold native C pointers to scopes allocated in the stack chunk

Since locals variables lives on these stack chunks instead of native stack, this locals can survive beyond the C frame where they were created.

Case 2: Lambdas, frozen and multiple stack chunks to represent DAG stack

Each time a lambda object is created, fastruby attach a ruby reference to the StackChunk being used at the moment of create the lambda, also, the state of that StackChunk and all his parents is changed to frozen, a frozen stack chunk cannot create new scopes to keep alive all the scopes at the moment when
freeze it, but the variables living on that stack chunk may still change their values



For example, if a lambda is created in a given stack point, rollback a few frames and creates a new ones (e.g. by calling another methods), the state will be something like that:


Red arrows represent ruby references between ruby objects, this references are created only with the purpose of keep the related stack chunks alive on garbage collection; ruby lambda objects store a reference to the stack chunk at the moment of create the lambda. To handle the dynamic memory used by the stack, it is wrapped (gray arrows) by a ruby object of class StackChunk. So, the memory is freed when the garbage collector indicate that it must be disposed when the object has no references.

Black arrows indicates C native pointers, native stack holds native local variables pointers to scopes (representation of ruby scopes including ruby local variables) living on the native heap allocated by the stack chunk

Green arrows indicates native references (VALUE) to ruby objects (represented by empty circles), to avoid these object being disposed by the GC, the stack chunk must implement the mark method and mark each object referenced in that way. These values can changed both on frozen and active stack chunks.

Case 3: Continuation objects (NOT IMPLEMENTED YET)

The handling of dynamic and DAG stack chunks is solved in the same way as in the case of the lambda: each time a continuation object is created (by using callcc ruby method), a reference to the stack chunk being used on that moment is attached to him to avoid that stack chunk being disposed by the GC.
The new problem to overcome now, is the recording of what would be the representation of ruby frames on fastruby, this representations are native local variables living in the native stack (including the most important one, the pointer to the scope), since the native stack is linear expires their data while decrease and overwrites the old data when it grows back (e.g another method is called).
Maybe, the solution in this case is to save the native stack together with the stack chunk and restore it each time a continuation object is called, but that is topic for another article.



miércoles, 7 de septiembre de 2011

Callcc puzzle on fastruby III: Almost solved?

First of all, the plan of migrate to C++ was canceled to use ruby Garbage Collector instead to handle the resources used by the fastruby stack entities

The cost of instantiating objects on ruby

Since ruby garbage collector works only with ruby objects, I need to encapsulate as ruby objects whatever I want to have managed by the GC. The first option that comes to mind is encapsulate every local variables scope into a ruby object: bad idea, this will instantiate a ruby object for each scope and the cost of instantiating objects on ruby is of an higher order than a single memory allocation on C; the issue of this approach can be appreciated on this spike which adds a instantiation of a ruby object (using rb_objnew) for each scope to see the impact on general performance measured used benchmark scripts, the result was the increase of the execution time of the main benchmark by 3X, enough to reject the approach of one ruby object per locals scope.

Stack Chunk

To solve the issue of multiple ruby object allocations, multiple locals scopes will be grouped and encapsulated as a single ruby object, specifically those belonging the same stack. So stack chunks will grow dynamically (using C malloc) as locals scopes are allocated and all dynamic memory used by a stack chunk will be de-allocated when it is removed by ruby GC.
For the cases of lambdas, proc and continuations, StackChunk objects will be referred from a ruby local variable in the ruby scope, so, lambdas duplicating ruby stacks will make new references to stack chunks. Stack chunks will become immutable/frozen when lambdas or continuation objects are created to prevent overwriting local variables with new scopes, local variables living on immutable/frozen stack chunks may be changed but frozen chunks rejects the allocation of new scopes.

Stack Chunk properties
  • All locals variables will be placed/allocated on the current stack chunk at the moment of initialize the function frame
  • Each stack chunk is wrapped as a ruby object of type StackChunk to provide to the ruby garbage collector the interface to release stack chunks from memory (Data_Wrap_Struct) when it have no references.
  • StackChunk will be referred from the local variable scope of MRI (Matz Ruby Interpreter) to prevent this object to be removed until they stop using it (e.g. leaving the scope where the stack chunk is created); on the internal implementation of MRI, lambdas, procs and continuation objects retain references to MRI stack frames, so the issue of lambdas and continuations will be solved (in part) with this.
  • Calcc, lambdas and procs will turn the state of the current stack chunk into frozen state, this implies that new scopes can't be created in that stack chunk. When new scopes are being required, a new stack chunk will be created
Implementation

See https://github.com/tario/fastruby/commits/dynamic_locals

martes, 6 de septiembre de 2011

Callcc puzzle on fastruby II: The return of C++

In a previous post, I have explained the problem of implementing callcc while keeping the performance on fastruby; and then the focus was changed to a simpler issue: lambdas.

The approaching to the lambda issue allow us to recognize the non-linear nature of the ruby stack: while the native stack grows allocating space for local variables and decreases deallocating that space, the non-linear stack of ruby allows keeping isolated local variables scopes associated to lambda blocks (while previous scopes in the stack are deallocated) and even allows retain all the scopes of a stack for continuation objects created with callcc

The solution for lambda case is as simple as changing the place where the local variable scope lives: heap instead of stack.

Making use of dynamic memory is not a game, and if misused it may be hurt or even kill (the ruby process, after a relatively slow death with cancer)


Release of allocated memory and C++

Each allocation must have their corresponding de-allocation. It's that simple. But not so simple when programming in C, because [...]
RAII provided by C++ make more easier to code scopes of resources initialization and finalization (dynamic memory, in this case), by using the destructor of clases to release the resources. this covers function returns and C++ exceptions. It does not cover longjmp (this will be analyzed later on this post). See the wikipedia article about RAII for more info and code examples

Reference Counting

Immediately after creating a lambda object, the local variable scope is referenced twice: one reference from the lambda object itself, and the other reference from the scope living in the stack (imagine that many lambdas created in the same scope adds more references to the count). In this situation, when the execution leaves the scope of the function, it de-allocate the local variables struct, the only thing it should do is to decrement the reference count of the local variables by one unit, and the release should only occur when this count reach zero. This algorithm is called Reference Counting.

Longjmp and C++ Raii

Using of C function setjmp/longjmp for non-local GoTO's are incompatible with C++ RAII since RAII depends on the destructors of objects instantiated on the stack are called. For example, a longjmp crossing a scope allocating recurses will prevent the release of these resources.
Apparently, the solution to this issue is to replace the current use of setjmp/longjmp with c++ exceptions (the same risk is present when using setjmp/longjmp), but after a spike I realized this might not be a good idea:

test.c using longjmp and setjmp
#include "setjmp.h"

jmp_buf jmp;

void function2() {
longjmp(jmp,1);
}
void function1() {
setjmp(jmp);
}
int main() {
int i;
for (i=0;i<200000000;i++) {
function1();
}
return 0;
}

It took 2.241 seconds

test.cpp using try/catch and throw
#include "exception"

void function2() {
throw std::exception();
}
void function1() {
try {
function2();
} catch (std::exception) {
}
}
int main() {
int i;
for (i=0;i<200000000;i++) {
function1();
}
return 0;
}

It was never finished even after hours

Conclusion

C++ exceptions will always be slower than setjmp/longjmp because C++ exceptions use non-local goto's as part of their implementation and other complexities. I must look for another workaround for the longjmp detail

Links

lunes, 5 de septiembre de 2011

Callcc puzzle on fastruby

Callcc or "Call with Current Continuation" is the implementation of coroutines on ruby, inherited from another languages such scheme and lisp, callcc allows the creation of continuation objects which hold the context information (stack frame position, local variables of the entire stack, etc...) and can be called to make a non-local goto jump to that context like longjmp and setjmp functions in C (see this explanation of callcc in ruby for more info)
The difference between callcc/Continuation#call and setjmp/longjmp is that callcc allows goto to forward, backward in the stack and even to a side stack while setjmp only allows jump backward. This imply that while setjmp/longjmp works fine with linear stack, callcc will not work in that way and will need a non-linear stack, a complex tree dynamic struct instead a chunk of memory growing when needed (the native stack)

Maybe something easier: lambdas

For example, a code like this, on released version 0.0.7 shouldn't work:
require "fastruby"

fastruby '
class X
def foo
a = 32
lambda {|x|
a+x
}
end
end
'

p X.new.foo.call(16) # it will return 48, or it will fail?

But it works! why?... the current implementation allocs scopes for ruby local variables as a local variable of a struct type, in C, the local variables live in native stack in a fixed address which is passed as block parameter when calling lambda (using rb_funcall, etc..), this memory address remains associated to the block passed to lambda and then that address is used from inside the block like a normal block invocation. The difference here is that the scope struct used from the lambda block is deprecated and not longer valid since that stack space is below the stack pointer

To obtain the example of failure (in the grand tradition of TDD ;) ), we must call any another method after calling the method returning the lambda and before calling the lambda

require "fastruby"

fastruby '
class X
def foo
a = 32
lambda {|x|
a+x
}
end

def bar
end
end
'

lambda_object = X.new.foo # it will return 48, or it will fail?
X.new.bar # this should not affect the lambda object since it is not related
lambda_object.call(16) # but it does, and this does not return 48 as expected

The call to lambda does not return 48 as expected, in fact, it could return any unpredictable value even an invalid object causing a segmentation fault. The reason is that the call to X#bar overwrites the deprecated stack space used by the local scope associated to the lambda block; in this situation, the value of the "local variable" may be any unpredictable random garbage in the stack

Make it pass: alloc locals on the heap using malloc

And it pass, simply by moving the allocation of local variable scopes to heap instead of stack. Memory allocated on the heap never will be deprecated... but it must be de-allocated or the ruby process will die young of cancer. It is late at night and I will not write a test to assert ruby does not die with cancer when six billon of objects are allocated, but surely the de-allocation of local variable scopes will be performed as a refactor task.

And the scope deallocation is the real key issue here

But it is a topic for another post...

Links