Tario's Project: fastruby

Mostrando entradas con la etiqueta fastruby. Mostrar todas las entradas

lunes, 23 de abril de 2012

Fastruby 0.0.21: Inline of corelib fixnum methods, restored ruby 1.8 compat

Fastruby is a gem which allows to execute ruby code much faster than normal, currently in a state of transition between a spike and a usable gem, it is released when possible with incremental improvements.

One of the main improvements on fastruby ~~v0.0.20~~ v0.0.21 was the implementation of a few corelib methods (e.g. Fixnum#+) in ruby to allow inline them. The other is the restored compatibility with ruby 1.8

Install

You can clone the repository at github:

git clone git://github.com/tario/fastruby.git

git checkout v0.0.21

Or install it using gem install:

gem install fastruby

Speedup

Before (v0.0.19 and prior)

Now:

NOTE: Since issues with 1.8 compatibility, inline of corelib methods won't works on ruby 1.8, this will be fixed for v0.0.22 in future releases

Next

Improve speed of dynamic call lookups (not direct or inlined calls)
Inline of inherited methods and potential inline (when a new method is defined methods calling it will be recompiled in order to inline)
Parallel build of methods

domingo, 5 de febrero de 2012

Fastruby 0.0.18 released: more performance improvements

The main improvemens on fastruby v0.0.18 are the optimizations:

Method inlining, with the following limitations: no inline of iter calls (blocks), no inline of methods with iter calls and only one level of inlining for now
Refactor of cache to do not make use of the code snippet as key, instead caching is applied at other level using the syntax tree and other data as key

And bug fixes:

Fixed replacements of methods by redesigning the cache

Install

You can clone the repository at github:

git clone git://github.com/tario/fastruby.git

git checkout v0.0.18

Or install it using gem install:

gem install fastruby

Inlining in Ruby

Inlining is a technique very very common and popular on static compiled languages like C, C++ and even on managed static languages as Java or .NET. It basically consists on replacing a method call with their implementation to save the overhead cost of calling to a separated method.

On C programs this means avoid having to execute call instructions, allocating a de-allocating function frames, etc... In the ruby world, while the concept of inlining turns a little more complex, inlining implies saving creations of ruby frames (surely much more expensive than C frames) and lookup on method tables which have the same cost as accessing a associative hash.

Many can realize that making use of inlining technique on dynamic languages such as Javascript, Python or Ruby on this case is very complicated. If you do not realize this, look at the following example:

Suppose that you have this ruby code:

class X
  def foo(a,b)
    a+b
  end
end

class Y
  def bar(x)
    i = 100000
    ret = 0
    while (i > 0)
      ret = x.foo(ret,i)
      i = i - 1
    end
    ret
  end
end

p Y.new.bar(X.new) # 50005000

For now, avoid paying attention to the loop code on bar, so, if we apply the inline logic to X#foo, the code will be converted to:

class Y
  def bar(x)
    i = 100000
    ret = 0
    while (i > 0)
      ret = (
       a = ret
       b = ret
       a+b
      ) # x.foo(ret,i)
      i = i - 1
    end
    ret
  end
end

p Y.new.bar(X.new) # 50005000

But there is a problem on this picture, what happens if we call the same bar method passing a object of other type than X?:

class X2
  def foo(a,b)
    a*b
  end
end
p Y.new.bar(X2.new) # ?

The result will not be correct, the implementation of bar method with inlined foo is only valid while the implementation of the method being called is the same used at the moment of inlining it

Transparent multimethod and inlining

From earliest released versions of fastruby the base concept of multimethods was applied, this means that for a given method will be as many implementations as possible signatures (signatures given by arguments types including type of receiver object) , by example, the method Y#bar is implemented using two functions, one for X class and the other for X2 class

When the method is implemented in that way, we can know the type of arguments in a given function and use this information to inline the correct method being called.

Will be a version of Y#bar for X:

class Y
  def bar(x) # we know the type of x is X
    i = 100000
    ret = 0
    while (i > 0)
      ret = (
       a = ret
       b = ret
       a+b
      ) # x.foo(ret,i) # we know the implementation of X#foo
      i = i - 1
    end
    ret
  end
end

Will be a version of Y#bar for X2:

class Y
  def bar(x) # we know the type of x is X2
    i = 100000
    ret = 0
    while (i > 0)
      ret = (
        a = ret
        b = ret
        a*b
      ) # x.foo(ret,i) # we know the implementation of X2#foo
      i = i - 1
    end
    ret
  end
end

And this is correct, because the dispatcher calls the version of X or X2 depending on the type of the argument. The only problem with this is that the change of implementations of inlined methods (such as X#foo), when somebody do something like that:

class X
  def foo(a,b)
    a + b % 5
  end
end

Open Classes: Observers

The solution to the problem described in the previous paragraph is to rebuild methods with inlined calls each time the implementation of inlined methods changes. To acchive this I have implemented a observer method on fastruby method object .

Each time a method is build, observers are created for each inlined method to rebuild when any of these observers are triggered, and each time a method changes all related observers are triggered.

miércoles, 18 de enero de 2012

Fastruby 0.0.17 released: performance improvements

Direct call to CFUNC methods on ruby1.9 (speed up for native methods like Fixnum#+, Fixnum#>, etc...)
Refactored Ruby to C translation to use inlined code instead of anonymous functions for each expression, this avoids unnecesary C functions
Improvement on non-local jumps return and next to avoid setjmp when possible using normal goto's. This affects performance on ruby1.9 and ruby1.8 as well
Implemented scopes on C stack instead of heap when possible

And bug fixes:

Fixed bug of setjmp and no volatile variables, by adding volatile keyword for all variables used on C functions calling setjmp
Fixed encoding of native pointers for 32 bits systems, making use of PTR2NUM and NUM2PTR
Fixed bug of re-executed default argument initialization when redo jump is used

The overall result of the optimi

zations (specially the one about relocating scopes on native stack) are the great time improvement on benchmark1.rb for which fastruby takes about 20-30% less time than ruby1.9

Install

You can clone the repository at github:

git clone git://github.com/tario/fastruby.git
git checkout v0.0.17

Or install it using gem install:

gem install fastruby

Placing Ruby scopes on native stack instead of heap

Initially, at the very beginning of the project, natural translation of ruby to C lead us to translate ruby local variables as C local variables, but this implementation is wrong since ruby stack can diverge in a tree while proc and continuation objects can potentially hold references to local variable scopes (e.g. lambdas can retain references to scopes, read and write while other scopes "at the same level" can be created but in another branch). Many releases ago I solved this problem by implement a structure to support this "branching" behavior of ruby scopes, the "stack chunk" (See Fastruby 0.0.8 released with support for lambdas and Callcc puzzle on fastruby IV: Execution stack as a DAG, lambda cases for more details)

This implementation, while allowing to create and use lambdas and continuation objects, imply a severe performance penalty forcing to all code to access local variables on the heap, access to the heap is slower than the stack for reasons not now be explained

The key issue here is: Why cannot local variables of the ruby scope live on the native stack? The reason for that is that references to the scope can be potentially created, example:

def foo(a)
 proc do
   a = a + 1
 end
end

pr = foo(10)
p pr.call # 11
p pr.call # 12

On the example, a proc is created with a reference to the scope created for the method call. This scope must be allocated on the heap.

What about this?:

def foo(a)
 ret = 0
 a.each do |x|
   ret = ret + 1
 end

 ret
end

p foo([1,2,3]) # 6

A innocent iteration on a array, ok?

NO, imagine that:

def foo(a)
 ret = 0
 a.each do |x|
   ret = ret + x
 end
 ret
end

class X
 def each(&blk)
   $a = proc(&blk)

   blk.call(1)
   blk.call(1)
   blk.call(1)
 end
end

x = X.new
p foo(x) # 3
p $a.call(1) # 4

Any block passed to any method can be used to create a lambda. So any method making block calls can potentially allow creation of lambdas referencing scopes created for these methods.

Also, It's virtually impossible to create lambdas referencing scopes of methods which has no block calls. So, since these scopes can not be referenced by lambdas, methods without blocks can allocate their scopes on stack... if not for

Continuations

Continuations are the most macabre and wonderful feature of ruby. Continuations use setjmp/longjmp to work and also write directly to native stack to restore execution data stored on it.

C Local variables on native stack are overwritten each time a continuation is called with the values they had before at the moment that the continuation was created using callcc. This behavior is wrong for ruby variables, so, the use of ruby variables allocated on native stack will lead to unexpected behavior.
For example, this innocent loop would fail if the variables are allocated on stack:

require "continuation"

def foo
 a = 10
 callcc {|cont| $cont = cont}
 a = a - 1
 p a
 $cont.call if a > 0
end

When the scope of local variables is allocated on heap, the method foo works as expected displaying a countdown on standard output. But, when the scope is allocated on native stack, the first call to callcc copy the entire stack including the variable a storing it; each time the continuation is called to iterate on the loop the value of a is restores to the value they had at the moment of create the continuation (10), the result is an inifinite loop displaying 9 all the time

The key problem here is reading a local variable after a continuation is created, after a continuation is created the value of all local variables allocated on stack will be the value they had at the moment of create the continuation. So, reading a local after creating a continuation may result on unexpected behavior. In summary. Any method with the possibility of following the sequence: continuation -> variable read can potentially work in unexpected way.

Converting Syntax trees to Graphs

The only way to detect potential continuation -> variable read sequences is to generate a graph from the syntax tree of the method being analyzed and then search the sequence on the graph

The graph is composed with the nodes of the AST as vertexes, and the edges are the possibility of execute a given destination node after another origin node. For example, the following source code:

def foo(a)
 if a.to_i == 0
   10
 else
   20
 end
end

Corresponds to the following syntax tree:

And generates the following graph:

In that case, there is no read of local variables after call on any path, so the method of the example is able to allocate their local variables on native stack

In the other hand, the following method cannot use native stack:

def foo(a,b)
 while (a > 0)
   b.foo
   a = a - 1
 end
end

Syntax tree:

Graph:

The use of while makes possible the existence of a cyclic path on the graph. Beyond that, exists many tours that make native stack storage for locals prohibitive, they are marked with blue, green and orange:

Note that all prohibitive tours on graph goes from a call node to a lvar node, of course, the method of the while example must be implemented with the local variable scope allocated on heap

Viewed in this way, almost any method with normal complexity will fall on "heap" category since the only requirement for this to be so is having at least one possible prohibitive tour (from a call node to a lvar node) on the graph, and for now this is true. But this algorithm will have greater relavance when the analyzer know a priori what call can potentially create a continuation and what call will never create a continuation (this calls will be ignored). Today, this improvement gains 20-30% of timed compared with MRI1.9

For example, in the while code sample the method calls are "foo", ">" and "-", if the analyzer can afford assume that these calls never will create a continuation... > and - on numeric dont create continuation objects, but foo?. Of course this will be a little more complicated and probably the optimizations around this will be about type inferences, and qualifiers

jueves, 10 de noviembre de 2011

Migrating everthing to ruby1.9

Since the warning I get from a excellent talk of @yhara at @RubyConfAr, ruby 1.9 is the present while ruby 1.8 is the past. The next step regarding my projects is to ensure that all works on ruby 1.9 (but keeping the compatibility with ruby1.8 at the same time)

This is the status of the more important projects.

ImageRuby

All tests on ImageRuby spec works with 1.9,2-p180 thanks to the contribution fix of amadanmath. Of course I will be listen to bug reports regarding ruby1.9 compatibility

Shikashi (and dependencies)

All tests on Shikashi spec works with 1.9,2-p180. Specific ruby 1.9 syntaxis are not convered by the spec. I have created a new issue for this (the issue corresponds to partialruby gem)

To make shikashi fully compatible with ruby1.9, I must find a replacement for RubyParser which is not compatible with some of ruby 1.9 specific syntax, for example, RubyParser is unable to recognize the new syntax for hashes of ruby 1.9

Fastruby

This is the most complex case, these are the remaining issues:

Main headers of fastruby won't compile since these sources use syntax not supported by ruby1.9
Use of C-Api specific to ruby1.8
Remove support for syntax specific to ruby1.8 (only when run under ruby1.8)
Add support for syntax specific to ruby1.9 (only when run under ruby1.9)
RubyInline does not work with ruby1.9
RubyParser does not work with ruby1.9

See ruby1.9 issues on fastruby

I will fork or find a fork of rubyinline to fix issues related with ruby1.9 compatibility, for the case of RubyParser (affecting shikashi too) I will replace parsing using Ripper which is the native out-of-the-box implementation of parsing on ruby1.9

martes, 1 de noviembre de 2011

Fastruby 0.0.15 released. Callcc and continuation objects!

Fastruby is a gem which allows to execute ruby code much faster than normal, currently in a state of transition between a spike and a usable gem, it is released when possible with incremental improvements.
The v0.0.15 release adds the support for continuation objects created using callcc function. To acchieve this, the implementation uses the non-lineal stack created on previous releases.

Also this release adds a few extra language support improvements such as method default arguments

Install

You can clone the repository at github:

git clone git://github.com/tario/fastruby.git
git checkout v0.0.15

Or install it using gem install:

gem install fastruby

New Features

Fixes

Internal fix to support unlimited number of arguments on methods when called from normal ruby
Minor fixes to splat arguments on blocks
Removed yield_signature (inference of yield arguments) since this is not longer applicable

Examples of code being supported

Example of continuation calls, tail recursion

require "fastruby"

fastruby '
class X
def fact( n )
    a = callcc { |k| [ n, 1, k ] }
    n = a[0]
    f  = a[1]
    k  = a[2]
    if ( n == 0 ) then return f
    else
 k.call n-1, n*f, k
    end
end
end
'

p X.new.fact(6) # 720

Default Arguments

require "fastruby"

fastruby '
class X
def foo(a,b = a.reverse)
  a + b
end
end
'

p X.new.foo("13") # 1331
p X.new.foo("xx", "yy") # xxyy

Passing proc objects as blocks

require "fastruby"

fastruby '
class X
def bar
  yield(1)
  yield(2)
  yield(3)
end

def foo
  pr = proc do |x|
    p x
  end

  bar(&pr) # passing proc as a block
end
end
'

X.new.foo

Receiving blocks as proc objects

require "fastruby"

fastruby '
class X
def bar(&pr)
  pr.call(1)
  pr.call(2)
  pr.call(3)
end

def foo
  bar do |x|
    p x
  end
end
end
'

X.new.foo

domingo, 23 de octubre de 2011

Fastruby 0.0.14 released with support for method replacement

Fastruby is a gem which allows to execute ruby code much faster than normal, currently in a state of transition between a spike and a usable gem, it is released when possible with incremental improvements.
The v0.0.14 release make a internal redesign of the method hash structure to allow the store of function pointer pointers, this is what allows to reference a fixed function pointer which may change enabling method replacement while retaining most of the performance (fastruby calls now consume a extra indirection and a few extra checks)

Install

You can clone the repository at github:

git clone git://github.com/tario/fastruby.git
git checkout v0.0.14

Or install it using gem install:

gem install fastruby

New Features

Examples of code being supported

Replacement of methods

require "fastruby"

fastruby '
class X
  def foo(a)
    a+1
  end
end
'

x = X.new
p x.foo(4) # 5

fastruby '
class X
  def foo(a)
    a*2
  end
end
'

p x.foo(4) # 8

Remaining uncovered items on v0.0.14 release

Memory handling of function pointers on method hash (will be fixed for v0.0.15)
Fix limitation of 15 arguments when calling method defined on fastruby from normal ruby
Fix duplication of objects on cache when methods are replaced
Optimization of splat callls, currently this calls are executing using normal rb_funcall2 (will be improved for v0.2.0)
Optimization of calls to methods with variable number of arguments, currently are wrapped using rb_funcall (will be improved for v0.2.0)

domingo, 9 de octubre de 2011

Fastruby 0.0.13 released with support for splat arguments and method with array arguments

Fastruby is a gem which allows to execute ruby code much faster than normal, currently in a state of transition between a spike and a usable gem, it is released when possible with incremental improvements.
The v0.0.13 release of fastruby make a little internal design improvement regarding translator class and the implementation of splat arguments and methods with optional arguments (see examples below)

Install

You can clone the repository at github:

git clone git://github.com/tario/fastruby.git
git checkout v0.0.13

Or install it using gem install:

gem install fastruby

New Features

Examples of code being supported

Multiple arguments and call with splat

require "fastruby"

fastruby '
class X
def foo(*array)
  array.each do |x|
    p x
  end
end

def bar(*args)
  foo(*args)
end
end
'

x = X.new
x.foo(1,2,3) # prints 1,2 and 3
x.bar(4,5,6) # prints 4,5 and 6

Remaining uncovered items on v0.0.13 release

Configuration of signatures for methods with variable number of arguments (will be implemented for v0.0.14)
Build-time type inference of splat arguments as Array type (will be implemented for v0.0.14)
Optimization of splat callls, currently this calls are executing using normal rb_funcall2 (will be improved for v0.2.0)
Optimization of calls to methods with variable number of arguments, currently are wrapped using rb_funcall (will be improved for v0.2.0)

domingo, 2 de octubre de 2011

Fastruby 0.0.11 released with support for retry, redo and foreach

Fastruby is a gem which allows to execute ruby code much faster than normal, currently in a state of transition between a spike and a usable gem, it is released when possible with incremental improvements.
The v0.0.11 release of fastruby make internal improvements regarding non-local jumps and implements foreach support, including retry and redo statements (both applicable to any kind of block, and retry applicable on rescue clause)

Install

You can clone the repository at github:

git clone git://github.com/tario/fastruby.git
git checkout v0.0.11

Or install it using gem install:

gem install fastruby

New Features

Examples of code being supported

For each with redo and retry

require "fastruby"

fastruby '
class X
 def foo
   sum = 0

   for i in (4..8)
     p i # 4 4 4 4 4 5 6 7 4 5 6 7 4 5 6 7 4 5 6 7 8
     sum = sum +  i
     redo if sum < 20
     retry if sum < 100 and i == 7
   end

   sum
 end
end
'

x = X.new
x.foo # 46

miércoles, 28 de septiembre de 2011

Fastruby 0.0.9 released with support for proc

Fastruby is a gem which allows to execute ruby code much faster than normal, currently in a state of transition between a spike and a usable gem, it is released when possible with incremental improvements.
Following the release of version v0.0.8 which did implement several internal improvements including a new non-linear stack structure, the new release v0.0.9 implements the support for procedure objects based in the previous implementation

Install

You can clone the repository at github:

git clone git://github.com/tario/fastruby.git
git checkout v0.0.9

Or install it using gem install:

gem install fastruby

New Features

Support for proc

Examples of code being supported

return method from proc

require "fastruby"

fastruby '
class X
  def foo
    pr = Proc.new do
      return "return foo from inside proc"
    end

    pr.call

    return "return foo"
  end
end
'

x = X.new
x.foo # this returns "return foo from inside proc"

The behavior of Proc objects (those created using Proc.new) are very similar to the behavior of lambdas, but with a few little differences:

It is possible to do non-local return's from Proc.new of the method where the block was defined, this non-local return will fail with LocalJumpError when the method scope is closed in the moment of invoking the return (in lambdas, return only acts as next; ie, only ends the lambda function)
Break jumps are illegal from Proc.new blocks and always fails with LocalJumpError (in lambdas, break acts as next)

As additional note, block objects created using proc method are exact the same block objects created using lambda method, only check the the ruby 1.8 source code to see something like that:

   ...
 rb_define_global_function("proc", proc_lambda, 0);
 rb_define_global_function("lambda", proc_lambda, 0);
 ...

lunes, 19 de septiembre de 2011

Fastruby roadmap up to 0.5.0

Fastruby development (up to 1.0.0 release) will be divided into two main phases:

Incremental Proof of Concept. Transition from a base spike/PoC (version 0.0.1) to usable gem (version 0.1.0).
Usable versions. Transition from minimally usable gem (version 0.1.0) to stable API release (version 1.0.0)

This is a rough roadmap and might change in the future if new ideas or optimizations are discovered

PoC Incremental Versions

Version 0.0.8: lambdas (already released)

Implementation of DAG stack including GC mark of local variables residing on it
lambda support
refactor of break implementation

Version 0.0.9: proc

proc support using the same functionality as for lambda

Version 0.0.10: Gem packaging fixes

Version 0.0.11: for, retry, redo

refactor of non-local jumps to use ruby API
trivial implementation of for
implementation of retry and redo as non-local retro-jumps (like the implementation of loop/while)

Version 0.0.12: Full exception support

Version 0.0.13: Methods with variable number of arguments

Version 0.0.14: Support for method replacement

Version 0.0.15: continuation objects

Etc...

Version 0.1.0: First usable release

Any ruby code executable by MRI can be executed by fastruby

Usable Versions (Following semantic versioning)

Zero used as mayor version number is used when API is changed every day without backward compatibility (despite the fact that will be backward compatibility when possible). Also, on 0.X.X versions the API is completely undefined.
Bugs and spec un-compliance of 0.1.0 release will be fixed on subsequent stabilization releases (0.1.1, 0.1.2, etc...)

Version 0.1.0: Execution of Ruby

Features:

Any executable code in ruby is executable via fastruby and gives the same results
Interoperation between ruby and fastruby is granted

Version 0.2.0: Fast execution of ruby, internal optimizations

Features

All internal optimizations documented up to 0.1.0 release
- Linear stack when possible using nonlambda qualifier
- Method dispatcher cache (to speedup calls from ruby)
- Optimization of C frames to make ruby return using C return when possible (useful on small methods)

Version 0.3.0: Fast execution of ruby, C extension API

C extension points
- API to define methods in C, by class, method name and signature allowing replacement of ruby standard lib implementation in the future.

Version 0.4.0: Fast execution of ruby, C extension API 2

C extension points
- Macro methods (keeping dynamism). Examples:
  1. Fixnum#+(x:Fixnum) => INT2FIX(FIX2INT(self)+FIX2INT( x) )
  2. String#[](x:Fixnum) => rb_str_new2(RSTRING(self)->ptr[FIX2INT(x)],1)
  3. String#[]=(x:Fixnum,y:Fixnum) =>
    RSTRING(self)->ptr[FIX2INT(x)]=FIX2INT(y)

Version 0.5.0: Faster standard lib

Ruby standard lib re-implemented in a separated gem optimized for fastruby
Fastruby works with and without optimized stdlib

Fastruby 0.0.8 released with support for lambdas

Fastruby is a gem which allows to execute ruby code much faster than normal, currently in a state of transition between a spike and a usable gem, it is released when possible with incremental improvements.
The version v0.0.8 was released with the new feature of supporting lambdas. Functional improvement does not appear to be important, but the underlying internal implementation will allow new features in the next release such as the support for proc and continuation objects.

Install

You can clone the repository at github:

git clone git://github.com/tario/fastruby.git

Or install it using gem install:

gem install fastruby

New Features

Support for lambdas
Internal: new structure to represent stack
Better implementation of break statement

Examples of code being supported

Scope retention (lambda has a reference to the scope where it was created)

require "fastruby"

fastruby '
class X
  attr_accessor :lambda_set, :lambda_get
  def initialize
    a = nil

    @lambda_set = lambda{|x| a = x}
    @lambda_get = lambda{a}
  end
end
'

x = X.new

x.lambda_set.call(88)
p x.lambda_get.call

Yield block of the scope

require "fastruby"

fastruby '
class X
  def bar
    lambda_obj = lambda{
      yield
    }
    lambda_obj
  end

  def foo
    a = 77
    bar do
      a
    end
  end
end
'

x = X.new

lambda_obj = x.foo
p lambda_obj.call  # 77

viernes, 16 de septiembre de 2011

Callcc puzzle on fastruby IV: Execution stack as a DAG, lambda cases

In order to implement lambdas and continuations (in a near future) a new structure to represent the stack was implemented, this will be available in 0.0.8 release of fastruby as base of the lambda support. Call with current continuation will be implemented on 0.0.9.

The implementation will cover two main cases:

Normal lineal stack (keeping performance when possible)
Non-linear stack (DAG) used when lambdas are created

While the case of continuation will remain uncovered (for now) being probably the storage of native stack the most important outstanding issue to solve in order to achieve the implementation of continuation.

New Stack Structure

To represent a ruby stack, a new structure was created: StackChunk. A stack chunk is a representation of a linear stack, allocating and de-alocatting memory from this scopes is far simpler than dynamic memory allocations (e.g. malloc)

Since the ruby stack is non-linear, StackChunk are used as the longest linear sections and chained following a DAG pattern. it is like the internal representation of frames used by the MRI with the difference that stack chunks may contain many scopes, even all the scopes in cases of representing stacks without divergences, in these cases a single dynamic memory allocation may be enough for all scopes needed for the execution.

Three main cases

There are three main cases of interest where the new structure has relevance

Case 1: normal linear stack, single stack chunk

The StackChunk (like the native stack) is an entity that grows dynamically linearly. So, in the case of representing a linear stack only one stack chunk is needed and this stack will grows dynamically as new scopes are allocated

The stack chunk is created the first time a fastruby method creates a scope. At an arbitrary point, the stack look like this:

When a ruby method (builded with fastruby) is called, a scope to hold local variables and (other information such the passed block) is created, the memory where this scope goes live is provided by the stack chunk and the stack chunk. There is no need to allocate ruby objects, nor even dynamic memory in most cases. So, linear stack chunk grows as native stack grows:

While empty circles represents ruby objects referenced by local variables, native stack hold native C pointers to scopes allocated in the stack chunk

Since locals variables lives on these stack chunks instead of native stack, this locals can survive beyond the C frame where they were created.

Case 2: Lambdas, frozen and multiple stack chunks to represent DAG stack

Each time a lambda object is created, fastruby attach a ruby reference to the StackChunk being used at the moment of create the lambda, also, the state of that StackChunk and all his parents is changed to frozen, a frozen stack chunk cannot create new scopes to keep alive all the scopes at the moment when
freeze it, but the variables living on that stack chunk may still change their values

For example, if a lambda is created in a given stack point, rollback a few frames and creates a new ones (e.g. by calling another methods), the state will be something like that:

Red arrows represent ruby references between ruby objects, this references are created only with the purpose of keep the related stack chunks alive on garbage collection; ruby lambda objects store a reference to the stack chunk at the moment of create the lambda. To handle the dynamic memory used by the stack, it is wrapped (gray arrows) by a ruby object of class StackChunk. So, the memory is freed when the garbage collector indicate that it must be disposed when the object has no references.

Black arrows indicates C native pointers, native stack holds native local variables pointers to scopes (representation of ruby scopes including ruby local variables) living on the native heap allocated by the stack chunk

Green arrows indicates native references (VALUE) to ruby objects (represented by empty circles), to avoid these object being disposed by the GC, the stack chunk must implement the mark method and mark each object referenced in that way. These values can changed both on frozen and active stack chunks.

Case 3: Continuation objects (NOT IMPLEMENTED YET)

The handling of dynamic and DAG stack chunks is solved in the same way as in the case of the lambda: each time a continuation object is created (by using callcc ruby method), a reference to the stack chunk being used on that moment is attached to him to avoid that stack chunk being disposed by the GC.
The new problem to overcome now, is the recording of what would be the representation of ruby frames on fastruby, this representations are native local variables living in the native stack (including the most important one, the pointer to the scope), since the native stack is linear expires their data while decrease and overwrites the old data when it grows back (e.g another method is called).
Maybe, the solution in this case is to save the native stack together with the stack chunk and restore it each time a continuation object is called, but that is topic for another article.

miércoles, 7 de septiembre de 2011

Callcc puzzle on fastruby III: Almost solved?

First of all, the plan of migrate to

C++ was canceled to use ruby Garbage Collector instead to handle the resources used by the fastruby stack entities

The cost of instantiating objects on ruby

Since ruby garbage collector works only with ruby objects, I need to encapsulate as ruby objects whatever I want to have managed by the GC. The first option that comes to mind is encapsulate every local variables scope into a ruby object: bad idea, this will instantiate a ruby object for each scope and the cost of instantiating objects on ruby is of an higher order than a single memory allocation on C; the issue of this approach can be appreciated on this spike which adds a instantiation of a ruby object (using rb_objnew) for each scope to see the impact on general performance measured used benchmark scripts, the result was the increase of the execution time of the main benchmark by 3X, enough to reject the approach of one ruby object per locals scope.

Stack Chunk

To solve the issue of multiple ruby object allocations, multiple locals scopes will be grouped and encapsulated as a single ruby object, specifically those belonging the same stack. So stack chunks will grow dynamically (using C malloc) as locals scopes are allocated and all dynamic memory used by a stack chunk will be de-allocated when it is removed by ruby GC.
For the cases of lambdas, proc and continuations, StackChunk objects will be referred from a ruby local variable in the ruby scope, so, lambdas duplicating ruby stacks will make new references to stack chunks. Stack chunks will become immutable/frozen when lambdas or continuation objects are created to prevent overwriting local variables with new scopes, local variables living on immutable/frozen stack chunks may be changed but frozen chunks rejects the allocation of new scopes.

Stack Chunk properties

All locals variables will be placed/allocated on the current stack chunk at the moment of initialize the function frame
Each stack chunk is wrapped as a ruby object of type StackChunk to provide to the ruby garbage collector the interface to release stack chunks from memory (Data_Wrap_Struct) when it have no references.
StackChunk will be referred from the local variable scope of MRI (Matz Ruby Interpreter) to prevent this object to be removed until they stop using it (e.g. leaving the scope where the stack chunk is created); on the internal implementation of MRI, lambdas, procs and continuation objects retain references to MRI stack frames, so the issue of lambdas and continuations will be solved (in part) with this.
Calcc, lambdas and procs will turn the state of the current stack chunk into frozen state, this implies that new scopes can't be created in that stack chunk. When new scopes are being required, a new stack chunk will be created

Implementation

See https://github.com/tario/fastruby/commits/dynamic_locals

martes, 6 de septiembre de 2011

Callcc puzzle on fastruby II: The return of C++

In a previous post, I have explained the problem of implementing callcc while keeping the performance on fastruby; and then the focus was changed to a simpler issue: lambdas.

The approaching to the lambda issue allow us to recognize the non-linear nature of the ruby stack: while the native stack grows allocating space for local variables and decreases deallocating that space, the non-linear stack of ruby allows keeping isolated local variables scopes associated to lambda blocks (while previous scopes in the stack are deallocated) and even allows retain all the scopes of a stack for continuation objects created with callcc

The solution for lambda case is as simple as changing the place where the local variable scope lives: heap instead of stack.

Making use of dynamic memory is not a game, and if misused it may be hurt or even kill (the ruby process, after a relatively slow death with cancer)

Release of allocated memory and C++

Each allocation must have their corresponding de-allocation. It's that simple. But not so simple when programming in C, because [...]
RAII provided by C++ make more easier to code scopes of resources initialization and finalization (dynamic memory, in this case), by using the destructor of clases to release the resources. this covers function returns and C++ exceptions. It does not cover longjmp (this will be analyzed later on this post). See the wikipedia article about RAII for more info and code examples

Reference Counting

Immediately after creating a lambda object, the local variable scope is referenced twice: one reference from the lambda object itself, and the other reference from the scope living in the stack (imagine that many lambdas created in the same scope adds more references to the count). In this situation, when the execution leaves the scope of the function, it de-allocate the local variables struct, the only thing it should do is to decrement the reference count of the local variables by one unit, and the release should only occur when this count reach zero. This algorithm is called Reference Counting.

Longjmp and C++ Raii

Using of C function setjmp/longjmp for non-local GoTO's are incompatible with C++ RAII since RAII depends on the destructors of objects instantiated on the stack are called. For example, a longjmp crossing a scope allocating recurses will prevent the release of these resources.
Apparently, the solution to this issue is to replace the current use of setjmp/longjmp with c++ exceptions (the same risk is present when using setjmp/longjmp), but after a spike I realized this might not be a good idea:

test.c using longjmp and setjmp

#include "setjmp.h"

jmp_buf jmp;

void function2() {
 longjmp(jmp,1);
}
void function1() {
 setjmp(jmp);
}
int main() {
 int i;
 for (i=0;i<200000000;i++) {  
   function1();
 }
 return 0;
}

It took 2.241 seconds

test.cpp using try/catch and throw

#include "exception"

void function2() {
 throw std::exception();
}
void function1() {
 try {
   function2();
 } catch (std::exception) {
 }
}
int main() {
 int i;
 for (i=0;i<200000000;i++) {
   function1();
 }
 return 0;
}

It was never finished even after hours

Conclusion

C++ exceptions will always be slower than setjmp/longjmp because C++ exceptions use non-local goto's as part of their implementation and other complexities. I must look for another workaround for the longjmp detail

Links

First post of callcc puzzle: http://tario-project.blogspot.com/2011/09/callcc-puzzle-on-fastruby.html
Reference counting at Wikipedia: http://en.wikipedia.org/wiki/Reference_counting
RAII at Wikipedia: http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization

lunes, 5 de septiembre de 2011

Callcc puzzle on fastruby

Callcc or "Call with Current Continuation" is the implementation of coroutines on ruby, inherited from another languages such scheme and lisp, callcc allows the creation of continuation objects which hold the context information (stack frame position, local variables of the entire stack, etc...) and can be called to make a non-local goto jump to that context like longjmp and setjmp functions in C (see this explanation of callcc in ruby for more info)
The difference between callcc/Continuation#call and setjmp/longjmp is that callcc allows goto to forward, backward in the stack and even to a side stack while setjmp only allows jump backward. This imply that while setjmp/longjmp works fine with linear stack, callcc will not work in that way and will need a non-linear stack, a complex tree dynamic struct instead a chunk of memory growing when needed (the native stack)

Maybe something easier: lambdas

For example, a code like this, on released version 0.0.7 shouldn't work:

require "fastruby"

fastruby '
  class X
    def foo
      a = 32
      lambda {|x|
        a+x
      }
    end
  end
'

p X.new.foo.call(16) # it will return 48, or it will fail?

But it works! why?... the current implementation allocs scopes for ruby local variables as a local variable of a struct type, in C, the local variables live in native stack in a fixed address which is passed as block parameter when calling lambda (using rb_funcall, etc..), this memory address remains associated to the block passed to lambda and then that address is used from inside the block like a normal block invocation. The difference here is that the scope struct used from the lambda block is deprecated and not longer valid since that stack space is below the stack pointer

To obtain the example of failure (in the grand tradition of TDD ;) ), we must call any another method after calling the method returning the lambda and before calling the lambda

require "fastruby"

fastruby '
  class X
    def foo
      a = 32
      lambda {|x|
        a+x
      }
    end

    def bar
    end
  end
'

lambda_object = X.new.foo # it will return 48, or it will fail?
X.new.bar # this should not affect the lambda object since it is not related
lambda_object.call(16) # but it does, and this does not return 48 as expected

The call to lambda does not return 48 as expected, in fact, it could return any unpredictable value even an invalid object causing a segmentation fault. The reason is that the call to X#bar overwrites the deprecated stack space used by the local scope associated to the lambda block; in this situation, the value of the "local variable" may be any unpredictable random garbage in the stack

Make it pass: alloc locals on the heap using malloc

And it pass, simply by moving the allocation of local variable scopes to heap instead of stack. Memory allocated on the heap never will be deprecated... but it must be de-allocated or the ruby process will die young of cancer. It is late at night and I will not write a test to assert ruby does not die with cancer when six billon of objects are allocated, but surely the de-allocation of local variable scopes will be performed as a refactor task.

And the scope deallocation is the real key issue here

But it is a topic for another post...

Links

Callcc at Wikipedia: http://en.wikipedia.org/wiki/Call-with-current-continuation
Callcc in ruby, the pragmatic programmer guide: http://www.ruby-doc.org/docs/ProgrammingRuby/html/ref_c_continuation.html
Extensive explanation of closure, blocks, lambdas at Wikibooks: http://en.wikibooks.org/wiki/Ruby_Programming/Syntax/Method_Calls

domingo, 28 de agosto de 2011

Fastruby v0.0.6 (PoC) released

Fastruby is a gem which allows to execute ruby code faster than normal (about 20X of the MRI1.8), see this post for more info

Improved ruby support

In the first gem released version of fastruby (v0.0.1), the ruby support was very limited since that version was designed as a spike (e.g. it can't handle block calls, exceptions, break, etc...).
These ruby support improvements are based on frame structures used at runtime and some other tricks such non-local goto's to implement return, next, break; this make the generated code a bit more expensive but still 20X faster (instead of 100X of the first version)

Now (version 0.0.6) fastruby support a wide range of ruby constructions, such:

Exceptions (begin, rescue, ensure, end)
Classes and modules
Constants, globals
Flow control sentences if, unless, and case..when..end
Literals
Singleton methods (almost as slow as normal ruby)
Ruby built-in goto's (break, return, next)

But for now, leaves out the following:

Class variables (*)
Methods with multiple number of arguments (*)
Callcc (*)

* Issue created for task

Native object cache to reduce bootstrap overhead

Usually the execution of ruby code on fastruby implies parsing the code, translate to C, and build by compiling it using gcc .
Even small code snippets may take a few seconds, a lot of time comparing it with the execution of same code using MRI, which takes hundredths of a second and imagining what would happen with larger code...
The response to this issue is the implementation of the cache, transparent to the invoker. In the image, a script is executed by first time to show it takes 0,642 seconds, and when the same script is executed again, the same script takes 0,057 seconds. The script test.rb is a very simple test:

require "fastruby"

fastruby '

print "hello world\n"

'

The cache is located on $HOME/.fastruby and the cache feature can be deactivated by setting the environment variable FASTRUBY_NO_CACHE to 1 when execute a ruby script that uses fastruby.

Implementation details of cache

As the diagram shows (click to view the full resolution version)

Each code snippet has a SHA1 sum associated and each SHA1 has a collection of native libraries including both the main object and multiple built of the methods defined in that snippet.
The diagram is only a conceptual model and does not represent any entity in fastruby source code, the sha1 association is implemented using the filesystem by saving each object collection in a directory named as the hexadecimal sha1 of the corresponding code snippet (inspired by git internals :D )

Links

Fastruby github page: https://github.com/tario/fastruby
Previous post on fastruby: http://tario-project.blogspot.com/2011/07/fastruby-v001-poc-released.html

sábado, 30 de julio de 2011

Fastruby v0.0.1 (PoC) released

Fastruby is a gem which allows to execute ruby code faster than normal (about 100X of the MRI1.8)

Fastruby IS NOT a separated ruby interpreter. Then, the design is simple

Fastruby IS NOT a DSL to generate C code using ruby or a Ruby to C translator, the goal of fastruby is to execute RUBY code

Core Concepts

Native build

All code processed by fastruby ends with a native representation, in the current version, this is acomplished using RubyParser to parse rubycode.
The ruby code is translated to C and then processed with RubyInline

Transparent multimethods (and multiblocks)

The methods processed by fastruby has multiple internal implementations depending on the type of the arguments.
Each possible signature has a version of the method and this are built in runtime when the method is called with a new signature.
The same concept will be applied to blocks (anonymous methods) in future releases

Type inference

Each version of a method is built for a specific signature, so, the builder can assume a type for the arguments and build method calls using that assumption.
Whereever the translator can asume a type for a expression involved in a method call (used as argument or as receiver), this information can be used to encode
direct calls instead of normal and expensive ruby calls.

The currently implementation only can infer types for method and block arguments, and for literals

Customization through build directives and API

To compensate for the described limitations, fastruby suport a few build directives to allow the programmer help the inference.
The syntaxis of these directives are the same as normal ruby call (see examples)
Also, fastruby will define a API to customize aspects of fastruby internals. E.g the build method to invoke the build of methods with a specific signature (see examples)

The Image

The image of this post shows a display with the execution of the current benchmarks of the test

All of them follow the pattern of executing the code with normal ruby and with fastruby. The fourth benchmark adds a few additional measures (see the benchmark source for more info)

You can locate the benchmark sources installed in the gem directory of fastruby or in the github repository and the full resolution version of the image here

Installation

The install is as simple as execute the well-known gem install:

sudo gem install fastruby

Examples

The syntax is simple since one of the main of goals of fastruby is to be transparent. The current API serves for testing and customization of this poc version of fastruby

Prebuild of methods:

 require "fastruby"

class X
fastruby '
def foo(a,b)
  a+b
end
'
end

X.build([X,String,String] , :foo)

p X.new.foo("fast", "ruby") # will use the prebuilded method
p X.new.foo(["fast"], ["ruby"]) # will build foo for X,Array,Array signature and then execute it

NOTE: this is not necessary to do this, the methods will built automatically when there are called for first time

Variable "types"

Like static languages, you can define a type for a variable to help the inference and gain some performance. This can be done by using lvar_type directive

 class X
  fastruby '
  def foo
    lvar_type(i, Fixnum)
    i = 100
    while (i > 0)
      i = i - 1
    end
    nil
  end
  '
end

With no lvar_type, the calls to Fixnum#> and Fixnum#- will be dynamic and more expensive

Links

Github page: https://github.com/tario/fastruby