martes, 21 de agosto de 2012

Fastruby 0.0.22: optimized dynamic calls

Fastruby is a gem which allows to execute ruby code much faster than normal, currently in a state of transition between a spike and a usable gem, it is released when possible with incremental improvements.

In previous post I promised to Improve speed of dynamic call lookups, this is because the speed optimization used to depend heavly on the possibility of inlining and therefore on the type information found at build-time. When no type information for method call receivers can be obtained at build-time, the method call is encoded as dynamic, and the method used to resolve dynamic calls so far was not optimal leading to results significantly worst than MRI 1.9

For example, the next code run faster because of the type information provided by lvar_type directive


class Y
  def bar(x)
    i = 1000000
    lvar_type(i,Fixnum)
   
    ret = 0
    while i > 0
      ret = x.foo(i,i)
      i = i - 1
    end
    return ret
  end
end


Else, removing lvar_type:


class Y
  def bar(x)
    i = 1000000
    #removed lvar_type
   
    ret = 0
    while i > 0
      ret = x.foo(i,i)
      i = i - 1
    end
    return ret
  end
end


Will result on this (for version without lvar_type):

Before (v0.0.21 and prior)

Now:






NOTE: When lvar_type is used, the results are the same as in v0.0.21 (see this post)

Next

lunes, 23 de abril de 2012

Fastruby 0.0.21: Inline of corelib fixnum methods, restored ruby 1.8 compat

Fastruby is a gem which allows to execute ruby code much faster than normal, currently in a state of transition between a spike and a usable gem, it is released when possible with incremental improvements.

One of the main improvements on fastruby v0.0.20 v0.0.21 was the implementation of a few corelib methods (e.g. Fixnum#+) in ruby to allow inline them. The other is the restored compatibility with ruby 1.8

Install

You can clone the repository at github:
git clone git://github.com/tario/fastruby.git
git checkout v0.0.21

Or install it using gem install:
gem install fastruby


Speedup

Before (v0.0.19 and prior)







Now:








NOTE: Since issues with 1.8 compatibility, inline of corelib methods won't works on ruby 1.8, this will be fixed for v0.0.22 in future releases

Next

martes, 14 de febrero de 2012

Fastruby 0.0.19 released: improve of inline, performance bugs fixed

Fastruby is a gem which allows to execute ruby code much faster than normal, currently in a state of transition between a spike and a usable gem, it is released when possible with incremental improvements.

The main improvements on fastruby v0.0.19 are refinements on the inliner algorithm:
  • Inline of block calls (was not possible before)
  • Inline of methods with block calls (was not possible before)
And the fix of two important bugs:
  • Duplicated entries on $LOAD_PATH creates an incremental latency on require as new methods are built. Fixing this bug allows obtaining a reduction of total time of tests of %80 (150 seconds to 30 seconds on my machine). Moral: don't touch $LOAD_PATH if possible
  • Replacement of CFUNC calls using the previously implemented observer mechanism for inlining
  • Fixed case statements reduction to use deterministic names for temporal variable and allow caching of methods using case statements
No new big features...

Install

You can clone the repository at github:
git clone git://github.com/tario/fastruby.git
git checkout v0.0.19

Or install it using gem install:
gem install fastruby


domingo, 5 de febrero de 2012

Fastruby 0.0.18 released: more performance improvements

Fastruby is a gem which allows to execute ruby code much faster than normal, currently in a state of transition between a spike and a usable gem, it is released when possible with incremental improvements.

The main improvemens on fastruby v0.0.18 are the optimizations:
  • Method inlining, with the following limitations: no inline of iter calls (blocks), no inline of methods with iter calls and only one level of inlining for now
  • Refactor of cache to do not make use of the code snippet as key, instead caching is applied at other level using the syntax tree and other data as key
And bug fixes:
  • Fixed replacements of methods by redesigning the cache

Install

You can clone the repository at github:
git clone git://github.com/tario/fastruby.git
git checkout v0.0.18

Or install it using gem install:
gem install fastruby

Inlining in Ruby

Inlining is a technique very very common and popular on static compiled languages like C, C++ and even on managed static languages as Java or .NET. It basically consists on replacing a method call with their implementation to save the overhead cost of calling to a separated method.
On C programs this means avoid having to execute call instructions, allocating a de-allocating function frames, etc... In the ruby world, while the concept of inlining turns a little more complex, inlining implies saving creations of ruby frames (surely much more expensive than C frames) and lookup on method tables which have the same cost as accessing a associative hash.

Many can realize that making use of inlining technique on dynamic languages such as Javascript, Python or Ruby on this case is very complicated. If you do not realize this, look at the following example:

Suppose that you have this ruby code:
class X
def foo(a,b)
a+b
end
end

class Y
def bar(x)
i = 100000
ret = 0
while (i > 0)
ret = x.foo(ret,i)
i = i - 1
end
ret
end
end

p Y.new.bar(X.new) # 50005000

For now, avoid paying attention to the loop code on bar, so, if we apply the inline logic to X#foo, the code will be converted to:
class Y
def bar(x)
i = 100000
ret = 0
while (i > 0)
ret = (
a = ret
b = ret
a+b
) # x.foo(ret,i)
i = i - 1
end
ret
end
end

p Y.new.bar(X.new) # 50005000

But there is a problem on this picture, what happens if we call the same bar method passing a object of other type than X?:
class X2
def foo(a,b)
a*b
end
end
p Y.new.bar(X2.new) # ?
The result will not be correct, the implementation of bar method with inlined foo is only valid while the implementation of the method being called is the same used at the moment of inlining it

Transparent multimethod and inlining

From earliest released versions of fastruby the base concept of multimethods was applied, this means that for a given method will be as many implementations as possible signatures (signatures given by arguments types including type of receiver object) , by example, the method Y#bar is implemented using two functions, one for X class and the other for X2 class

When the method is implemented in that way, we can know the type of arguments in a given function and use this information to inline the correct method being called.

Will be a version of Y#bar for X:
class Y
def bar(x) # we know the type of x is X
i = 100000
ret = 0
while (i > 0)
ret = (
a = ret
b = ret
a+b
) # x.foo(ret,i) # we know the implementation of X#foo
i = i - 1
end
ret
end
end
Will be a version of Y#bar for X2:
class Y
def bar(x) # we know the type of x is X2
i = 100000
ret = 0
while (i > 0)
ret = (
a = ret
b = ret
a*b
) # x.foo(ret,i) # we know the implementation of X2#foo
i = i - 1
end
ret
end
end
And this is correct, because the dispatcher calls the version of X or X2 depending on the type of the argument. The only problem with this is that the change of implementations of inlined methods (such as X#foo), when somebody do something like that:
class X
def foo(a,b)
a + b % 5
end
end

Open Classes: Observers

The solution to the problem described in the previous paragraph is to rebuild methods with inlined calls each time the implementation of inlined methods changes. To acchive this I have implemented a observer method on fastruby method object .
Each time a method is build, observers are created for each inlined method to rebuild when any of these observers are triggered, and each time a method changes all related observers are triggered.




miércoles, 18 de enero de 2012

Fastruby 0.0.17 released: performance improvements

Fastruby is a gem which allows to execute ruby code much faster than normal, currently in a state of transition between a spike and a usable gem, it is released when possible with incremental improvements.

The main improvemens on fastruby v0.0.17 are the optimizations:
  • Direct call to CFUNC methods on ruby1.9 (speed up for native methods like Fixnum#+, Fixnum#>, etc...)
  • Refactored Ruby to C translation to use inlined code instead of anonymous functions for each expression, this avoids unnecesary C functions
  • Improvement on non-local jumps return and next to avoid setjmp when possible using normal goto's. This affects performance on ruby1.9 and ruby1.8 as well
  • Implemented scopes on C stack instead of heap when possible
And bug fixes:
The overall result of the optimi
zations (specially the one about relocating scopes on native stack) are the great time improvement on benchmark1.rb for which fastruby takes about 20-30% less time than ruby1.9


Install

You can clone the repository at github:
git clone git://github.com/tario/fastruby.git
git checkout v0.0.17
Or install it using gem install:
gem install fastruby 

Placing Ruby scopes on native stack instead of heap

Initially, at the very beginning of the project, natural translation of ruby to C lead us to translate ruby local variables as C local variables, but this implementation is wrong since ruby stack can diverge in a tree while proc and continuation objects can potentially hold references to local variable scopes (e.g. lambdas can retain references to scopes, read and write while other scopes "at the same level" can be created but in another branch). Many releases ago I solved this problem by implement a structure to support this "branching" behavior of ruby scopes, the "stack chunk" (See Fastruby 0.0.8 released with support for lambdas and Callcc puzzle on fastruby IV: Execution stack as a DAG, lambda cases for more details)

This implementation, while allowing to create and use lambdas and continuation objects, imply a severe performance penalty forcing to all code to access local variables on the heap, access to the heap is slower than the stack for reasons not now be explained

The key issue here is: Why cannot local variables of the ruby scope live on the native stack? The reason for that is that references to the scope can be potentially created, example:
def foo(a)
proc do
a = a + 1
end
end

pr = foo(10)
p pr.call # 11
p pr.call # 12

On the example, a proc is created with a reference to the scope created for the method call. This scope must be allocated on the heap.

What about this?:

def foo(a)
ret = 0
a.each do |x|
ret = ret + 1
end

ret
end

p foo([1,2,3]) # 6

A innocent iteration on a array, ok?
NO, imagine that:
def foo(a)
ret = 0
a.each do |x|
ret = ret + x
end
ret
end

class X
def each(&blk)
$a = proc(&blk)

blk.call(1)
blk.call(1)
blk.call(1)
end
end
x = X.new
p foo(x) # 3
p $a.call(1) # 4

Any block passed to any method can be used to create a lambda. So any method making block calls can potentially allow creation of lambdas referencing scopes created for these methods.
Also, It's virtually impossible to create lambdas referencing scopes of methods which has no block calls. So, since these scopes can not be referenced by lambdas, methods without blocks can allocate their scopes on stack... if not for

Continuations

Continuations are the most macabre and wonderful feature of ruby. Continuations use setjmp/longjmp to work and also write directly to native stack to restore execution data stored on it.
C Local variables on native stack are overwritten each time a continuation is called with the values they had before at the moment that the continuation was created using callcc. This behavior is wrong for ruby variables, so, the use of ruby variables allocated on native stack will lead to unexpected behavior.
For example, this innocent loop would fail if the variables are allocated on stack:
require "continuation"

def foo
a = 10
callcc {|cont| $cont = cont}
a = a - 1
p a
$cont.call if a > 0
end

When the scope of local variables is allocated on heap, the method foo works as expected displaying a countdown on standard output. But, when the scope is allocated on native stack, the first call to callcc copy the entire stack including the variable a storing it; each time the continuation is called to iterate on the loop the value of a is restores to the value they had at the moment of create the continuation (10), the result is an inifinite loop displaying 9 all the time

The key problem here is reading a local variable after a continuation is created, after a continuation is created the value of all local variables allocated on stack will be the value they had at the moment of create the continuation. So, reading a local after creating a continuation may result on unexpected behavior. In summary. Any method with the possibility of following the sequence: continuation -> variable read can potentially work in unexpected way.

Converting Syntax trees to Graphs

The only way to detect potential continuation -> variable read sequences is to generate a graph from the syntax tree of the method being analyzed and then search the sequence on the graph

The graph is composed with the nodes of the AST as vertexes, and the edges are the possibility of execute a given destination node after another origin node. For example, the following source code:
def foo(a)
if a.to_i == 0
10
else
20
end
end
Corresponds to the following syntax tree:



And generates the following graph:



In that case, there is no read of local variables after call on any path, so the method of the example is able to allocate their local variables on native stack

In the other hand, the following method cannot use native stack:
def foo(a,b)
while (a > 0)
b.foo
a = a - 1
end
end
Syntax tree:


Graph:
The use of while makes possible the existence of a cyclic path on the graph. Beyond that, exists many tours that make native stack storage for locals prohibitive, they are marked with blue, green and orange:


Note that all prohibitive tours on graph goes from a call node to a lvar node, of course, the method of the while example must be implemented with the local variable scope allocated on heap

Viewed in this way, almost any method with normal complexity will fall on "heap" category since the only requirement for this to be so is having at least one possible prohibitive tour (from a call node to a lvar node) on the graph, and for now this is true. But this algorithm will have greater relavance when the analyzer know a priori what call can potentially create a continuation and what call will never create a continuation (this calls will be ignored). Today, this improvement gains 20-30% of timed compared with MRI1.9

For example, in the while code sample the method calls are "foo", ">" and "-", if the analyzer can afford assume that these calls never will create a continuation... > and - on numeric dont create continuation objects, but foo?. Of course this will be a little more complicated and probably the optimizations around this will be about type inferences, and qualifiers



lunes, 2 de enero de 2012

No more excuses for do not testing: picotest 0.0.1 released

Picotest is a gem which allows to write complete test suites in a few lines, targeted primarily to test separated methods, algorithms or little snippets of code

The first gem release (0.0.1) is only a proof on concept to explore the idea which is to avoid "It's not worth testing this" syndrome. This problem occurs when we develop really small features such as helper functions, calculation private methods, etc... and then we fall in "It's not worth testing this", "because test code will be much greater than code being tested".

The solution offered by picotest, is to provide a really small testing dsl to keep the "proportion" even in smaller cases, using hash syntax and other syntaxic sugar

Examples

For example, if you want to test Math.sqrt function, you could use the following:
require "picotest"

suite(1 => 1, 4 => 2, 9 => 3, 16 => 4).test(Math.method(:sqrt))

There is no need to write all cases one by one, if you known as to calculate the input from the output (the inverse of sqrt), in that cases you could use a reverse oracle
require "picotest"

suite( lambda{|x| x**2} => _set(1,2,3,4,5,6,7,8,9,10)).test(Math.method(:sqrt))
# the following is the same as the above example
suite( lambda{|x| x**2} => _set(*(1..10)).test(Math.method(:sqrt))

And mocking is as follows:
class X
attr_accessor :another_object
def foo(data)
data.split(";")[another_object.fieldno].to_f
end
end

# you can return mock objects using mock method again
s.test X.new.method(:foo).mock(:another_object => mock(:fieldno => 2) )
You can read more examples at https://github.com/tario/picotest/tree/master/samples and in the README

Install

gem install picotest

Links