Tario's Project: How AST hook works and what are the current implementations

AST (Abstract Syntax Tree) hook is a technique to control the behavior of certain ruby node elements like method calls, global variables, etc...

First of all, certain ruby interpreters (including MRI) has an internal representation of AST which is called "node tree", each piece of code that is read by the interpreter for execution is parsed and is represented using a node tree structure. Then. this structure is read at runtime and "executed" by the interpreter core/vm.

So, the tree hook technique implies the modification of that tree before it is read and executed by the interpreter in order to perform certain "hooks". This is done by patching the tree and inserting new node elements.

Roughly, In the normal execution flow of the interpreter, the code is parsed and translated into the AST and finally executed by the vm which is part of the interpreter core:

When AST hook operate, it change the node tree after this is built and before it is executed

For example, the call node can be intercepted by changing_

The last node layout is a valid node tree structure too, which emulates the call after notify the event to a handler which decides to do

Current Implementations

MRI (Matz Ruby Interpreter) Hack

The first implementation of tree patching was a hack of the MRI which implements the patching directly to the node tree structure located in the memory of the ruby interpreter process, certain nodes pointer can be obtained in a C extension using a trick like that:



VALUE hook_block(VALUE self, VALUE handler) {
process_node(ruby_frame->node->nd_recv, handler);
}

And then, the node tree can be walked to make the patching. For example, patching the call node:


void patch_call_node(NODE* node, VALUE handler) {
NODE* args1 = NEW_LIST(NEW_LIT(ID2SYM(node->nd_mid)));
NODE* args2 = NEW_LIST(NEW_LIT(handler));

node->nd_recv = NEW_CALL(node->nd_recv, method_hooked_method, args1);
node->nd_recv = NEW_CALL(node->nd_recv, method_set_hook_handler, args2);

node->nd_mid = method_call;
}

Advantages

Easy to implement in C using interpreter code

Disadvantages

MUST be implemented in C to access the interpreter internal structures
Poor compatibility, the implementation works using internal structures of particular ruby interpreter version (e.g. wont work in ruby 1.9)

https://github.com/tario/evalhook /blob/v0.2.0/ext/evalhook_base/evalhook_base.c

Partial Ruby

In the previous detailed implementation, the main problem of the hack was the compatibility, because the tree patching is performed by changing internal structures of the interpreter which may or not may exists in the interpreter. This was done in that way because the interpreter does not expose any services in their API which serves to modify the tree (in fact, could not be any node tree in many interpreters).

The solution, is to create another interpreter which the needed services are exposed, using resources that exists in the environment: parser, api and VM.
Basically, PartialRuby parse the input ruby source file to an AST represented with ruby structures, after that, executes the ruby AST by emulating it using ruby and finally, pass the emulation ruby code to the real interpreter.

In this scenario, partial ruby expose in their API the services needed to perform the node tree patching

Advantages

Can be implemented in pure ruby
No access to ruby interpreter internals needed: compatibility granted

Disadvantages

Must re-implement a part of the Ruby VM

https://github.com/tario/partialruby

Tario's Project

sábado, 19 de marzo de 2011

How AST hook works and what are the current implementations

No hay comentarios:

Publicar un comentario