Saturday, November 1, 2014

Lo and behold! A multi-dimensional translation table.

One of the biggest hurdles in this project has been the mechanics of translating Pessego-bytecode to x86 machine code (of course, it's a JIT!). I took 3 different approaches;

  • A completely control-flow based approach, checking the Pessego RM bytes bit-by-bit to construct an x86 instruction
  • An experiment with a multi-dimensional "translation table". 
  • A hybrid of the former 2


The first approach failed miserably. The instruction set is familiar to x86, but nowhere near compatible enough that it can be translated one-to-one. Quite soon I changed my approach.

The translation table had the upside that in theory it should be very fast because you index the translation table with the elements that form a Pessego instruction. Indexing takes some time, but a lot less than shifting and comparing bits on runtime. So, eager to find out whether this concept would work, I began to rewrite my translator for the 'PUT' opcode. However, I soon realized that this would take me ages. There are 2553 possible combinations (16,481,375 to be precise).

I pondered quite a while how to solve this matter. The idea of populating the matrix on runtime crossed my mind, but I figured that would not work because the x86 instructions are not compatible enough. Soon enough I just tried something and I managed to reduce the length of my "PUT" translator code from 700 lines and 75% completion to 156 lines and 100% completion.

How did I do it? Well, instead of abandoning the idea of automating the population of the table, I began to search for specific similarities between x86 and Pessego instructions. Also, when I learned more about how the x86 instructions are formed (opcode, R/M modifiers, etc.) I was able to programmatically generate the necessary bytes. Success!

This PAsm code:

Became this x86 code:

No comments:

Post a Comment