So... I was going through the branching/control flow opcodes, which I already expected to be difficult to implement. The problem is, the assembler will calculate addresses and offsets according to the original instruction sizes. However, the instruction sizes in x86 are different than the instruction sizes of Pessego. The easiest solution is to make a temporary relocation table that holds a mapping for every instruction address in Pessego address space to x86 address space within the allocated executable memory area. This way I can just lookup the new address by indexing the table with the original address. Sounded like a plan!
But then... damn! Of course, the instructions ahead had not been processed yet so every reference to a label that's after the current instruction could not be compiled. I fixed this by implementing a special pre-compilation pass that will populate the relocation table before actually compiling the code.
Another problem was the situation when you would use "JMP A" (which will jump to the address stored in register A). You see, A would contain for example 0, this is the first instruction in the Pessego address space. Of course, this needs to be relocated, but the compiler can not inject a relocated address in the code like I did with immediate addressing, because the contents of A are volatile. To solve this challenge, I added an internal syscall which can be used to get a relocated address from within the generated x86 code on runtime.
By now, it mostly works. The only problem remaining is the separation of data and code sections. Sometimes, "0" will refer to the beginning of the data section (effectively; beginning of the executable memory), but sometimes "0" refers to the code offset within the executable memory. This will most likely be fixed if I can normalize what "0" means in this context. I might also move the data section out of the executable memory. That seems nicer, more secure, and makes more sense too.
To be continued...