undar-lang/SPECIFICATION.org at main

7.3 KiB

Raw Permalink Blame History

Binary interface
- Numbers
Memory
Opcodes
- could be encoded
- Maybe more flexible calling convention?
  - All 32 bit instructions (registers are all 32 bit values)

#TITLE Project Specification

Binary interface

The VM does not use floating point numbers, it instead uses fixed point numbers.

This is for portability reasons as some devices might not have a FPU in them

especially microcontrollers and some retro game systems like the PS1.

Numbers

type	size (bytes)	description
u8	1	unsigned 8bit, alias `char` and `byte`
bool	1	unsigned 8bit, `false` or `true`
i8	1	signed 8bit for interop
u16	2	unsigned 16bit for interop
i16	2	signed 16bit for interop
u32	4	unsigned 32bit, alias `nat`
i32	4	signed 32bit, alias `int`
f32	4	signed 32bit fixed number, alias `real`

Memory

Uses a harvard style archecture, meaning the code and ram memory are split up into two seperate blocks.

In the C version you can see these are two seperate arrays 'code' and 'mem'.

During compilation constants and local variables are put onto 'mem'

Opcodes

2^n	count
2^3	8
2^4	16
2^5	32
2^6	64
2^8	128

could be encoded

op type [this is to maximize jump immidate and load immidate size]
memory location
local value / register
local value type

Simplest

[opcode][dest][src1][src2] [8][8][8][8]

Maximize inline jump and load-immidate

[0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0]

[0 0 0 | 0 0 0 0 0][0 0 0 | 0 0 0 0 0][0 0 0 | 0 0 0 0 0][0 0 0 | 0 0 0 | 0 0 ] noop [0 0 0 | 0 0 0 0 0][0 0 0 | 0 0 0 0 0][0 0 0 | 0 0 0 0 0][0 0 0 | 0 0 1 | 0 0 ] call [0 0 0 | 0 0 0 0 0][0 0 0 | 0 0 0 0 0][0 0 0 | 0 0 0 0 0][0 0 0 | 0 1 0 | 0 0 ] return [0 0 0 | 0 0 0 0 0][0 0 0 | 0 0 0 0 0][0 0 0 | 0 0 0 0 0][0 0 0 | 0 1 1 | 0 0 ] syscall? [0 0 0 | 0 0 0 0 0][0 0 0 | 0 0 0 0 0][0 0 0 | 0 0 0 0 1][0 0 0 | 1 0 0 | 0 0 ] exit

[0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0][0 0 0 0 0 0 | 0 1 ] jump ~1GB range [0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0][0 0 0 0 0 0 | 1 0 ] load-immidate 2^30 max

multibyte ops

ones that would be easier if they were multibyte
- jump
- load immidate
- syscall
- call

0 0 - system, lowest because lower opcodes are faster 0 1 - memory 1 0 - math 1 1 - jump

[0 0][0 0 0 0 0 0] = [system][no op] [0 0][1 0 0 0 0 0] = [system][loadimm] [0 0][0 0 0 0 0 0] = [system][return]

J [0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0][0 0 0 0 0 0 | 0 1] L [0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0][0 | o o o o o | 1 0]

[0 0 0 0 0 | r r r][r r r r | r r r r][r r | r r r r r r][b b | t | o o o | 1 1]

[0 0 0 | r r r r r][0 0 0 | r r r r r][0 0 0 | r r r r r][b b | t | o o o | 1 1] [0 0 0 | 0 0 0 0 0][0 0 0 | 0 0 0 0 0][0 0 0 | 0 0 0 0 0][0 0 | 1 | 0 0 1 | 1 1] [math][add][f][8] [0 0 0 | 0 0 0 0 0][0 0 0 | 0 0 0 0 0][0 0 0 | 0 0 0 0 0][0 0 | 1 | 0 1 0 | 1 1] [math][sub][f][8] [0 0 0 | 0 0 0 0 0][0 0 0 | 0 0 0 0 0][0 0 0 | 0 0 0 0 0][0 0 | 1 | 0 1 1 | 1 1] [math][mul][f][8] [0 0 0 | 0 0 0 0 0][0 0 0 | 0 0 0 0 0][0 0 0 | 0 0 0 0 0][0 0 | 1 | 1 0 0 | 1 1] [math][div][f][8] [0 0 0 | 0 0 0 0 0][0 0 0 | 0 0 0 0 0][0 0 0 | 0 0 0 0 0][0 0 | 1 | 1 0 1 | 1 1] [math][mod][f][8]

[0 1][0 0 1][0][0 0] = [math][sub][f][16] [0 1][0 1 0][0][0 0] = [math][mul][f][32] [0 1][0 1 1][0][0 0] = [math][div][f][?] [0 1][1 0 0][0][0 0] = [math][and][f][?]

[1 1][0][0 0][0 0] = [jmp][u][eq] [1 1][0][0 0][0 0] = [jmp][u][ne] [1 1][0][0 0][0 0] = [jmp][u][lt] [1 1][0][0 0][0 0] = [jmp][u][gt]

[1 1][1][0 0 0][0 0] = [jmp][s][le] [1 1][1][0 0 0][0 0] = [jmp][s][ge] [1 1][1][0 0 0][0 0] = [jmp][s][] [1 1][1][0 0 0][0 0] = [jmp][s][]

3 2 2 1 […] i 32 0

3 1 3 1 […] u rl j […] u ab j […] u eq j […] u nq j […] u lt j […] u le j […] u gt j […] u ge j

[jmp][dest][18] [lli][dest][2]

int 3

add sub mul div

jeq jne

jlt jle jgt jge

Maybe opcodes could be Huffman encoded? That way the smaller opcodes get more operand data within the u32

At compile time each function gets N number of locals (up to 255). These are allocated onto memory along with everything else, but they come before the heap values.

Maybe instead of push pushing onto a stack it instead pushes onto a child frame? Pops would just mark what locals need to be replaced by the child function? Maybe we can get rid of call/return and just use jumps?

U8 u8 u8 u8 Push/pop parent_local child_local metadata

Maybe more flexible calling convention?

Memory-to-memory with register characteristics?

Passed in values

Copy each argument from the callers local to the callees local. This includes pointers.

child modifies the heap

If a child modifies a value in the parents heap do nothing, this is expected behavior.

If a child changes the size of a parents heap then copy the heap value to the child’s frame.

Returned values

If a primitive value just copy from child local to parent local

If a heap value is returned but placed in a new local in the parent then copy the child to the parent and update the frames memory pointer

If a heap value is replaced (i.e. the return sets a heap value with its modified version) then Sort each returned value by its pointers location in memory, lowest first Move to position of returned values lowest ptr position. Read fat ptr size of the earliest value. Take the current size of heap. Move to just after the end of the size + ptr. Copy all values from that location through current end of heap to the old start location of that value. Subtract the old size of the value from the mp. Copy the new sized value and put it at the current end of the heap. Update the new pointer’s local position. Add the new size to the mp. Repeat for each returned value that is a replaced heap value.

Opcodes are variable sized, each part is 8 bytes to maximize portability.

8 bit Type Z: [u8:opcode] 16 bit Type Q: [u8:opcode][u8:dest] 24 bit Type I: [u8:opcode][u8:dest][u8:src1] 32 bit Type M: [u8:opcode][u8:dest][u8:src1][u8:src2] 40 bit Type J: [u8:opcode][u32:dest] 56 bit Type S: [u8:opcode][u8:dest][u8:src1][u32:imm]

Fixed size

All 32 bit instructions (registers are all 32 bit values)

Type I: [8:opcode][8:dest][16:immediate]

load-immediate : for small 16 bit consts, and lower part of 32 bit consts
load-upper-immediate : for large 32 bit consts
load-indirect : dest
store-indirect : dest
load-absolute : dest
store-absolute : dest
jump-absolute : dest

Type M: [8:opcode][8:dest][8:src1][8:src2]

load-offset dest src1 src2
store-offset dest src1 src2
call dest num-of-args ptr-to-function
return dest return-arg
syscall id num-of-args ptr-to-memory
memset dest src1 count
long-jump : src1 used as address
add
sub
mul
div
and
or
xor
shift-left
shift-right
shift-right-sign-extend
neg
abs
mov : move register value
alloc : dest size unused
jeq (jump eq)
jne (jump not eq)
jgt (jump greater than)
jlt (jump less than)
jle (jump less than or eq)
jge (jump greater then or eq)
jf (jump if flag < zero)

Type Z: [8:opcode][24:immediate]

noop : immediate is unused (all zeros)
jump : immediate jump

7.3 KiB Raw Permalink Blame History Unescape Escape