|
JOP - Java Optimized Processor
|
Home |
First Approach: A general purpose accu/register machineThis design is the result from earlier researches on simple processor structures. There are no JVM specific parts. The processor is a accu register machine. This means that every result from an operation is stored in on special register the accu. The operands are a register and the accu. For load and stores there are two addressing modes: register and inirect with displacement. All constants are realized as a special registers. There are up to 1024 registers. All instructions (except branches and jumps) take one cycle (3 stage pipeline). Accu and registers are 32 bit. The instructions are fetched from on chip memory. Instruction SetThe instruction set is in the idea of RISC, very simple and fits in 16 bit.
An example program:Serial.asm reads one line (with echo) from the serial port and prints the reversed string. JVM ImplementationThe current JVM (jvm.asm) is implemented in JOP-assembler. Every instruction is fetched, decoded and executed in software. JVM PC and SP are ordinary registers of JOP. Fetch and decode:
fetch // label
ld (pc+1) // pc points to old byte code
add jmp_tbl
st tmp
ld pc // increment pc
add 1
st pc
jp (tmp) // jump indirect in jump table
Part of jump table:
jmp_tbl
... // all 256 possible values are listed
bnz iadd // 0x60
bnz notimp
bnz notimp // means not jet implemented
bnz notimp
bnz isub
bnz notimp
bnz notimp
bnz notimp
...
It can be seen that fetch and decode of a JVM instrcution takes 6 + 2*3 + n cycles. With n cycles added for the memory access time. A sample JVM stack instruction:
iadd
ld (sp) // read first argument
st tmp
ld (sp-1) // read second argument
add tmp // execute
st (sp-1) // store back
ld sp // decrement stack pointer
sub 1
st sp
jp fetch // jmp to next fetch
The execution of this instruction takes 8 + 3 + 3*n cycles. In the best case (assuming a local, single cycle memory) a simple instruction takes:
Some EnhancementsWith the jump table in local memory (0 WS) at address 0 the fetch/decode can be reduced to 8 + 3 + n cycles:
fetch
ld (pc+1) // pc point to old byte code
st tmp
st tmp // read after write
ld (tmp)
st tmp
ld pc // increment pc
add 1
st pc
jp (tmp) // jump indirect to instruction
A change in the instrcution set with ld and jmp indirect accu would reduce the fetch/decode to 5 + 3 + n cycles.
fetch
ld pc // increment pc
add 1
st pc
ld (a) // load instruction accu indirect
ld (a) // load address from jmp table accu indirect
jp (a) // jump accu indirect to instruction
The last possible enhancement with this architecture would be to copy the fetch code to the end of every instruction. This will save one jmp but adds extra code.
iadd
ld (sp) // read first argument
st tmp
ld (sp-1) // read second argument
add tmp // execute
st (sp-1) // store back
ld sp // decrement stack pointer
sub 1
st sp
ld pc // increment pc
add 1
st pc
ld (a) // load instruction accu indirect
ld (a) // load address from jump table accu indirect
jp (a) // jump accu indirect to instruction
Now one instruction takes 13 + 3 + 4*n cycles. The main disadvantage is, that we loose the central execution point of the fetch loop. This point would be ideal to switch threads and poll for I/O requests. You can download the complete design. More enhancements need a redesign: Second Approach: More specific for the JVM
| ||||||||||||||||||||||||||||||||||||||||||||
|
| |||||||||||||||||||||||||||||||||||||||||||||
|
Copyright © 2000-2007, Martin Schoeberl
| |||||||||||||||||||||||||||||||||||||||||||||