JOP - Java Optimized Processor

Home
Documentation
Publications
Performance
Download
Applications
Simulation
Links

Cyclone Board
LEON/Nios
Acex Board

LEGO MindStorms
Java TCP/IP

Teaching Material

Contact

First Approach: A general purpose accu/register machine

This design is the result from earlier researches on simple processor structures. There are no JVM specific parts.

The processor is a accu register machine. This means that every result from an operation is stored in on special register the accu. The operands are a register and the accu. For load and stores there are two addressing modes: register and inirect with displacement. All constants are realized as a special registers. There are up to 1024 registers. All instructions (except branches and jumps) take one cycle (3 stage pipeline). Accu and registers are 32 bit. The instructions are fetched from on chip memory.

Instruction Set

The instruction set is in the idea of RISC, very simple and fits in 16 bit.

00dd ddrr rrrr rrrr	ld (reg+d)
01dd ddrr rrrr rrrr	st (reg+d)

1000 oooo oooo oooo	bz $+2+o
1001 oooo oooo oooo	bnz $+2+o
1010 oooo oooo oooo	bc $+2+o
1011 oooo oooo oooo	bnc $+2+o

1100 00rr rrrr rrrr	ld reg
1100 01rr rrrr rrrr	and reg
1100 10rr rrrr rrrr	or reg
1100 11rr rrrr rrrr	xor reg

1101 00rr rrrr rrrr	add reg
1101 01rr rrrr rrrr	sub reg
1101 10rr rrrr rrrr	st reg
1101 11rr rrrr rrrr	j (reg)

1110 00-- ---- ----	shr
1110 01-- ---- ----	shl
1110 10-- ---- ----	sbr
1110 11-- ---- ----	sbl

An example program:

Serial.asm reads one line (with echo) from the serial port and prints the reversed string.

JVM Implementation

The current JVM (jvm.asm) is implemented in JOP-assembler. Every instruction is fetched, decoded and executed in software. JVM PC and SP are ordinary registers of JOP.

Fetch and decode:

fetch                  // label
        ld    (pc+1)   // pc points to old byte code
        add   jmp_tbl
        st    tmp
        ld    pc       // increment pc
        add   1
        st    pc
        jp    (tmp)    // jump indirect in jump table

Part of jump table:

jmp_tbl
        ...            // all 256 possible values are listed
        bnz   iadd     // 0x60
        bnz   notimp
        bnz   notimp   // means not jet implemented
        bnz   notimp
        bnz   isub
        bnz   notimp
        bnz   notimp
        bnz   notimp
        ...

It can be seen that fetch and decode of a JVM instrcution takes 6 + 2*3 + n cycles. With n cycles added for the memory access time.

A sample JVM stack instruction:

iadd
        ld    (sp)     // read first argument
        st    tmp
        ld    (sp-1)   // read second argument
        add   tmp      // execute
        st    (sp-1)   // store back
        ld    sp       // decrement stack pointer
        sub   1
        st    sp

        jp    fetch    // jmp to next fetch

The execution of this instruction takes 8 + 3 + 3*n cycles.

In the best case (assuming a local, single cycle memory) a simple instruction takes:

12 cycles for fetch and decode
11 cycles exectue

Some Enhancements

With the jump table in local memory (0 WS) at address 0 the fetch/decode can be reduced to 8 + 3 + n cycles:

fetch
        ld    (pc+1)   // pc point to old byte code
        st    tmp
        st    tmp      // read after write
        ld    (tmp)
        st    tmp
        ld    pc       // increment pc
        add   1
        st    pc
        jp    (tmp)    // jump indirect to instruction

A change in the instrcution set with ld and jmp indirect accu would reduce the fetch/decode to 5 + 3 + n cycles.

fetch
        ld    pc       // increment pc
        add   1
        st    pc
        ld    (a)      // load instruction accu indirect
        ld    (a)      // load address from jmp table accu indirect
        jp    (a)      // jump accu indirect to instruction

The last possible enhancement with this architecture would be to copy the fetch code to the end of every instruction. This will save one jmp but adds extra code.

iadd
        ld    (sp)     // read first argument
        st    tmp
        ld    (sp-1)   // read second argument
        add   tmp      // execute
        st    (sp-1)   // store back
        ld    sp       // decrement stack pointer
        sub   1
        st    sp
        ld    pc       // increment pc
        add   1
        st    pc
        ld    (a)      // load instruction accu indirect
        ld    (a)      // load address from jump table accu indirect
        jp    (a)      // jump accu indirect to instruction

Now one instruction takes 13 + 3 + 4*n cycles. The main disadvantage is, that we loose the central execution point of the fetch loop. This point would be ideal to switch threads and poll for I/O requests.

You can download the complete design.

More enhancements need a redesign: Second Approach: More specific for the JVM