================================================================================
B A S T A R D                                            disassembly environment


                        Intermediate Code Format



================================================================================
 Contents

 1. Introduction
 2. INT_CODE Object Definition
 3. Virtual Registers
 4. Operand Types
 5. Instruction Format
 6. Standard (Processor) Instructions
 7. Non-standard Instructions (Directives)
 8. Supported Traps
 9. Implementation: Intel to INT_CODE



================================================================================
 Introduction


	dream on. here's the lowdown:
		* no address expressions are allowed
		* no arch-specific registers may be used
		* no explicit addressing
		* no prefetch or branch delay instructions
		* register-memory architecture
		* SPARC-based instruction set




================================================================================
 INT_CODE Object Definition


This is how each line of intermediate code will be represented internally:


	struct INT_CODE {
		unsigned long id;
			/* the actual instruction */
		unsigned long opcode;
		unsigned long src, dest, aux;
			/* operand types */
		unsigned long sType, dType, aType;
			/* housekeeping stuff */
		unsigned long fn_id;		/* function owning this */
		unsigned long addr_id;		/* addr in original asm */ 
		unsigned int order;		/* order after addr_id */
		unsigned long cmt_id;		/* associated comment */
	};



================================================================================
 Virtual Registers


The following groups of registers will be used in INT_CODE representation:

	General Purpose
		g0, g1, g2, .... gFF
	
	Incoming Arguments
		i0, i1, i2, ... iFF
	
	Outgoing Arguments [parameters to called procedures]
		o0, o1, o2, ... oFF

	Local Variables
		l0, l1, l2, ... lFF

	Stack Pointer
		sp

	Frame Pointer
		fp

	Program Counter
		pc

	Condition Codes
		cz					/* zero flag */
		cn					/* negative flag */
		cv					/* overflow flag */
		cc					/* carry flag */

	Physical Registers
		r0, r1, r2, ... rMAX




================================================================================
 Operand Types


The operand types are of the following format:

	00 00 00 00
	^^---------- global flags			{ deref }
	   ^^------- operand basic type		{ reg, imm, label }
	      ^^---- operand specific type		{ per-basetype-specific }
	         ^-- operand size			{ byte, long, qword }
	          ^- operand access			{ r, w, x }

...this can probably be whittled  down to a short if need be.

An operand may be a 'virtual register', an immediate value, or a reference to
a label created with the .label directive [i.e., a NAME or CODE object from
the main bastard disassembly]. Operands may be dereferenced, in which case
their size attribute represents the size of the item pointed to, since the
operand itself will always contain data of size DEFAULT_MACHINE_ADDR_SIZE.

Note that an operand may never be an absolute address, a relative address, or
an address expression; absolute addresses should be referenced by labels, and
address expressions [absolute or relative] should be calculated in a register
prior to referencing the address.

In the AT&T syntax, most operands are prefixed by special characters denoting
their nature; this tradition will be followed when representing INT_CODE 
objects.

 	Character prefixes for operand types:
		immediate value signed/unsigned	'$'
		label                          	None
		label.local-label              	None
		register                       	'%'
		dereference any of the above   	'[]' or '*'
		comment					'#'
	

	/* basic op types */
	enum basic_op_types 	{
		g_reg,		/* general register */
		i_reg,		/* incoming register */
		o_reg,		/* outgoing register */
		l_reg,		/* local register */
		spec_reg,		/* special register */
		r_reg,		/* "real" register */
		imm_val,		/* immediate value */
		label			/* address label */
	};


	/* specific op types */
	enum special_regs   	{
		sp_reg,		/* stack pointer */
		fp_reg,		/* frame pointer */
		pc_reg,		/* program counter */
		cz_reg,		/* Zero Condition */
		cn_reg,		/* Negative (Sign) Condition */
		cv_reg,		/* Overflow Condition */
		cc_reg		/* Carry Condition */
	};

	enum immediate_ops	{
		imm_byte,
		imm_ubyte,
		imm_hword,
		imm_uhword,
		imm_word,
		imm_uword
	};

	enum label_ops		{
		label_name,
		label_code,
		label_func,
		label_struct 	/* and so on and so on... */
	};

	#define DEREF_OP        0x10 00 00 00




================================================================================
 Instruction Format


Since this will never be a compilable architecture, there is no need for a very
efficient instruction set. Each instruction is 4 bytes, and the operands are 
encoded the INT_CODE structure ... they are not present in the instruction
at all.

The reason for providing an 'opcode' is to represent the instruction set as 
a collection of unrelated instructions that tend to have many modifiers [ i.e.
condition codes, signed/unsigned, etc]; the mnemonic is generated from an
instruction, and the instruction itself provides information about the 'type'
of instruction.

Here is the basic opcode format, in 4 bytes:

		0x00 00 00 00
		            ^----- dest size
		           ^------ src size
		         ^-------- cond code
		        ^--------- trap or branch type [unused]
	  	     ^^----------- instruction
  		  ^^-------------- instruction type [FPU, basic, special]
	



================================================================================
  Standard (Processor) Instructions


Fields
	Syntax : This is in the format
	              mnemonic [operand-type operand-name[, ...]]
	         ...where the operand types are any combination of
	              r 	-- register operand
	              m	-- memory operand [i.e., code or address label]
	              i  -- immediate operand
	         Note that the first operand is always 'src', 'dest' is always
	         the last operand; in 2-operand instructions, the second operand
	         is either 'arg' or 'dest' depending on context, i.e. whether or 
	         not the argument is written.
					  
	Outputs : The direct effects of the instruction. Usually, the operand
	          named 'dest' [always the last operand] is overwritten.
				 
	Flags Affected : Side effects of the instruction. 
	
	Basic Form : The basic opcode format. This is in the form of 4 hexadecimal
	             bytes, with the second byte replaced by the appropriate
	             mnemonic, and is of the format
	                 instr-type mnemonic conditon-code op-sizes
	             For most instructions, the standard format
	                 00 mnemonic 00 00
	             will apply, possibly with operand size specifiers replacing
	             the last byte. 
	             
		Operand sizes are:
			enum op_size { 	none, 
				byte, ubyte, 	/* 1, 2 *//* 00001b == signed */
				hword, uhword,	/* 3, 4 */
				word, uword, 	/* 5, 6 */
				dword, udword,	/* 7, 8 */
				qword, uqword,	/* 9, A */
				ext_prec	/* B */ /* extended precision */
			};

		Condition codes are:
			char *branch_type[] { 	
				"n",	"a",	/*  NEVER, ALWAYS  */
				"e",	"ne",	/*  EQUAL, NOT EQUAL */
				"g",	"le",	/*  GREATER, LESSER/EQUAL */
				"l",	"ge",	/*  LESSER, GREATER/EQUAL */
				"neg",	"pos",	/*  NEGATIVE, POSITIVE */
				"cs",	"cc",	/*  CARRY, NO CARRY */
				"vs",	"vc"	/*  OVERFLOW, NO OVERFLOW */
			};

	Variants: Instructions with condition codes or operand-size specifications
	          will have a number of variant forms depending on the condition
	          code byte or the operand size byte. These variants are listed
	          with the full mnemonic and the corresponding condition code and
	          operand size bytes.
	          


add		Integer Addition 
	Adds 'src' and 'arg' operands
	Syntax:
		add 	r/m/i src, r/m/i arg, r/m dest
	Outputs:
		dest is overwritten
	Flags Affected:
		??
		%cc	Result overflowed (unsigned)
		%cv	Result overflowed (signed)
		%cn	Result is a negative number
	Basic Form:
		00 add 00 SD
	Variants:
		00 add  00 55 		; integer add
		00 addx 00 BB		; extended precision add 						


and		Bitwise AND
	Bitwise AND of 'src' with 'arg'
	Syntax:
		and	r/m/i src, r/m/i arg, r/m dest
	Outputs:
		dest is overwritten
	Flags Affected:
		??
	Basic Form:
		00 and 00 SD
	Variants:
		00 andb 00 22
		00 andh 00 44
		00 andw 00 66
		00 andd 00 88


bcc		Branch on Condition
	Branch to new instruction address
	Syntax:
		b{cc}	r/m src
	Outputs:
		None.
	Flags Affected:
		None.
	Basic Form:
		00 b 0C 00					
	Variants:
		00 bn   00 00					
		00 ba   01 00					
		00 be   02 00					
		00 bne  03 00					
		00 bg   04 00					
		00 ble  05 00					
		00 bl   06 00					
		00 bge  07 00					
		00 bneg 08 00					
		00 bpos 09 00					
		00 bcs  0A 00					
		00 bcc  0B 00					
		00 bvs  0C 00					
		00 bvc  0D 00					
	

bclr		Bit Clear
	Clears bit number 'arg' in register or memory 'src'
	Syntax:
		bclr	r/m src, r/m/i arg
	Outputs:
		src is overwritten
	Flags Affected:
		None.
	Basic Form:
		00	bclr	00 S0
	Variants:
		00	bclrb	00 20
		00	bclrh	00 40
		00	bclrw	00 60
		00	bclrd	00 80


bset		Bit Set
	Sets bit number 'arg' in register or memory 'src'
	Syntax:
		bset	r/m src, r/m/i arg
	Outputs:
		src is overwritten
	Flags Affected:
		None.
	Basic Form:
		00	bset	00 S0
	Variants:
		00	bsetb	00 20
		00	bsetw	00 40
		00	bseth	00 60
		00	bsetd	00 80


btog		Bit Toggle
	Toggles bit number 'arg' in register or memory 'src'
	Syntax:
		btog	r/m src, r/m/i arg
	Outputs:
		src is overwritten
	Flags Affected:
		None.
	Basic Form:
		00	btog	00 S0
	Variants:
		00	btogb	00 20
		00	btogh	00 40
		00	btogw	00 60
		00	btogd	00 80


btst		Bit Test
	Sets zero flag to value of bit number 'arg' in register or memory 'src'
	Syntax:
		btst	r/m src, r/m/i arg
	Outputs:
		None.
	Flags Affected:
		None.
	Basic Form:
		00	btst	00 S0
	Variants:
		00	btstb	00 20
		00	btsth	00 40
		00	btstw	00 60
		00	btstd	00 80


call		Call Procedure
	Call a procedure or subroutine
	Syntax:
		call	r/m src
	Outputs:
		None.
	Flags Affected:
		None.
	Basic Form:
		00 call 00 00
	Variants:
		None.
	

clr		Clear Register or Memory
	Sets 'src' to zero
	Syntax:
		clr	r/m src
	Outputs:
		src is overwritten
	Flags Affected:
		None.
	Basic Form:
		00	clr	00 S0
	Variants:
		00	clrb	00 20	
		00	clrh	00 40	
		00	clrw	00 60	
		00	clrd	00 80	


cmp		Compare two values
	Subtract 'arg' from 'src' and discard the results
	Syntax:
		cmp	r/m src, r/m/i arg
	Outputs:
		None.
	Flags Affected:
		??
	Basic Form:
		00 cmp	00 SD
	Variants:
		00 cmpb	00 22
		00 cmph	00 44
		00 cmpw	00 66
		00 cmpd	00 88


dec		Decrement
	Subtract 1 from 'src'
	Syntax:
		dec	r/m src
	Outputs:
		src is overwritten
	Flags Affected:
		??
	Basic Form:
		00 dec 00 S0
	Variants:
		00 decb 00 20
		00 dech 00 40
		00 decw 00 60
		00 decd 00 80


div		Divide
	Divide 'src' by 'arg'
	Syntax:
		div	r/m/i src, r/m/i arg, r/m dest
	Outputs:
		dest is overwritten
	Flags Affected:
		??
	Basic Form:
		00 div 00 SD
	Variants:
		00 div  00 55
		00 divx 00 BB
	

inc		Increment
	Add 1 to 'src'
	Syntax:
		inc	r/m src
	Outputs:
		src is overwritten
	Flags Affected:
		??
	Basic Form:
		00 inc 00 S0
	Variants:
		00 incb 00 20
		00 inch 00 40
		00 incw 00 60
		00 incd 00 80


jmp		Jump
	Unconditional branch: same as branch always 
	Syntax:
		jmp	r/m src
	Outputs:
		None.
	Flags Affected:
		None.
	Basic Form:
		00 jmp 00 00
	Variants:
		None.


ld 		Load
	Load memory 'src' to register 'dest'
	Syntax:
		ld		m src, r dest
	Outputs:
		dest is overwritten
	Flags Affected:
		None.
	Basic Form:
		00 ld 00 SD
	Variants:
		00 ldb 00 22
		00 ldh 00 44
		00 ldw 00 66
		00 ldd 00 88
	

mod		Modulus
	Set 'dest' to 'src' modulo 'arg'
	Syntax:
		mod	r/m/i src, r/m/i arg, r/m dest
	Outputs:
		dest is overwritten
	Flags Affected:
		??
	Basic Form:
		00 mod 00 SD
	Variants:
		00 mod  00 55		; integer mod


mul 		Multiply 
	Multiply 'src' by 'arg'
	Syntax:
		mul	r/m/i src, r/m/i arg, r/m dest
	Outputs:
		dest is overwritten
	Flags Affected:
		??
	Basic Form:
		00 mul 00 SD
	Variants:
		00 mul  00 55		; integer mul
		00 mulx 00 BB		; extended-precision mul


mv			Move
	Move register/imm 'src' to register 'dest'
	Syntax:
		mv		r/i src, r dest
	Outputs:
		dest is overwritten
	Flags Affected:
		None.
	Basic Form:
		00 mov 00 SD
	Variants:
		00 movb 00 22
		00 movh 00 44
		00 movw 00 66
		00 movd 00 88


neg		Neg
	Two's complement of 'src'
	Syntax:
		neg	r/m/i src, r/m dest
	Outputs:
		dest is overwritten
	Flags Affected:
		None.
	Basic Form:
		00 neg 00 S0
	Variants:
		00 negb 00 20
		00 negh 00 40
		00 negw 00 60
		00 negd 00 80


not		Not
	One's complement of 'src'
	Syntax:
		not	r/m/i src, r/m dest 
	Outputs:
		dest is overwritten
	Flags Affected:
		None.
	Basic Form:
		00 not 00 S0
	Variants:
		00 notb 00 20
		00 noth 00 40
		00 notw 00 60
		00 notd 00 80


or			Bitwise OR
	Bitwise OR of 'src' with 'arg'
	Syntax:
		or		r/m/i src, r/m/i arg, r/m dest
	Outputs:
		dest is overwritten
	Flags Affected:
		??
	Basic Form:
		00 or 00 SD
	Variants:
		00 orb 00 22
		00 orh 00 44
		00 orw 00 66
		00 ord 00 88


restore	Restore Context
	Restore register context
	Syntax:
		restore
	Outputs:
		None, though all registers are overwritten.
	Flags Affected:
		None.
	Basic Form:
		00 restore 00 00
	Variants:
		None


ret		Return
	Return from procedure or subroutine
	Syntax:
		ret
	Outputs:
		None.
	Flags Affected:
		None.
	Basic Form:
		00 ret 00 00 
	Variants:
		None


rol		Rotate Left
	Rotate 'src' left by 'arg' bits
	Syntax:
		rol	r/m/i src, r/m/i arg, r/m dest
	Outputs:
		dest is overwritten
	Flags Affected:
		None.
	Basic Form:
		00 rol 00 SD
	Variants:
		00 rolb 00 22
		00 rolh 00 44
		00 rolw 00 66
		00 rold 00 88


ror		Rotate Right
	Rotate 'src' right by 'arg' bits
	Syntax:
		ror	r/m/i src, r/m/i arg, r/m dest
	Outputs:
		dest is overwritten
	Flags Affected:
		None.
	Basic Form:
		00 ror 00 SD
	Variants:
		00 rorb 00 22
		00 rorh 00 44
		00 rorw 00 66
		00 rord 00 88


save		Context save
	Save current register context
	Syntax:
		save
	Outputs:
		None.
	Flags Affected:
		None.
	Basic Form:
		00 save 00 00
	Variants:
		None


set		Set Bits
	Set all bits in 'src' register or memory location
	Syntax:
		set	r/m  src
	Outputs:
		src is overwritten
	Flags Affected:
		None.
	Basic Form:
		00 set 00 S0
	Variants:
		00 setb 00 20
		00 seth 00 40
		00 setw 00 60
		00 setd 00 80


sll		Shift Left Logical
	Shift 'src' left by 'arg' bits, zero extending
	Syntax:
		sll	r/m/i src, r/m/i arg, r/m dest
	Outputs:
		dest is overwritten
	Flags Affected:
		None.
	Basic Form:
		00 sll 00 SD
	Variants:
		00 sllb 00 22
		00 sllh 00 44
		00 sllw 00 66
		00 slld 00 88


sra		Shift Right Arithmetic
	Shift 'src' right by 'arg' bits, sign extending.
	Syntax:
		sra	r/m/i src, r/m/i arg, r/m dest
	Outputs:
		dest is overwritten
	Flags Affected:
		None.
	Basic Form:
		00 sra 00 SD
	Variants:
		00 srab 00 22
		00 srah 00 44
		00 sraw 00 66
		00 srad 00 88

	
srl		Shift Right Logical
	Shift 'src' right by 'arg' bits, zero extending. 
	Syntax:
		srl	r/m/i src, r/m/i arg, r/m dest
	Outputs:
		dest is overwritten
	Flags Affected:
		None.
	Basic Form:
		00 srl 00 SD
	Variants:
		00 srlb 00 22
		00 srlh 00 44
		00 srlw 00 66
		00 srld 00 88


st			Store
	Store register 'src' to memory 'dest'
	Syntax:
		st		r src, m dest 
	Outputs:
		dest is overwritten
	Flags Affected:
		None.
	Basic Form:
		00 st 00 SD
	Variants:
		00 stb 00 22
		00 sth 00 44
		00 stw 00 66
		00 std 00 88


sub		Subtract
	Subtract 'arg' from 'src'
	Syntax:
		sub	r/m/i src, r/m/i arg, r/m/i dest
	Outputs:
		dest is overwritten
	Flags Affected:
		??
	Basic Form:
		00 sub 00 SD
	Variants:
		00 sub  00 55		; integer subtraction
		00 subx 00 BB		; extended-precision subtraction
	

swap		Swap		
	Swap contents of register and reg/memory
	Syntax:
		swap	r/m src, r/m dest
	Outputs:
		src and dest are overwritten
	Flags Affected:
		None.
	Basic Form:
		00 swap 00 SD
	Variants:
		00 swapb 00 22
		00 swaph 00 44
		00 swapw 00 66
		00 swapd 00 88
	

tcc		Trap on Condition
	Generate machine trap number 'src'
	Syntax:
		t 		i src
	Outputs:
		None.
	Basic Form:
		00 t TC 00
	Variants:
		00 tn   00 00					
		00 ta   01 00					
		00 te   02 00					
		00 tne  03 00					
		00 tg   04 00					
		00 tle  05 00					
		00 tl   06 00					
		00 tge  07 00					
		00 tneg 08 00					
		00 tpos 09 00					
		00 tcs  0A 00					
		00 tcc  0B 00					
		00 tvs  0C 00					
		00 tvc  0D 00					
	
	
tret		Trap Return
	Return from trap handler.
	Syntax:
		tret
	Outputs:
		None.
	Flags Affected:
		None.
	Basic Form:
		00 tret 00 00
	Variants:
		None.


tst		Test
	Test 'src' for a non-zero value
	Syntax:
		tst	r/m src
	Outputs:
		None.
	Flags Affected:
		??
	Basic Form:
		00 test 00 S0
	Variants:
		00 testb 00 20
		00 testh 00 40
		00 testw 00 60
		00 testd 00 80


xor 		Exclusive OR
	Bitwise XOR of 'src' with 'arg'
	Syntax:
		xor r/m/i src, r/m/i arg, r/m dest
	Outputs:
		dest is overwritten
	Flags Affected:
		??
	Basic Form:
		00 xor 00 SD 
	Variants:
		00 xorb 00 22
		00 xorh 00 44
		00 xorw 00 66
		00 xord 00 88




================================================================================
  Non-standard Instructions (Directives)


.label
	Generate a symbolic code address for the current location
	Syntax:
		.label		id of CODE object
	Basic Form:
		01 .label 00 00
	Notes: 

.data
	Generate a symbolic data address for the current location
	Syntax:
		.data			id of FUNC_LOCAL object
	Basic Form:
		01 .data 00 00
	Notes: 

.global
	Generate a global symbolic data address for the current location
	Syntax:
		.global		id of NAME object
	Basic Form:
		01 .global 00 00 
	Notes: 


.frame
	Enter stack frame
	Syntax:
		.frame
	Basic Form:
		01 .frame 00 00
	Notes: 


.unframe
	Exit stack frame
	Syntax:
		.unframe
	Basic Form:
		01 .unframe 00 00
	Notes: 


.proc
	Generate a global symbolic code address for the current location
	Syntax:
		.proc			id of FUNCTION object
	Basic Form:
		01 .proc 00 00
	Notes: 

.asm
	Unknown assembler instruction -- verbatim from user
	Syntax:
		.asm			id of CODE object 
	Basic Form:
		01 .asm 00 00
	Notes: 

.block
	Open code block
	Syntax:
		.block		id of INT_CODE object 
	Basic Form:
		01 .block 00 00
	Notes: 
		The INT_CODE object is the condition which "owns" or applies to the
		block. A block may have a NULL INT_CODE object, meaning it is an
		arbitrary block -- always executed.

.unblock 
	Close code block
	Syntax:
		.unblock		id of INT_CODE object
	Basic Form:
		01 .unblock 00 00
	Notes:
		The INT_CODE object is the .block statement that opened this block.

.clobber
	Overwrite register contents
	Syntax:
		.clobber		register
	Basic Form:
		01 .clobber 00 00
	Notes:
		Informs decompiler that 'register' has been cleared of its original
		contents. This is not used when the register is modified [ e.g.
		add %r1, %r2, %r2 ] but only when the new contents are not based
		on the old contents [ e.g. mov %r1, %r2 ]. This is intended to 
		make managing 'dead' registers easier.

.calc
	Dynamic Calculation of Address
	Syntax:
		.calc
	Basic Form:
		01 .calc 00 00
	Notes:
		Informs the decompiler that the following instructions are a
		dynamic address calculation [e.g., an effective or SIB address
		in Intel syntax]. This is merely a 'hint' for treating these
		instructions correctly, and has no bearing on the code itself.

.uncalc
	End Dynamic Calculation of Address
	Syntax:
		.uncalc
	Basic Form:
		01 .uncalc 00 00
	Notes:
		Marks the end of a dynamic address calculation.




================================================================================
 Supported Traps


	Basically, every INT in Intel as well as the IN and OUT instructions will
be implemented as a trap; these are OS specific, and so must be handled in the
EXT_OS module . Still, there may be a need to have some 'magic numbers' to 
identify trap types in the intermediate code.

We'll see.



================================================================================
 Implementation: Intel to INT_CODE


Please reference
	src/arch/i386/i386_intcode.c
	src/arch/i386/i386_intcode.h
	src/arch/i386/i386_intcode.table
