Is it possible to implement subroutine call without a stack nor indirect addressing?-CodePudding

is it possible to use subroutine with a cpu that doesn't feature indirect addressing nor a way to store the program counter, it would only feature :

2 register A and B ( and a status register for carry and zero flag)
And those instructions:
- load the A register with an address as operand
- store the A register with an address as operand
- load an immediate value into the A register
- move the A register into the B register
- move the B register into the A register
- add A to B and store the result in A
- sub B from A and stoee the result in A
- nand A with B
- nor a with B
- branch if carry with a target address as operand
- branch if zero with a target address as operand
- jump with a target address as operand
- halt the cpu
- nop (do nothing for a clock cycle)

The problem is, to return from a subroutine you have to store the program counter somewhere first when you call it and the instruction set doesn't allow me to do this also with indirect addressing I can't return to a variable address.

Is it even possible to implement some sort of subroutine without self modifying code (or using an assembler to keep count of which instruction we are at)?

CodePudding user response：

Evaluation of JC and JZ conditional jumps allow a form of "hack" where subroutines can hardcode four return addresses. The two flags, one bit each, are two bit. Two bits are four possible combinations: 00, 01, 10, 11. These are for the flags C:0 and Z:0, C:0 and Z:1, C:1 and Z:0, C:1 and Z:1.

These sequences at the end of the subroutine evaluate those four conditions, and branch to four different return addresses ("RTN" means the hardcoded return address specified in operand. )

1: JC,5
2: JZ,4
3: JPM,RTN
4: JPM,RTN
5: JZ,7
6: JPM,RTN
7: JMP,RTN

If JC,5 is 0, it continues to JZ,4. If JZ,4 is 0, it picks the first return address. If JC,5 is 0 but JZ,4 is 1, it picks the second return address. If JC,5 is 1 and JZ,7 is 0, it picks the third return address. If JC,5 is 1, and JZ,7 is 1, it picks the fourth address.

A "subroutine" is just when a sequence of instructions that is part of a normal program is able to be reused, thanks to that different return addresses can be specified. So, if JC, JZ and JMP allows four different return addresses, it seems like that would implement a very limited form of subroutines, using the constraints specified in the question.

I am a beginner, and I thought this was interesting to think about, but this "hack" was my initial thought from the question.

CodePudding user response：

Is it possible to implement subroutine call without a stack

Yes. The PDP-8 did not employ a stack — that is to say it had no direct hardware support for stack, such as a stack pointer register. However, it did allow for function calling, just not recursion.

nor indirect addressing?

That's harder.

In calling a function, the PDP-8 would store the return address (linkage) into the first word of the function (so that first word is data, and the instructions for the function started at that address 1). Returning though, required indirection, so as to return back to the proper caller.

Indirection can sometimes be substituted with self-modifying code, but here that would be complicated unless limited to a small range of memory that the indirection needed to accomplish.

is it possible to use subroutine with a cpu that doesn't feature indirect addressing nor a way to store the program counter, it would only feature :

I suppose that a (potentially large) if-then structure could be generated by hand, or by compiler, to decide where to return, given a call site's value as follows:

At each call site to function X, pass a unique integer as parameter, uip. The called function, X, then, at the end, does:

if (uip == 1) goto resume_call_site_1;
if (uip == 2) goto resume_call_site_2;

etc.., using only (direct) goto's (no indirection). To do this by hand would work for small programs, but otherwise be an ongoing maintenance issue.

In the end I suppose, we see that even without hardware support, there are ways to simulate things to accomplish our intent. There has always been an alternation of advancing software requirements and hardware-provided features.

CodePudding user response：

TL:DR: callers could pass an integer index, and subroutines could compare/branch to figure out which call-site to return to. With the set of call-sites hard-coded into each subroutine (hopefully by automated tools that build these blocks of code.)

With no indirect data addressing, you're going to need self-modifying code to loop over arrays for example, like the Little Man Computer (See Programming arrays in LMC for an example).

Your machine is still Turing-complete (except for having limited RAM) without that; even a machine with just one instruction (dec-and-branch) is Turing complete without self-modifying code, but a program to implement some algorithms might be pretty big.

With no indirect jumps, only with a destination embedded in their machine code, you can't have one instruction that returns directly to different possible callers, without self-modifying or cross-modifying code. So it's not efficiently possible, but you can hack something up with multiple conditional branches to figure out where to jump.

Self-modifying code is the normal way to implement subroutines on a machine like this, if it's a Von Neumann architecture, not Harvard where the program is fixed, not modifiable as data. (On a Harvard design, you normally would provide some indirection since programs can't synthesize it with self-modifying code.)

For example, the callee stores a return address into the operand for a jmp in the callee, before jumping there. (Perhaps a fixed length before the symbol address, so functions return by jumping to that space before their start address.) That's still self-modifying machine code which you asked to avoid, or at least cross-modifying.

Some historical machines worked essentially that way, with each function having a fixed slot for its caller to store a return address. The Ferranti Pegasus doesn't have indirect jumps, so self-modifying code was part of the calling convention. It's not re-entrant, so recursion or mutual recursion don't work.

or using an assembler to keep count of which instruction we are at)?

How would that help? If there's no simple way to return to any of multiple call sites in machine code (even knowing all the addresses), an assembler won't have the building blocks to build a ret or indirect-jump instruction.

An assembler can make it less cumbersome to implement a complex mechanism if we do have one, though.

Strictly avoiding modifying machine code at run-time

Without indirect jumps, the only way to get run-time variable control-flow is conditional branches. Each branch gives you two possible paths of execution (taken or fall-through).

Instead of a return address, a caller can pass an integer telling it which call-site to return to. The subroutine will have to know about all possible call-sites, having code like if (0==B) goto 0x1234 / if (1==B) goto 0x1246 / etc., or something equivalent after loading the return index into the A register, where those addresses follow mov/jump instructions.

The caller can pass a return index along with other args, either in a fixed memory location (per function), or in a register for the callee to store until it's ready to return. (@BipedalJoe had basically the same idea, but this version uses an integer instead of condition-codes which subroutines would have to save / restore if they wanted to do much before returning.)

So you couldn't load new code and call it; you'd have to construct a switch block of conditional branches for it to use when returning.

Performance would suck as every return goes through a sequence of compare/branch, and it would be a nightmare to maintain by hand as you added new function calls. You'd definitely want an assembler linker to create these sequences of branches. The number of total call-sites in your program is unbounded, except by program memory size and the size of an integer.

For subroutines with more than a couple callsites, you'd actually use a tree of branches like nested ifs instead of an if / elseif chain. You'd still need a branch or jump for every possible return address, but any single execution would only run about log2(n) branches before returning, instead of O(n), where n is the number of call-sites. (Like a compiler might do for a big switch if it didn't or couldn't use a jump table.)

If you aren't doing indirect calls, each call-site can only be returned-to by one subroutine, the one that it jumps to after storing args and return index to memory (or loading into A and B). So there wouldn't be a saving in code-size from having all subroutines share one big switch tree. The same subroutine might make multiple calls to different subroutines, but each call-site within it has its own address.

I think the best strategy to do multiple branches on one value would be to get it into the B register, then you can li A, 3 and sub A, B, setting flags and destroying the constants, but leaving B (return index) unchanged. So if you had 8 possible return addresses, index 0..7, you'd branch on CF from 3-B to select between the 0..3 range or the 4..7 range. Then within each of those, again, 1-B and 5-B respectively.

subfoo:
  ... the real work

  load  A, foo_ret_index      # static storage for our return index
  mov   B, A
  li    A, 3
  sub   A, B          # assuming sub sets Carry flag on borrow, like x86 (opposite of ARM), so jb (below) is jump if carry-set.  Otherwise of course jae is the available jump
  jb    ret4to7       # jump on unsigned below; if ( 3u < B ) goto

# return index 0..3
  li    A, 1
  sub   A, B
  jb    ret2to3
# return address 0..1, with FLAGS set from 1 - idx
############### these are the actual jumps back to callers ############
  je    foo_callsite1  # jump if equal, based on the Zero flag
  jmp   foo_callsite0  # only possibility left is idx==0

 ret2to3:  # B = 2 (A=-1) or B = 3 (A=-2).  
       # Not sure if we can avoid an LI using NAND, NOR, ADD, or SUB to set CF or ZF based on those values.
   li  2
   sub A, B
   ...
   je / jmp
  

ret4to7:
   ... similar code here

A is the only possible destination for load and li, I'm writing 2-operand syntax just for clarity. Some ISAs with restrictions like that would leave the destination implicit like li 3.

A chain of branches can be more compact, since we can get B=1 and then repeatedly subtract, checking ZF. Like if (--A == 0) goto dst.

  li    A, 1
  mov   B, A
  load  A, foo_ret_index      # static storage for our return index
  sub   A, B
  jb    foo_callsite0         # 0u < 1
  je    foo_callsite1         # 1 == 1
  sub   A, B
  je    foo_callsite2
  sub   A, B
  je    foo_callsite3
  ...

After ruling out zero, we could subtract by 2 with a jb and je for every sub. But that would take more overhead at the start; it takes 3 instructions to get a new constant into B and reload A, so it's only worthwhile for a long chain, where you might want a tree anyway unless optimizing purely for code size.

An li B, imm instruction would help a lot. Or if load / mov set FLAGS, then we can check for 0 without a sub first. I was assuming they wouldn't, since most ISAs don't do that. Most ISAs have a way to test A,A or something to set FLAGS according to an AND, but you only have NAND/NOR which would invert the register. I guess we could actually work with that, using add and counting up toward 0 after doing idx = ~idx = 0u - idx - 1

The same mechanism could be used to implement function pointers for indirect calls. And you might share the same return branch tree between subroutines that are all possible callees of some indirect-call sites, if you aren't using one global tree.