T E C H N I C A L  M E M O R A N D U M


Subject:        RISC OS Application Image Format
                (previously Arthur Image Format)

Reference:      PLG-AIF

Issue:          1.00

Author:         Lee Smith, 2nd September 1987
                Lionel Haines, 26th October 1988
                Lee Smith, 23rd January 1989

Distribution:   Not restricted.


-----------------------------------------------------------------------------
Programming Languages Group, Acorn Computers Limited,
Fulbourn Road, Cherry Hinton, Cambridge, CB1 4JN, England.
-----------------------------------------------------------------------------


Copyright Acorn Computers Limited 1989


Neither the whole nor any part of the information contained in this technical
memorandum may be adapted or reproduced in any material form except with the
prior written approval of Acorn Computers Limited (Acorn).

The information contained in this technical memorandum relates to ongoing
developments. Whilst it is given in good faith by Acorn, it is acknowledged
that there may be errors or omissions.


                H I S T O R Y


    14-Aug-87   First written & shown to Roger Wilson
    14-Aug-87   Limited circulation for comment (RWilson, RCownie, MJordan,
                NRaine, PFellows, JThackray, HMeekings, SWoodward); support
                for Dbug continues to worry LDS...
    19-Aug-87   Released to Richard Evans of TopExpress for comment.
    02-Sep-87   Major revision following further thought.
    26-Oct-88   Corrections and modifications.
    17-Jan-89   Major revision:-
                  - improved support for relocatable images
                  - improved support for ASD/Dbug
                  - editorial review
    23-Jan-89   More minor editing and clarification
    02-Feb-89   Removed final restriction on circulation of issue 1.00


Properties of AIF
-----------------

1.  An AIF image is loaded into memory at its load address and entered at its
    first word (compatible with old-style Arthur/Brazil ADFS images).

2.  An AIF image may be compressed and can be self-decompressing (to support
    faster loading from floppy disks, and better use of floppy-disk space).

3.  If created with suitable linker options, an AIF image may relocate itself |
    at load time. Self-relocation is supported in two, distinct senses:-      |

    a.  One-time Position-Independence: A relocatable image can be loaded     |
        at any address (not just its load address) and will execute there     |
        (compatible with version 0.03 of AIF).

    b.  Specified Working Space Relocation: A suitably created relocatable    |
        image will copy itself from where it is loaded to the high address    |
        end of applications memory, leaving space above the copied image      |
        as noted in the AIF header (see below).                               |

    In addition, similar relocation code and similar linker options support   |
    many-time position independence of RISC OS Relocatable Modules. This is   |
    explained further in a later section and in document PLG-AMF.             |

4.  AIF images support being debugged by the Arthur Symbolic Debugger (ASD),  |
    for C, Fortran and Pascal. Version 0.04 of AIF (and later) together with  |
    version 3.00 (or later) of ASD, and version 3.00 (or later) of link,      |
    supports debugging at the symbolic assembler level (hitherto done by      |
    Dbug). Low-level and source-level debugging support are orthogonal        |
    (capabilities of debuggers notwithstanding, both, either, or neither      |
     kind of debugging  support may be present in an AIF image).              |

    A separate document (PLG-DEBUG) describes the format of debugger tables.  |

    Debugging tables have the property that all references from them to code
    and data (if any) are in the form of relocatable addresses. After loading |
    an image at its load address these values are effectively absolute. All   |
    references between debugger table entries are in the form of offsets from
    the beginning of the debugging data area. Thus, following relocation of a
    whole image, the debugging data area itself is position independent and
    can be copied by the debugger.


The Layout of an AIF Image
--------------------------

The layout of an AIF image is as follows:-

    +----------------------+
    | Header               |
    +----------------------+
    | Compressed image     |
    +----------------------+
    | Decompression data   |    This data is position-independent
    +----------------------+
    | Decompression code   |    This code is position-independent
    +----------------------+

The Header is small, fixed in size, and described below. In a compressed
AIF image, the header is NOT compressed.

Once an image has been decompressed---or if it is uncompressed in the first
place---it has the following layout:-

    +----------------------+
    | Header               |
    +----------------------+
    | Read-Only area       |
    +----------------------+
    | Read-Write area      |
    +----------------------+
    | Debugging data       |    (optional)
    +----------------------+
    | Self-relocation code |    MUST be position independent                  |
    +----------------------+                                                  |
    | Relocation list      |    List of words to relocate, terminated by -1   |
    +----------------------+

Debugging data are absent unless the image has been linked appropriately      |
and, in the case of source-level debugging, unless the constituent components |
of the image have been compiled appropriately.                                |

The relocation list is a list of byte offsets from the beginning of the AIF   |
header, of words to be relocated, followed by a word containing -1.           |
The relocation of non-word values is not supported.                           |

After the execution of the self-relocation code---or if the image is not
self-relocating---the image has the following layout:-

    +----------------------+
    | Header               |
    +----------------------+
    | Read-Only area       |
    +----------------------+
    | Read-Write area      |
    +----------------------+
    | Debugging data       |    (optional)
    +----------------------+

At this stage a debugger is expected to copy the debugging data (if present)  |
somewhere safe, otherwise they will be overwritten by the zero-initialised    |
data and/or the heap/stack data of the program. A debugger can seize control  |
at the appropriate moment by copying, then modifying, the third word of the   |
AIF header (see below).                                                       |


AIF Header Layout
-----------------

    +----------------------+
00: | BL DecompressCode    |    BLNV 0 if the image is not compressed.
    +----------------------+
04: | BL SelfRelocCode     |    BLNV 0 if the image is not self-relocating.
    +----------------------+
08: | BL ZeroInitCode      |    BLNV 0 if the image has none.
    +----------------------+
0C: | BL ImageEntryPoint   |    BL to make header addressable via R14.
    +----------------------+
10: | SWI Exit             |    Just in case silly enough to return...
    +----------------------+
14: | Image ReadOnly size  |    Includes header size and any padding
    +----------------------+
18: | Image ReadWrite size |    Exact size - a multiple of 4 bytes
    +----------------------+
1C: | Image Debug size     |    Exact size - a multiple of 4 bytes
    +----------------------+
20: | Image zero-init size |    Exact size - a multiple of 4 bytes
    +----------------------+
24: | Image debug type     |    0, 1, 2, or 3 (see note below).               |
    +----------------------+
28: | Image base           |    Address of the AIF header - set by link       |
    +----------------------+ 
2C: | Work Space           |    Min work space - in bytes - to be reserved    |
    |                      |    by a self-moving relocatable image.           |
    +----------------------+
30: | Four reserved words  |
    | ...initially 0...    |
    +----------------------+
40: | Zero-init code       |
    | (16 words as below)  |    Header is 32 words long.
    +----------------------+


BL is used everywhere to make the header addressable via R14 (but beware the
PSR bits) in a position-independent manner and to ensure that the header will
be position-independent.

It is required that an image be re-enterable at its first instruction.
Therefore, after decompression, the decompression code must reset the first
word of the header to BLNV 0. Similarly, following self-relocation, the       |
second word of the header must be reset to BLNV 0. This causes no additional  |
problems with the read-only nature of the code segment - both decompression   |
and relocation code must write to it anyway. So, on systems with memory       |
protection, both the decompression code and the self-relocation code must     |
be bracketed by system calls to change the access status of the read-only     |
section (first to writable, then back to read-only).                          |

The image debug type has the following meaning:-                              |

0:  No debugging data are present.                                            |
1:  Low-level debugging data are present.                                     |
2:  Source level (ASD) debugging data are present.                            |
3:  1 and 2 are present together.                                             |

All other values are reserved to Acorn.                                       |


Zero-Initialisation Code
------------------------

The Zero-initialisation code is as follows:-

    BIC     IP, LR, #&FC000003  ; clear status bits -> header + &C
    ADD     IP, IP, #8          ; -> Image ReadOnly size
    LDMIA   IP, {R0,R1,R2,R3}   ; various sizes
    CMPS    R3, #0
    MOVLES  PC, LR              ; nothing to do
    SUB     IP, IP, #&14        ; image base
    ADD     IP, IP, R0          ; + RO size
    ADD     IP, IP, R1          ; + RW size = base of 0-init area
    MOV     R0, #0
    MOV     R1, #0
    MOV     R2, #0
    MOV     R4, #0
ZeroLoop
    STMIA   IP!, {R0,R1,R2,R4}
    SUBS    R3, R3, #16
    BGT     ZeroLoop
    MOVS    PC, LR              ; 16 words in total.


Relationship between Header Sizes and Linker Pre-defined Symbols
----------------------------------------------------------------

AIFHeader.ImageBase            = Image$$RO$$Base

AIFHeader.ImageBase + 
  AIFHeader.ROSize             = Image$$RW$$Base

AIFHeader.ImageBase +
  AIFHeader.ROSize +
    AIFHeader.RWSize           = Image$$ZI$$Base

AIFHeader.ImageBase +
  AIFHeader.ROSize +
    AIFHeader.RWSize +
      AIFHeader.ZeroInitSize   = Image$$RW$$Limit


Self Relocation
---------------

Two kinds of self-relocation are supported by AIF and one by AMF; for         |
completeness, all three are described here.                                   |

One-time position independence is supported by relocatable AIF images.        |
Many-time position independence is required for AMF Relocatable Modules.      |
And only AIF images can self-move to a location which leaves a requested      |
amount of workspace.                                                          |

Why are there three different kinds of self-relocation?                       |

1.  The rules for constructing RISC OS applications do not forbid acquired    |
    position-dependence. Once an application has begun to run, it is not, in  |
    general, possible to move it, as it isn't possible to find all the data   |
    locations which are being used as position-dependent pointers. So, AIF    |
    images can be relocated only once. Afterwards, the relocation table is    |
    over-written by the application's zero-initialised data, heap, or stack.  |


2.  In contrast, the rules for constructing a RISC OS Relocatable Modules     |
    (RM) require that it be prepared to shut itelf down, be moved in memory,  |
    and start itself up again. Shut-down and start-up are notified to a RM    |
    by special service calls to it. Clearly, a RM must be relocatable many    |
    times so its relocation table is not overwritten after first use.         |

3.  Relocatable Modules are loaded under the control of a Relocatable Module  |
    Area (RMA) manager which decides where to load a module initially and     |
    where to  move each module to whenever the RMA is reorganised.            |
    In contrast, an application is loaded at its load address and is then     |
    on its own until it exits or faults. An application can only be moved     |
    by itself (and then only once, before it begins execution proper).        |


Self-Relocation Code for RMF Modules
------------------------------------

In this case there is no AIF header, the code must be executable many times,  |
and it must be symbolically addressable from the Relocatable Module header.   |
The code below must be the last area of the RMF image, following the          |
relocation list. Note that it is best thought of as an additional area.       |

    IMPORT  |Image$$RO$$Base|       ; where the image is linked at...
    EXPORT  |__RelocCode|           ; referenced from the RM header
;
; The module image has already been loaded at/moved to its target address.    |
; It only remains to relocate location-dependent addresses. The list of       |
; offsets to be relocated, terminated by (-1), immediately follows End.       |
; Note that the address values here (e.g. |__RelocCode|) will appear in the   |
; list of places to be relocated, allowing the code to be re-executed.        |
;
|__RelocCode|
    LDR     R1, RelocCode           ; value of __RelocCode (before relocation)
    SUB     IP, PC, #12             ; value of __RelocCode now
    SUBS    R1, IP, R1              ; relocation offset
    MOVEQS  PC, LR                  ; relocate by 0 so nothing to do
    LDR     IP, ImageBase           ; image base prior to relocation...
    ADD     IP, IP, R1              ; ...where the image really is
    ADR     R2, End
RelocLoop
    LDR     R0, [R2], #4
    CMNS    R0, #1                  ; got list terminator?
    MOVLES  PC, LR                  ; yes => return
    LDR     R3, [IP, R0]            ; word to relocate
    ADD     R3, R3, R1              ; relocate it
    STR     R3, [IP, R0]            ; store it back
    B       RelocLoop               ; and do the next one
RelocCode   DCD    |__RelocCode|
ImageBase   DCD    |Image$$RO$$Base|
End                                 ; the list of locations to relocate starts
                                    ; here (each is an offset from the base of
                                    ; the module) and is terminated by -1.

Note that this code, and the associated list of locations to relocate, is
added automatically to a relocatable module image by the linker (as a
consequence of issuing the command "link -module...").


Self-Move and Self-Relocation Code for AIF
------------------------------------------

This code is added to the end of an AIF image by the linker, immediately      |
before the list of relocations (terminated by -1). Note that the code is      |
entered via a BL from the second word of the AIF header so, on entry,         |
R14 -> AIFHeader + 8.                                                         |

RelocCode
    BIC     IP, LR, #&FC000003      ; clear flag bits; -> AIF header + &08
    SUB     IP, IP, #8              ; -> header address
    MOV     R0, #&FB000000          ; BLNV #0
    STR     R0, [IP, #4]            ; won't be called again on image re-entry
; does the code need to be moved?
    LDR     R9, [IP, #&2C]          ; min free space requirement
    CMPS    R9, #0                  ; 0 => no move, just relocate
    BEQ     RelocateOnly
; calculate the amount to move by...
    LDR     R0, [IP, #&20]          ; image zero-init size
    ADD     R9, R9, R0              ; space to leave = min free + zero init
    SWI     GetEnv                  ; MemLimit -> R1
    ADR     R2, End                 ; -> End
01  LDR     R0, [R2], #4            ; load relocation offset, increment R2
    CMNS    R0, #1                  ; terminator?
    BNE     %B01                    ; No, so loop again
    SUB     R3, R1, R9              ; MemLimit - freeSpace
    SUBS    R0, R3, R2              ; amount to move by
    BLE     RelocateOnly            ; not enough space to move...
    BIC     R0, R0, #15             ; a multiple of 16...
    ADD     R3, R2, R0              ; End + shift
    ADR     R8, %F01                ; intermediate limit for copy-up
;
; copy everything up memory, in descending address order, branching
; to the copied copy loop as soon as it has been copied.
;
01  LDMDB   R2!, {R4-R7}
    STMDB   R3!, {R4-R7}
    CMP     R2, R8                  ; copied the copy loop?
    BGT     %B01                    ; not yet
    ADD     R4, PC, R0
    MOV     PC, R4                  ; jump to copied copy code
01  LDMDB   R2!, {R4-R7}
    STMDB   R3!, {R4-R7}
    CMP     R2, IP                  ; copied everything?
    BGT     %B01                    ; not yet
    ADD     IP, IP, R0              ; load address of code
    ADD     LR, LR, R0              ; relocated return address
RelocateOnly
    LDR     R1, [IP, #&28]          ; header + &28 = code base set by Link
    SUBS    R1, IP, R1              ; relocation offset
    MOVEQ   PC, LR                  ; relocate by 0 so nothing to do
    STR     IP, [IP, #&28]          ; new image base = actual load address
    ADR     R2, End                 ; start of reloc list
RelocLoop
    LDR     R0, [R2], #4            ; offset of word to relocate
    CMNS    R0, #1                  ; terminator?
    MOVEQS  PC, LR                  ; yes => return
    LDR     R3, [IP, R0]            ; word to relocate
    ADD     R3, R3, R1              ; relocate it
    STR     R3, [IP, R0]            ; store it back
    B       RelocLoop               ; and do the next one
End                                 ; The list of offsets of locations to re-
                                    ; locate starts here; terminated by -1.