U N I F O R M British Broadcasting Corporation The purpose of this document is to detail the BBC standard interchange format, UNIFORM, that will be used to allow different Archimedes applications to read each other's data. THE COPYBOARD Part of the philosophy behind BBC-UniForm is that it should be possible to transfer not just the whole file from one application to another, but also part of a file, selected while within the application. To this end it is envisaged that there will be a standard file, the CopyBoard, which is stored in a known place, probably in &.System (or some similar directory) and which can be read and written to by any application. The CopyBoard is just an ordinary BBC-UniForm file, but one that is shared by all the applications, and is used to store a selected section of data, whether it is text, a spreadsheet, graphics, etc. All applications that support BBC-UniForm should provide a method of copying all or part of their data onto the CopyBoard, preferably in a way which is transparent to the user, ie. as part of the application's normal cut, copy and paste routines. For speed reasons, it may be sensible to keep a memory copy of the CopyBoard, and only write it to disc when the application is terminated, or another application is about to be called. FILE TYPE Each file has a 'file type'. This is a number from 0-&FFF. It is used to determine the sort of data the file contains, and to allow applications to recognise their own files. This file type is contained within the LOAD and exec addresses of the file, along with the date and time stamp. An extension to OSGBPB is provided that allows the file type of each file in a directory to be read along with the filename in one operation. File types &E00-&FFF are reserved for use by Acorn; file types &800-&DFF are to be allocated by Acorn to third-party software houses. The other types are user defined. File type &800 is a BBC-UniForm file. FILE STRUCTURE The file consists of basically ASCII information, with special data preceded by a Control Sequence Introducer (SCI || 'double bar'). Text consists of ISO character codes (8-bit) and BBC-UniForm control sequences. The text does not contain any 'padding' spaces. A new line can be started, without starting a new paragraph, by using the NEW LINE character. A paragraph is terminated with a CARRIAGE RETURN. For spreadsheets, the values are stored row by row, with all items on one row occupying one text line. Each cell is separated from the next by a TAB character (there is also a TAB at the beginning). The line is terminated with a NEW LINE character. Then the next row begins, etc. The last row in the sheet should be terminated by a CARRIAGE RETURN instead of a NEW LINE. For databases, records are stored as single line paragraphs, with each field separated by a TAB. NEW LINE character (ASCII 10) are used to begin a new line without beginning a new paragraph. Normal line spacing is used between the previous line and the new line. This code should only be used where the user has specifically entered a new line. CARRIAGE RETURN characters (ASCII 13) are used to end paragraphs, in place of NEW LINE characters. The normal line spacing plus the inter-paragraph spacing is used between the previous line and the new line. The CSI is followed by a letter which specifies the type of data following. There are basically three types of control sequences: single letters, a letter followed by two numbers, delimited by another SCSI symbol, or a letter followed by a block of data. The numbers used in control sequences are 32-bit integer number, unless otherwise stated. SINGLE LETTER CSI's ||B Turn bold on ||b Turn bold off ||I Turn italic on ||i Turn italic off ||U Turn underlining on ||u Turn underlining off ||l Left justify text ||c Centre text ||r Right justify text ||f Fully justify text ||- Discretionary Hyphen - a discretionary hyphen is normally invisible, unless the word in which it is situated is at the end of a line, in which case the word may be hyphenated at the position of the discretionary hyphen if the whole word will not fit. NB. Hyphen character is the same as minus. ||N Start a new page - this code should only be used where the user has specifically specified a new page, and not when a normal page break occurs in a piece of text. ||{&||} Literal {&} respectively |||| Literal CSI character TWO NUMBER CSI's For both the following codes, the numbers are in decimal ASCII. The two numbers are separated by a comma, and the last number if followed by a CSI character to indicate the end of the number. ||M x,y Move by x 72000ths of an inch forward, and y 72000ths of an inch down. If either x or y are negative then move backwards up respectively. ||m x,y Move by x/100ths of the current horizontal point size forward, and y/100ths of the current vertical pint size down. If either x or y are negative then move backwards or up respectively. These two move commands are used to represent super- and subscrpts, kerning, and non-destructive backspaces. DATA BLOCK CSI's All of the following sequences contain a block of data. They all have a common format. The CSI is followed by the letter code, which is followed by a number in decimal ASCII which gives the length of the data block (not including the CSI, letter code, or the length string itself). The end of the number is marked by another CSI character (also not included in the length). The data that follows is dependent on the letter code. ||F Font specification - the data consists of the font name, terminated by a comma, the horizontal point size (in 1/16ths of a point as a decimal ASCII number), terminated by another comma, and the vertical point size (in 1/16ths of a point as a decimal ASCII number), terminated by a CSI character. ||L Line spacing - the data consists of the leading (in 1/72000ths of an inch as a decimal ASCII number), terminated by a comma, the space before a paragraph and the space after a paragraph (in 1/72000ths of an inch as a decimal ASCII number), terminated by a CSI character. ||R Ruler settings - the data consists of the left margin (in 1/72000ths of an inch as a decimal ASCII number), terminated by a comma, the indent margin (in 1/72000ths of an inch as a decimal ASCII number), terminated by a comma, the right margin (in 1/72000ths of an inch as a decimal ASCII number), terminated by a comma, then for each tab the following data: either L,C,R or D (for Left Tab, Centre Tab, Right Tab, and Decimal Tab) followed immediately by its position (in 1/72000ths of an inch as a decimal ASCII number), terminated by a comma, except the last tab which is terminated by a CSI character. If the ruler defines the indent margin to be to the left of the left margin (ie. a hanging indent) the left margin forms an implicit tab position. ||P A painting (bit-image) - the data consists of an x and y scaling factor where 1000 is actual size, 500 means that the image should be scaled down to 50%, 3000 means that the image should be scaled up to 300% of its original size, etc. these two number are is decimal ASCII, separated by a comma, and the last one is terminated by a CSI character. The rest of the data block is made up of a sprite definition: Word Meaning 1 Total length of sprite and control block 2-4 Sprite name, up to 12 characters with trailing nulls 5 Width of sprite (No. of words per row - 1) 6 Height of sprite (no. of rows - 1) 7 First bit used (lefthand row) 8 Last bit used (righthand row) 9 Offset to sprite image 10 Offset to sprite transparent mask 11 MODE defined in 12-n Palette information - as follows: Pairs of words for each logical colour of the MODE. The first word corresponds to the first flash state, the second corresponds to the second state. The four bytes of the word are (highest first) Blue, Green, Red and "Physical", as would be sent to VDU19. n-XX The sprite image and, if present, the transparency mask. ||p A bit-image - the data consists of exactly the same format as the picture, but is not displayed. Instead, the data is used to define a sprite which is then printed using a Drawing sequence. ||D A Drawing - the data consists of an x and y scaling factor where 1000 is actual size, 250 means that the image should be scaled down to 25%, 1050 means that the image should be scaled up to 105% of its original size, etc. These two numbers are in decimal ASCII, terminated by commas. The next four decimal ASCII numbers give the bounding box of the drawing in the same coordinate system as the drawing (ie. BBC screen coordinates). The last number is the drawing in the same format as the sprite definition. The rest of the data block is made up of a steam of BBC VDU codes. These codes should not change the graphics or text windows, alter the palette, reprogram characters. ||% PostScript program - the data consists of a PostScript program, or program segment. This control sequence should normally be followed immediately by either a painting or drawing sequence that gives a representation of what the program produces (except for program segments). The PostScript programme is terminated by a QQ. ||V Videotex frame - the data consist of either a T, 1, 2 or 3 (for Teletext, CEPT 1, 2 and 3 respectively). This is then followed by the frame data. CEPT 1,2 and 3 are stored as a data stream with ESCAPE codes. Teletext is stored as a series of 42-byte packets, as created by the BBC ATS. After the frame image, additional information can be stored, such as routing, etc. It is hoped that a standard can be devised for this information, especially for viewdata systems. ||? Application specific data - the data consists of an application code (6 letters) terminated by a comma, the rest of the data is then completely free format. MACROS It is possible to define simple macros, which can contain any data required. To start a macro definition use '{' followed by a decimal ASCII number (0 to 100), terminated by a comma, followed by a name (optional), again terminated by a comma. If there is no name, you must still include the second comma. Names can include any character except a comma. All the characters following, including any CSI sequences, are part of the macro. The definition is terminated by a '}'. To recall the macro at any point in the data, use '{', followed by the macro number in decimal ASCII, terminated by '}'. Macros should be defined before they are used. The main use of macros will be in style based wordprocessors or page layout programs, allowing standard paragraph or text formats to be defined once, and then used wherever they are needed. Negative macro numbers are used to 'turn-off' macros. So if macro 5 turns on BOLD, macro -5 should turn off BOLD. The negative macros are not defined, as they should return the font, style, etc. to the state they were in before the macro was called. Macro 0 is normally used to set up a page in page based applications. This includes printing out any header. Macro 1 is used to finish a page, this includes printing out any footer. Macros 0 & 1 can be redefined throughout a file, but other macros should only be defined once.