1 History 22 Jul 1994 Revision A --- Initial draft revision, passed round for comment. 12 Aug 1994 Revision B --- Incorporated reviewers' comments. Added specification of mbuf manager module. 16 Aug 1994 Revision C --- Description of mbuf manager module tidied up and clarified. 09 Nov 1994 Revision D --- Large number of changes introduced following formal review, and feedback from external reviewers. Main areas of change are: Device drivers can now be identified by the address of their Driver Information Block. A new subsection has been added to the introduction to explain this feature. The concept of a protocol handle has been removed. Register usage for service calls has been changed. Service calls Service_ProtocolDying, Service_FindNetworkDriver, and Service_NetworkDriverStatus are now obsolete. The new service calls Service_DCIProtocolDying, Service_DCIDriverStatus and Service_DCIFrameTypeFree have been added. All SWI calls now have a flags register. The gaps in the SWI chunk from earlier versions of the DCI have been closed. The Filter SWI has been heavily reworked: - Frame type, and frame level have been merged into a single register; register numbers have been shuffled to fill the gap left by frame level. - The read flag has been removed --- it is no longer possible toread current filter levels. - Releasing a filter is now achieved via a flag bit, rather than by specifying a frame level of FRMLVL_NONE. - The ensure safe flag bit has been added. IEEE 802.3 frames are more complicated than originally thought, the concept of frame type for these frames has been extensively revised. The rx_frame_type field in struct rx_hdr now only contains the last 2 bytes of the MAC header. A completely new memory manager has been designed. The dib_swibase field has been moved to the top of a struct dib. Unsafe data are no longer flagged to the Transmit SWI. Added a small section on SWI re-entrancy, and re-enabling interrupts. Added new sections on acceptance tests, and on development test strategies. 03 Feb 1995 Revision E --- Corrected some typos, then made some alterations and additions: DCIProtocolDying has been renamed to Service_DCIProtocolStatus, and its register usage changed; the official DCI version is now 4.01. Added a section on returning errors from SWI calls which defines some standard error numbers. Added the new SWI Stats. Created a whole new section describing the standard statistics interface. Added a couple of lines (in a new miscellanea section) about network card self-tests. 14 Mar 1995 Revision F --- A few more changes: Added a paragraph to the description of Service_DCIDriverStatus to define (by cross-reference) the format of the supported DCI version. Service_DCIFrameTypeFree has been changed to reflect the changes made to the Filter SWI for revision D, i.e. frame type and frame level have been merged into a single 32-bit register, r2; parameters that were in registers r4 & r5 have been moved into r3 & r4 respectively. This is now DCI version 4.02. Corrected some typos. Table of standardised errors (hopefully) made more explicit by adding a column detailing all the error numbers. Added a paragraph to the statistics section clarifying when statistics are gathered (i.e. for all frames). Added some new members to struct stats: st_tx_general_errors st_unwanted_frames st_rx_general_errors Added codes 9 & 10 for st_interface_type field in struct stats. structure. Removed field st_net_error from struct stats --- it has been made redundant by new field st_tx_general_errors. Field st_link_status in struct stats made into a bitfield. 10 Apr 1995 Revision G --- A couple more changes: The definition of a Driver Information Block has been extended to include a copy of the Inquiry flags. This is now version 4.03 of the DCI. Added a couple of entry points missing from struct mbctl (copy_p and copy_u). 5 Sep 1995 Revision 1.00 --- Wrote a new subsection on virtual interfaces, and added virtual interface flag bits to the Inquire SWI 14 Apr 1997 Revision 1.01 --- SWI MulticastRequest added. This is now version 4.04 of the DCI. Fixed some formatting errors in the Impression version of the specification. 2 Unresolved Questions There are no unresolved questions (at the moment). 3 Introduction This document describes version 4 of the Device Control Interface ("DCI"), an interface between protocol modules and device driver modules in the RISC OS networking system. 3.1 Objectives This new version is needed to overcome deficiencies/errors in the existing interface. The specific points it aims to address are: Full support for multiple protocol modules in a single system: older versions of the DCI notionally provided this support, but there were implicit features of the design which made support for more than one protocol module at any one time difficult. Support for multicast and promiscuous frame reception. Promiscuous reception is when an Ethernet interface receives all frames, regardless of their destination address; multicasting is a method used to transmit a single frame to multiple hosts simultaneously, it can be viewed as a form of limited broadcast: individual hosts on a network can choose whether or not they wish to receive multicast frames {The glossary of Internet terms in "Internetworking with TCP/IP" by Douglas Comer defines multicasting as "A technique that allows copies of a single [frame] to be passed to a selected subset of all possible destinations . . . . broadcast is a special form of multicast in which the subset of machines to receive a copy of a [frame] consists of the entire set." The need for improved data transfer rates compared to previous DCI versions, and the formal adoption of techniques already used to improve data throughput. Backwards compatabilty The radical changes and new features being introduced with this new version of the DCI, along with the inbuilt lack of flexibility in previous versions, combine to make any attempt at backwards compatibility impossible. With this lack of compatibility, care has been taken to ensure that there is no overlap between DCI 4 compliant modules, and other modules loaded on the same machine that implement earlier versions of the DCI, specifically: 1. In both old and new DCI versions, it is the responsibility of the protocol module to initialise the interface between itself and a device driver after either actively or passively learning of the device driver's presence; DCI 4 has replaced the active (Service_FindNetworkDriver) and passive (Service_NetworkDriverStatus) service calls used by protocol modules with Service_EnumerateNetworkDrivers and Service_DCIDriverStatus respectively. 2. To prevent old device drivers getting confused by Service_ProtocolStatus, which no longer provides a sensible (as far as old DCI versions are concerned) value in r2, this service call has been made obsolete, and has been replaced by Service_DCIProtocolStatus. 3.2 Principles of operation The principle behind the interface is that protocol modules register a list of desirable frame types with network device drivers. When a device driver receives a frame, it passes it along to the protocol module that expressed an interest in the frame's type. Transmission is much simpler --- the protocol module passes the frame to be transmitted to the appropriate device driver. In both cases it is generally the recipient of the frame that assumes responsibility for the memory containing the frame (i.e. the protocol module for received packets, the device driver for transmitted packets). Identifying Device Drivers Device drivers are always identified by their "Driver Information Block", described on page 4. These Driver Information Blocks are used in a number of service calls, and are also given to protocol modules along with received frames. There are fields within a Driver Information Block which uniquely identify each interface, but to prevent protocol modules having to make laborious (i.e. strcmp()) comparisons of these fields, device drivers should maintain a single, static, Driver Information Block for each interface it controls. In this way, protocol modules need only compare the address of Driver Information Blocks to identify an interface. This scheme means that any use of the rmtidy RISC OS command will kill any network stack on the machine --- this is not a great problem, since anyone who uses rmtidy in a modern RISC OS system is asking for all the trouble that they are about to receive. However, if a device driver module is re-initialised (via rmreinit), then the address of its Driver Information Block will change, therefore any protocol module's handler for the Service_DCIDriverStatus service call (page 6) cannot compare addresses, but must fall back to comparing those fields which uniquely identify an interface, i.e. dib_name & dib_unit. 3.3 Device Driver considerations Important points for device driver writers to note are: 1. The DCI interface is optimised in various ways for Ethernet device drivers, specifically a) Physical network addresses are 48-bit quantities. b) Protocol modules identify the physical network frames they wish to receive by the type of the frame. (For example, the Internet module claims frame types 0x800 (IP), 0x806 (ARP), and 0x8035 (RevARP). This is a 16-bit value transmitted as part of the Ethernet header.) Drivers for other types of network hardware will need to emulate an Ethernet driver at this interface by mapping "virtual Ethernet" values onto the real values meaningful to the network hardware. 2. At startup, driver modules must set the variable Inet$EtherType to the textual name of the controlled physical interface (e.g. "en", "ea"), with a suffix of `0'. This is for backwards compatibility with versions of Acorn's TCP/IP Protocol Suite software already in the field. The textual name is the same string as the field dib_name in the Driver Information Block (see page 4 for a description of Driver Information Blocks). Note that this field variable reflects the last driver initialised, i.e. a driver will always set this field, regardless of whether or not it has been set previously. 4 Service Calls As explained in the section on backwards compatibility ( Section 3), the service calls defined in earlier versions of the DCI are all now obsolete. To summarise, these calls are 1. Service_ProtocolStatus 2. Service_FindNetworkDriver 3. Service_NetworkDriverStatus The new service calls defined in DCI 4 are 1. Service_EnumerateNetworkDrivers (service call 0x9b). 2. Service_DCIDriverStatus (service call 0x9d). 3. Service_DCIFrameTypeFree (service call 0x9e). 4. Service_DCIProtocolStatus (service call 0x9f). Note that the old, unnamed, service call 0x41200, which was never part of any formal DCI specification, but which used to be issued during finalisation of the Internet module, has now been officially replaced by Service_DCIProtocolStatus. 4.1 Data Structures Some user applications need to associate device drivers with the physical location of the network hardware, i.e. with which "slot" the hardware occupies. In order to support complex networking cards (e.g. one card with multiple, independent, interfaces), the concept of a slot is overloaded with a minor device number; the interpretation of this minor device number is device driver dependent. Using C, a slot can be expressed as struct slot { unsigned int slotid:8, minor:8, pcmciaslot:5, /* must be zero if not a * PCMCIA virtual slot */ mbz:11; /* must be zero */ } In this, and all other C code fragments, the standard Norcroft RISC OS compiler is assumed --- with this example, this means that bitfields start at the least significant end of a word, i.e. slotid is bits 0--7, minor bits 8--15, and so on. Device drivers are free to interpret the minor field as they wish, but a typical use would be to discriminate multiple units on a single physical card. The pcmciaslot field is used to differentiate between cards in different PCMCIA slots, (unfortunately , PCMCIA also uses the word "slot" to refer to the physical connection to a card. this field only has any significance when slotid is a PCMCIA virtual slot (see immediately below for a description of virtual slots). As well as the physical expansion card slots, which now number from 0--8 with the introduction of the latest Acorn machines, there are also a number of "virtual" slots, i.e. network interfaces which don't use hardware in an expansion card slot. The list of physical and virtual slots can be summarised as 0--7 Physical expansion card slots 8 Risc PC network position 16--31 PCI slots 128 Parallel port 129 Serial port (e.g. PPP) 130 Econet socket 131 PCMCIA cards Note:- there is only one PCMCIA virtual slot, this one virtual slot refers to PCMCIA hardware which may contain more than one physical PCMCIA slot; the pcmciaslot field within the slot number can be used by the device driver to differentiate between physical PCMCIA slots. Driver Information Blocks Device drivers identify themselves via a Driver Information Block, which can be expressed in C syntax as struct dib { unsigned int dib_swibase; char *dib_name; unsigned int dib_unit; unsigned char *dib_address; char *dib_module; char *dib_location; struct slot dib_slot; unsigned int dib_inquire; }; The fields within this structure are: dib_swibase The base of the device driver's allocated SWI chunk. dib_name A pointer to a short textual name unique to the driver (e.g. "en", "ppp"), and a terminating NULL. dib_unit The unit number. dib_address A pointer to a 6-byte character array which contains the hardware address of the interface. dib_module A pointer to a string containing the title of the driver module (e.g. "Ether3"). dib_location A pointer to a string which attempts to describe the physical location of the interface. A typical string would be somewhere between 8 and 40 characters long and would be of the form "Network Expansion Slot", or "Expansion Slot 0, port #1" etc.. dib_slot The slot number for this unit. dib_inquire A copy of the flags returned from the Inquire SWI (section 5.3). Note that there is a subtle, but important, distinction between this definition of a Driver Information Block and its definition in previous versions of the DCI: the new definition has one Driver Information Block per unit , rather than one per device driver; if a device driver controls several units, then it must provide one struct dib per unit. Units: a single device driver may control more than one physical network interface, either by driving multiple network cards, or by driving multiple interfaces on a single card. The driver is responsible for allocating a number to each interface under its control --- the first interface being unit 0, the second interface being unit 1, and so on. Any particular network connection can then be uniquely identified by its driver name and unit number, e.g. en0, ea2. If an interface is found to be faulty during any hardware check its driver may perform, the interface must still be assigned a unit number, and must still appear in the enumerated list of device drivers. Chained Driver Information blocks The results from the Service_EnumerateNetworkDrivers service call are chained together into a linked list of Driver Information Blocks. The C structure used for this linked list is struct chaindib { struct chaindib *chd_next; struct dib *chd_dib; }; Just in case the fields within this structure are not self-evident, they are: chd_next A pointer to the next entry in the linked list. The last entry in the list contains a NULL pointer. chd_dib A pointer to the Driver Information Block for this entry in the linked list. Protocol Information Blocks In much the same way that device drivers are identified by their Driver Information Block, older versions of the DCI used to contain a Protocol Information Block which identified individual protocol modules. This Protocol Information Block is not needed in DCI 4, and has been removed. 4.2 Service Call Descriptions Service_EnumerateNetworkDrivers Service Call 0x9b On entry: r0 = pointer to head of linked list of device drivers r1 = 0x9b (reason code) On exit: r0 = pointer to new head of linked list All other registers are preserved. Use: This service call is used to obtain a list of all active network device drivers in the system. When the service call is issued, r0 is a NULL pointer; upon receipt of this call, a network device driver should chain Driver Information Blocks to the head of the list, one for each logical interface the driver controls. Section 4.1 describes the struct chaindib used to hold the linked list of Driver Information Blocks. This service call should never be claimed. Note: Struct chaindibs are transient objects: they should be allocated from RMA by the device drivers, and freed back into the RMA by the protocol module which issued the service call. The Driver Information Blocks referenced by the struct chaindibs must be static data, as explained in section 3.2. Service_DCIDriverStatus Service Call 0x9d On entry: r0 = pointer to Driver Information Block describing this driver r1 = 0x9d (reason code) r2 = status (0 = starting, 1 = terminating) r3 = DCI version supported On exit: All registers are preserved. Use: Service_DCIDriverStatus is issued by a network driver module during its initialisation (r2 = 0), and finalisation (r2 = 1) calls. If a network device driver controls multiple logical interfaces, then a separate service call must be issued for each interface the driver is responsible for. Upon receipt of this service call from a driver that is starting up (i.e. r2 = 0), a protocol module should add the driver to its list of known device drivers. If a service call is received from a driver that is terminating, the protocol module should scan its list of known device drivers for a Driver Information Block matching the one addressed by r0, removing it from the list if a match is found. A Driver Information Block is uniquely identified by its dib_name and dib_unit fields, therefore a comparison of these two fields is sufficient to prove a match. The supported DCI version passed in r3 is in the same format as described for the DCIVersion SWI (Section 5.3), i.e. 404 decimal for this version. When this call is issued while starting, device drivers should be able to receive SWIs raised by protocol modules; this means that the service call cannot be directly issued from a driver's initialisation routine --- see section 4.3 for an explanation of this feature, along with ways to overcome it. When this call is issued during finalisation, protocol modules should not expect to be able to issue SWIs, therefore no special action to allow this is required of the driver module issuing the service call. This service call should never be claimed. Service_DCIFrameTypeFree Service Call 0x9e On entry: r0 = pointer to Driver Information Block r1 = 0x9e (reason code) r2 = frame type being released r3 = address level of former claim r4 = error level of former claim On exit: r1 = 0 to claim the call, or preserved to pass it on. All other registers preserved Use: This service call is issued by a device driver when a protocol module releases a claim it formerly had on a frame type (i.e. when the frame type becomes free for claiming by a different protocol module). This release may have been either explicit (the protocol module called the Filter SWI), or implicit (the protocol module issued a Service_DCIProtocolStatus service call). If a protocol module wishes to claim the newly relinquished frame type for itself, it should use the Filter SWI to do so, and then claim the service call by setting r1 to 0. Note: the concepts of frame types and the various filtering levels available are explained fully in section 6.2. Service _DCIProtocolStatus Service Call 0x9f On entry: r0 = Protocol module's private word pointer r1 = 0x9f (reason code) r2 = status (0 = starting, 1 = terminating) r3 = DCI version supported r4 = Pointer to protocol module's title string On exit: All registers are preserved Use: Service_DCIProtocolStatus is issued by a protocol module during its initialisation (r2 = 0), and finalisation (r2 = 1) calls. The private word pointer in r0 is the same as that supplied by the protocol module in the Filter SWI (see section 5.3). The supported DCI version passed in r3 is in the same format as described for the DCIVersion SWI, i.e. 404 decimal for this version. The title string pointed to by r4 should be identical to the title string in the protocol module's header. This string is not used anywhere else in the DCI --- it is intended for use by modules that rely on the protocol module, but which do not communicate with it via the DCI; these modules need to have the name of significant protocol modules built into them. As with device drivers issuing Service_DCIDriverStatus, protocol modules should already be capable of handling any SWIs at the time they issue this service call to announce that they are starting; the techniques described in section 4.3 are as equally valid for protocol modules as they are for device drivers. When terminating, the protocol module which issued this service call must be prepared to handle receive events for all frame types it has not explicitly relinquished until the service call returns; once the call has returned, device drivers should have deleted all references to the protocol module which issued the service call. If necessary, device drivers may enable interrupts while processing the service call, but they should return with the interrupt state preserved. Device drivers must never claim this service call. Note: as already mentioned in section 4, the old, unnamed, service call 0x41200 will not be issued by DCI 4 compliant versions of the Internet module when it is terminating. 4.3 Device Driver Initialisation When a device driver module initialises, it is expected to issue a Service_DCIDriverStatus service call to announce that it is starting; at the time this startup call is issued, the module must also be capable of handling SWIs raised by any protocol module interested in the device driver. Unfortunately, the RISC OS kernel does not recognise a module's SWIs until after its initialisation routine has returned, which means that driver modules must take explicit steps to allow SWIs to be caught before issuing the service call, i.e. they must either: 1. Install a handler on the unknown SWI software vector ("UKSWIV"), and check all unknown SWIs for an appropriate chunk number. 2. Setup a callback handler in the initialisation routine, and then issue the service call from within this callback handler. 5 Device Driver SWIs All network device drivers must provide a SWI call interface which protocol modules can use to - send control commands - pass data - obtain information to/from a device driver. Obviously, each device driver will have its own, unique, SWI chunk, so a protocol module must use the dib_swibase field from the Driver Information Block (see section 4.1) to determine the base of a driver's SWI chunk. The SWI calls that a device driver must supply (and their offsets) are Offset SWI name 0 DCIVersion 1 Inquire 2 GetNetworkMTU 3 SetNetworkMTU 4 Transmit 5 Filter 6 Stats 7 MulticastRequest Re-entrancy All device driver SWIs are potentially re-entrant, i.e. the protocol modules are not expected to take any explicit action to prevent re-entrance. If re-entrancy is undesirable during the processing of a SWI, then the device driver should take explicit steps to prevent this, i.e. by disabling interrupts. In order to minimise interrupt latency within the machine as a whole, device drivers should take steps to minimise the length of time during which interrupts are disabled. Byte Sex All data passed between the protocol modules and device drivers use the host byte sex, i.e. little-endian, whereas all data transmitted over the wire are (obviously) in network, or big-endian, byte sex. Device drivers are responsible for converting data to the appropriate sex. 5.1 Errors All the SWI descriptions given later in this section ignore what will happen if the SWI needs to return an error. Obviously, there will be circumstances in which it is necessary to return an error from a SWI call, and the standard RISC OS mechanism is used, i.e. the device driver will set the V flag, and return with R0 pointing to an error block. Apart from the unavoidable corruption of R0, all other registers which are declared as being preserved by the SWI are still preserved, even when an error is returned. Error numbers usually equate to Unix error numbers, as defined in the standard header file "errno.h"; these numbers are always less than 128, and are converted into offsets within the standard error block that has been defined for DCI 4 and Internet. This error block starts at &20E00, for example, the error EINVAL (invalid argument, defined as 22, &16) would be returned as error number &20E16. There are certain circumstances (e.g., the Transmit SWI indicating that transmission is blocked) where an appropriate Unix error number does not exist --- in this situation, a custom error number is defined specifically for this one error condition. Standardised Errors In an attempt to force some consistency, this sub-section defines some of the errors which various SWIs, and the circumstances in which these errors may be returned. This is not meant to be an exhaustive list, merely to cover all the errors explicitly mentioned in this document, plus some other common faults. SWI Name Unix Error Error Number Circumstances (Any) EINVAL 0x20e16 Incorrect flags word in r0 ENXIO 0x20e06 Invalid unit number supplied SetNetworkMTU ENOTTY 0x20e19 Illegal op for device. Transmit - 0x20e86 Transmission is blocked. ENETDOWN 0x20e32 Network hardware is down. EMSGSIZE 0x20e28 Frame length > network MTU. ENOBUFS 0x20e37 Not enough mbufs available. Filter - 0x20e87 Frame type already claimed. EINVAL 0x20e16 Trying to claim illegal frame type EINVAL 0x20e16 Trying to release a non-existent claim. EPERM 0x20e01 Trying to free another protocol's claim. MulticastRequest EINVAL 0x20e16 Trying to claim illegal frame type EINVAL 0x20e16 Trying to release a non-existent claim. 5.2 Changes in DCI 4 The device driver SWI interface has been given an extensive overhaul for DCI 4: the complete break between this, and older versions of the DCI (as explained in the section about backwards compatibility 3.1) mean that all SWIs, even DCIVersion can be altered without worrying about the impact on non-DCI 4 modules within a machine. The major changes made for DCI 4 are: 1. All SWIs now use R0 as a flag word: this provides an easy route to alter SWI functionality in any future versions of the DCI that prove to be necessary. All bits of this flag word should be set to zero, except where explicitly stated otherwise. 2. Several new SWIs have been added, i.e. - SetNetWorkMTU - Inquire - Filter - Stats - MulticastRequest (as of DCI 4.04) 3. SWI NetworkMTU has been renamed to GetNetworkMTU, to differentiate it from the new SWI SetNetworkMTU. 4. SWI NetworkIfSend has been renamed to Transmit, mainly because it is less of a mouthful. 5. Some old SWIs have been deleted (see below). 6. Offsets of SWIs within a driver's SWI chunk have been changed to fill in the gaps left by the deleted SWIs; for example, DCIVersion has been moved from offset 4 to the more logical offset of 0. Deleted SWI Details As mentioned above, some of the SWIs from earlier versions of the DCI have been removed from DCI 4. These SWIs are: NetworkIfStart --- The decision on whether network hardware should be enabled lies with the device driver, not with a protocol module (consider the situation where one protocol module believes that the hardware should be enabled, while a different module is of the opinion that it should be disabled). As far as protocol modules are concerned, they can only reasonably expect the hardware to be enabled when they have declared an interest in one or more frame types; if they have no declared interests, then whether the hardware is enabled or not is of no significance to them. NetworkIfUp --- this SWI has been removed for the same reasons as NetWorkIfStart (q.v.). NetworkIfDown --- this SWI has been removed for the same reasons as NetWorkIfStart (q.v.). TxEventRequired --- DCI 4 no longer uses events to communicate between device drivers and protocol modules, therefore this SWI has become redundant. 5.3 SWI Descriptions DCIVersion SWI (dib_swibase + 0) On entry: r0 = flags (all bits must be zero) On exit: r1 = Supported DCI version number (this version = 404 decimal) All other registers preserved Use: Returns DCI major and minor version numbers supported by the device driver. The supported DCI version number is calculated as (major version „× 100) + minor version. Note: earlier versions of the DCI only returned the major version, i.e. 1 or 2 (as opposed to 100 or 200). There was no formal version 3 of the DCI. Inquire SWI (dib_swibase +1) On entry: r0 = flags (all bits must be zero) r1 = unit number On exit: r2 = Bitmap of supported features (see below) All other registers are preserved. Use: This SWI is used to ascertain the characteristics of a device driver. The flag bits within r2 are: Bit 0: Multicast reception is supported Bit 1: Promiscuous reception is supported Bit 2: Interface receives its own transmitted packets Bit 3: Station number required. Bit 4: Interface can receive erroneous packets Bit 5: Interface has a hardware address Bit 6: Driver can alter interface's hardware address Bit 7: Interface is a point to point link Bit 8: Driver supplies standard statistics Bit 9: Driver supplies extended statistics Bit 10: This is a virtual interface Bit 11: This virtual interface is software based Bit 12: This interface can selectively receive multicast packets (ie SWI MulticastRequest available) Bits 13--31: Reserved, must be zero. Most of these flags are self-explanatory; "Station number required" (bit 3) is used by AUN software to find out whether the underlying network requires a fixed "pseudo-Econet" station number (i.e. set in CMOS RAM), or whether a dynamic station number allocation mechanism can be employed. For example, physical Econet requires a fixed station number and its driver should set bit 3 of the flags, but Ethernet does not, and any such driver should leave bit 3 clear. The concept of virtual interfaces (bits10 and 11) is explained in section 9.2. Some of these characteristisc are inter-related, specifically: - If bit 5 (interface has a hardware address) is not set, then bit 6 (driver can alter hardware address) is ignored. - A driver cannot supply extended statistics (bit 9), without also supplying standard statistics (bit 8). - If bit 11 (virtual interface is software based) is set, then bit 10 (this is a virtual interface) should always be set as well. GetNetworkMTU SWI (dib_swibase + 2) On entry: r0 = flags (all bits must be zero) r1 = unit number On exit: r2 = MTU All other registers preserved Use: This SWI returns the MTU (Maximum Transmission Unit) for the unit specified in r0. Ethernet has a fixed MTU of 1500 bytes, other hardware layers (e.g. PPP) may have a variable MTU. Note: this SWI has changed with respect to earlier versions of the DCI, in as much as 1. There is now the standard flags word in r0. 2. A unit number is now passed in r1. 3. Results are returned in r2. 4. A default return (r2 = 0), implying Ethernet MTU is no longer supported. SetNetworkMTU SWI ( dib_swibase + 3) On entry: r0 = flags (all bits must be zero) r1 = unit number r2 = new MTU On exit: All registers are preserved. Use: For those device drivers that allow it (e.g. PPP), this SWI sets the Maximum Transmission Unit for the unit given in r1. If the device driver has an immutable MTU, then it must still support this SWI, but return an error indicating an illegal operation. Note: protocol modules can only ever consider this MTU as a guideline --- other protocol modules may set a different MTU for the same logical unit. Transmit SWI (dib_swibase + 4) On entry: r0 = flags (see below) r1 = unit number r2 = frame type r3 = pointer to mbuf chains containing data to transmit r4 = (byte aligned) pointer to destination hardware address r5 = (byte aligned) pointer to source hardware address (if applicable) On exit: All registers are preserved. Use: This SWI is a request from the protocol module for the device driver to send the packet addressed by r3 to the hardware address specified in r4. The "frame type" passed in r2 is something of a misnomer: the value given is copied into the last 2 bytes of an Ethernet frame header, i.e. it is the length field according to the IEEE 802.3 spec., and the frame type as far as Ethernet 2.0 is concerned. If a previous frame is still being transmitted, the driver should queue the new request if possible, otherwise return an error indicating that transmission is blocked. The flag bits within r0 are: Bit 0: 0 --- Use interface's own hardware address. 1 --- Use address given by r5 for source hardware address. Bit 1: 0 --- Device driver assumes ownership of memory resources 1 --- Protocol module retains ownership of memory resources Bits 2--31: Reserved, must be zero. Regardless of who initially allocated the memory resources (i.e. mbuf chains) passed in r3, it is the new owner of these resources (i.e. the device driver if r0, bit1 = 0; the protocol module if r0, bit 1 = 1) that is responsible for returning these resources to the free pool when they are no longer needed. This SWI uses the scheme, described in section 6.3 for linking several received mbuf chains, to pass multiple output chains to the device driver via asingle call to this SWI. Care must be taken to ensure that the flag bits in r0 are applicable to all , mbuf chains passed to the driver. The data passed to the driver can be either "safe", or "unsafe" (section 8.3 explains the concept of unsafe data) --- if the device driver is given ownership of memory resources, and needs to keep these resources after the Transmit SWI has finished, then it must use the ensure_safe function of the memory manager (section 8.2) to obtain a safe copy of the data. Note: the register numbers for this call have changed from earlier versions of the DCI, this change being made in an attempt to standardise register usage as far as possible. Filter SWI (dib_swibase + 5) On entry: R0 = flags (see below) R1 = unit number R2 = frame type R3 = address level (for write) R4 = error level (for write) R5 = private word pointer R6 = address of handler routine for received frames On exit: All registers are preserved. Use: This SWI is the mechanism by which protocol modules inform device drivers which Ethernet frame types they would like to be passed. A full description of this interface is provided in section 6.2. The flag bits within R0 are: Bit 0: 0 --- Claim a frame type. 1 --- Release a previous claim on the frame type. Bit 1: 0 --- Device drivers can pass unsafe mbuf chains to the receive handler 1 --- Device drivers should ensure_safe mbuf chains before passing them to the receive handler. Bit 2: 0 --- The protocol module wants all multicast frames (if indicated by R3) 1 --- The protocol module will ask for specific multicast frames. Bits 3--31: Reserved, must be zero. The private word pointer passed in R5 is the address of the protcol module's private word, which itself contains the address of the module's workspace. This pointer is passed round in R0 by the protocol module in the Service_DCIProtocolStatus service call. When a device driver receives a network frame of a type claimed by a protocol module, it will call the routine given in r6. Section 6.3 describes the parameters which must be passed to this received frame handler. The concept of safe and unsafe data, as used in bit 1 of the flags is explained in section 8.3. This SWI should return an error when: - An illegal frame type is claimed. - A frame type is already claimed. If a protocol module receives this error from a device driver, it can use Service_DCIFrameTypeFree to learn when the frame type is again free for claiming. - An attempt is made to free a frame type which has not been previously claimed by the protocol module. Stats SWI (dib _swibase + 6) On entry: r0 = flags (see below) r1 = unit number r2 = pointer to buffer for holding results On exit: All registers are preserved. Use: This SWI is the mechanism by which device drivers return statistics they have gathered while running. A full description of these statistics, including the structure copied into the the addressed by r2 is provided in section 7. The flag bits within r0 are: Bit 0: 0 --- Return an indication of which statistics are gathered. 1 --- Return the statistics themselves. Bits 1--31: Reserved, must be zero. The buffer addressed by r2 must be large enough to hold the full statistics structure, i.e. at least 100 bytes long; the driver is free to copy that many bytes into the buffer without thought for the consequences if the buffer is too small. MulticastRequest SWI (dib_swibase + 7) On entry: R0 = flags (see below) R1 = unit number R2 = frame type R3 = (byte aligned) pointer to multicast hardware (MAC) address R4 = (word aligned) pointer to multicast logical address (eg pointer to IP address for frame type 0x800) R5 = private word pointer R6 = address of handler routine for received frames On exit: All registers are preserved. Use: This SWI is the mechanism by which protocol modules specify which destination multicast addresses they wish to receive. The flag bits within R0 are: Bit 0: 0 --- Request a multicast address 1 --- Release a multicast address Bit 1: 0 --- Requesting/releasing specific multicast address (as specified by R3,R4) 1 --- Requesting/releasing all multicast addresses (R3 and R4 irrelevant) Bits 2--31: Reserved, must be zero. If a protocol module calls the Filter SWI with bit 2 of R0 clear, then it will receive all multicast frames (if the address level is multicast or promiscuous). If, however, it sets bit 2 of R0, and the address level is multicast, then it will initially receive no frames (usually -- see below). To start to receive certain multicasts, it should call this SWI. R3 will point to a MAC address -- Ethernet drivers will use only this. Non-Ethernet drivers will probably need to know what logical address is being requested, as there may not be a one-to-one mapping between the logical and hardware multicast addresses for the specified frame type (as indeed there isn't for IP). Contact Acorn for details of what to pass in R4 for specific frame types. R1, R2, R5, and R6 must match the values passed into the Filter SWI so that the device driver can tell which filter this call is intended for. It is not expected that the device driver will do software filtering of multicasts (beyond ensuring that specific and broadcast filters don't receive any multicasts). This is up to the protocol modules. The intention of this SWI is that it should be used to set up hardware filtering where possible; protocol modules may receive more multicasts than they requested. For example, if one protocol module is using selective multicasts, while another, older protocol module isn't, the selective module will probably end up receiving all multicasts because the hardware filtering will have had to be switched off for the unselective protocol module. This actually aids compatibility with DCI 4.03 driver modules -- a new protocol module need only set bit 2 of R0 when calling Filter, then ignore any "SWI not known" errors from the MulticastRequest SWI. It will then work fine with older drivers. Device drivers will need to track which filters are requesting which multicast addresses, so that when a filter is released or a protocol module dies all its multicast claims can be automatically removed. However, as specified above, there is no need to check whether a multicast filter has requested a specific multicast address before passing a received frame to it. This SWI provides no function for filters with an address level other than ADDRLVL_MULTICAST, and if called for such a filter should return EINVAL. 6 Received Frames One of the major changes between DCI 4 and earlier DCI versions is the scheme used for handling received frames. The main changes introduced with this version are: 1. Support for multicast and promiscuous frames. 2. Improved handling of IEEE 802.3 format frames. 3. Protocol modules are informed of received frames with a direct call into a handler routine, rather than via an event. The principle of operation is that protocol modules register an interest in one or more frame types with a device driver, defining various filtering parameters in the process. When a device driver receives a network frame, it uses the frame type and filtering parameters to decide which (if any) protocol module should be passed the frame. Any one received frame can be passed to one or no protocol modules, it is not possible for a single frame to be given to multiple protocol modules. 6.1 Frame Class --- Ethernet 2 and IEEE 802.3 All Ethernet frames have a 14 byte MAC header: 6 bytes of destination hardware address, 6 bytes of source hardware address, and 2 more bytes. Unfortunately, there are two competing "standards" which place a different interpretation on these last 2 bytes: Ethernet 2.0, which considers them as 16 bits of frame type, and IEEE 802.3 which treats them as 16 bits of frame length. It is obviously not possible to refer to Ethernet 2.0 and IEEE 802.3 as different types of frame, so this document uses the term "class" to refer to the property of being either an Ethernet 2.0, or an IEEE 802.3 frame. Since all Ethernet frames must be no more than 1500 bytes long, a device driver should assume that any received frame with an Ethernet 2.0 "frame type" of 0--1500 is an IEEE 802.3 frame, and that everything else is an Ethernet 2.0 frame. (Note although Ethernet frames should be padded to a minimum length of 46 bytes, frame lengths < 46 are still legal values). All IEEE 802.3 class frames should also conform to the IEEE 802.2 standard for Logical Link Control --- this latter standard defines a set of services to be supported, and provides a method to identify the type of an 802.3 class frame; the implementation of this IEEE 802.2 Logical Link Control layer cannot be the responsibility of any specific protocol module. and it would be inefficient to make each device driver responsible for the implementation, so DCI 4 caters for the scheme shown in figure 1. (Note: It cannot be protocol modules for two reasons: 1. Protocol modules are frame type specific, whereas the standard services which an IEEE 802.2 implementor must provide are frame type independent. 2. The software that implements the IEEE 802.2 layer will be expected to filter frame types, and pass them along to protocol modules; therefore, obviously, this software cannot be a standard protocol module itself. +------------------+ +------------------+ | | | | | Ethernet 2.0 | | IEEE 802.3 | | Protocols | | Protocols | | | | | +------------------+ +------------------+ /|\ /|\ /|\ /|\ | | | | | | \|/ \|/ | | +------------------+ | | | | | | | IEEE 802.2 | | | | Implementor | | | | | | | +------------------+ | | /|\ | | | \|/ \|/ \|/ +-------------------------------------------+ | | | Device | | Drivers | | | +-------------------------------------------+ Figure 1: Filtering Ethernet 2.0 and IEEE 802.3 class frames In this scheme, device drivers can differentiate between the two frame classes, and, furthermore, can distinguish Ethernet 2.0 frame types. However, no effort is made to ascertain frame types for IEEE 802.3 frames, and all frames of this class are passed to a pseudo-protocol module which implements the IEEE 802.2 Logical Link Control layer, and which provides a similar interface to DCI 4, allowing IEEE 802.3 protocol modules to claim specific frame types. 6.3 Frame Filtering A protocol module uses the Filter SWI to identify a number of criteria which a received frame must match before being passed by the device driver to the protocol module; these criteria are - Frame type - Address level - Error level Only one protocol module is allowed to claim any given frame type, and when claimed, that frame type is never passed to any other protocol module. For example, if one protocol module has claimed a frame type with an address filter of specifically addressed packets only, then a second protocol module: 1. cannot claim the same frame type with an address level of promiscuous. 2. can claim all frame types not specifically registered, with (e.g.) an address level of multicast, but will not be passed any broadcast frames of the type claimed by the first protocol module (which will not receive the frame either, because the address level will filter out broadcast packets). Frame Type DCI 4 splits the 32-bit frame type into two 16-bit subfields --- the hi-order 16 bits specify the frame class and level, while the lo-order 16 bits provide the exact frame type (where significant). Expressed in C format, the class/level subfield can take the following values: #define FRMLVL_E2SPECIFIC 0x0001 #define FRMLVL_E2SINK 0x0002 #define FRMLVL_E2MONITOR 0x0003 #define FRMLVL_IEEE 0x0004 All other values for this subfield are illegal --- any attempt to use them in a Filter SWI should generate an error; similarly, if the hi-order subfield of the frame type is FRMLVL_E2SPECIFIC, then the lo-order subfield can take any value from 0x0000 -- 0xffff, otherwise it must be set to 0x0000, and any other value passed to Filter should be treated as an error. The precise meanings of the class/level subfield values are: Specific: this is the standard frame level filter --- the protocol module is only passed Ethernet 2.0 frames whose type match that given in the lo-order, frame type subfield. Sink: pass all Ethernet 2.0 frames that are not explicitly claimed by any protocol module. Monitor: pass all Ethernet 2.0 frames to the protocol module. For Ethernet 2.0 frames, the table below gives a summary of what frame levels are allowed on new claims, given the highest level of filtering currently active (monitor is considered higher than sink, and both of these levels are considered higher than normal) Highest Current Level New Levels Allowed --------------------------- ---------------------------- (Nothing) Normal, Sink, Monitor Normal Normal, Sink Sink Normal Monitor (nothing) Address Level The four levels of address level filtering can be expressed in C as #define ADDRLVL_SPECIFIC 0 #define ADDRLVL_NORMAL 1 #define ADDRLVL_MULTICAST 2 #define ADDRLVL_PROMISCUOUS 3 These levels are: Specific: only pass frames addressed to the interface's specific hardware address. Normal: only pass frames addressed to the interface's specific hardware address, and broadcast frames. Multicast: pass all specifically addressed, broadcast and multicast frames. If bit 2 of R0 was set on entry to the DCI Filter SWI, and the DCI Inquire SWI returns with bit 12 set, then the driver should attempt to filter multicast frames -- see the DCI MulticastRequest SWI for details. Otherwise, all multicast frames will be passed. Promiscuous: pass all frames of the appropriate frame type, with no address matching at all. Most Ethernet controllers can perform this address filtering at a hardware level, but, obviously, the hardware needs to be configured to the loosest level of filtering requested by any protocol module. In the situation where two protocol modules have specified two different levels of address filtering, the device driver must still filter out unwanted frames; protocol modules are only responsible for filtering out any unwanted subset of multicast frames. Error Level A device driver should provide two levels of error filtering, in C these are #define ERRLVL_NO_ERRORS 0 #define ERRLVL_ERRORS 1 These levels are: No errors: only pass frames that are received error free. Errors: pass all frames, regardless of error state. 6.3 Received Frame Handlers A major difference between DCI 4 and earlier versions of the DCI is the method used to notify protocol modules of the arrival of frames in which they have registered an interest. The main features of these receive handlers are 1. Device drivers call a direct entry point within the protocol module (earlier DCI versions used a receive event). The address of this direct entry point is passed to the device driver at the same time the frame type is claimed via the Filter SWI . 2. A device driver can pass several received frames to the protocol module with one call to the receive handler, rather than having to call the protocol module once per frame. 3. The protocol module becomes the new owner of all mbufs passed to its receive handler by device drivers: it is the protocol module that is responsible for freeing all resources once they are no longer needed. Handler Details On entry: r0 = pointer to Driver Information Block describing the source interface r1 = pointer to head of mbuf list of received frames r12 = protocol module's private word pointer, i.e. value passed in r5 to Filter SWI On exit: All registers preserved. Interrupt Status: Both interrupts and fast interrupts are enabled by the received frame handler. Details of the exact structure of the mbuf list of received frames are given in the section below. Each received frame has a header which can be described in terms of the following C structure: struct rx_hdr { void *rx_ptr; unsigned int rx_tag; unsigned char rx_src_addr[6], _spad[2]; unsigned char rx_dst_addr[6], _dpad[2]; unsigned int rx_frame_type; unsigned int rx_error_level; } The fields in this structure are: rx _ptr This field is for internal use by the receive handler, its value is undefined upon entry. rx _tag This field is reserved for use by the IEEE 802.2 implementor, and must be set to zero by the device driver. rx _src _addr The hardware source address of the frame. (Must be zeroed if hardware addresses not supported) _spad Space filler to align the next field (dst_addr) on a word boundary. Must be zero filled. rx _dst _addr The hardware destination address of the frame. (Must be zeroed if hardware addresses not supported) _dpad Space filler to align the next field (frame_type) on a word boundary. Must be zero filled. rx _frame _type The length (for IEEE 802.3), or the type (for Ethernet 2.0) of the received frame, i.e. the last 2 bytes of the frame's MAC header. rx _error _level This field is zero if the frame was received with no errors, otherwise it contains a driver specific error code. This frame header is passed in the first mbuf of each frame, the first byte of the frame data is in the second mbuf in the chain. Mbuf Chaining An mbuf contains two fields which point to the next mbuf in a linked list, specifically m_next --- typically used to link mbufs in a chain. m_list --- typically used to link separate mbuf chains together. When a device driver calls a protocol module's receive handler, it uses a single mbuf chain to hold each received frame, and can link several frames together for passing via the single call. Figure 2 shows how m_next and m_list are used to link chains of mbufs together into a list. Note that, although this structure allows different frame types to be passed to the protocol module (because the first mbuf+ in each chain contains a struct rx_hdr which includes the frame_type field), the receive handler is only given a single Driver Information Block, and therefore all the frames passed in any one call to the handler must come from a single unit. +--------+ +--------+ +--------+ +--------+ +------+ | m_next |---->| m_next |---->| m_next |---->| m_next |---->| NULL | +--------+ +--------+ +--------+ +--------+ +------+ | | | | | | | | | | | | | | | | | | | | | | | | +--------+ +--------+ +--------+ +--------+ | m_list | | | | | | | +--------+ +--------+ +--------+ +--------+ | | |/ +--------+ +--------+ +------+ | m_next |---->| m_next |---->| NULL | +--------+ +--------+ +------+ | | | | | | | | | | | | +--------+ +--------+ | m_list | | | +--------+ +--------+ | | |/ +--------+ +--------+ +--------+ +------+ | m_next |---->| m_next |---->| m_next |---->| NULL | +--------+ +--------+ +--------+ +------+ | | | | | | | | | | | | | | | | | | +--------+ +--------+ +--------+ | m_list | | | | | +--------+ +--------+ +--------+ | | |/ +------+ | NULL | +------+ Figure 2: Linking mbuf chains 7 Statistics 7.1 Introduction The Inquire SWI makes mention of device drivers supporting both standard, and extended statistics interfaces. This version of the DCI does not define an extended statistics interface, but it does define a standard stats. interface, and that is what this section is all about. This document defines what it considers to be the definitive list of network parameters, and the driver maintains a subset of these (remembering that the whole set is a valid subset). An independent set of statistics is maintained for each unit that the driver controls. The Stats SWI serves two purposes: 1. It identifies which statistics the driver gathers for a particular unit. 2. It allows reading of the gathered statistics. 7.2 Data Structures The statistics structure is written in C code as: struct stats {* general information */ unsigned char st_interface_type; unsigned char st_link_status; unsigned char st_link_polarity; unsigned char st_blank1; unsigned long st_link_failures; unsigned long st_network_collisions; /* * transmit statistics */ unsigned long st_collisions; unsigned long st_excess_collisions; unsigned long st_heartbeat_failures; unsigned long st_not_listening; unsigned long st_net_error; unsigned long st_tx_frames; unsigned long st_tx_bytes; unsigned long st_tx_general_errors; unsigned char st_last_dest_addr[8]; /* * receive statistics */ unsigned long st_crc_failures; unsigned long st_frame_alignment_errors; unsigned long st_dropped_frames; unsigned long st_runt_frames; unsigned long st_overlong_frames; unsigned long st_jabbers; unsigned long st_late_events; unsigned long st_unwanted_frames; unsigned long st_rx_frames; unsigned long st_rx_bytes; unsigned long st_rx_general_errors; unsigned char st_last_src_addr[8]; }; The fields within this structure are: st _interface _type A single byte coding the specific hardware interface type. Values so far defined are: Code Interface type 1 10Base5 2 10Base2 3 10BaseT 4 Combination 10Base5/10Base2 5 Combination 10Base2/10BaseT 6 Reduced Squelch 10BaseT 7 Acorn Econet 8 Serial line 9 Parallel port 10 Combination 10Base5/10Base2/10BaseT st _link _status A bitfield describing the current state of the interface; significant bits are: Bit 0: 0 --- Interface bad (i.e. self-test failed). 1 --- Interface OK. Bit 1: 0 --- Interface is inactive. 1 --- Interface is active. Bits 2--3: Describe the currently configured receive level as follows: 00 --- Accept directly addressed frames only. 01 --- Accept directly addressed and broadcast frames only. 10 --- Accept direct, broadcast, and multicast frames. 11 --- Promiscuous mode, accept all frames. Bits 4--7: Reserved, must be zero. st _link _polarity Indicates polarity of network connection; contains either 1 (polarity correct), or 0 (polarity incorrect). st _blank1 Unused, must be set to zero. st _link _failures Counts the number of times a good link went away. st _network _collisions Counts the total number of collisions on the network. st _collisions The number of times a collision has occured when trying to transmit a packet. st _excess _collisions A count of excess transmit collisions. st _heartbeat _failures The number of times the Signal Quality Error Test failed to detect a collision. st _not _listening A count of the number of times when the remote station was not listening. This statistic will usually be specific to Acorn Econet. st_net_error General TX error (Typically used by EconetA module) st _tx _frames The total number of frames transmitted since driver initialisation. st _tx _bytes The total number of bytes transmitted since driver initialisation. st _tx _general _errors A count of the number of non-specific network errors that occured during transmission. st _last _dest _addr Hardware address of the last interface to which a frame was sent. st _jabbers The number of times the interface was caught jabbering. st _unwanted _frames The number of frames received, but not claimed by any protocol module. st _rx _frames The total number of frames received since driver initialisation. st _rx _bytes The total number of bytes received since driver initialisation. st _tx _general _errors A count of the number of non-specific network errors that occured during frame reception. st _last _src _addr Hardware address of the last interface from which a frame was received. Note that statistics are gathered for all frames received --- even if a driver subsequently decides that no protocol wants a given frame, that frame still appears in the relevant receive statistics (i.e. st_rx_bytes, st_last_src_addr etc.). 7.3 Statistics The basic interface for reading statistics from a driver is the Stats SWI, outlined in section 5.3 There are two different forms of this SWI, selected by bit 0 of r0 --- the first form is used to determine which statistics are supported by the driver, while the second form is used to read the statistics. To indicate which statistics it supports, a device driver returns a statistics structure with all bits in those fields it does support set to 1, and all bits in those fields it doesn't support set to 0. Those fields which are a variable length (i.e. st_last_dest_addr & st_last_src_addr) use the same mechanism to indicate which parts of the field are valid. For example, a standard Ethernet interface which uses 6-byte hardware addresses would return st_last_src_addr set to 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, whereas a PPP driver which does not use hardware addresses, and therefore would not support this field would return it set to 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00. When returning the statistics, all multi-byte fields are returned with host byte ordering. 8 Memory Management 8.1 Introduction In all versions of the DCI, data pass across the interface between protocol modules and device drivers in "mbufs". These are based upon the data structures originally developed for handling network data within BSD Unix kernels. Mbufs within DCI 4 are noticeably different from their brethren, both those from BSD, and those from earlier versions of the DCI, the main distinctions being: - Mbufs and the data they describe no longer occupy a contiguous piece of memory. - It is no longer the responsibility of protocol modules to allocate and maintain pools of free memory --- DCI 4 introduces a single, centralised, memory manager module which all protocol and device driver modules claim memory from in the form of mbufs. - The set of function calls and macros for manipulating mbufs (i.e. those operations defined in mbuf.c and mbuf.h) provided by the new memory manager module are completely changed from those used in earlier versions of the DCI. Any module being upgraded to DCI 4 will have to have all these calls changed to the new versions. 8.2 Memory Manager Module Overview Memory management for packet storage in earlier versions of the DCI is performed with mbufs. DCI 4 also uses mbuf based packet storage, but there are some differences. These differences are for the following reasons: 1. Correct design oversights in previous versions of the DCI. 2. Provide a more modular, and upgradeable, system. 3. Offer single mbuf arbiter with optimised routines available to all DCI4 components. The memory manager, the arbiter module, is central to the DCI4 mbuf scheme. It performs most of the low level work associated with mbufs, as well as relieving both protocol and client modules of some tedium. A complete specification of the memory manager is available separately; this document is designed to guide the reader conversant with "traditional" mbufs through using DCI4 mbufs. Communication with the memory manager is centred around an mbctl structure. This is stored in the client's memory, and is mainly initialised by the memory manager to contain useful information, including the addresses of a number of routines within the memory manager for the client to call directly. Direct entry points are designed to permit the easy inter-operation of assembler and APCS code (such as that generated by the NorCroft C compiler), and roughly obey APCS. A list of entry/exit characteristics follows (using APCS register naming convention): 1. a1 always points at an mbctl structure for all direct entry calls 2. the processor must be in supervisor mode 3. a1--a4 are the only parameter registers 4. a2--a4 and ip are corrupted by the call 5. a1 is either the call result or corrupted 6. other registers preserved by call 7. the processor flags are preserved by the call 8. no V set error convention (incompatible with APCS) 9. in general, an error results in a1=0 on exit 10. IRQ state preserved across call 11. IRQs may be disabled during calls 12. IRQs may be enabled during calls ONLY if specifically documented 13. FIQs assumed enabled on entry 14. FIQs preserved across calls Currently, no direct entry point routine will enable interrupts if they are disabled on entry. The header file mbuf.h provides some macros to manage interfacing with the memory manager routines. These direct entry points provide access to allocator and free routines, along with a whole host of support routines. Rather than each protocol implementing it's own mbuf scheme, and each device driver having to choose the correct mbuf pool to allocate from, all protocols and all device drivers perform their allocations and frees via these direct entry points. Structures The new memory manager module uses two main data structures: the mbuf structure, each one of which describes a piece of atomically allocated memory, and struct mbctl, the control struture which describes the exact interface between the memory manager and one of its clients. Note that, although a struct mbuf is recognisably similar to the structure used in "traditional" memory management schemes, there are enough differences in the new structure to render it incompatible with the macros defined in the traditional versions of mbuf.h. The new definition of an mbuf is shown below: typedef struct mbuf { struct mbuf *m_next; /* next mbuf in chain */ struct mbuf *m_list; /* next mbuf in list (clients only) */ ptrdiff_t m_off; /* current offset to data from * mbuf itself */ size_t m_len; /* current byte count */ const ptrdiff_t m_inioff; /* original offset to data from * mbuf itself */ const size_t m_inilen; /* original byte count (for * underlying data) */ unsigned char m_type; /* client use only */ const unsigned char m_sys1; /* mbuf manager use only */ const unsigned char m_sys2; /* mbuf manager use only */ unsigned char m_flags /* client use only */ struct pkthdr m_pkthdr; /* client use only */ } dci4_mbuf; The MLEN macro value is no longer directly applicable --- each mbuf must have its maximum size checked individually. Likewise, reseting m_off now requires examining the m_inioff field of the mbuf - just setting it to zero is no longer good enough. m_act has been renamed m_list. m_inilen and m_inioff are provided to replace the MMINOFF and MMAXLEN macros, as the values are now dependent upon the particular mbuf in question. m_indir has disappeared --- specific routines exist for determining if an mbuf chain contains unsafe data. The mbuf structure has been separated from the underlying storage it describes. The underlying storage blocks may now be different sizes (128 and 1536 byte blocks are currently used). Earlier versions of the DCI had big mbufs, but they were weakly defined and dtom did not work with them. DCI 4 corrects both these points --- indirect mbufs, which at times were not distinguishable from large mbufs, have been formalised into unsafe mbufs. As an mbuf chain passes around the system, ownership of that chain is also transferred. Ownership brings with it the responsibility to free the mbuf chain (unless it is transferred to another component, although this is unlikely). The other crucial structure in DCI 4 memory management is mbctl: typedef struct mbctl {reserved for mbuf manager use in establishing context */ int opaque; /* mbuf manager use only */ /* Client initialises before session is established */ size_t mbcsize; /* size of mbctl structure from * client */ unsigned int mbcvers; /* client version of mbuf manager * spec */ unsigned long flags; /* */ size_t advminubs; /* Advisory desired minimum * underlying block size */ size_t advmaxubs; /* Advisory desired maximum * underlying block size */ size_t mincontig; /* client required min * ensure_contig value */ unsigned long spare1; /* Must be set to zero on * initialisation */ /* Mbuf manager initialises during session establishment */ size_t minubs; /* Minimum underlying block size */ size_t maxubs; /* Maximum underlying block size */ size_t maxcontig; /* Maximum contiguify block size */ unsigned long spare2; /* Reserved for future use */ /* Allocation routines */ struct mbuf * /* MBC_DEFAULT */ (* alloc) (struct mbctl *, size_t bytes, void *ptr); struct mbuf * /* Parameter driven */ (* alloc_g) (struct mbctl *, size_t bytes, void *ptr, unsigned long flags); struct mbuf * /* MBC_UNSAFE */ (* alloc_u) (struct mbctl *, size_t bytes, void *ptr); struct mbuf * /* MBC_SINGLE */ (* alloc_s) (struct mbctl *, size_t bytes, void *ptr); struct mbuf * /* MBC_CLEAR */ (* alloc_c) (struct mbctl *, size_t bytes, void *ptr); /* Ensuring routines */ struct mbuf * (* ensure_safe) (struct mbctl *, struct mbuf *mp); struct mbuf * (* ensure_contig) (struct mbctl *, struct mbuf *mp, size_t bytes); /* Freeing routines */ void (* free) (struct mbctl *, struct mbuf *mp); void (* freem) (struct mbctl *, struct mbuf *mp); void (* dtom_free) (struct mbctl *, struct mbuf *mp); void (* dtom_freem) (struct mbctl *, struct mbuf *mp); /* Support routines */ struct mbuf * /* No ownership transfer though */ (* dtom) (struct mbctl *, void *ptr); int /* Client retains mp ownership */ (* any_unsafe) (struct mbctl *, struct mbuf *mp); int /* Client retains mp ownership */ (* this_unsafe) (struct mbctl *, struct mbuf *mp); size_t /* Client retains mp ownership */ (* count_bytes) (struct mbctl *, struct mbuf *mp); struct mbuf * /* Client retains old, new ownership */ (* cat) (struct mbctl *, struct mbuf *old, struct mbuf *new); struct mbuf * /* Client retains mp ownership */ (* trim) (struct mbctl *, struct mbuf *mp, int bytes, void *ptr); struct mbuf * /* Client retains mp ownership */ (* copy) (struct mbctl *, struct mbuf *mp, size_t off, size_t len); struct mbuf * /* Client retains mp ownership */ (* copy_p) (struct mbctl *, struct mbuf *mp, size_t off, size_t len); struct mbuf * /* Client retains mp ownership */ (* copy_u) (struct mbctl *, struct mbuf *mp, size_t off, size_t len); struct mbuf * /* Client retains mp ownership */ (* import) (struct mbctl *, struct mbuf *mp, size_t bytes, void *ptr); struct mbuf * /* Client retains mp ownership */ (* export) (struct mbctl *, struct mbuf *mp, size_t bytes, void *ptr); } dci4_mbctl; end{verbatim } Some of the fields the client initialises are present to permit future versions of the memory manager to tune themselves as tightly as possible to the setup they are asked to support. Note that dtom is no longer a macro. Don't worry --- it's efficient assembler, and it works with all sizes of mbuf the memory manager cares to use. Using the memory manager module Basic use of the memory manager is performed as follows 1. module loads and looks for memory manager 2. if memory manager is absent, module goes into a pre-active state, awaiting the arrival of the memory manager 3. once the memory manager is present, a "session" is opened with it 4. the device driver/protocol may now become active if it is pre-active (this might involve delaying arrival service calls until now) 5. the device driver/protocol uses direct entry points to communicate with the memory manager 6. the device driver/protocol is about to die --- it first closes the open session with the memory manager 7. the device driver/protocol can now die Initialisation with the memory manager is performed with code something like: static _kernel_oserror *open_mbuf_manager_session(void) { _kernel_swi_regs r ; memset(&mbctl, 0, sizeof(struct mbctl)); mbctl.mbcsize = sizeof(struct mbctl); mbctl.mbcvers = MBUF_MANAGER_VERSION; mbctl.flags = 0; mbctl.advminubs = 0; mbctl.advmaxubs = 0; mbctl.mincontig = 0; mbctl.spare1 = 0; r.r[0] = (int) &mbctl; return(_kernel_swi(XOS_Bit Mbuf_OpenSession, &r, &r)); } Finalisation is performed with code something like: static _kernel_oserror *close_mbuf_manager_session(void) {r ; r.r[0] = (int) &mbctl; return(_kernel_swi(XOS_Bit Mbuf_CloseSession, &r, &r)); } A quick summary of the available direct entry point routines: alloc: standard allocator. Can import data, but cannot zero the underlying storage, force single mbuf allocation or allocate unsafe data. alloc_g: the allocator which can emulate the other allocator functions. alloc_u: allocate unsafe mbufs alloc_s: force allocation to a single mbuf alloc_c: clear the underlying storage after allocation. ensure_safe: Examines each mbuf and returns a modified mbuf chain if any mbufs are unsafe. ensure_contig: Ensures that a given region of the described data is contiguous in memory, to permit structures to be "cast over it". free: Frees a single mbuf freem: Frees an mbuf chain dtom_free: Performs a dtom operation and then a free operation on the result dtom_freem: Performs a dtom operation and then a freem operation on the result dtom: Transform a data pointer to the mbuf describing it any_unsafe: scan an mbuf chain for unsafe mbufs this_unsafe: determine whether an mbuf is safe or unsafe. count_bytes: return the number of bytes described by an mbuf chain cat: concatenate two mbuf chains together trim: adjust m _len and m _off values to remove data from an mbuf chain. copy: produce an mbuf chain containing a copy of the data described by an mbuf chain. copy_p: produce an mbuf chain containing a copy of the data described by an mbuf chain. The only difference between this routine and copy is that this routine assumes that the m_type, m_flags and m_pkthdr fields contain important data which should be preserved during the copy. copy_u: produce an unsafe copy of of the data described by an mbuf chain. import: import data from raw memory into an mbuf chain export: export from from an mbuf chain into raw memory So, a device driver might use the allocator as follows: struct mbuf *mp = mbctl.alloc(&mbctl, packlen, NULL); which allocates an mbuf chain of "packlen" bytes. The entire chain might later be freed thus: mbctl.freem(&mbctl, mp); The reason all the direct entry point calls take a (struct mbctl *) value as there first parameter is to permit the mbuf manager to establish a context within which it is operating (ie find its workspace!). Finally, the memory manager supports the DCI4 statistics interface. This can be useful in fine tuning your DCI4 component. 8.3 Unsafe Data The concept of indirect data mbufs was introduced in earlier versions of the DCI to eliminate the need for data to be copied where this is possible; this results in a significant improvement in frame rates. Typically, an mbuf is used to indirect to user data, rather than making a private copy of that data. An important implication of this is that a protocol module cannot rely on the indirect pointers after any system call which uses them has returned to the user: if they need to keep the data after this time, then a private copy must be made of the data. Protocol modules should always know whether an mbuf chain is unsafe or not (since they create the chain in the first place); device drivers are informed via the flags in the Transmit SWI whether or not the passed mbuf chain is safe or not --- if they need to use any data from an unsafe mbuf chain after the SWI has returned, then they must make a copy of that chain. 9 Miscellanea 9.1 Network Card Self-Tests All network cards should support at least one *-command, used to initiate a hardware self-test. This * -command should be of the form nametest, where name is the driver name as supplied in the dib_name field of the driver information block When invoked, the self-test command should, to the best of its ability, ascertain whether the network hardware is still functioning correctly, and print a short success/failure message. As part of this self-test, the drivers should, where possible, perform a live network test (this is because many network faults are due to cabling problems rather than hardware failures, and a live network test may be able to detect these problems). 9.2 Virtual Interfaces When running some form of PC emulator under riscos, it is frequently desirable to run a second protocol stack within the emulator that is independant of a similar protocol stack running on the native OS (the classic example being a TCP/IP stack running with one Internet address under riscos, and a PC based TCP/IP stack running under the emulator with a different Internet address). One hardware-based solution to this problem would be to simply have two Ethernet cards in the same machine, each dedicated to one of the competing protocol stacks; the obvious downside to this solution would be the expense --- any software-based solution that allowed one card to support two interfaces would be much cheaper. The optional software solution supported by DCI 4 is the concept of virtual interfaces. A virtual interface is where a driver creates a second unit for a physical interface, this unit having an Ethernet hardware address that is different from the hardware address for the first, "real" unit for the interface. Exactly how this virtual interface is implemented is highly dependent upon the Ethernet controller chip used by the interface. Some controllers allow more than one hardware address to be specified for the one interface; this obviously makes the implemention of a virtual interface a relatively easy task. For controllers that do not allow more than one hardware address, the device driver will need to put the interface into promiscuous mode, and discard frames with unwanted hardware addresses under software control. For any unit which is a virtual interface, bit 10 of the Inquire flags for that unit should be set; if the virtual interface uses software filtering of Ethernet hardware addresses, then bit 11 of these flags should also be set. Appendix A Mbuf Manager Module Specification Mbuf Manager Module Specification - Version 1.00 Issue 2 - LIVE Copyright (C) 1994 ANT Limited., PO BOX 300, Cambridge, England. All rights reserved. Redistribution and use in source code and executable binary forms are permitted provided that: (1) source distributions retain this entire copyright notice and comment, and (2) distributions including executable binaries contain the following acknowledgement: "This product includes software developed by ANT Limited and its contributors. Copyright (C) ANT Limited 1994." and also in the documentation and other materials provided with the distribution and in all advertising materials mentioning features or use of this software. Neither the name of ANT Limited nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSER. NOT INTENDED FOR USE IN LIFE CRITICAL APPLICATIONS. Contents: 0 History and outstanding 1 Conventions 2 Basics of operation 3 Mbuf manager goals 4 The mbuf structure and its uses 5 Unsafe data 6 Mbuf manager sessions 7 Allocation routines 8 The allocation phase 9 The clearing phase 10 The copying phase 11 Freeing mbufs 12 Support and ensuring routines 13 SWI entry points 14 Service calls 15 The life and death of an mbuf manager 16 Glossary 17 Appendix - DCI4 mbuf manager client interface contract Appendix - Supplementary clarifications 0 History and outstanding: 0.01 Issue 1. 14th September 1994. Borris A draft specification containing most necessary details and only a few contradictions. 0.01 Issue 2. 15th September 1994. Borris An internally coherent document that specifies something that can useful be implemented. A example "mbuf.h" header file illustrates how C clients might be started. This file also contains a set of wrappers to make interfacing existing traditional mbuf source code easier (such as the internet source code). 0.02 Issue 1. 21st September 1994. Borris All safe mbufs are now dtom'able. Better specified direct entry point register conventions. Basically, these entry points are now suitable for APCS routines - a2-a4 trashed on exit and maybe a1. Rewrote the description of allocation to remove abiguities and increase understandability. Added a table summarising owner transfer for the direct entry points. 0.02 Issue 2. ? September 1994. Borris Minor corrections. Stricter definitions for most of the support routines. Fixed duplicate maxcontig field definition in mbctl. 0.99 Issue 1 ? Borris Removed the restriction that the copying phase of an allocation only occurred if there was a data import to be performed (ie ptr != NULL). Rename maxcontig_c to mincontig and maxcontig_m to maxcontig in struct mbctl. mbctl.trim returns the mbuf chain it was given for convenience. m_type field is initialised to the value one for each mbuf allocated. [this has subsequently been taken out again - see below] Clarified m_list field after allocation. 1.00 Issue 1, 14 November 1994, Borris Service_MbufManagerStatus (0xa2) defined and added. MBC_USERMODE flag for getting user mode direct entry points added. Details added in the OpenSession SWI documentation. Added section on life and death of an mbuf manager to detail startup interactions (the mmintro document that provides an introduction of DCI2 to DCI4 mbuf usage has also been expanded similarly). Definitions for MBC_DONTWAIT and MBC_DTOMABLE added, although these flags are not currently used (they probably will be one day, though). Reduce currently outstanding section correspondingly. Extended legal stuff at head of document. m_type from allocation is undefined - device drivers are expected to initialise it before supplying received data to protocols though. Field means nothing when data moves from protocol to device driver. m_copy variants called m_copy_p (preserve m_type information) and m_copy_u (make an unsafe copy) have been added and changes to reflect this made to header files and structure definitions. Added MT_HEADER and MT_DATA related comments to the DCI4 protocol/device driver interface contract section. Corrections to dci4 protocol/client contract summary table. 1.00 Issue 2, November 1994, Borris Removed comment about ensure_contig being able to do its job at the tail end of an mbuf chain as well as the head. Added comment about unsafe mbufs not being suitable for ensure_contig to operate on. 1.01 Issue 3, April 1997, KBracey Added information about m_pkthdr and m_flags fields. Currently outstanding: C is acceptable as an illustrative language but is not ideally suited to a definition language. Language neutral versions of all structures, etc, need producing. Not all macros contained within the reference mbuf.h file are documented although their behaviour is readily determinable from this specification. More should probably be said about unsafe data shadowing safe data. The dci4 client interface contract appendix needs tightening up. 1 Conventions: 'quoted strings' should be envisaged in an italic font. "quoted strings" indicate phrases with specific technical interpretation, typically only when first introduced. [bracketed text] is an aside to reviewers. Comments on such text is encouraged. 77 hyphen (-) characters denote table and figure delimitation. 2 Basics of operation: A "client" is a program (typically a module and in practise probably constrained to be a module) that uses the facilities of the "mbuf manager". Clients establish a session with the mbuf manager on initialisation, use memory management facilities from the mbuf manager during their normal operation, and then close the session with the mbuf manager just prior to their shutdown. Memory is manipulated through the use of a descriptor structure, called an mbuf. An mbuf describes a number of contiguous bytes in memory. The mbuf manager imposes some restrictions, requirements and conventions upon the use of mbufs. A group of client typically implement a further interface contract of their own - the DCI4 client interface contract is such an interface (see Appendix). 3 Mbuf manager goals: * Hide any mechanics not necessary for client operation * Provide single mbuf pool, rather than one per protocol * Provide facilties suitable for device drivers and protocol modules * Provide a balanced compromise between conflicting design goals * Provide negligable long term fragmentation * Provide efficient implementations of facilities offered * Permit pre-allocation of packet storage space (DCI2 doesn't) * Permit modular component upgrade path (DCI2 doesn't) 4 The mbuf structure and its uses: Mbufs are used to provide a descriptive structure layer that describe and dictate access to a conceptual block of memory. In conventional memory management, the user manipulates a pointer to a block of memory. With mbuf memory management, the user manipulates a pointer to a descriptive structure (an "mbuf"), which itself provides the means to obtain a pointer to the block of memory that it "describes". Further, these structures are chained together to form a linked list, providing a form of scatter/gather memory description. The chain of mbufs describes a single conceptual block of memory, even though the actual memory used might well be scattered throughout real memory. Thus, an extra layer of structuring is inserted between the user and the block of memory being manipulated when mbuf memory management is used. By accessing a block of memory through an mbuf structure, it is possible to record size, type and some degree of linkage information. By manipulating the fields within an mbuf, it is possible to efficiently add and remove data from a described" block of memory in a variety of fashions. Mbufs are allocated and freed by the mbuf manager. Programs that requestallocation and freeing operations are called clients. A client never has a "struct mbuf" itself, only pointers that at some stage came from the mbuf manager. (This allows the size of an mbuf to grow later; in particular mbuf managers may differ in the amount of private data stored with the mbuf.) The C definition of an mbuf structure: struct ifnet; struct pkthdr { int len; /* total packet length */ struct ifnet *rcvif; /* receiving interface */ }; typedef struct mbuf { struct mbuf *m_next; /* next mbuf in chain */ struct mbuf *m_list; /* next mbuf in list (clients only) */ ptrdiff_t m_off; /* current offset to data from mbuf itself */ size_t m_len; /* current byte count */ const ptrdiff_t m_inioff; /* original offset to data from mbuf itself */ const size_t m_inilen; /* original byte count (for underlying data) */ unsigned char m_type; /* client use only */ const unsigned char m_sys1; /* mbuf manager use only */ const unsigned char m_sys2; /* mbuf manager use only */ unsigned char m_flags; /* client use only */ struct pkthdr m_pkthdr; /* client use only */ } dci4_mbuf; struct mbuf *m_next; Although mbufs may be manipulated individually, they are almost always used as an mbuf chain. The 'm_next' field of the mbuf structure points at the next mbuf in an mbuf chain. There is no back pointer. Successive mbufs describe conceptually "later" bytes of memory, even if the underlying blocks of actual memory used to hold these bytes are not stored consecutively or "later" in memory. The end of an mbuf chain is indicated by the 'm_next' field containing the NULL pointer. A client may allocate an mbuf chain for internal use or for communicating data to another (mbuf manager) client. An mbuf chain may exist only for a small fraction of a second or it may be used and retained for weeks. It is the task of the mbuf manager to ensure that such usage does not cause anything other than negligible memory fragmentation. ptrdiff_t m_off; The allocation of an mbuf is always accompanied by the allocation of an additional block of memory (see the discussion on unsafe data later for the exceptions to this). It is this additional block of memory that the mbuf is said to "describe". Access to this described memory is through manipulation of the address of the mbuf itself and fields contained within the mbuf. The location of this memory, relative to the mbuf itself, is not defined, other than the 'm_off' field contains a suitable bias to access this memory. size_t m_len; The 'm_off' field is the bias to add to the address of the mbuf to obtain the address of the first byte of data described by that mbuf. The 'm_len' field specifies the number of bytes contained in the described data. This might be envisaged as follows: Relationships between the address of an mbuf, and the 'm_off', 'm_next' and 'm_len' fields. Mbufs Described data mbuf pointer ------> ========= --------+-------> ========= | | / \ | | | m_off | ------/ \ | | | | \ | | | m_len | -----------+ | | | | \ | | | m_next| --\ \ | | | | \ \ | | ========= \ --> ========= / /---------------------/ | \------> ========= --------+-------> ========= | | / \ | | | m_off | ------/ \ | | | | \ | | | m_len | -----------+ | | | | \ | | | m_next| -->NULL \ | | | | \ | | ========= --> ========= Notes: Only three of the fields of an mbuf are indicated. This is for clarity purposes. The example illustrates an mbuf chain formed from two mbufs. Example C code to flatten an mbuf chain into the single sequence of bytes that it conceptually describes: void flatten_mbuf_chain(struct mbuf *mp, char *buffer) { for ( ; mp != NULL; mp = mp->m_next) { memcpy(buffer, mtod(mp, char *), mp->m_len); buffer += mp->m_len; } } Notes: As with most example programs, no thought has been given to the handling of exceptional circumstances. This algorithm applies equally to safe and unsafe mbuf chains. 'mtod' is a macro, and adds the address of the first byte of the mbuf to the value of the 'm_off' field of the mbuf, yielding the address of the first byte of data described by the mbuf. It also performs a type cast. 'mtod' is supplied as a C macro for convenience. Only byte alignment is required for the data described by the 'm_off' and 'm_len' fields. All clients must fully cope with non-word aligned addresses and lengths when manipulating the data described by an mbuf, although optimisations for aligned data are encouraged, as is the generation of aligned data. An mbuf structure itself is always at least word aligned in memory. When the mbuf manager performs an allocation for a client and returns an mbuf chain to a client, that client is deemed to have taken "ownership" of that mbuf chain. The precise implications of ownership form part of the interface contract between clients of the mbuf manager. For example, the DCI4 specification specifies an interface contract between compliant modules using mbufs. One of the prime responsibilities of ownership is to free the mbuf chain at some stage (or transfer ownership). Whenever the mbuf manager returns an mbuf chain, it transfer ownership of this chain at the same time. Whenever the mbuf manager receives an mbuf chain it also takes ownership of that chain. A summary of ownership transfer for direct entry points is given later. Exceptions to these rules are details individually throughout the text. In order to minimise long term fragmentation (and for various other implementation reasons), the sizes of the underlying memory blocks that may be allocated for an mbuf to describe are constrained to a number of sizes. This permits the situation where an mbuf describes only some of the underlying memory actually allocated. const ptrdiff_t m_inioff; const size_t m_inilen; The 'm_inioff' and 'm_inilen' fields provide a description of the underlying block of memory in the same way as 'm_off' and 'm_len' fields (respectively) provide a description of the described block of memory (which resides somewhere within the underling block, although not necessarily always at the start or end of it). Any values of 'm_len' and 'm_off' are permitted provided that the described block of memory is fully contained within the underlying block described by 'm_inioff' and 'm_inilen' (but see the field validity table below). struct mbuf *m_list; unsigned char m_type; The 'm_list' and 'm_type' fields are provided for the convenience of the client. The contents of these fields when an mbuf is passed from one client to another is part of the interface contract between clients. When a client owns an mbuf chain, it can set these fields to whatever values it requires. If these fields are used, a client must explicitly initialise them. The mbuf manager never examines these fields. const unsigned char m_sys1; const unsigned char m_sys2; The 'm_sys1' and 'm_sys2' fields are private fields maintained by the mbuf manager. They should never be read or written by a client under any circumstances, even transiently. The mbuf manager is entitled to asynchronously examine these three fields if it requires. unsigned char m_flags; struct pkthdr m_pkthdr; The 'm_flags' field, and the 'm_pkthdr' field in version 0.15 or later of the mbuf manager, are provided for the convenience of the client . The contents of these fields when an mbuf is passed from one client to another is part of the interface contract between clients. When a client owns an mbuf chain, it can set these fields to whatever values it requires. If these fields are used, a client must explicitly initialise them. The mbuf manager never examines these fields. When an mbuf chain is freed, the storage required for the mbuf(s) comprising the chain and the underlying memory associated with each mbuf is placed backinto the free pool and becomes available for subsequent re-allocation. It is not possible to free only one of the mbuf and the underlying storage associated with that mbuf. This ensures that, as long as an mbuf chain is allocated, then the data it describes is also be correctly allocated. 5 Unsafe data: "Unsafe data" is a concept that breaks some of the rules just outlined. In particular, the memory described by an unsafe mbuf is not underlying memory allocated by the mbuf manager in the fashion just described. When an unsafe mbuf is allocated, the mbuf manager does not allocate associated underlying storage. Rather, the mbuf is available for the client to set the 'm_off' and 'm_len' fields such that a portion of memory beyond the control of the mbuf manager is described by the mbuf. The only requirement of the mbuf manager on the 'm_off' and 'm_len' fields for an unsafe mbuf is that the data they describe must be valid whenever an mbuf manager operation implicitly or explicitly accesses them. In practise, the only time when these fields may hold random values is when the mbuf (chain) is being freed or transiently during update within a client. The mbuf manager can tell whether an mbuf describes safe or unsafe data from examination of the mbuf. When an unsafe mbuf is freed, there is no freeing action performed on the associated data. It is because there is no directly enforceable relationship (by the mbuf manager) between the lifetime of an unsafe mbuf and the data it describes that the data is termed "unsafe". "Unsafe data" does not imply incorrect behaviour. The phrase is used as is a reminder of the additional constraints on the described data, and serves to encourage the programmer to take the necessary extra precautions. Unsafe data is often used when it is not necessary to copy the data into a safe mbuf chain, eliminating a copy operation and increasing performance. Aclient may arrange its internal strategies to permit the use of unsafe data as an optimisation. In order for the concept of unsafe data to be useful, some statement about the lifetime and validity of the described data must be possible. The client interface contract normally adds further detail to the requirements for unsafe mbufs. Whenever an unsafe mbuf is supplied to the mbuf manager in a context that the mbuf manager may examine the data described, then the client pledges that this data will remain valid until that call into the mbuf manager returns. In practise there are two useful ways in which unsafe mbuf chains may be manipulated: 1) All use of the data is completed before the recipient (client or mbuf manager) returns execution control to the supplying client. It is still theresponsibility of the recipient client to free the mbuf chain. 2) The recipient client copies the unsafe data described into a safe mbuf chain before control returns back to the supplying client. Responsibilityfor freeing the unsafe mbuf chain still lies with the recipient client. Ideally, all recipient clients would be capable of processing all mbuf chains they receive prior to returning control. Such clients could always use unsafe data in an efficient manner. As a worst case fallback, whenever a client is supplied with an mbuf chain it always performs an 'ensure_safe' operation to ensure that the data is safe; this always entails data copying for unsafe mbuf chains. In practise, "ensuring" (see the description of "ensuring" later) potentially unsafe mbufs chain to safe mbuf chains only when necessary is a reasonable compromise. It is possible (indeed permitted) to allocate a safe mbuf and then generate a second reference to the same data with an unsafe mbuf. Data described in an mbuf chain may thus be "referenced" in almost the same way any other data may be used in an unsafe mbuf. The difference is that an unsafe mbuf will be required for each describing mbuf in the chain, rather than a single unsafe mbuf to describe a single conceptual region of memory. Additional constraints are also imposed to ensure that the original mbuf chain is not freed before the unsafe mbuf chain. Freeing the safe mbuf chain after the return of the call where the unsafe mbuf chain is used will achieve correct operation. In some traditional mbuf implementations, use is made of native memory management facilities to provide the ability to remap memory into "mbuf visible" regions, thus avoiding memory copying. This facility is not available in this specification. To some degree, unsafe data lessens this lack. The 'm_inioff' and 'm_inilen' fields of an unsafe mbuf are initialised according to the allocation method used. See later for details. Summary of mbuf field validity: client <=> mbuf manager Field From mbuf To mbuf ===== ========= ======= m_next valid valid m_list NULL invalid m_off valid invalid m_len valid invalid m_inioff valid valid* m_inilen valid valid* m_type invalid invalid m_sys1 opaque opaque m_sys2 opaque opaque m_sys3 opaque opaque m_pkthdr invalid invalid Notes: Field: the name of a field within an mbuf structure. From mbuf: an mbuf chain being passed from the mbuf manager to a client. To mbuf: an mbuf chain being passed from a client to the mbuf manager. valid: the field meets the criteria stated within this document. valid*: unsafe mbufs need not describe real memory in the underlying storage. There is never any freeing of the underlying storage of an unsafe mbuf. NULL: the NULL pointer. invalid: any value may be present. No manipulation of such a value should ever be made. Either the field should be ignored or it should be initialised prior to use. opaque: never read and never written by any client. The 'm_inioff' and 'm_inilen' fields of a safe mbuf are never altered by a client - only read. The 'm_inioff' and 'm_inilen' fields of an unsafe mbuf may be altered by the client. 6 Mbuf manager sessions: The period of time when a client may allocate (and otherwise use) mbufs from the mbuf manager is termed a "session". A session is initiated (opened) and terminated (closed) with SWI calls. A client must allocate and maintain an mbuf manager control structure (an "mbctl" structure) for a duration encompassing a session (typically, the client has a static structure within it's data area for this purpose). A pointer to this structure is supplied to the mbuf manager during both initialisation and termination calls and all direct entry points. This structure contains, amongst other things, a set of function pointers. These function pointers provide direct entry points into individual routines within the mbuf manager. They are initialised by the mbuf manager during session initialisation and remain valid until session termination. All of the performance critical routines of the mbuf manager (such as mbuf allocation and freeing) are accessed through these direct entry points, which incur considerably less overhead than SWI routines. These entry points are designed to permit the easy inter-operation of assembler and APCS code (such as that generated by the NorCroft C compiler), and roughly obey APCS. A list of entry/exit characteristics follows (using APCS register naming convention): - a1 always points at an mbctl structure for all direct entry calls - the processor must be in supervisor mode (but see MBC_USERMODE) - a1-a4 are the only parameter registers - a2-a4 and ip are corrupted by the call - a1 is either the call result or corrupted - other registers preserved by call - the processor flags are preserved by the call - no V set error convention (incompatible with APCS) - in general, an error results in a1=0 on exit - IRQ state preserved across call - IRQs may be disable during calls - IRQs may be enabled during calls ONLY if specifically documented - FIQs assumed enabled on entry - FIQs preserved across calls Currently, no direct entry point routine will enable interrupts if they are disabled on entry. C definition of an mbuf manager control structure: typedef struct mbctl {reserved for mbuf manager use in establishing context */ int opaque; /* mbuf manager use only */ /* Client initialises before session is established */ size_t mbcsize; /* size of mbctl structure from client */ unsigned int mbcvers; /* client version of mbuf manager spec */ unsigned long flags; /* */ size_t advminubs; /* Advisory desired minimum underlying block size */ size_t advmaxubs; /* Advisory desired maximum underlying block size */ size_t mincontig; /* client required min ensure_contig value */ unsigned long spare1; /* Must be set to zero on initialisation */ /* Mbuf manager initialises during session establishment */ size_t minubs; /* Minimum underlying block size */ size_t maxubs; /* Maximum underlying block size */ size_t maxcontig; /* Maximum contiguify block size */ unsigned long spare2; /* Reserved for future use */ /* Allocation routines */ struct mbuf * /* MBC_DEFAULT */ (* alloc) (struct mbctl *, size_t bytes, void *ptr); struct mbuf * /* Parameter driven */ (* alloc_g) (struct mbctl *, size_t bytes, void *ptr, unsigned long flags); struct mbuf * /* MBC_UNSAFE */ (* alloc_u) (struct mbctl *, size_t bytes, void *ptr); struct mbuf * /* MBC_SINGLE */ (* alloc_s) (struct mbctl *, size_t bytes, void *ptr); struct mbuf * /* MBC_CLEAR */ (* alloc_c) (struct mbctl *, size_t bytes, void *ptr); /* Ensuring routines */ struct mbuf * (* ensure_safe) (struct mbctl *, struct mbuf *mp); struct mbuf * (* ensure_contig) (struct mbctl *, struct mbuf *mp, size_t bytes); /* Freeing routines */ void (* free) (struct mbctl *, struct mbuf *mp); void (* freem) (struct mbctl *, struct mbuf *mp); void (* dtom_free) (struct mbctl *, struct mbuf *mp); void (* dtom_freem) (struct mbctl *, struct mbuf *mp); /* Support routines */ struct mbuf * /* No ownership transfer though */ (* dtom) (struct mbctl *, void *ptr); int /* Client retains mp ownership */ (* any_unsafe) (struct mbctl *, struct mbuf *mp); int /* Client retains mp ownership */ (* this_unsafe) (struct mbctl *, struct mbuf *mp); size_t /* Client retains mp ownership */ (* count_bytes) (struct mbctl *, struct mbuf *mp); struct mbuf * /* Client retains old, new ownership */ (* cat) (struct mbctl *, struct mbuf *old, struct mbuf *new); struct mbuf * /* Client retains mp ownership */ (* trim) (struct mbctl *, struct mbuf *mp, int bytes, void *ptr); struct mbuf * /* Client retains mp ownership */ (* copy) (struct mbctl *, struct mbuf *mp, size_t off, size_t len); struct mbuf * /* Client retains mp ownership */ (* copy_p) (struct mbctl *, struct mbuf *mp, size_t off, size_t len); struct mbuf * /* Client retains mp ownership */ (* copy_u) (struct mbctl *, struct mbuf *mp, size_t off, size_t len); struct mbuf * /* Client retains mp ownership */ (* import) (struct mbctl *, struct mbuf *mp, size_t bytes, void *ptr); struct mbuf * /* Client retains mp ownership */ (* export) (struct mbctl *, struct mbuf *mp, size_t bytes, void *ptr); } dci4_mbctl; Prior to establishing a session, the client initialises the following fields of the mbctl structure: size_t mbcsize; unsigned int mbcvers; unsigned long flags; size_t advminubs; size_t advmaxubs; size_t mincontig; unsigned long spare1; The values a client initialises these fields to are defined as follows: mbcsize: The size of the mbctl structure. This is the size of the structure understood by the compiler/assembler at compilation time. Future versions of this specification may add other fields. In C, one might use "sizeof(struct mbctl)". mbcvers: The version of the mbuf manager specification that the client is implemented against. This is the major version times one hundred plus the minor version. Minor version number changes indicate bug fixes and the possible introduction of small and upwardly compatible changes. Major revision number changes indicate major and possibly not entirely backwardly compatible changes. flags: This bitset supplies various pieces of information to the mbuf manager. See the description of the Mbuf_OpenSession SWI later on for details of suitable values to enter in this field. advminubs: Advisory minimum underlying block size. This value advises the mbuf manager of the smallest underlying block size that the client thinks appropriate for its requirements. Traditional mbuf clients might well use the orignal value of MLEN (112 in most cases) for this value. If no particular value seems appropriate, a client should set this field to zero. advmaxubs: Advisory maximum underlying block size. This value advises the mbuf manager of the largest underlying block size that the client thinks appropriate for its requirements. An ethernet device driver client might well use the ethernet MTU (maximum transmission unit) value of 1500. If no particular value seems appropriate, a client should set this field to zero. mincontig: This specifies the maximum size the client will ever specify to the "contiguify" routine. If the mbuf manager can never meet the value specified, it will refuse to open the session. If no particular value seems appropriate, a client should set this field to zero. spare1: This field must be initialised to zero. The contents of all other fields of the mbctl structure are irrelevant at the start of session initiation. The next stage of session initiation is the issuing of an Mbuf_OpenSession SWI call, supplying the address of this mbctl structure to the mbuf manager as a parameter. If the session requested can be support by the mbuf manager then it will initialise all other fields of the mbctl structure before returning. If the session cannot be supported, then no fields of the mbctl structure will be modified by the mbuf manager and an error will be returned. If no error is returned, the session is established and the client may use the direct entry points now available. A session is terminated with the Mbuf_CloseSession SWI call, supplying it the address of the same mbctl structure used to establish the connection. int opaque; The 'opaque' field is for the use of the mbuf manager. It is initialised during session establishment. It must never be read or written by a client during a session. All the direct entry points take a fixed first parameter of the address ofthe mbctl structure used to establish the session. This permits the mbuf manager to establish any necessary context. 7 Allocation routines: struct mbuf * /* Parameter driven */ (* alloc_g) (struct mbctl *, size_t bytes, void *ptr, unsigned long flags); There is one general purpose allocation routine (alloc_g), and a number of more specialised allocation routines. All of these are accessed through the direct entry addresses contained in the initialised mbctl structure. The particular values the mbuf manager supplies for the addresses of the direct entry point routines are chosen to be as optimal as possible for the clients indicated requirements. The functionality of these specific routines may be accessed through the general purpose routine; they are provided solely for performance reasons. A successful allocation returns a chain of mbufs that satisfy all the criteria of the allocation. An unsuccessful allocation returns the NULL pointer. A NULL pointer indicates either a lack of some resource (typically mbufs or underlying storage) or a set of criteria that cannot be satisifed. If, for whatever reason, an allocation is constrained to a single mbuf, then the 'm_next' field of that mbuf will always be zero. In other words, whatever the entry flags may suggest, effectively, an mbuf chain is always returned. The 'flags' bitset provides a list of constraints and deviations that are to be applied to an allocation. The default allocation has all bits of the 'flags' bitset clear. In particular, the default allocation is for safe data. The defined bits are as follows: MBC_DEFAULT 0x00000000ul Bit 0 MBC_UNSAFE 0x00000001ul Bit 1 MBC_SINGLE 0x00000002ul Bit 2 MBC_CLEAR 0x00000004ul All other bits are undefined and must be zero. Allocation consists of up to three internal phases: - allocation - clearing - copying Roughly speaking; the allocation phase always happens, the clearing phase happens when the MBC_CLEAR bit is set, and the copying phase happens when 'ptr' is not the NULL pointer. If the allocation phase fails the clearing and copying phases are always skipped and the NULL pointer returned. The clearing and copying phases are not capable of failing (merely not happening). 8 The allocation phase: Table of different allocation options MBC_UNSAFE bytes MBC_SINGLE type 0 0 ? 1 0 \= 0 0 2 0 \= 0 1 3 1 0 ? 4 1 \= 0 ? 5 Notes: Column headings: MBC_UNSAFE: the value of the MBC_UNSAFE bit bytes: the value of the 'bytes' parameter MBC_SINGLE: the value of the MBC_SINGLE bit type: reference to detailed description Column contents: 0: equal to zero, or flag clear 1: flag set \=0: not equal to zero ?: any value (0 or 1 for bits) 1) MBC_UNSAFE = 0, bytes = 0, MBC_SINGLE = ? The first available mbuf is chosen (so the setting of MBC_SINGLE is irrelevant). The actual size of the described data returned is unknown in advance, other than it is equal to or larger than the minimum underlying block size of the mbuf manager (the 'minubs' field of the mbctl structure). 'm_len' and 'm_off' are set to reflect the underlying block (ie 'm_len' = 'm_inilen' and 'm_off' = 'm_inioff', respectively). Should a clearing or copying phase occur, then the value used for 'bytes' will be the value of 'm_len' in the newly allocated mbuf. Such copying might be the start of a variable sized mbuf chain building algorithm. m_next: NULL - always only one mbuf m_list: NULL m_off: describes underlying block m_len: size of underlying block m_inioff: describes underlying block m_inilen: size of underlying block 2) MBC_UNSAFE = 0, bytes \= 0, MBC_SINGLE = 0 A chain of an arbitary number of mbufs is allocated, with a total described data size of 'bytes' bytes. m_next: chain of mbufs returned m_list: NULL m_off: describes allocated memory m_len: summed over the chain, gives 'bytes' m_inioff: describes underlying block m_inilen: size of underlying block 3) MBC_UNSAFE = 0, bytes \= 0, MBC_SINGLE = 1 Precisely one mbuf is allocated to describe the required number of bytes. It is possible for such allocations to fail due to not being able to locate an mbuf and underlying block with sufficient size. m_next: NULL - always only one mbuf m_list: NULL m_off: describes allocated memory m_len: 'bytes' m_inioff: describes underlying block m_inilen: size of underlying block 4) MBC_UNSAFE = 1, bytes = 0, MBC_SINGLE = ? A single unsafe mbuf is allocated and set to describe no data. The value of 'ptr' is irrelevant. The MBC_CLEAR flag will be forced clear. 'm_off' and 'm_inioff' will describe the same value as the NULL pointer, and 'm_len' and 'm_inilen' will be zero. This is the only circumstance in which an mbuf (anywhere in an allocated chain) is allocated with zero in the 'm_len' field and returned directly from an allocation routine. (Note the anomoly for the 'copy' routine when asked to duplicate zero bytes.) The data described by an unsafe mbuf is not suitable for 'dtom', unless the data described is a "shadow" or "reference" to some previously allocated safe data, in which case 'dtom' will return the mbuf pointer for the original, safe, mbuf. m_next: NULL - always only one mbuf m_list: NULL m_off: describes the NULL pointer m_len: zero m_inioff: describes the NULL pointer m_inilen: zero 5) MBC_UNSAFE = 1, bytes \= 0, MBC_SINGLE = ? A single unsafe mbuf is allocated. The 'm_len' field is set to the value of 'bytes'. 'm_off' is set to describe 'ptr', whatever the value of 'ptr'. This means suppling 'ptr' as the NULL pointer will cause the unsafe mbuf returned to have a non-zero byte count but for the data described to occur from address zero onwards. This is only useful when 'm_off' is later initialised to describe real memory. The data described by an unsafe mbuf is not suitable for 'dtom', unless the data described is a "shadow" or "reference" to some previously allocated safe data, in which case 'dtom' will return the mbuf pointer for the original, safe, mbuf. m_next: NULL - always only one mbuf m_list: NULL m_off: describes 'ptr' m_len: 'bytes' m_inioff: describes 'ptr' m_inilen: 'bytes' 9 The clearing phase: The clearing phase will set to zero all the underlying bytes in the allocated mbuf chain. The number of bytes zeroed is directly independent of the 'bytes' and 'ptr' values ('bytes' as zero indirectly dictates the number of bytes allocated). The clearing phase only occurs if the following conditions are all met: 1. The allocation phase succeeded 2. The MBC_CLEAR bit is set * 3. The MBC_UNSAFE bit is clear *: The MBC_CLEAR bit can be cleared during the allocation phase. This clearing overrides any value the bit may have had on entry to the allocation routine and prevents the clearing phase from occurring. 10 The copying phase: The copying phase copies data from 'ptr' into the described data. The number of bytes copied is the number of bytes described by the mbuf chain. This is the value of 'bytes' supplied to the allocation routine if 'bytes' was non-zero, and the underlying block size if 'bytes' was zero. The copying phase may be viewed as importing data into an mbuf chain from "raw" memory. A client cannot determine if copied bytes were cleared during the clearing phase or not. The copying phase only occurs if the following conditions are all met: 1. The allocation phase succeeded 2. The MBC_UNSAFE bit is clear 3. 'ptr' is not the NULL pointer Summary of allocator routines and implicit or explicit 'flags' settings Allocator Control over flags alloc MBC_DEFAULT (0) alloc_g parameter to the call alloc_s MBC_SINGLE alloc_u MBC_UNSAFE alloc_c MBC_CLEAR 11 Freeing mbufs: void (* free) (struct mbctl *, struct mbuf *mp); void (* freem) (struct mbctl *, struct mbuf *mp); void (* dtom_free) (struct mbctl *, struct mbuf *mp); void (* dtom_freem) (struct mbctl *, struct mbuf *mp); Once an mbuf chain or an individual mbuf is finished with, it is freed and its resources become available for re-allocation. Variants of the free call are available that free either just a single mbuf (without examining the 'm_next' field of the mbuf supplied) or the entire chain it describes. The routine that frees a single mbuf is called 'free' and the routine that potentially frees multiple mbufs is called 'freem'. Additionally, routines that perform the equivalent of a 'dtom' call followed by a freeing call are provided. No action is performed by a free call if supplied the NULL pointer. Summary of mbuf and mbuf chain freeing routines: free Free single mbuf (ignores 'm_next' field) freem Free entire mbuf chain (uses 'm_next' field) dtom_free Performs 'dtom' action then behaves the same as 'free' dtom_freem Performs 'dtom' action then behaves the same as 'freem' 12 Support and ensuring routines: struct mbuf * (* dtom) (struct mbctl *, void *ptr); Under the right circumstances, it is possible to perform a transformation from the address of any byte described by an mbuf to the address of the mbuf describing that byte. This transformation is performed with the 'dtom' routine. The presence of a 'm_next' field, and the lack of a hypothetical 'm_prev' field means that 'dtom' provides access to a portion of the conceptually described data, starting with the first byte described by the mbuf that describes the supplied address, and extending to the end of the mbuf chain. The client cannot directly determine if the mbuf returned by 'dtom' is the first mbuf in a chain or an mbuf part-way along a chain. The required circumstances for 'dtom' to operate correctly are that the described data is safe. Applying 'dtom' to an unsuitable address will return the NULL pointer. The NULL pointer is always an unsuitable address for 'dtom'. The 'dtom' and 'mtod' transformations are not entirely symmetrical. 'dtom' will always return the address of the mbuf owning the underlying storage referenced, indepedent of the number of unsafe mbufs also referencing that storage. The transformation performed by 'dtom' is necessarily based on the address supplied; this includes whether the address is within the region(s) of memory controlled by the mbuf manager or not. For this reason, if a safe mbuf chain has been "shadowed" with an unsafe mbuf chain, then 'dtom' will always return the original safe mbuf. Further, freeing the chain returned by 'dtom' will free the original, safe mbuf chain, leaving the unsafe mbuf chain describing (through reference) now unknown data (this certainly warrants the description "unsafe data" and is one of the reasons for the 'ensure_safe' routine, although in this particular case it would be far too late to make the call to 'ensure_safe'). It is the clients responsibility to ensure that such problems do not occur (typically through appropriate interface contracts between clients). struct mbuf * (* ensure_contig) (struct mbctl *, struct mbuf *mp, size_t bytes); For a protocol client, the removal of protocol layers (headers or trailers) when a packet passes up a protocol stack is often made easier if all of the bytes constituting a particular header level are contiguous. (This permits the protocol to conceptually overlay the received packet with a structure describing the protocol header.) The 'ensure_contig' routine is used to ensure such "contiguousness" requirements are met by an mbuf chain, and "contiguifies" the specified number of bytes at the head of the mbuf chain supplied. Described data that is contiguified is also always at least word aligned for the first byte. This helps with the overlaying of wordorientated structures. struct mbuf * (* ensure_safe) (struct mbctl *, struct mbuf *mp); If safe data is required and the safeness of an mbuf chain is uncertain, then the 'ensure_safe' routine may be used to ensure that data in the returned mbuf chain is safe. In both cases (that is, 'ensure_contig' and 'ensure_safe'), "ensure" is used as a technical term meaning: IF the data (mbuf chain) meets the required condition THEN return the data unmodified ELSE return some data that does meet the required condition FI The process of generating data that does meet the required condition involves allocating one or more mbufs with appropriate constraints (equivalent to MBC_SINGLE, and MBC_DEFAULT allocation constraints) and replacing existing mbufs in the chain with these new mbufs. Any existing mbuf that is replaced is automatically freed. Any of the ensure routines may fail if they have to perform allocations and there is a lack of resource. If this happens, then the entire mbuf chain supplied is freed and the NULL pointer is returned. If the ensure operation does not fail, then the returned mbuf chain will meet the desired criteria. Whether an ensure operation fails or succeeds, the client must use the returned mbuf chain. There is an ownership transfer to the mbuf manager whilst the ensure operation is performed and then another ownership transfer back to the client of the resulting mbuf chain that meets the criteria - these two mbuf chains may happen to be the same, but the loss and regain of ownership means a client cannot tell. If the mbuf chain meets the indicated criteria on entry, then the ensure routine cannot fail and will always return the supplied pointer without modification. Under some circumstances the 'ensure_contig' routine may be able to avoid an allocation by moving data around within the existing mbuf chain (this requires some underlying bytes not described by the mbuf chain itself). If these circumstances apply, they cannot generate a failure condition themselves. Thus, if only shuffling is required, then 'ensure_contig' cannot fail, but if shuffling and allocation are required, then 'ensure_contig' can fail through lack of resources for the allocation. Note that an mbuf chain may contain a mixture of mbufs, each with its own characteristics. For example, an individual mbuf may contain safe or unsafe data, it may meet some contiguity requirement or not and there may or may not be underlying bytes described. An mbuf chain may be composed of any mixture of such mbufs. The ensure routines arrange that all necessary mbufs within a chain meet the desired criteria. 'ensure_contig' cannot operate on unsafe mbufs and will fail (ie freethe supplied chain and return NULL) if asked to do so. 'ensure_safe' returns an mbuf chain where every byte described by it is known to be safe. 'ensure_contig' returns an mbuf chain where the first 'N' bytes are known to be contiguous. int (* any_unsafe) (struct mbctl *, struct mbuf *mp); int (* this_unsafe) (struct mbctl *, struct mbuf *mp); A client may determine if an mbuf chain has any unsafe data in it with the 'any_unsafe' routine. This returns 0 for either no unsafe data (ie all data safe) or if supplied the NULL pointer. It returns 1 if the mbuf chain supplied contains unsafe data. The 'this_unsafe' returns the same values but only examines the mbuf supplied - that is, it does not follow the 'm_next' field. size_t (* count_bytes) (struct mbctl *, struct mbuf *mp); The number of bytes described by an mbuf chain may be quickly determined with the 'count_bytes' routine. If supplied the NULL pointer, then 0 is returned. struct mbuf * (* cat) (struct mbctl *, struct mbuf *old, struct mbuf *new); One mbuf chain may be appended to the end of another mbuf chain with the 'cat' routine. The mbuf chain that gets appended to is the first mbuf parameter ('old'). The mbuf chain to be appended is the second mbuf parameter ('new'). Note that there is no ownership transfer of the second mbuf parameter. If 'old' is the NULL pointer, the 'new' is returned without examination. If 'old' is not the NULL pointer and 'new' is the NULL pointer then 'old' is returned without any modifications made to it. struct mbuf * (* trim) (struct mbctl *, struct mbuf *mp, int bytes, void *ptr); The 'trim' routine is used to remove bytes from either the head or the tail of an mbuf chain. It may optionally copy the bytes described to another piece of memory (performing a "flattening" operation in the process). All trimming is performed by adjusting 'm_off' and 'm_len'. No mbufs are removed (unlinked and freed) from the either the head or the tail of the chain. If 'bytes' is greater than zero, then it specifies the number of bytes to remove from the head of the mbuf chain. If 'bytes' is zero then no alterations are performed and no data is copied. If 'bytes' is less than zero, then the absolute value of 'bytes' is the number of bytes to remove from the tail of the mbuf chain. If the NULL pointer is supplied for the mbuf chain, no operations are performed. If 'ptr' is not the NULL pointer, then any bytes "trimmed" from the mbuf chain will be copied into the supplied area of memory. If 'ptr' is the NULL pointer, then no data copying is performed and only a trimming operation occurs. The magic value M_COPYALL may be used for the 'bytes' parameter to indicate the entire mbuf chain. This is only useful if a copy is also being performed, although it will always correctly set the mbuf chain to describe zero bytes. If the number of bytes to trim (after tail adjustment if applicable) is greater than the number of bytes described by the mbuf chain, then behaviour is as if M_COPYALL was supplied for a trim byte count. struct mbuf * (* copy) (struct mbctl *, struct mbuf *mp, size_t off, size_t len); struct mbuf * (* copy_p) (struct mbctl *, struct mbuf *mp, size_t off, size_t len); struct mbuf * (* copy_u) (struct mbctl *, struct mbuf *mp, size_t off, size_t len); The 'copy' routine is used to duplicate a portion of an mbuf chain. An mbuf chain is always allocated, even if it eventually describes zero bytes. This distinguishes successful allocations that required no data from unsuccessful allocations and makes the behaviour of the 'len' parameter more orthogonal. The returned mbuf chain will have a minimum byte count of zero and a maximum byte count equal to the byte count of the supplied mbuf chain. The portion of the mbuf chain copied is the intersecting region between the described data of the supplied mbuf chain and the region starting 'off' bytes into the supplied chain and continuing for 'len' bytes. The first byte described by an mbuf chain is byte 0. The 'alloc' allocator routine is used. This means the returned mbuf chain will be safe, may have any number of mbufs, is directly suitable for the 'dtom' routine and any unused underlying storage may hold random values. If a lack of resources occur, the NULL pointer is returned. If 'mp' is the NULL pointer, then the NULL pointer is returned. If 'len' holds the magic value M_COPYALL, then all remaining bytes in the mbuf chain from 'off' onwards are copied. M_COPYALL has the value 0x7f000000, in hexadecimal in C notation. The mbuf chain supplied will not be altered. The 'copy_p' routine behaves as 'copy' does, except that the m_type, m_flags and m_pkthdr fields are assumed to contain significant information. This prevents two small mbufs with different m_type values from being merged into a single larger mbuf. This does not prevent a single mbuf being copied into more than one mbuf; this would replicate the m_type field. If the usage of m_type is more sophisticated than the simple 'tagging' discussed above, then it is likely than the client will require a custom copying routine. The 'm_copy_u' routine produces an unsafe copy of an mbuf chain. This means no new underlying storage will be allocated. The chain returned will have the same number of mbufs as that supplied. The m_type, m_flags and m_pkthdr fields are replicated from the old chain to the new chain. The 'alloc_u' routine is used to perform the new allocations. struct mbuf * (* import) (struct mbctl *, struct mbuf *mp, size_t bytes, void *ptr); Data may be copied from raw memory into an mbuf chain with the 'import' routine. Copying starts at 'ptr' for reading and the first byte described by the mbuf chain for writing. Copying proceeds until either the entire mbuf chain has been filled or 'bytes' bytes have been read. The magic value M_COPYALL may be used to indicate that the entire mbuf chain should be filled. The return value is the mbuf chain supplied. If either the mbuf chain or 'ptr' is the NULL pointer then no operation is performed. struct mbuf * (* export) (struct mbctl *, struct mbuf *mp, size_t bytes, void *ptr); Data may be copied from an mbuf chain into raw memory with the 'export' routine. Copying starts at the first byte described by the mbuf chain for reading and 'ptr' for writing. Copying proceeds until either the entire mbuf chain has been copied or 'bytes' bytes have been written to raw memory. The magic value M_COPYALL may be used to indicate that the entire mbuf chain should be copied to raw memory. The return value is the mbuf chain supplied. If either the mbuf chain or 'ptr' is the NULL pointer then no operation is performed. Summary of mbuf chain ownership transfer for direct entry points Routine Category alloc Fresh alloc_g Fresh alloc_u Fresh alloc_s Fresh alloc_c Fresh ensure_safe Release and gain ensure_contig Release and gain free Release freem Release dtom_free Release dtom_freem Release dtom No transfer any_unsafe No transfer this_unsafe No transfer count_bytes No transfer cat No transfer trim No transfer copy Fresh * copy_p Fresh * copy_u Fresh * import No transfer export No transfer Notes: Fresh: A new mbuf chain is generated and the client receives ownership of this new chain when it receives the chain itself. Fresh *: Ownership of the supplied mbuf chain remains with the caller. Release: Ownership of the mbuf chain supplied is passed to the mbuf manager. Release and gain: Ownership of the mbuf chain supplied is passed to the mbuf manager. Ownership of the returned chain is passed to the client at the same time as the mbuf chain itself. No transfer: No ownership transfer occurs (and hence no linkage changes are performed), but the mbuf manager does expect a valid mbuf chain that remains static during the period of the call. 13 SWI Entry points All SWIs defined here obey the RISC OS convention of indicating success by returning with the V flag clear and failure by returning with the V flag set and r0 pointing at a standard RISC OS error block. For convenience, this is omitted from the definition of each SWI individually. All SWIs defined here also obey the standard convention regarding interrupts, unless otherwise specified. That is: IRQ interrupt state is preserved across the SWI, although it may be enabled during the SWI. FIQ interrupt state is assumed enabled and not altered. This is described as the normal behaviour in the text below. Mbuf_OpenSession (Mbuf_SWI + 0) Purpose: This SWI is used by clients to establish a session with the mbuf manager. This informs the mbuf manager of the clients mbuf requirements, and informs the client of the direct entry point addresses into the mbuf manager that are appropriate for its mbuf requirements. A certain amount of validation of the proposed session is performed by the mbuf manager. This may result in modifications of behaviour within the mbuf manager and it may also result in a refusal to accept a session, with an error being returned in the normal fashion. The flags field of the mbctl structure provides additional information to the mbuf manager about the required session. The only flag with a defined meaning at present is the MBC_USERMODE flag. The MBC_USERMODE flag may be specified in the flags field to request that the direct entry points be suitable for user mode calling. If this is not specified, then the direct entry points must be called in supervisor mode. If MBC_USERMODE is specified, then the direct entry points supplied must be called in user mode. This permits normal user mode applications to interact with the mbuf manager. Care must be taken with unsafe data to ensure that the memory described is valid when the user mode application is not the current application, and hence might not have its memory currently 'mapped in'. All other bits in the flags bitset should be zero. MBC_USERMODE 0 - This client requires supervisor mode direct entry points. 1 - This client requires user mode direct entry points. Entry: r0 Address of an 'mbctl' structure Exit: All registers preserved Interrupts: As normal. Errors: As normal. "Mbuf manager unsuitable for client" Notes: The addresess of the routines supplied for the direct entry points may vary according to the requirements indicated by a client. In some circumstances, the mbuf manager is able to supply a "null" routine: ie an immediate return. Further details of the fields and their uses is found elsewhere in this document. Mbuf_CloseSession (Mbuf_SWI + 1) Purpose: This SWI is uses to terminate a session that has previously been successfully created with the Mbuf_Init SWI. Entry: r0 Address of 'mbctl' structure supplied to previous Mbuf_OpenSession Exit: All registers preserved. Interrupts: As normal. Errors: As normal. "No such session" Notes: This SWI is used to terminate a session. The address supplied must be the same as that supplied to Mbuf_Init when the session was created. Whether an error is returned or not, the client must consider the session closed after issuing this SWI. Mbuf_Memory (Mbuf_SWI + 2) Purpose: Provides a means to limit the maximum amount of memory that may be claimed by the mbuf manager for mbuf and underlying storage. Entry: r0 Either 0 or the new desired limit, in bytes. Exit: r0 Approximate limit active when SWI issued. Interrupts: As normal, except interrupts are not enabled during processing. Errors: As normal. Notes: If zero is supplied as the new desired limit, then the limit is not altered and only an examination is performed. Limits are specified in bytes. They are approximate figures only (due to the underlying allocations being performed in granularities larger than a single byte). They are normally within one about kilobyte of the actual value. A new, larger limit does not cause more memory to be automatically claimed. The mbuf manager may attempt to dynamically maintain an appropriately sized free pool. This limit provides a ceiling to any dynamic fluctuations. A user interface might well use a granularity of four kilobytes. Mbuf_Statistic (Mbuf_SWI + 3) Purpose: This SWI provides an entry point that conforms to the DCI4 Statistic Interface. Please refer to that document for further details. Mbuf_Control (Mbuf_SWI + 4) Purpose: This is a general purpose control interface to the mbuf manager. Different implementations may implement different control calls. Entry: r0 Control call number, This dictates further register usage. Entry: r0 0 Mbuf manager version Exit: r0 Mbuf manager version in MMmm format (major * 100 plus minor, in decimal) Interrupts: As normal unless otherwise stated. Errors: As normal. "No such mbuf manager control call" Notes: Issuing this SWI with a reason code of 0 is a good method of checking for the presence of the mbuf manager. 14 Service calls: The Mbuf Manager issues service calls to notify clients and potential clients of desired and actual state changes. The service call used is Service_MbufManagerStatus (service call number 0xa2, in C). A reason code is passed in r0 to indicate the reason for the service call. Service_MbufManagerStatus - 0xa2 Entry r0 Reason code r1 Service_MbufManagerStatus (0xa2) Exit registers preserved, service call never claimed. The defined reason codes are as follows: r0 = MbufManagerStatus_Started (0) Mbuf Manager has started and is now available for use. It is possible to issue SWIs to the mbuf manager as soon as this service call has been seen (it is issued from a callback to ensure this is possible). r0 = MbufManagerStatus_Stopping (1) The Mbuf Manager is finishing. There are no open sessions if this reason code is used. The mbuf manager will refuse to die if there are any open sessions. r0 = MbufManagerStatus_Scavenge (2) This reason code is used to indicate that the mbuf manager is running short of allocatable memory and any clients with allocated data that may be easily recreated (such as cached data) should release this memory (ideally) before returning from the service call.. All other reason codes should be ignored. 15 The life and death of an mbuf manager: The components necessary to form a working DCI4 environment may be loaded in a number of different orders. To permit sensible, defined behaviour for all of these orders, the mbuf manager and all device drivers provide mechanisms to announce their arrival and departure, and for their presence to be determined through a polled action. The mbuf manager announces its arrival with the Service_MbufManagerStatus service call with a reason code of MbufManagerStatus_Started. An client that cannot detect the presence of the mbuf manager when it (the client) loads (via an Mbuf_Control SWI call, for example) should place itself in a 'pre-active' state and await this service call. The mbuf manager will respond to SWIs when the service call is issued. Any attempt to kill the mbuf manager when there are open sessions will be refused. If the mbuf manager is requested to die and there are not open sessions, it will issue the MbufManager_Stopping reason code in a Service_MbufManagerStatus service call to notify potential clients of this. Clients that can remain inactive without an open session with the mbuf manager should do so, to give the user more flexibility should it be necessary to upgrade the mbuf manager. Either way, killing all modules with open sessions should always permit the mbuf manager to be killed. 16 Glossary A "NULL pointer" is a pointer with all bits clear. It is never a valid address of examination or modification for virtually all programs undervirtually all circumstances. A linked list of mbufs constructed with the 'm_next' field is referred to as a "chain of mbufs". A linked list of mbufs constructed with the 'm_list' field is referred to as a "list of mbufs" or a "list of mbuf chains". A chain and a list may consist of just one mbuf. The ends of the chain and list are indicated by the NULL pointer. Lists of chains are constructed, but not vice versa (certainly within this specification). A1 Appendix - DCI4 mbuf manager client interface contract: Ownership of an mbuf chain grants permission to examine and modify the described data, alter the order of the chain, etc. It also brings the responsibility to either pass ownership to another client or to ensure that the mbuf chain is (eventually) freed. In short, ownership permits useful things to be done with an mbuf chain, and carries with it the responsibility to free the data. When a client obtains a pointer to an mbuf chain from outside itself (ie from another client or from the mbuf manager), it is deemed to have taken ownership of that chain. When it supplies that pointer to another client or the mbuf manager, it has lost ownership. An mbuf chain is never owned by more than one client. The presence of asynchronous execution mechanisms (such as interrupts) requires a more precise definition. During ownership transfer, there is a transient period where the relinquishing client has called the recipient client, but the call has not yet returned. Depending upon the precise definition, during this period of time, the mbuf chain could be viewed as being owned by zero, one or two clients. The definition for this specification is that the mbuf chain is owned by zero clients. This requires the relinquishing client to take whatever steps are necessary to ensure that it cannot continue to access an mbuf chain before calling the recipient client. An example of such steps might be to remove the mbuf chain pointer from a list of such pointers examined by an interrupt routine of the relinquishing client or to set a semaphore of some form. The mbuf pointer supplied to the call that transfers ownership should be the only copy of that pointer value that the relinquishing client has. In short, once ownership transfer is committed to, the original owner has already lost ownership. [Is this adequate. Does it supply the necessary degree of precision?] Whenever one client passes an mbuf chain that describes unsafe data (an "unsafe mbuf chain") to another client, it pledges that the data will remain valid until the recipient client returns through the thread of control that supplied the unsafe mbuf chain. If the recipient client were to retain the supplied mbuf chain beyond this point in time, then it might describe invalid data (it is possible that exceptions will arise if an attempt is made to access the described data, for example). A device driver never supplies a protocol module an unsafe mbuf chain (this is a DCI4 protocol to device driver restriction only). [I dislike this restriction, but it is necessary for the sweeping 'dtom' statement to apply.] When a device driver supplies a received packet chain to a protocol module, the m_type field of the first mbuf holds MT_HEADER (2) and all the other m_type fields hold MT_DATA (1). When a protocol module supplies an mbuf chain to a device driver, the m_type field of all described mbufs is invalid and must not influence the behaviour of the device driver. Summary of mbuf field validity: dci4 client <=> dci4 client Field From protocol From device driver m_next valid valid m_list valid* valid* m_off valid valid m_len valid valid m_inioff valid valid m_inilen valid valid m_type invalid valid m_sys1 opaque opaque m_sys2 opaque opaque m_flags invalid invalid m_pkthdr invalid invalid Notes: Field: the name of a field within an mbuf structure. From protocol: an mbuf chain for transmission From device driver: a newly received mbuf chain for protocol processing. From protocol: an mbuf chain being passed from a protocol to a device driver for transmission. valid: the field meets the criteria stated within this document. valid*: This is a list of mbuf chains. A protocol can avoid fragmenting datagrams down to the device driver mtu by using a list of unsafe mbufs to shadow the real data. The device driver uses 'ensure_safe' if it needs to retain an mbuf beyond the transmit call returning. invalid: any value may be present. No manipulation of such a value should ever be made. Either the field should be ignored or it should be initialised prior to use. opaque: never read and never written by any client. The 'm_inioff' and 'm_inilen' fields of a safe mbuf are never altered by a client - only read. The 'm_inioff' and 'm_inilen' fields of an unsafe mbuf may be altered by the client. A2 Appendix - Supplementary clarifications: There are no supplementary clarifications known to be required at present.