Path: archiver1.google.com!news2.google.com!news1.google.com!newsfeed.stanford.edu!newsfeeds.belnet.be!news.belnet.be!psinet-eu-nl!psiuk-p4!uknet!psiuk-p3!uknet!psiuk-n!news.pace.co.uk!nh.pace.co.uk!kbracey.cam.pace.co.uk%kbracey From: Kevin Bracey Newsgroups: comp.sys.acorn.programmer Subject: Re: Acorn's squeeze Date: Thu, 13 Dec 2001 14:43:17 GMT Organization: Posted on a server owned by Pace Micro Technology plc Lines: 74 Message-ID: <3ed643e84a.kbracey@kbracey.cam.pace.co.uk> References: <9vaa04$2ina$1@mimas.salford.ac.uk> NNTP-Posting-Host: kbracey.cam.pace.co.uk X-Trace: nh.pace.co.uk 1008255472 29864 136.170.129.213 (13 Dec 2001 14:57:52 GMT) X-Complaints-To: newsmaster@news.cam.pace.co.uk NNTP-Posting-Date: 13 Dec 2001 14:57:52 GMT To: p.f.john...@salford.ac.uk In-Reply-To: <9vaa04$2ina$1@mimas.salford.ac.uk> X-Organization: Pace Micro Technology User-Agent: Messenger-Pro/2.50a (MsgServe/1.50) (RISC-OS/3.71) NewsHound/1.38 In message <9vaa04$2in...@mimas.salford.ac.uk> Paul F. Johnson wrote: > Hi, > > Probably just me and my usual lack of info/knowledge/anything else (!), but > having looked through the PRMs, I can't find anything on the algorhythm > used in Acorn's squeeze program and was wondering if it ever was made > public. Don't think so. Here's a description I wrote, by reverse engineering it. Summary of the compressed data format: typedef struct { word decoded_size; word encoded_size; word tables_size; word nshorts; word nlongs; word bytestomove; } comp_table; <- encoded_size -> <- tables_size -> |-----encoded data-----|---dictionary----|-tbl-|--decomp code-| <- -> bytestomove <- decoded_size -> |-------------------------------decoded data--------------------------------| The image is scanned for up to 1792 common 32-bit words (short entries) and up to 1792 common top 24 bits of words (long entries). These form the dictionary, which is stored compressed above the compressed data. The dictionary consists of t->nshorts + t->nlongs entries. The entries are represented by: 00 WW XX YY -> previous entry + YYXXWW (long) 00 WW XX YY ZZ -> previous entry + ZZYYXXWW (short) 01-09 -> n entries consecutive from the last one 0A-5B -> previous entry + (n-&0A) 5C-AD XX -> previous entry + ((n-&5C)<<8) + XX AE-FF XX YY -> previous entry + ((n-&AE)<<16) + YYXX Data is compressed two words at a time, and must be decompressed from the top down. The compressed form starts with a single byte (at the highest memory location), split into two nibbles. The least-significant nibble represents the lower-addressed word, the most-significant nibble the higher. Further bytes are read from decreasing memory addresses, first for the low word, then the higher, depending on the contents of the nibble. Words encoding forms are: 0 -> 00000000 WW XX YY ZZ 1 -> WWXXYYZZ ZZ 2-8 -> short[((n-2) << 8) + ZZ] YY ZZ 9-F -> (long[((n-9) << 8) + ZZ] << 8) + YY For example, (45 12 34 56 78 51) represents (12345678 short[345]) The encoding scheme is well suited to 32-bit oriented data, particularly ARM instructions. Before compression, the data would normally be padded out to an even number of words with zeros. This extra trailing data is harmless for AIF. A six-word header precedes the decompression code, giving details of the data and dictionary sizes. It also gives an indication of when to shuffle the final compressed data out of the way when decompressing in place. -- Kevin Bracey, Principal Software Engineer Pace Micro Technology plc Tel: +44 (0) 1223 518566 645 Newmarket Road Fax: +44 (0) 1223 518526 Cambridge, CB5 8PB, United Kingdom WWW: http://www.pace.co.uk/