Silicon Junction: Storing records on a PIC web server

In a previous blog, I looked at using the TCP/IP stack on an Olimex PIC SBC to implement a web server. In this blog, I look at storing and retrieving data on the EEPROM on the SBC. My data is a collection of records of different types. The SBC in use, the Olimex PIC-WEB SBC, uses an Atmel AT45DB011. This chip has its idiosyncrasies that the storage process takes advantage of.

Using the Atmel AT45DB011

The Atmel AT54DB011 is a 128 KB EEPROM. It has the following characteristics.

The part is page oriented. This means writing a single byte or a random sequence of bytes is not easy but writing a whole page is easier. Reading sequentially across pages is not easy.
Due to the serial interface, reading or writing non-sequential bytes means supplying the address first and then the data.
The 128KB is split up into pages of 264 bytes each - yeah, that’s right - not 256 bytes.
The part includes two internal 264 byte RAM buffers. These serve as the temporary buffer for writing into the EEPROM. They can also be used as general purpose RAM.
Reading multiple bytes is easy by reading sequentially. When the read moves past the end of the page, it goes back to the beginning of the same page and not the next page.
Writing a byte means reading a page into the internal buffer, changing the bytes in the buffer and then writing back the whole buffer into the page.
Writing a page takes about 20ms.
A status byte can be read to tell you if the part is busy. On packages with more pins than an 8 pin SOIC, a pin provides the ready/busy status.
Once in a while, you have to rewrite all pages in a sector

The storage scheme will be page based. All reads and writes have to be sequential. Sequential reads cannot be across page boundaries. Writes are executed by first copying the page to the RAM buffer. updating it and then writing the page back.

Structuring the Data

The data is organised as a set of record types. Each record type has a fixed length. So what if the record does not have a fixed length? There are several ways of ensuring that it is.

If the variable length field has a fixed maximum length, this can be used as the size. This will mean that if the data is smaller, there will be wasted space. For example, if the field is a string and it cannot exceed 16 characters, the size of the field is set at 16 characters and the remainder is filled with zeros.

If a record has multiple instances of a field, that field becomes a new record type. A field in the new record type will tie it back to the parent record. This technique is used often in database design and is one of the steps of a process called normalisation.

The variable length field can be split into chunks. The chunks are stored in a new record type. This is handy if the maximum size is quite large but you don't want to waste space. This is a combination of the previous two approaches.

Now that we have a bunch of fixed record types, we store only records of the same type in a page. This does mean that the record size has to be less than the page size. A few bytes are used for a page header and in the remaining space, we fit in as many records as we can.

EEPROM Page Structure

Each page in use has a signature - the first byte set to a particular value - to differentiate it from an empty page. Each record type is represented by a single byte value. The page header holds the signature byte, the record type and the number of records already stored in the page. The rest of the space holds one or more instances of the record. There may be some wasted space at the end of the page that is too small for a record to fit.

RAM Variables

A typedef for a structure for each of the record types is the first step. We also need a union of all possible record structures with a byte at the start holding the record type currently held. This will be our universal record. This is used for all read and write operations. So the data stored in the EEPROM will be the binary representation of the structure for each record type. Finally, we need a structure to hold the page header.

A page index array with one byte for each page holds the record type held in each page. Empty pages are represented by a 0xFF. A variable holds the number of pages used. Each record will have a sequence number but it is not stored anywhere. This sequence number can be used to reference the record in other related records.

Access Functions

Startup initialisation: The list of pages is scanned and the record type for each page is stored in the page index array. A 0xFF indicates an unused page. The page count is also updated.
Get record by sequence: Scan the page index array for records of that type. Read the record header to get the count of records and add them up. When the sequence is reached, get the offset of the record within the page. Use this to read the record into the record buffer.
Save a record: Start from the last page and look in the page array index for a page of the right type. If it does not have enough space for a new record or there are no pages of that type, start a new page for that type. Use the count held in the record header to get the offset into the page. Write the record at the offset and update the count.
Search for a record: All searches use one or more fields in a record. The fields have to be contiguous. Each record of the type is loaded in sequence and the bytes are compared at the right offset. If the record matches, the buffer can be used or the sequence can be returned. As records can make references to other records, the search routine uses its own record buffer.
Parsing routines: Records may be received as a delimited character string. Parsing routines for each record type can be used to convert this string into a record.
Formatting routines: Record may be sent as a delimited character string. Formatting routines for each record can be used to convert a record to a string.

I will need more functions as I go along but these should keep me going for a while.

Some problems to be fixed

There are a few FAT based file systems already available. These are quite large and not particularly suited for reading or writing a random access file structure. This scheme is simple enough but has some drawbacks. Some you fix over time, some you gotta live with.

There is no update routine at the moment but that is relatively simple. I will implement it when I get to it.
There is no delete. This is a bit more involved. I may take the simple approach of marking a record as deleted and trying to reuse it when a new record is needed.
The routines are single threaded and cannot handle simultaneous web requests. This is normally fatal for a web application but should be OK for an embedded web server.
A certain amount of space is wasted on each page. At the moment, the space on a 128K EEPROM looks massive but at some point I may have to look at using the space more efficiently.

Silicon Junction

Saturday, May 4, 2013

Storing records on a PIC web server