GTech Booster • A delve into MS-DOS filing system

Now that Microsoft has open sourced MS-DOS it’s a good time to look at filing systems, blocks and the FAT and how they are used.

Let’s explore the basics of a DOS – Disk Operating System. There is a big problem with disk drives. No, not the constant running out of space that we used to suffer from, but the deeper computer science and engineering problem of actually making them useful at all.

We are so familiar with disk drives being delivered to us in a usable form that it is first necessary to go back to the raw hardware to see what the problem is. In a hard disk drive the data is stored on the disk as magnetic patterns written in complete concentric tracks.

More accurately each track is split into a number of different sectors and data can be read or written by specifying the head, track and sector involved. If you want to read head 2, track 10, sector 5 then the hard disk moves its head array to track 10 and reads the sector markers passing underneath head 2 until it detects sector 5.

Then, and only then, does it start reading data and passing it on to the CPU or to memory for later use. Notice that you can read or write a sector but you can’t work with anything smaller and if you want to work with anything bigger you need to manipulate multiple sectors.

This is the main characteristic of a “block” device. It can read or write a block of data of a given size and everything you do with it has to take this into account. Modern “disk” devices like SSDs which don’t work in the same way as a magnetic disk still divide the storage they provide into blocks. Compare a block device with the behaviour of main memory (RAM) and you will begin to see the difference and the similarities.

To access main memory you give an address and this results in a single word of data being retrieved. To access a block device, a hard disk say, you have to supply head, track and sector numbers and a block of data is retrieved. For a more modern device you might just have to give the block number and it will workout where the block is stored.

At this point you might like to ask yourself what is wrong with this arrangement. If it is good enough for main memory it should work for disk drives! Well imagine that you now want to edit the letter you wrote last week. It is stored at head 2, track 5, sector 6 and you are required to specify this as 2,5,6 when the text editor asks what you want to edit.

If the letter is large enough to fill more than one sector or block you will have to remember the follow on sector’s address as well as the next and the next and so on… You now begin to see the problem.

Hard disk manufacturers leave the stage at the point where the hard disk is working but only in terms of the computer supplying simple read/write commands and head, track and sector addresses. What is needed is something to turn this into an easy to use storage device with a filing system.

Files and filing systems

You could see the filing system as a hardware problem but, when you start thinking of how flexible it has to be, it very rapidly turns into a software problem. It is much easier to write some software to organise the disk into a filing system than it is to build custom hardware to do the same job.

This software was initially called a “DOS” standing for Disk Operating System but later on the term became expanded to cover all sorts of additional features such as looking after other I/O devices. That is the term DOS became synonymous with Operating System because managing the disk was the main task of any OS.

Today we tend to think of the operating system as more than just a DOS and the term File System is often used to refer to that part of an OS which manages the disk or any mass storage device. The creation of a DOS, and the disk drives themselves, has often been a chicken and egg type of problem. Large computer companies such as IBM had no such difficulties.

They could build a disk drive and assign the manpower needed to write the DOS to make it useful. Smaller companies might be able to produce an add-on disk drive for a mini-computer, say, but where the DOS was to come from was left to the customers.

In the early days many a non-specialist programmer had had the task of putting together a custom DOS for a cheap disk drive, usually a floppy disk drive.. This was particularly true during the microprocessor revolution. We all got landed with the CP/M operating system because an early Intel development system needed a DOS to be useful and Gary Kildall wrote one.

You could even say that we are still landed with Windows because Microsoft produced a DOS for the new 16-bit generation of Intel processors. It is even claimed that Bill Gates himself invented a basic method of organising a disk drive, the FAT, and hence implemented a DOS.

So how does a DOS work? The best way to understand is to try to implement a DOS for yourself.

Working with blocks

For simplicity it is better to imagine that every sector has a unique address or block number rather than head, track sector number – after all this is how modern drives work.

The simplest way of organising the disk is to keep a list of file names and the blocks that are used to store the data. For example,

MyLetter  01,02,10
Notes	  03,15,09

indicates that the file MyLetter is composed of block 1,2 and 10 and Notes is composed of blocks 03,15 and 09. Simple enough but there are a number of unresolved questions in this method. The first is, where is this list to be stored? The obvious answer is on the disk drive itself. Any disk drive has to use a proportion of the space to store a catalogue of what is on the drive – this data has to be stored somewhere.

The most simple method is to use a fixed area – usually the first few tracks. The problem with this is that a fixed size catalogue can only hold the details of a set number of files. This means that a disk can appear full even though there is storage space left on it – early systems really did give a “Disk Full” message when in reality only the fixed sized directory had been used up.

On the other hand if you reserve too much space for the catalogue and then only store a few large files then much of it will be wasted. Keeping a list of the file names is only the start of the problem. How are you going to keep track of which blocks are used and which are free?.

How are you going to deal with the problem that the number of blocks that a file uses is variable and so you need a variable amount of space to record them in? The best way to answer these and other questions is to describe how MS-DOS, once the world’s favourite operating system, does it.

The FAT

Windows inherits MS-DOS’s filing system FAT – File Allocation Table. Although it has been superseded by the NTFS filing system in most cases it is still a supported option and often used for small and simple storage devices such as USB “drives”. NTFS is more complex than FAT and hence doesn’t make a good example.

The FAT is a table of data written on the disk drive at a fixed location when it is formatted. In theory the table contains one entry for each block on the disk. When the FAT is first written to the disk every entry is zero and a zero is used to indicate that block is free for use.

As well as the FAT the disk also contains a fixed area – the root directory – that is used to record filenames. Each root directory entry consists of a filename and the number of the first block in the file. As every file has at least one block, a directory entry that is all zeros is free for use.

If the file consists of more than one block then the number of the second block is stored in the FAT. Where? The answer is that it is stored in the entry corresponding to the first block.

That is, each entry in the FAT is either zero, indicating that the block is free for use, or it “points” to the next block in the “chain” and the start of the chain is stored in the root directory entry along with the file name.

The final block in the chain is marked by the use of the special code FFF8 to FFFF. The code FFF7 is used to indicate that the block is damaged and should not be used.