A delve into MS-DOS filing system

Reading a file
So to read a file you first find its name, MyLetter say, in the root directory. You then know the number of the first block that it uses to store data – say block 05. You can now read block 05 and process the data it contains.
When you want to read the next block you go to entry 05 in the FAT and read the block number of the next block in the file 03 say. You can now read and process block 03. When you want to read the next block you look in the FAT in entry 03 to get the next block number.
You carry on like this working your way along the file until the FAT entry is bigger than or equal to FFF8. At this point you know you have got to the end of the file and there isn’t another block.
Writing a file
Writing a file is just as easy.
First you find a free root directory entry and store the file name in it. You then search the FAT for the first zero entry. When you have found a zero entry you know that the block with the same number as the table entry is free and so you store this number in the root directory. The program then generates data until it fills a block when it is written to the disk. Another block is then added to the file by first searching the FAT for a zero entry and then writing the number of the table entry in the previous FAT location. The process continues in this way until the final block is written and its entry in the FAT is set to FFF8 (or greater).
One advantage of using the FAT method is that files can be stored scattered all over a drive!
EOF
The only subtle point is what the program is to do if it doesn’t use a complete final block? The answer is that most programs store an EOF or End Of File code to mark the end of useful data. In addition, the operating system stores a file length in the form of a byte count in the root directory which can be used to decide where the file ends.
There are some other points of implementation but nothing too difficult or complex. For example, the block size used in the FAT is usually not a single cluster because this would make the FAT too large slowing things down and wasting disk space. Instead sectors which usually store 512 Bytes are grouped together to form clusters which are then treated as allocation blocks in the FAT.
The number of clusters that a FAT can handle depends on the size of each entry in the FAT. The first FAT-based operating system used 12-bit entries and could handle 2^12 clusters, i.e. 4096. If each cluster is 4Kbytes, i.e. 8 standard sectors, this makes the largest disk that a FAT 12 system can manage, 32MBytes. As disks got bigger the FAT had to be extended to use 16-bit entries, i.e. 65536 clusters or 2GBytes using 64 sector clusters. Now we have FAT32 which uses 32-bit entries, i.e. 4,294,967,296 clusters or 2TeraBytes using 64 sector clusters.
The advantage of knowing how the FAT system works is that many of the mysterious messages will now make better sense. For example, if a disk check on a FAT disk reports a lost cluster this corresponds to an allocation unit that has a non-zero FAT entry but isn’t part of any file – it’s lost, it’s come adrift – and the only sensible thing to do is convert it into a file. In fact errors of this sort in the FAT are so dangerous that the operating system keeps two copies of the FAT which it compares every time the machine is started.
The only other important idea that we haven’t dealt with is the way the root directory is expanded to allow additional directories. Put simply the root directory can contain entries which point to files that are themselves treated as directories – i.e. sub directories. This is the reason that most filing systems are hierarchical and have a tree structure. Filing system such as NTFS and the Linux ext2 work in roughly same way – they all implement some method of keeping track of which blocks are used and belong to which files. Easy to say, but very difficult to do in an efficient and reliable way.
More Information
Some content may require login.