FAT: Difference between revisions

From OSDev.wiki
Jump to navigation Jump to search
[unchecked revision][unchecked revision]
Content added Content deleted
(Ported FAT12 page. Lotsawork, really)
m (Bot: Replace deprecated source tag with syntaxhighlight)
 
(104 intermediate revisions by 54 users not shown)
Line 1: Line 1:
{{Filesystems}}
File Allocation Table (FAT) was introduced with DOS v1.0 (and possibly CP/M), supposedly written by Bill Gates. FAT is a very simple filesystem which is nothing more than a singular linked list of clusters. FAT filesystems use very little memory and is one of, if not the, most basic filesystem in existance today.
The '''File Allocation Table''' ('''FAT''') was the native file system of MS-DOS. FAT was originally introduced by Marc McDonald in Stand-alone Disk BASIC with 8-bit FAT entries and 16 byte directory entries. The better known FAT12 variant, with 12-bit FAT entries and 32 byte directory entries, was introduced with DOS. FAT is a very simple file system -- nothing more than a singly-linked list of clusters in a gigantic table. A FAT file system uses very little memory (unless the OS caches the whole allocation table in memory) and is one of, if not the, most basic file system in use today.


== About FAT ==
== Overview ==
There are several different versions of the FAT file system. Each version was designed for a different size of storage media.
=== FAT 12 ===
FAT 12 was designed for floppy disks and can manage a maximum size of 16 megabytes because it uses 12 bits to address the clusters.


=== FAT 16 ===
There are two versions of this simplified FAT, FAT12 and FAT16. FAT12 was designed for floppy disks and can manage a maximum size of 16mb using 12bit cluster numbers. FAT16 was designed for early hard disks and could handle a maximum size of 64kb * cluster_size. The larger the hard disk, the larger the cluster size would be, which lead to large amounts of "slack space" on the disk.
FAT 16 was designed for early hard disks and could handle a maximum size of 64K clusters * the cluster size. The larger the hard disk, the larger the cluster size would be, which leads to large amounts of "slack space" on the disk.


=== FAT 32 ===
FAT12+FAT16 filesystems have fixed size for filenames of "8.3" and limited support for file attributes. You could also read the FAT12 document. You could also check out the FAT tutorial reported by Kemp.
FAT 32 was introduced to us by Windows95-B and Windows98. FAT32 solved some of FAT's problems. No more 64K max clusters! Although FAT32 uses 32 bits per FAT entry, only the bottom 28 bits are actually used to address clusters on the disk (top 4 bits are reserved). With 28 bits per FAT entry, the filesystem can address a maximum of about 270 million clusters in a partition. This enables very large hard disks to still maintain reasonably small cluster sizes and thus reduce slack space between files.


=== ExFAT ===
'''Note that the FAT filesystem is covered by software patents.'''
{{Main|ExFAT}}
ExFAT is the filesystem used on SDXC cards, created by Microsoft. It is FAT32 with actually 32 bits per FAT entry, with the ability to indicate a file is fully consecutive on disk (allowing you to skip reading the FAT), some more advanced features and a fully redesigned file entry system. Since it's so similar to FAT32, please merge any bits of info from the exFAT article into this one.


Microsoft has published the official specification at https://docs.microsoft.com/en-us/windows/win32/fileio/exfat-specification .
=== VFAT ===


=== VFAT ===
VFAT is an extension of FAT16 and FAT12 that has the ability to use long filenames (up to 255 characters i think). First introduced by Windows95, it uses a "cludge" whereby long filenames are marked with an "volume label"; attribute and filenames are subsequently stored in the 8.3 format in sequential directory entries. (This is a bit of an oversimplification, but close enough).
VFAT is an extension to the FAT file system that has the ability to use long filenames (up to 255 characters). First introduced by Windows 95, it uses a "kludge" whereby long filenames are marked with a "volume label" attribute and filenames are subsequently stored in 11 byte chunks in sequential directory entries. (This is a bit of an oversimplification, but close enough).

=== FAT32 ===

FAT32 was introduced to us by Windows95-B and Windows98. FAT32 solved some of FAT's problems. No more 64kb max clusters! FAT32, as its name suggests, can handle a maximum of 4gig clusters per partition. This enables very large hard disks to still maintain very small cluster sizes and thus reduce slack space between files.

''Is FAT32 really able to handle 4G clusters ? last time i looked to a FAT table, it looked to have last 4 bits cleared all the time, including the 'end of chain' tag, giving me the feeling that it was somehow more a "FAT28" than "FAT32"'' -- PypeClicker

''Correct - FAT32 is actually only FAT28. Top 4 bits are currently "Undefined" (according to Microsoft's specification). Note that this does not mean 'top four bits should be "0"' - they should be honored. This means that if you need to alter a FAT entry, and the top 4 bits contain 1001, you should write it back to disk containing 1001 and not 0000.'' -- djhayman


== Implementation Details ==
== Implementation Details ==
The FAT file system views the storage media as a flat array of clusters. If the physical media does not address its data as a flat list of sectors (really old hard disks and floppy disks) then the cluster numbers will need to be translated before being sent to the disk. The storage media is organized into three basic areas.


* The boot record
The File Allocation Table (FAT) is a table stored on every hard or floppy disk that indicates the status and location of all data clusters that are on the disk. The File Allocation Table can be considered to be the "table of contents" of a disk. If the file allocation table is damaged or lost, then a disk is unreadable. In a file server the FAT data is sometimes kept in the computer RAM for quick access and is easily lost if the system crashes as the result of a power failure.
* The File Allocation Table (FAT)
* The directory and data area


=== Boot Record ===
The File Allocation Table is maintained by the operating system that provides a map of the clusters (the basic unit of logical storage on a disk) that a file has been stored in. When you write a new file, the file is stored in one or more clusters that are not necessarily next to each other; they may be rather widely scattered over the disk. A typical cluster size is 2,048 Bytes, 4,096 Bytes or 8,192 Bytes.* The operating system creates a FAT entry for the new file that records where each cluster is located and their sequential order. When you read a file, the operating system reassembles the file from clusters and places it as an entire file where you want to read it.
The boot record occupies one sector, and is always placed in logical sector number zero of the "partition". If the media is not divided into partitions, then this is the beginning of the media. This is the easiest sector on the partition for the computer to locate when it is loaded. If the storage media is partitioned (such as a hard disk), then the beginning of the actual media contains an [[MBR (x86)]] or other form of partition information. In this case each partition's first sector holds a [[Volume Boot Record]].


==== BPB (BIOS Parameter Block) ====
The hard disk is physically arranged by cylinders, heads, and sectors, that is how it is addressed by the hardware controller and the ROM BIOS, which addresses it at a physical level. For the operating system and other programs, however, this is cumbersome, since the physical number of cylinders, heads, and sectors varies from disk to disk. It would be convenient to view the disk as simply a large continuous block of sectors with simple sequential addresses.


The boot record contains both code and data, mixed together. The data that isn't code is known as the BPB.
MS-DOS does, in fact, view the sectors on a disk as a one-dimensional array of sectors numbered from 0 to n-1, where n is the total number of sectors on the disk. It therefore must translate from the logical sector numbers to physical to physical cylinder-head-sector, or CHS addresses. In doing so, MS-DOS sequentially numbers all the sectors of head 0, cylinder 0, then all the sectors of head 1, cylinder 0, and so on for each head, and then repeats this for each cylinder, to the end of the disk.


{| {{wikitable}}
Furthermore, MS-DOS logically divides this array of sectors into five distinct areas, which are, in the order they appear on the disk,
|-
! Offset (decimal)
! Offset (hex)
! Size (in bytes)
! Meaning
|-
| 0
| 0x00
| 3
| The first three bytes EB 3C 90 disassemble to JMP SHORT 3C NOP. (The 3C value may be different.) The reason for this is to jump over the disk format information (the BPB and EBPB). Since the first sector of the disk is loaded into ram at location 0x0000:0x7c00 and executed, without this jump, the processor would attempt to execute data that isn't code. Even for non-bootable volumes, code matching this pattern (or using the E9 jump opcode) is required to be present by both Windows and OS X. To fulfil this requirement, an infinite loop can be placed here with the bytes EB FE 90.
|-
| 3
| 0x03
| 8
| OEM identifier. The first 8 Bytes (3 - 10) is the version of DOS being used. The next eight Bytes 29 3A 63 7E 2D 49 48 and 43 read out the name of the version. The official FAT Specification from Microsoft says that this field is really meaningless and is ignored by MS FAT Drivers, however it does recommend the value "MSWIN4.1" as some 3rd party drivers supposedly check it and expect it to have that value. Older versions of dos also report MSDOS5.1, linux-formatted floppy will likely to carry "mkdosfs" here, and FreeDOS formatted disks have been observed to have "FRDOS5.1" here. If the string is less than 8 bytes, it is padded with spaces.
|-
| 11
| 0x0B
| 2
| The number of Bytes per sector (remember, all numbers are in the little-endian format).
|-
| 13
| 0x0D
| 1
| Number of sectors per cluster.
|-
| 14
| 0x0E
| 2
| Number of reserved sectors. The boot record sectors are included in this value.
|-
| 16
| 0x10
| 1
| Number of File Allocation Tables (FAT's) on the storage media. Often this value is 2.
|-
| 17
| 0x11
| 2
| Number of root directory entries (must be set so that the root directory occupies entire sectors).
|-
| 19
| 0x13
| 2
| The total sectors in the logical volume. If this value is 0, it means there are more than 65535 sectors in the volume, and the actual count is stored in the Large Sector Count entry at 0x20.
|-
| 21
| 0x15
| 1
| This Byte indicates the [https://en.wikipedia.org/wiki/Design_of_the_FAT_file_system#BPB20_OFS_0Ah media descriptor type].
|-
| 22
| 0x16
| 2
| Number of sectors per FAT. FAT12/FAT16 only.
|-
| 24
| 0x18
| 2
| Number of sectors per track.
|-
| 26
| 0x1A
| 2
| Number of heads or sides on the storage media.
|-
| 28
| 0x1C
| 4
| Number of hidden sectors. (i.e. the LBA of the beginning of the partition.)
|-
| 32
| 0x20
| 4
| Large sector count. This field is set if there are more than 65535 sectors in the volume, resulting in a value which does not fit in the ''Number of Sectors'' entry at 0x13.
|}


Note: the "geometry" of the media (sectors per track, heads, and perhaps the number of bytes in a sector) is not necessarily known correctly by the program that originally formats the media. Also, if the media is moved (from the computer that formatted it) to another machine with a different BIOS -- then the new BIOS may specify a different geometry for the same media. So it is generally a very bad idea to trust the "SPT" or "heads" numbers. Get them from the BIOS instead, if possible.
* The partition table,
* The boot record,
* The File Allocation Table (FAT),
* The root directory, and
* The data area.


Note2: many of the values in the BPB are not correctly "aligned". That is, word-sized values are not stored on word ("even" address) boundaries. On some architectures, accessing misaligned words may cause the code to crash. Making a copy of the BPB (somewhere else in memory and shifted up one byte) may solve the problem.
The first four areas of the disk, collectively called the system area, are used by MS-DOS to keep track of the contents of the disk. The largest area of the disk, the data area, is where all user files and data reside. MS-DOS uses a special numbering scheme for the area called cluster numbering which is in addition to, but independent of logical sector numbers.


==== Extended Boot Record ====
The boot record occupies one sector, and is always placed in logical sector number (LSN) 0, which is physically cylinder 0, head 0, sector 1, the first sector of the first head of the first cylinder on the disk. This is the easiest sector on the disk for the computer to locate when it begins running.
The extended boot record information comes right after the BPB. The data at the beginning is known as the EBPB. It contains different information depending on whether this partition is a FAT 12, FAT 16, or FAT 32 filesystem. Immediately following the EBPB is the actual boot code, then the standard 0xAA55 boot signature, to fill out the 512-byte boot sector. Offsets shows are from the start of the standard boot record.


===== FAT 12 and FAT 16 =====
The File Allocation Table (FAT) is an array of integers in which each element represents one cluster in the data area. For each cluster in the data area the corresponding entry in the FAT contains a code which indicates the status of the cluster. The cluster may be available for use, it may be reserved by the operating system, it may be unavailable due to a bad sector on the disk, or it may be in use by a file.
{| {{wikitable}}

|-
MS-DOS maintains a hierarchical directory structure in which there is one entry for every file on the disk.
! Offset (decimal)

! Offset (hexadecimal)
Data area is where all user files and data reside.
! Length (in bytes)

! Meaning
== FAT 12 (Diskette) ==
|-
| 36
| 0x024
| 1
|Drive number. The value here should be identical to the value returned by BIOS interrupt 0x13, or passed in the DL register; i.e. 0x00 for a floppy disk and 0x80 for hard disks. This number is useless because the media is likely to be moved to another machine and inserted in a drive with a different drive number.
|-
| 37
| 0x025
| 1
| Flags in Windows NT. Reserved otherwise.
|-
| 38
| 0x026
| 1
|Signature (must be 0x28 or 0x29).
|-
| 39
| 0x027
| 4
|VolumeID 'Serial' number. Used for tracking volumes between computers. You can ignore this if you want.
|-
| 43
| 0x02B
| 11
|Volume label string. This field is padded with spaces.
|-
| 54
| 0x036
| 8
|System identifier string. This field is a string representation of the FAT file system type. It is padded with spaces. The spec says never to trust the contents of this string for any use.
|-
| 62
| 0x03E
| 448
|Boot code.
|-
| 510
| 0x1FE
| 2
|Bootable partition signature 0xAA55.
|}


=== Boot Sector ===
===== FAT 32 =====
{| {{wikitable}}

|-
==== Contents ====
! Offset (decimal)
{| {{wikitable}}
! Offset (hexadecimal)
! Length (in bytes)
! Meaning
|-
| 36
| 0x024
| 4
|Sectors per FAT. The size of the FAT in sectors.
|-
|-
| 40
! BYTE
| 0x028
! <pre> +0 +1 +2 +3 +4 +5 +6 +7 </pre>
| 2
! meaning
| Flags.
|-
|-
| 0-2
| 42
| 0x02A
| <pre> 0000 eb 3c 90 .. .. .. .. .. </pre>
| 2
| Jump to start of boot code
|FAT version number. The high byte is the major version and the low byte is the minor version. FAT drivers should respect this field.
|-
|-
| 44
| 3 - 10
| 0x02C
| <pre> 0000 .. .. .. 6d 6b 64 6f 73 </pre>
| 4
| OEM identifier (mkdosfs)
|The cluster number of the root directory. Often this field is set to 2.
|-
| 48
| 0x030
| 2
|The sector number of the FSInfo structure.
|-
| 50
| 0x032
| 2
|The sector number of the backup boot sector.
|-
| 52
| 0x034
| 12
|Reserved. When the volume is formated these bytes should be zero.
|-
| 64
| 0x040
| 1
|Drive number. The values here are identical to the values returned by the BIOS interrupt 0x13. 0x00 for a floppy disk and 0x80 for hard disks.
|-
|-
| 65
|
| 0x041
| <pre> 0008 66 73 00 .. .. .. .. .. </pre>
| 1
| (program/OS being used to format)
|Flags in Windows NT. Reserved otherwise.
|-
|-
| 66
| 11 - 12
| 0x042
| <pre> 0008 .. .. .. 00 02 .. .. .. </pre>
| 1
| The number of Bytes per sector (512)
|Signature (must be 0x28 or 0x29).
|-
|-
| 67
| 13
| 0x043
| <pre> 0008 .. .. .. .. .. 01 .. .. </pre>
| 4
| Number of sectors per allocation unit (cluster)
|Volume ID 'Serial' number. Used for tracking volumes between computers. You can ignore this if you want.
|-
|-
| 71
| 14 - 15
| 0x047
| <pre> 0008 .. .. .. .. .. .. 01 00 </pre>
| 11
| Number of reserved sectors
|Volume label string. This field is padded with spaces.
|-
| 82
| 0x052
| 8
|System identifier string. Always "FAT32&nbsp;&nbsp;&nbsp;". The spec says never to trust the contents of this string for any use.
|-
| 90
| 0x05A
| 420
|Boot code.
|-
| 510
| 0x1FE
| 2
|Bootable partition signature 0xAA55.
|}


=== FSInfo Structure (FAT32 only) ===
{| {{wikitable}}
|-
! Offset (decimal)
! Offset (hexadecimal)
! Length (in bytes)
! Meaning
|-
| 0
| 0x0
| 4
|Lead signature (must be 0x41615252 to indicate a valid FSInfo structure)
|-
| 4
| 0x4
| 480
|Reserved, these bytes should never be used
|-
| 484
| 0x1E4
| 4
|Another signature (must be 0x61417272)
|-
| 488
| 0x1E8
| 4
|Contains the last known free cluster count on the volume. If the value is 0xFFFFFFFF, then the free count is unknown and must be computed. However, this value might be incorrect and should at least be range checked (<= volume cluster count)
|-
| 492
| 0x1EC
| 4
|Indicates the cluster number at which the filesystem driver should start looking for available clusters. If the value is 0xFFFFFFFF, then there is no hint and the driver should start searching at 2. Typically this value is set to the last allocated cluster number. As the previous field, this value should be range checked.
|-
| 496
| 0x1F0
| 12
|Reserved
|-
| 508
| 0x1FC
| 4
|Trail signature (0xAA550000)

|}

==== exFat boot record ====
For exFAT the whole boot record was recreated from scratch instead of extending the existing FAT12/16/32 boot records even further. You can recognize exFAT by noticing that in the FAT12/16/32 boot record, the "bytes per sector" is zero.

{| {{wikitable}}
|-
|-
! Offset (decimal)
| 16
! Offset (hex)
| <pre> 0010 02 .. .. .. .. .. .. .. </pre>
! Size (in bytes)
| Number of FAT's on the diskette
! Meaning
|-
|-
| 0
| 17 - 18
| 0x00
| <pre> 0010 .. e0 00 .. .. .. .. .. </pre>
| 3
| Number of directory entries (BIOS Parameters)
| The first three bytes EB 3C 90 disassemble to JMP SHORT 3C NOP. (The 3C value may be different.) The reason for this is to jump over the disk format information (the BPB and EBPB). Since the first sector of the disk is loaded into ram at location 0x0000:0x7c00 and executed, without this jump, the processor would attempt to execute data that isn't code. Even for non-bootable volumes, code matching this pattern (or using the E9 jump opcode) is required to be present by both Windows and OS X. To fulfil this requirement, an infinite loop can be placed here with the bytes EB FE 90.
|-
|-
| 3
| 19 - 20
| 0x03
| <pre> 0010 .. .. .. 40 0b .. .. .. </pre>
| 8
| The total sectors in the logical volume
| OEM identifier. This contains the string "EXFAT ". Not to be used for filesystem determination, but it's a nice hint.
|-
|-
| 11
| 21
| 0x0B
| <pre> 0010 .. .. .. .. .. f0 .. .. </pre>
| 53
| Media descriptor type
| Set to zero. This makes sure any FAT driver will not be able to load it.
|-
|-
| 64
| 22 - 23
| 0x40
| <pre> 0010 .. .. .. .. .. .. 09 00 </pre>
| 8
| Number of sectors per FAT
| Partition offset. No idea why the partition itself would have this, but it's here. Might be wrong. Probably best to just ignore.
|-
|-
| 72
| 24 - 25
| 0x48
| <pre> 0018 12 00 .. .. .. .. .. .. </pre>
| 8
| Number of sectors per track
| Volume length.
|-
|-
| 80
| 26 - 27
| 0x50
| <pre> 0018 .. .. 02 00 .. .. .. .. </pre>
| 4
| Number of heads or sides on the diskette
| FAT offset (in sectors) from start of partition.
|-
|-
| 84
| 28 - 31
| 0x54
| <pre> 0018 .. .. .. .. 00 00 00 00 </pre>
| 4
| Number of hidden sectors
| FAT length (in sectors).
|-
|-
| 88
| 32 - 35
| 0x58
| <pre> 0020 00 00 00 00 .. .. .. .. </pre>
| 4
| Large amount of sector on media
| Cluster heap offset (in sectors).
|-
|-
| 92
| 36
| 0x5C
| <pre> 0020 .. .. .. .. 00 .. .. .. </pre>
| 4
| Drive number
| Cluster count
|-
|-
| 96
| 37
| 0x60
| <pre> 0020 .. .. .. .. .. 00 .. .. </pre>
| 4
| Flags
| Root directory cluster. Typically 4 (but just read this value).
|-
|-
| 100
| 38
| 0x64
| <pre> 0020 .. .. .. .. .. .. 29 .. </pre>
| 4
| Signature (must be 0x28 or 0x29)
| Serial number of partition.
|-
|-
| 104
| 39 - 42
| 0x68
| <disk-dependent>
| 2
| VolumeID 'Serial' number (ignore this)
| Filesystem revision
|-
|-
| 106
| 43 - 53
| 0x6A
| <pre> 0028 .. .. .. 20 20 20 20 20 </pre>
| 2
| Volume label,
| Flags
|-
|-
| 108
|
| 0x6C
| <pre> 0030 20 20 20 20 20 20 .. .. </pre>
| 1
| padded with spaces
| Sector shift
|-
|-
| 109
| 54 - 61
| 0x6D
| <pre> 0030 .. .. .. .. .. .. F A </pre>
| 1
| system identifier
| Cluster shift
|-
|-
| 110
|
| 0x6E
| <pre> 0038 T 1 2 20 20 20 .. .. </pre>
| 1
| (padded with space)
| Number of FATs
|-
|-
| 111
| 62-509
| 0x6F
| <pre> 0038 .. .. .. .. .. .. 0e 1f </pre>
| 1
| Start of Bootstrap routine
| Drive select
|-
|-
| 112
|
| 0x70
| <pre> 0040 be 5b 7c ac 22 c0 74 0b </pre>
| 1
| Bootstrap routine (cont'd)
| Percentage in use
|-
|-
| 510
| 113
| 0x71
| <pre> 01f0 .. .. .. .. .. .. 55 aa </pre>
| 7
| BIOS boot Signature (dw 0xAA55)
| Reserved (set to 0).
|}
|}


To read the filesystem, find out how big a 'sector' and a 'cluster' are. A sector is (1 << sectorshift) bytes, a cluster is (1 << (sectorshift + clustershift)) bytes. Then, find the start of the FAT and the start of the cluster heap (note that the first cluster is *still* cluster 2).
==== Reading the Boot Sector ====


<syntaxhighlight lang="C">
''' Bytes (0-2) '''
// This allows you to zero-index clusters:
uint64_t clusterArray = clusterheapoffset * sectorsize - 2 * clustersize;
uint64_t fatOffset = fatoffset * sectorsize;
uint64_t usablespace = clustercount * clustersize;
</syntaxhighlight>


Note that all values in the BPB are now naturally aligned and that this code is *significantly* simpler than FAT32's BPB reading.
The first three bytes 6B 3C and 90 disassemble to JMP 003C NOP. The reason for this is to jump over the disk format information. Since the first sector of the disk is loaded into ram at location 0x0000:0x7c00 and executed, without this jump, the processor would attempt to execute data that isn't code.


=== File Allocation Table ===
''' Bytes (3 - 10) '''
The File Allocation Table (FAT) is a table stored on the storage media that indicates the status and location of all data clusters that are on the disk. It can be considered the "table of contents" of a disk. The cluster may be available for use, it may be reserved by the operating system, it may be unavailable due to a bad sector on the disk, or it may be in use by a file. The clusters of a file need not be right next to each other on the disk. In fact it is likely that they are scattered widely throughout the disk. The FAT allows the operating system to follow the "chain" of clusters in a file.


==== FAT 12 ====
The first 8 Bytes (3 - 10) is the version of DOS being used. The next eight Bytes 29 3A 63 7E 2D 49 48 and 43 read out the name of the version. The official FAT Specification from Microsoft says that this field is really meaningless and is ignored by MS FAT Drivers, however it does recommend the value "MSWIN4.1" as some 3rd party drivers supposedly check it and expect it to have that value. Older versions of dos also report MSDOS5.1 and linux-formatted floppy will likely to carry "mkdosfs" here. If the string is less than 8 bytes, it is padded with zeroes.
FAT 12 uses 12 bits to address the clusters on the disk. Each 12 bit entry in the FAT points to the next cluster of a file on the disk. Given a valid cluster number, here is how you extract the value of the next cluster in the cluster chain:
<syntaxhighlight lang="C">
unsigned char FAT_table[sector_size * 2]; // needs two in case we straddle a sector
unsigned int fat_offset = active_cluster + (active_cluster / 2);// multiply by 1.5
unsigned int fat_sector = first_fat_sector + (fat_offset / sector_size);
unsigned int ent_offset = fat_offset % sector_size;


//at this point you need to read two sectors from disk starting at "fat_sector" into "FAT_table".
''' Bytes (11 - 12) '''


unsigned short table_value = *(unsigned short*)&FAT_table[ent_offset];
The next two Bytes (11 - 12), 00 and 02, is the number of Bytes per sector. The first thing you do when reading a pair of Bytes is reverse them to read 02 00. 0200 is the number of Bytes per sector in hexadecimal or 512 Bytes per sector in decimal.


table_value = (active_cluster & 1) ? table_value >> 4 : table_value & 0xfff;
''' Byte 13 '''


//the variable "table_value" now has the information you need about the next cluster in the chain.
Byte 13 is the number of sectors per allocation (cluster). In this case it is one.
</syntaxhighlight>
If "table_value" is greater than or equal to (>=) 0xFF8 then there are no more clusters in the chain. This means that the whole file has been read. If "table_value" equals (==) 0xFF7 then this cluster has been marked as "bad". "Bad" clusters are prone to errors and should be avoided. If "table_value" is not one of the above cases then it is the cluster number of the next cluster in the file.


The entries under index 0 and 1 are reserved. Index 0 is used as a value in other entries signifying that the given cluster is free, with the corresponding first entry in the table holding the value of the BPB_Media field in its low 8 bits and 0xf in its top 4 bits. For example, if BPB_Media is 0xF8, then the zeroth entry should hold the value 0xFF8. The second entry (index 1) is unused but must hold the value 0xFFF.
''' Bytes (14 - 15) '''


FAT12 uses an entry size that is not evenly divisible by 8 bits. This has some consequences.
These two Bytes, 01 and 00, indicate the number of reserved sectors. Again you must reverse the Bytes to 00 01. There is one reserved sector.


First is storage in the table. Consider successive entries with values 0x123 and 0x456. In the bytes of the table, they'll be stored 0x23 0x61 0x45. Note that if you do little-endian 16-bit loads, you get 0x6123 at offset 0 and 0x4561 at offset 1, letting you recover the original two entry values with the shifts, masks, and offsets seen in the above code block.
''' Byte 16 '''


The second is that, as seen above with the offsets used being 0 and 1, those word bytes might not be 16-bit aligned. That usually just means the x86 takes a slower path to load the word if you do e.g. <tt>*(unsigned short *)bytes</tt>, but if you're use something like UBSan to avoid undefined behavior, those UB-catching routines can be triggered (usually resulting in a panic) if you don't load the two bytes separately and stick them together yourself.
This is the first Byte of the second row and it indicates the number of FAT's on the diskette. There are two.


The third consequence is that the word bytes might not be *sector* aligned. Which means if your code loads a single sector of the table, it needs a special case where it loads two if the entry straddles the sector-size boundary. Or you can just load two sectors every time as seen above.
''' Bytes (17 -18) '''


==== FAT 16 ====
This indicates the number of directory entries. Reversing the Bytes E0 00 to 00 E0 and converting the number to decimal we have 224 directory entries.
FAT 16 uses 16 bits to address the clusters on the disk. Because of this, it is much easier to extract the values out of a 16 bit File Allocation Table. Here is how it is done:
<syntaxhighlight lang="C">
unsigned char FAT_table[sector_size];
unsigned int fat_offset = active_cluster * 2;
unsigned int fat_sector = first_fat_sector + (fat_offset / sector_size);
unsigned int ent_offset = fat_offset % sector_size;


//at this point you need to read from sector "fat_sector" on the disk into "FAT_table".
''' Bytes (19 - 20) '''


unsigned short table_value = *(unsigned short*)&FAT_table[ent_offset];
The total sectors in the logical volume. Reversing 40 and 0B to 0B 40 and converting the number to decimal we have 2880 sectors in the logical volume. If that value is 0, it means there are more than 65535 sectors in the volume, and the actual count is stored in "Large Sectors (bytes 32-35).


//the variable "table_value" now has the information you need about the next cluster in the chain.
''' Byte 21 '''
</syntaxhighlight>
If "table_value" is greater than or equal to (>=) 0xFFF8 then there are no more clusters in the chain. This means that the whole file has been read. If "table_value" equals (==) 0xFFF7 then this cluster has been marked as "bad". "Bad" clusters are prone to errors and should be avoided. If "table_value" is not one of the above cases then it is the cluster number of the next cluster in the file.


The entries under index 0 and 1 are reserved. The zeroth entry is reserved because index 0 is used as value of other entries signifying that the given cluster is free. Zeroth entry has to hold value of the BPB_Media field from in the low 8 bits, and the rest of the bits have to be set to zero. For example, if BPB_Media is 0xF8, then the zeroth entry should hold the value 0xFFF8. The first entry is reserved for the future and must to hold the value 0xFFFF.
This Byte (F0) indicates the [http://support.microsoft.com/kb/q140418/ media descriptor type], which is here a 1.44MB floppy.


==== FAT 32 and exFAT ====
''' Bytes (22 - 23) '''
FAT 32 uses 28 bits to address the clusters on the disk. The highest 4 bits are reserved. This means that they should be ignored when read and unchanged when written. exFAT uses the full 32 bit to encode sector numbers. Similar to the same operation on a 16 bit FAT:
<syntaxhighlight lang="C">
unsigned char FAT_table[sector_size];
unsigned int fat_offset = active_cluster * 4;
unsigned int fat_sector = first_fat_sector + (fat_offset / sector_size);
unsigned int ent_offset = fat_offset % sector_size;


//at this point you need to read from sector "fat_sector" on the disk into "FAT_table".
These two Bytes, 09 and 00, indicate the number of sectors per FAT. Reversing 09 and 00 to 00 09, we see that we have nine sectors per FAT.


//remember to ignore the high 4 bits.
''' Bytes (24 - 25) '''
unsigned int table_value = *(unsigned int*)&FAT_table[ent_offset];
if (fat32) table_value &= 0x0FFFFFFF;


//the variable "table_value" now has the information you need about the next cluster in the chain.
Number of sectors per track. Reversing the Bytes 12 and 00 to 00 12, there are eighteen sectors per track.
</syntaxhighlight>
If "table_value" is greater than or equal to (>=) 0x0FFFFFF8 (or 0xFFFFFFF8 for exFAT) then there are no more clusters in the chain. This means that the whole file has been read. If "table_value" equals (==) 0x0FFFFFF7 (or 0xFFFFFFF7 for exFAT) then this cluster has been marked as "bad". "Bad" clusters are prone to errors and should be avoided. If "table_value" is not one of the above cases then it is the cluster number of the next cluster in the file.


The entries under index 0 and 1 are reserved. The zeroth entry is reserved because index 0 is used as value of other entries signifying that the given cluster is free. Zeroth entry has to hold value of the BPB_Media field from in the low 8 bits, and the rest of the bits have to be set to zero. For example, if BPB_Media is 0xF8, then the zeroth entry should hold the value 0xFFFFFFF8. The first entry is reserved for the future and must to hold the value 0xFFFFFFFF.
''' Bytes (26 - 27) '''


Note that on exFAT, some files are not written out into the FAT. In the case that a file is fully contiguous, exFAT allows the operating system to encode this information and not update the FAT for this file. Unlike FAT32 therefore, the FAT table is not used for allocation status of a cluster; instead there is an allocation bitmap to handle that. See below under directory entries for that.
These two Bytes indicate the number of heads or sides on the diskette. Reversing the Bytes 02 and 00 to 00 02, we see that there are two sides to the diskette.


=== Directories on FAT12/16/32 ===
''' Bytes (28 - 29) '''
A directory entry simply stores the information needed to know where a file's data or a folder's children are stored on the disk. It also holds information such as the entry's name, size, and creation time. There are two types of directories in a FAT file system. Standard 8.3 directory entries, which appear on all FAT file systems, and Long File Name directory entries which are optionally present to allow for longer file names.


==== Standard 8.3 format ====
Number of hidden sectors. Both Bytes read zero, no hidden sectors.

''' Byte 30 '''

Start of bootstrap routine is zero.

=== Directory ===
''ToDo: this information is weak, lacks clarifications about padding (how is A.B exactly encoded), what time and dates refers, etc.''
==== Contents ====
{| {{Wikitable}}
{| {{Wikitable}}
|-
|-
! Offset (in bytes)
! Bytes
! Length (in bytes)
! Meaning
! Meaning
|-
|-
| 0 - 10
| 0
| 11
| File name with extension
| 8.3 file name. The first 8 characters are the name and the last 3 are the extension.
|-
|-
| 11
| 11
| 1
| Attributes of the file
| Attributes of the file. The possible attributes are: <pre>READ_ONLY=0x01 HIDDEN=0x02 SYSTEM=0x04 VOLUME_ID=0x08 DIRECTORY=0x10 ARCHIVE=0x20 LFN=READ_ONLY|HIDDEN|SYSTEM|VOLUME_ID </pre> (LFN means that this entry is a [[#Long_File_Names|long file name entry]])
|-
|-
| 12 - 21
| 12
| 1
| Reserved Bytes
| Reserved for use by Windows NT.
|-
|-
| 13
| 22 - 23
| 1
| Indicate the time
| Creation time in hundredths of a second, although the official FAT Specification from Microsoft says it is tenths of a second. Range 0-199 inclusive. Based on simple tests, Ubuntu16.10 stores either 0 or 100 while Windows7 stores 0-199 in this field.
|-
|-
| 14
| 24 - 25
| 2
| Indicate the date
| The time that the file was created. Multiply Seconds by 2.
|-
| 26 - 27
| Indicate the entry cluster value
|-
| 28 - 31
| Indicate the size of the file
|}

==== Reading the Directory ====

We had a thread where it shows that root directory might not be that simple. It seems like we should read all entries, skipping entries marked as 'volume label' if any. -- PypeClicker

'''Bytes (0 - 10)'''

Starting with the Byte 2620, the first 11 Bytes (0 - 10) is the name of the file with extension. If the 11 byte string is PROCESSATXT, then the 8.3 filename is PROCESSA.TXT since the first 8 bytes of the string comprise the filename and the last 3 are the extension. If the filename is less than 8 bytes or the extension is less than 3, padding spaces are added, e.g. a file name of LOADER.RC would be encoded simply as "LOADER RC " (that's two spaces after LOADER and one after RC).

'''Byte 11'''

This Byte lists the attributes of the file. To read this you must convert the hexadecimal Byte to binary. In this case 20 (hex) is converted to 0010 0000. Each of the eight bits represents an attribute of the file. When a bit is on, indicated by a one, the file has that attribute. Starting with the right most bit, which is the zero bit and working over to the left most bit the 7th bit the attributes are; read only, hidden, system file, volume label, sub-directory, archive, and the last two bits the 6th and 7th bits indicate resolved. In this particular file it is the 5th bit that is on meaning that it is an achieve file.

'''Bytes (12 - 21)'''

These are the reserved Bytes.

'''Bytes (22 - 23)'''

These two Bytes, 4E and 7B, indicate the time the file was made. To retrieve the time reverse the Bytes to 7B 4E and convert to binary 0111 1011 0100 1110. The hour is read from the first five bits, the minutes are read from the next six bits, and the seconds are read from the last five bits. So our time Bytes are read like this 01111 011010 01110. Reading the Bytes; the hour is 15, the minutes are 26, and the seconds are 14. Important, the seconds must be multiplied by 2 to get the true second reading. So the time that the file was created was 15:26:28 military time or 3:26:28 PM.


{| {{Wikitable}}
{| {{Wikitable}}
Line 285: Line 530:
|}
|}


|-
'''Bytes (24 - 25)'''
| 16

| 2
These two Bytes, 96 and 26, indicate the date the file was made. To retrieve the date reverse the Bytes to 26 96 and convert to binary 0010 0110 1001 0110. The year is read from the first seven bits, the month is read from the next four bits, and the day is read from the last five bits. So our date Bytes are read like this 0010011 0100 10110. Reading the Bytes; the year is 19, the month is 4, and the day is 22. The number for the year must be added with 1980 to get the correct year the file was made. So the date that the file was made was April 22, 1999.
| The date on which the file was created.


{| {{Wikitable}}
{| {{Wikitable}}
Line 300: Line 546:
| 5 bits
| 5 bits
|}
|}
'''Bytes (26 -27)'''


|-
These two Bytes, 02 and 00, indicate the entry cluster value for both the FAT and the Open Space Area. More about this in the last two sections (File Allocation Table Entry Cluster Values and Location of File in the Open Space Area).
| 18
| 2
| Last accessed date. Same format as the creation date.
|-
| 20
| 2
| The high 16 bits of this entry's first cluster number. For FAT 12 and FAT 16 this is always zero.
|-
| 22
| 2
| Last modification time. Same format as the creation time.
|-
| 24
| 2
| Last modification date. Same format as the creation date.
|-
| 26
| 2
| The low 16 bits of this entry's first cluster number. Use this number to find the first cluster for this entry.
|-
| 28
| 4
| The size of the file in bytes.
|}


==== Long File Names ====
'''Bytes (28 - 31)'''
Long file name entries ''always'' have a regular 8.3 entry to which they belong. The long file name entries are always placed immediately before their 8.3 entry. Here is the format of a long file name entry.
{| {{Wikitable}}
|-
! Offset (in bytes)
! Length (in bytes)
! Meaning
|-
| 0
| 1
| The order of this entry in the sequence of long file name entries. This value helps you to know where in the file's name the characters from this entry should be placed.
|-
| 1
| 10
| The first 5, 2-byte characters of this entry.
|-
| 11
| 1
| Attribute. Always equals 0x0F. (the long file name attribute)
|-
| 12
| 1
| Long entry type. Zero for name entries.
|-
| 13
| 1
| Checksum generated of the short file name when the file was created. The short filename can change without changing the long filename in cases where the partition is mounted on a system which does not support long filenames.
|-
| 14
| 12
| The next 6, 2-byte characters of this entry.
|-
| 26
| 2
| Always zero.
|-
| 28
| 4
| The final 2, 2-byte characters of this entry.
|}
Here is an example of what a regular 8.3 entry with one long file name entry preceding it might look like in a hex editor:
<pre>
41 62 00 69 00 6E 00 00 00 FF FF 0F 00 7F FF FF FF FF FF FF FF FF FF FF FF FF 00 00 FF FF FF FF
42 49 4E 20 20 20 20 20 20 20 20 10 00 00 F7 01 D5 38 D5 38 00 00 F7 01 D5 38 03 00 00 00 00 00
</pre>
And in a text editor:
<pre>
Ab.i.n..........................
BIN ......8.8.....8......
</pre>
The first line is the long file name entry (the second line is the regular 8.3 entry). The very first byte (41) tells us two important pieces of information. First, the one (01) tells us that this is the first long file name entry for the regular 8.3 entry. Second the forty (40) part tells us that this is also the ''last'' long file name entry for this regular 8.3 entry. The next 10 bytes spell out the first part of the long file name. In this case they read:
<pre>
b 00 i 00 n 00 00 00 FF FF
</pre>
Notice that each character is two bytes long and that the name is null terminated. The two FF's at the end are the padding at the end of the long file name. This is also what the other FF's in the long file name entry are.
The final important thing to notice about the long file name entry is it's attribute byte at offset 11. the 0x0F attribute allows us to verify that this is indeed a long file name entry.


=== Directories on exFAT ===
These four Bytes 1B, 0C, 00, 00 indicate the size of the file. Reversing the Bytes to 00 00 0C 1B and converting the number to decimal the size of the file is 3099 Bytes.
exFAT redesigned these directory entries from the ground up.


{| {{Wikitable}}
=== Finding the Beginning of the Boot, FAT, Directory, and Open Space ===
|-
====Boot Sector====
! Offset (in bytes)
! Length (in bytes)
! Meaning
|-
| 0
| 1
| Entry type
|-
| 1
| 31
| Rest of entry.
|}


The base for every entry is that they are all still 32 bytes, and they all start with the type in the first byte. The types I've encountered that are relevant for reading files from disk:
as stated in the introduction the Boot Sector is always placed in logical sector number (LSN) 0, 0000.


====File Allocation Table (FAT)====
==== File entry ====
{| {{Wikitable}}
|-
! Offset (in bytes)
! Length (in bytes)
! Meaning
|-
| 0
| 1
| Entry type = 0x85
|-
| 1
| 1
| Count of secondary entries.
|-
| 2
| 2
| Checksum of entry set
|-
| 4
| 2
| File attributes
|-
| 6
| 2
| Reserved
|-
| 8
| 4
| Creation date and time
|-
| 12
| 4
| Modification date and time
|-
| 16
| 4
| Access date and time
|-
| 20
| 1
| Creation time in hundredths of a second (0-199) to be added to the FAT style date/time for more accuracy. See FAT12 entry for format of date/time.
|-
| 21
| 1
| Modification time in hundredths of a second (0-199).
|-
| 22
| 1
| UTC offset for creation time
|-
| 23
| 1
| UTC offset for modification time
|-
| 24
| 1
| UTC offset for access time
|-
| 25
| 7
| Reserved.
|}


==== Stream "extension" entry ====
The File Allocation Table begins after the Boot Sector. To find the starting Byte, find the length of the Boot Sector which is one sector multiplied by the number of Bytes per sector (Bytes 11 and 12 of the Boot Sector). The File Allocation Table begins at 0200 (hex).
It's called an extension, but it's 100% required to exist directly after the "file" entry.


{| {{Wikitable}}
====Directory====
|-
! Offset (in bytes)
! Length (in bytes)
! Meaning
|-
| 0
| 1
| Entry type = 0xC0
|-
| 1
| 1
| Secondary flags
|-
| 2
| 1
| Reserved
|-
| 3
| 1
| Name length
|-
| 4
| 2
| Name hash
|-
| 6
| 2
| Reserved
|-
| 8
| 8
| Valid data length. When writing large files, exFAT allocates the whole file first, and then incrementally updates this as data is written. Not sure what you're supposed to do with this, if it's not dataLength yell at the user?
|-
| 16
| 4
| Reserved
|-
| 20
| 4
| First cluster.
|-
| 24
| 8
| Data length.
|}


==== File name entry ====
The Directory begins after both the Boot Sector and the File Allocation Tables. To find the starting Byte, find the number File Allocation Tables on the diskette (Byte 16 of the Boot Sector) and multiply this number with the number of sectors per FAT (Bytes 22 and 23 of the Boot Sector). Add this number with the number of Boot Sectors (which is one) to give you the total number of sectors of both the FAT and Boot sectors. Multiply total number of sectors by the number of Bytes per sector (Bytes 11 and 12 of the Boot Sector) giving you the starting Byte of the Directory.
{| {{Wikitable}}
|-
! Offset (in bytes)
! Length (in bytes)
! Meaning
|-
| 0
| 1
| Entry type = 0xC1
|-
| 1
| 1
| flags
|-
| 2
| 30
| File name characters (15 UTF16 code units).
|}


To actually use these, they typically come in the order:
(2 * 9) + 1 = 19 sectors; 19 sectors * 512 Bytes/ sector = 9728 Bytes (decimal) or 2600 Bytes (hex). The start of the Directory is 2600.
====Open Space====


- File entry
The Open Space begins after the directory. To find the beginning of the Open Space you need to find the size of the directory in Bytes and add that to the beginning Byte of the Directory (2600). To find the size of the directory multiply the number of directory entries (Bytes 17 and 18 of the Boot Sector) by the Bytes per directory entries which in this case is given at 32 Bytes/directory entry of data (decimal).
- Stream extension entry
- File name entry
- (Additional file name entries)


The file entry has the file metadata info, the stream extension tells you how it's stored and the file name entries tell you what it's called. There is no 8.3 name any more.
224 directory entries * 32 Bytes/ directory entry = 7168 Bytes (decimal) or 1C00 (hex)


When reading the file, the second bit in the stream extension secondary flags indicates if it's stored as extent, or if you need to use the FAT table. If it is set, the file is contiguous and the FAT is not up to date, if it is clear, the FAT is accurate and needs to be used (but could still say it's contiguous).
1C00 Bytes (hex) + 2600 Bytes (hex) = 4200 Bytes (hex). The start of the Open Space is 4200.


=== File Allocation Table Entry Cluster Values ===
==== Long File Names ====
Long file name entries ''always'' have a regular 8.3 entry to which they belong. The long file name entries are always placed immediately before their 8.3 entry. Here is the format of a long file name entry.
{| {{Wikitable}}
|-
! Offset (in bytes)
! Length (in bytes)
! Meaning
|-
| 0
| 1
| The order of this entry in the sequence of long file name entries. This value helps you to know where in the file's name the characters from this entry should be placed.
|-
| 1
| 10
| The first 5, 2-byte characters of this entry.
|-
| 11
| 1
| Attribute. Always equals 0x0F. (the long file name attribute)
|-
| 12
| 1
| Long entry type. Zero for name entries.
|-
| 13
| 1
| Checksum generated of the short file name when the file was created. The short filename can change without changing the long filename in cases where the partition is mounted on a system which does not support long filenames.
|-
| 14
| 12
| The next 6, 2-byte characters of this entry.
|-
| 26
| 2
| Always zero.
|-
| 28
| 4
| The final 2, 2-byte characters of this entry.
|}


Starting with the entry cluster value (Bytes 26 and 27 in the Directory), find the values (02 and 00) and reverse them to read 00 02 (hex). The result being **2 (hex or decimal).


== Programming Guide ==
Because, this result, the value of 2 is the same for both hexadecimal and decimal converting to decimal is not necessary, just remember that this next step is in decimal. Multiply this number 2 by 1.5 giving the number 3 (decimal) or 3 (hex). Now, go to the File Allocation table and retrieve the 3rd (0203) and 4th (0204) Bytes.¨ Remember to start your counting from zero. Take the two Bytes 03 and 40 and reverse them to 40 03. Because 3 is a whole integer, AND the binary value of the hexadecimal number of 4003 (0100 0000 0000 0011) to the binary value of the hexadecimal number 0FFF (0000 1111 1111 1111). The result is **3 (hex or decimal).
This section is intended to give you information about common functions that are preformed on a FAT file system.


=== Reading the Boot Sector ===
Convert the result above to decimal (if necessary) and multiply by 1.5 giving 4.5. So now we extract the 4th (0204) and 5th (0205) numbers from the File Allocation Table which are 40 and 00. Reverse the hexadecimal numbers to read 00 40. Because 4.5 is not a whole integer we right shift 0040 to read 0004. The result being **4 (hex or decimal).
The Boot Sector is always placed at logical sector number zero. You can either read the boot sector into an array and access it's members that way or you can read it into a structure and access it through the structure. Either way, the values in the boot sector need to be readily available in order to do much of anything with a FAT file system.


Here is an example of some boot sector structures in C.
Multiply 4 (decimal) by 1.5 giving 6. Now we go to the 6th (0206) and 7th (0207) number in the File Allocation Table which are 05 and 60. Reverse the numbers to read 60 05. Because 6 is a whole integer, AND the binary value of the hexadecimal number of 6005 (0110 0000 0000 0101) to the binary value of the hexadecimal number 0FFF (0000 1111 1111 1111). The result is **5 (hex or decimal).
<syntaxhighlight lang="C">
typedef struct fat_extBS_32
{
//extended fat32 stuff
unsigned int table_size_32;
unsigned short extended_flags;
unsigned short fat_version;
unsigned int root_cluster;
unsigned short fat_info;
unsigned short backup_BS_sector;
unsigned char reserved_0[12];
unsigned char drive_number;
unsigned char reserved_1;
unsigned char boot_signature;
unsigned int volume_id;
unsigned char volume_label[11];
unsigned char fat_type_label[8];


}__attribute__((packed)) fat_extBS_32_t;
Multiply 5 (decimal) by 1.5 giving 7.5. Now we read the 7th (0207) and 8th (0208) numbers in the File Allocation Table which are 60 and 00. Reversing the numbers we have 00 60. Because 7.5 is a fractional number we right shift 0060 to read 0006. The result is **6 (hex or decimal).


typedef struct fat_extBS_16
Multiply 6 (decimal) by 1.5 giving 9. Reading the 9th (0209) and 10th (0210) numbers in the File Allocation Table which are 07 and 80. Reverse the numbers to read 80 07. Because 9 is a whole integer, AND the binary value of the hexadecimal number of 8007 (1000 0000 0000 0111) to the binary value of the hexadecimal number 0FFF (0000 1111 1111 1111). The result is **7 (hex or decimal).
{
//extended fat12 and fat16 stuff
unsigned char bios_drive_num;
unsigned char reserved1;
unsigned char boot_signature;
unsigned int volume_id;
unsigned char volume_label[11];
unsigned char fat_type_label[8];
}__attribute__((packed)) fat_extBS_16_t;


typedef struct fat_BS
Take the decimal result above and multiply by 1.5 giving 10.5. Read the 10th (0210) and 11th (0211) Bytes in the File Allocation Table, the numbers are 80 and 00. Reversing the numbers we have 00 80. Because 10.5 is not a whole integer we right shift 0080 to 0008. The result is **8 (hex or decimal).
{
unsigned char bootjmp[3];
unsigned char oem_name[8];
unsigned short bytes_per_sector;
unsigned char sectors_per_cluster;
unsigned short reserved_sector_count;
unsigned char table_count;
unsigned short root_entry_count;
unsigned short total_sectors_16;
unsigned char media_type;
unsigned short table_size_16;
unsigned short sectors_per_track;
unsigned short head_side_count;
unsigned int hidden_sector_count;
unsigned int total_sectors_32;
//this will be cast to it's specific type once the driver actually knows what type of FAT this is.
unsigned char extended_section[54];
}__attribute__((packed)) fat_BS_t;
</syntaxhighlight>
Important pieces of information that can be extracted from the boot sector include:


'''Total sectors in volume (including VBR):'''
'' This number in decimal form is used also to calculate the Location of File in Open Space Area (Next Section). ''
<syntaxhighlight lang="C">
total_sectors = (fat_boot->total_sectors_16 == 0)? fat_boot->total_sectors_32 : fat_boot->total_sectors_16;
</syntaxhighlight>


'''FAT size in sectors:'''
Multiply 8 (decimal) by 1.5 giving 12. Now extract the 12th (0212) and 13th (0213) Bytes in the File Allocation Table. These numbers are FF 0F. Reverse the numbers to read 0F FF. This value 0FFF (hex) indicates the end of this file.
<syntaxhighlight lang="C">
fat_size = (fat_boot->table_size_16 == 0)? fat_boot_ext_32->table_size_16 : fat_boot->table_size_16;
</syntaxhighlight>


'''The size of the root directory (unless you have FAT32, in which case the size will be 0):'''
=== Location of File in Open Space Area ===
<syntaxhighlight lang="C">
root_dir_sectors = ((fat_boot->root_entry_count * 32) + (fat_boot->bytes_per_sector - 1)) / fat_boot->bytes_per_sector;
</syntaxhighlight>
This calculation will round up. 32 is the size of a FAT directory in bytes.


To find the location of the file in the Open Space Area take the decimal results of the File Allocation Table entry cluster values, as denoted by the double asterisks, and subtract 2. Then multiply by the number of Bytes per sector, which is indicated in Bytes 11 and 12 in the Boot Sector. In this case the Bytes per sector value is 512 (decimal). Finally take this value in Bytes, convert it to hexadecimal, and add it onto the starting location of the Open Space Area, which in this case is 4200.


'''The first data sector (that is, the first sector in which directories and files may be stored):'''
(2 - 2) sectors * 512 Bytes per sector = 0 Bytes (decimal) 0 Bytes (hex) + 4200 Bytes = 4200 Entry value of the first cluster is 4200.
<syntaxhighlight lang="C">
first_data_sector = fat_boot->reserved_sector_count + (fat_boot->table_count * fat_size) + root_dir_sectors;
</syntaxhighlight>


(3 - 2) sectors * 512 Bytes per sector = 512 Bytes (decimal) 200 Bytes (hex) + 4200 Bytes = 4400 Entry value of the second cluster is 4400.


'''The first sector in the File Allocation Table:'''
(4 - 2) sectors * 512 Bytes per sector = 1024 Bytes (decimal) 400 Bytes (hex) + 4200 Bytes = 4600 Entry value of the third cluster is 4600.
<syntaxhighlight lang="C">
first_fat_sector = fat_boot->reserved_sector_count;
</syntaxhighlight>


(5 - 2) sectors * 512 Bytes per sector = 1536 Bytes (decimal) 600 Bytes (hex) + 4200 Bytes = 4800 Entry value of the fourth cluster is 4800.


'''The total number of data sectors:'''
(6 - 2) sectors * 512 Bytes per sector = 2048 Bytes (decimal) 800 Bytes + 4200 Bytes = 4A00 Entry value of fifth cluster is 4A00.
<syntaxhighlight lang="C">
data_sectors = total_sectors - (fat_boot->reserved_sector_count + (fat_boot->table_count * fat_size) + root_dir_sectors);
</syntaxhighlight>


(7 - 2) sectors * 512 Bytes per sector = 2560 Bytes (decimal) A00 Bytes + 4200 Bytes = 4C00 Entry value of sixth cluster is 4C00.


'''The total number of clusters:'''
(8- 2) sectors * 512 Bytes per sector = 3072 Bytes (decimal) C00 Bytes + 4200 Bytes = 4E00 Entry value of seventh cluster is 4E00.
<syntaxhighlight lang="C">
total_clusters = data_sectors / fat_boot->sectors_per_cluster;
</syntaxhighlight>
This rounds down.


'''The FAT type of this file system:'''
== Links ==
<syntaxhighlight lang="C">
if (sectorsize == 0)
{
fat_type = ExFAT;
}
else if(total_clusters < 4085)
{
fat_type = FAT12;
}
else if(total_clusters < 65525)
{
fat_type = FAT16;
}
else
{
fat_type = FAT32;
}
</syntaxhighlight>


=== Reading Directories ===
* http://scottie.20m.com/fat.htm
The first step in reading directories is finding and reading the root directory. On a FAT 12 or FAT 16 volumes the root directory is at a fixed position immediately after the File Allocation Tables:
<syntaxhighlight lang="C">
first_root_dir_sector = first_data_sector - root_dir_sectors;
</syntaxhighlight>

In FAT32 and exFAT, root directory appears in data area on given cluster and can be a cluster chain. In exFAT it cannot be encoded as extent and will always be present in the FAT.
<syntaxhighlight lang="C">
root_cluster_32 = extBS_32->root_cluster;
</syntaxhighlight>

For each given cluster number we can calculate the first sector of it (relative to the partition's offset):
<syntaxhighlight lang="C">
first_sector_of_cluster = ((cluster - 2) * fat_boot->sectors_per_cluster) + first_data_sector;
</syntaxhighlight>

After the correct cluster has been loaded into memory, the next step is to read and parse all of the entries in it. Each entry is 32 bytes long. For each 32 byte entry this is the flow of execution:
# If the first byte of the entry is equal to 0 then there are no more files/directories in this directory. FirstByte==0, finish. FirstByte!=0, goto 2.
# If the first byte of the entry is equal to 0xE5 then the entry is unused. FirstByte==0xE5, goto 8, FirstByte!=0xE5, goto 3.
# Is this entry a long file name entry? If the 11'th byte of the entry equals 0x0F, then it is a long file name entry. Otherwise, it is not. 11thByte==0x0F, goto 4. 11thByte!=0x0F, goto 5.
# Read the portion of the long filename into a temporary buffer. Goto 8.
# Parse the data for this entry using the table from further up on this page. It would be a good idea to save the data for later. Possibly in a virtual file system structure. goto 6
# Is there a long file name in the temporary buffer? Yes, goto 7. No, goto 8
# Apply the long file name to the entry that you just read and clear the temporary buffer. goto 8
# Increment pointers and/or counters and check the next entry. (goto number 1)

This process should be repeated until all of the entries have been read from the cluster. You should then check to see if there is another cluster following this one in the cluster chain or if this is the last cluster in the chain. See [[#Following_Cluster_Chains|section below]] and [[#File_Allocation_Table|FAT]] section for more information. You should do the above process for each cluster in the chain, following it until there are no more clusters left in the chain. Then you can check if any of the entries that you just read are directories. If the are they should each be read in the same way starting with their first cluster number which is stored in the entry.

=== Following Cluster Chains ===
There are two basic steps to following cluster chains. The first step is to find out if there is another "link" (cluster) following the current one in the chain. The second step is to actually use the value read from the FAT to read the next sector. Here is the basic idea:
# Extract the value from the FAT for the _current_ cluster. (Use the previous section on the File Allocation Table for details on how exactly to extract the value.) goto number 2
# Is this cluster marked as the last cluster in the chain? (again, see the above section for more details) Yes, goto number 4. No, goto number 3
# Read the cluster represented by the extracted value and return for more directory parsing.
# The end of the cluster chain has been found. Our work here is finished. :)

=== Reading extents ===
On exFAT, files can have a bit set in their flags that indicate it is stored as extent-based. This means that the whole file is contiguous, and that the file size plus the first cluster indicate where the (whole) file is. The FAT entries will contain garbage and are not to be trusted.

To read this, do the same calculation as above, except you may in every step assume that the next cluster is the numerically next cluster and that enough sectors have been allocated for the file size.

== Creating a fresh FAT filesystem ==
Typically during development you want to create a disk image with a FAT filesystem. There are two common approaches for this, either by using a utility that works directly on images, or by using a [[Loopback Device]] and using the OS' own driver to work on the image. A less common alternative is to have an actual disk in your drive.

The most buildscript-friendly tool is [[MTools]] - which can do all operations directly on a disk image using the -i argument and supplies every DOS command related to files in this fashion, only prefixed with an <tt>m</tt>. It can also use a configuration file to access drives in their DOS fashion, allowing you to use for instance A: and C: as actual drives. The tool can be built out of the box for Windows and is included in many a linux package manager.

Linux-only developers can, often with a bit of sudo and permission magic, automate the [[Loopback Device]] in combination with <tt>mkdosfs</tt> or <tt>mkfs.vfat</tt> as well as partition editing. This method is less portable as the commands often can't be reused outside of Linux. Several developers also make the error of passing -F to mkdosfs in an attempt to choose a FAT size, which often has the effect of creating a corrupt filesystem since the result doesn't follow [[#Reading_the_Boot_Sector|the official rule for FAT sizes anymore]].

Windows users can make use of [[Virtual Floppy Drive|VFD]] for loopback devices. It comes with a GUI, but at the cost of not being properly automatable in a script.

==See Also==

=== Threads ===
* from raw bits to directory listing (code posted) in the [[Topic:11247|forum]]
* from raw bits to directory listing (code posted) in the [[Topic:11247|forum]]
* Public Domain FAT32 code in the [[Topic:13993|forum]]
* FAT12/FAT16 bootsector code in the [[Topic:21155|forum]]


=== External Links ===
* [http://www.osdever.net/downloads/docs/fatgen103.zip FAT32 File System Specification] - from Cottontail OS Development Library
* [http://download.microsoft.com/download/1/6/1/161ba512-40e2-4cc9-843a-923143f3456c/fatgen103.doc FAT32 File System Specification] - from Microsoft (documentation, but in tutorial style with many code examples)
* [http://board.flatassembler.net/topic.php?t=12680 About an error in the above specification]
* http://scottie.20m.com/fat.htm
* http://www.maverick-os.dk/FileSystemFormats/FAT12_FileSystem.html
* http://www.pjrc.com/tech/8051/ide/fat32.html
* [http://web.archive.org/web/20170112194555/http://www.viralpatel.net/taj/tutorial/fat.php Intro into Sectors and Addressing]
* http://elm-chan.org/fsw/ff/00index_e.html - simple (V)FAT12/16/32 read/write library with good documentation
* http://gitorious.org/unix-stuff/fat-util - Utility to read, remove and extract files on FAT12, 16 and 32
* http://www.larwe.com/zws/products/dosfs/index.html - fat12/16/32 compatible fs driver
* http://www.isdaman.com/alsos/protocols/fats/nowhere/FAT.HTM


[[Category:Filesystems]]
[[Category:Filesystems]]
[[de:FAT]]

Latest revision as of 05:14, 9 June 2024

Filesystems
Virtual Filesystems

VFS

Disk Filesystems
CD/DVD Filesystems
Network Filesystems
Flash Filesystems

The File Allocation Table (FAT) was the native file system of MS-DOS. FAT was originally introduced by Marc McDonald in Stand-alone Disk BASIC with 8-bit FAT entries and 16 byte directory entries. The better known FAT12 variant, with 12-bit FAT entries and 32 byte directory entries, was introduced with DOS. FAT is a very simple file system -- nothing more than a singly-linked list of clusters in a gigantic table. A FAT file system uses very little memory (unless the OS caches the whole allocation table in memory) and is one of, if not the, most basic file system in use today.

Overview

There are several different versions of the FAT file system. Each version was designed for a different size of storage media.

FAT 12

FAT 12 was designed for floppy disks and can manage a maximum size of 16 megabytes because it uses 12 bits to address the clusters.

FAT 16

FAT 16 was designed for early hard disks and could handle a maximum size of 64K clusters * the cluster size. The larger the hard disk, the larger the cluster size would be, which leads to large amounts of "slack space" on the disk.

FAT 32

FAT 32 was introduced to us by Windows95-B and Windows98. FAT32 solved some of FAT's problems. No more 64K max clusters! Although FAT32 uses 32 bits per FAT entry, only the bottom 28 bits are actually used to address clusters on the disk (top 4 bits are reserved). With 28 bits per FAT entry, the filesystem can address a maximum of about 270 million clusters in a partition. This enables very large hard disks to still maintain reasonably small cluster sizes and thus reduce slack space between files.

ExFAT

Main article: ExFAT

ExFAT is the filesystem used on SDXC cards, created by Microsoft. It is FAT32 with actually 32 bits per FAT entry, with the ability to indicate a file is fully consecutive on disk (allowing you to skip reading the FAT), some more advanced features and a fully redesigned file entry system. Since it's so similar to FAT32, please merge any bits of info from the exFAT article into this one.

Microsoft has published the official specification at https://docs.microsoft.com/en-us/windows/win32/fileio/exfat-specification .

VFAT

VFAT is an extension to the FAT file system that has the ability to use long filenames (up to 255 characters). First introduced by Windows 95, it uses a "kludge" whereby long filenames are marked with a "volume label" attribute and filenames are subsequently stored in 11 byte chunks in sequential directory entries. (This is a bit of an oversimplification, but close enough).

Implementation Details

The FAT file system views the storage media as a flat array of clusters. If the physical media does not address its data as a flat list of sectors (really old hard disks and floppy disks) then the cluster numbers will need to be translated before being sent to the disk. The storage media is organized into three basic areas.

  • The boot record
  • The File Allocation Table (FAT)
  • The directory and data area

Boot Record

The boot record occupies one sector, and is always placed in logical sector number zero of the "partition". If the media is not divided into partitions, then this is the beginning of the media. This is the easiest sector on the partition for the computer to locate when it is loaded. If the storage media is partitioned (such as a hard disk), then the beginning of the actual media contains an MBR (x86) or other form of partition information. In this case each partition's first sector holds a Volume Boot Record.

BPB (BIOS Parameter Block)

The boot record contains both code and data, mixed together. The data that isn't code is known as the BPB.

Offset (decimal) Offset (hex) Size (in bytes) Meaning
0 0x00 3 The first three bytes EB 3C 90 disassemble to JMP SHORT 3C NOP. (The 3C value may be different.) The reason for this is to jump over the disk format information (the BPB and EBPB). Since the first sector of the disk is loaded into ram at location 0x0000:0x7c00 and executed, without this jump, the processor would attempt to execute data that isn't code. Even for non-bootable volumes, code matching this pattern (or using the E9 jump opcode) is required to be present by both Windows and OS X. To fulfil this requirement, an infinite loop can be placed here with the bytes EB FE 90.
3 0x03 8 OEM identifier. The first 8 Bytes (3 - 10) is the version of DOS being used. The next eight Bytes 29 3A 63 7E 2D 49 48 and 43 read out the name of the version. The official FAT Specification from Microsoft says that this field is really meaningless and is ignored by MS FAT Drivers, however it does recommend the value "MSWIN4.1" as some 3rd party drivers supposedly check it and expect it to have that value. Older versions of dos also report MSDOS5.1, linux-formatted floppy will likely to carry "mkdosfs" here, and FreeDOS formatted disks have been observed to have "FRDOS5.1" here. If the string is less than 8 bytes, it is padded with spaces.
11 0x0B 2 The number of Bytes per sector (remember, all numbers are in the little-endian format).
13 0x0D 1 Number of sectors per cluster.
14 0x0E 2 Number of reserved sectors. The boot record sectors are included in this value.
16 0x10 1 Number of File Allocation Tables (FAT's) on the storage media. Often this value is 2.
17 0x11 2 Number of root directory entries (must be set so that the root directory occupies entire sectors).
19 0x13 2 The total sectors in the logical volume. If this value is 0, it means there are more than 65535 sectors in the volume, and the actual count is stored in the Large Sector Count entry at 0x20.
21 0x15 1 This Byte indicates the media descriptor type.
22 0x16 2 Number of sectors per FAT. FAT12/FAT16 only.
24 0x18 2 Number of sectors per track.
26 0x1A 2 Number of heads or sides on the storage media.
28 0x1C 4 Number of hidden sectors. (i.e. the LBA of the beginning of the partition.)
32 0x20 4 Large sector count. This field is set if there are more than 65535 sectors in the volume, resulting in a value which does not fit in the Number of Sectors entry at 0x13.

Note: the "geometry" of the media (sectors per track, heads, and perhaps the number of bytes in a sector) is not necessarily known correctly by the program that originally formats the media. Also, if the media is moved (from the computer that formatted it) to another machine with a different BIOS -- then the new BIOS may specify a different geometry for the same media. So it is generally a very bad idea to trust the "SPT" or "heads" numbers. Get them from the BIOS instead, if possible.

Note2: many of the values in the BPB are not correctly "aligned". That is, word-sized values are not stored on word ("even" address) boundaries. On some architectures, accessing misaligned words may cause the code to crash. Making a copy of the BPB (somewhere else in memory and shifted up one byte) may solve the problem.

Extended Boot Record

The extended boot record information comes right after the BPB. The data at the beginning is known as the EBPB. It contains different information depending on whether this partition is a FAT 12, FAT 16, or FAT 32 filesystem. Immediately following the EBPB is the actual boot code, then the standard 0xAA55 boot signature, to fill out the 512-byte boot sector. Offsets shows are from the start of the standard boot record.

FAT 12 and FAT 16
Offset (decimal) Offset (hexadecimal) Length (in bytes) Meaning
36 0x024 1 Drive number. The value here should be identical to the value returned by BIOS interrupt 0x13, or passed in the DL register; i.e. 0x00 for a floppy disk and 0x80 for hard disks. This number is useless because the media is likely to be moved to another machine and inserted in a drive with a different drive number.
37 0x025 1 Flags in Windows NT. Reserved otherwise.
38 0x026 1 Signature (must be 0x28 or 0x29).
39 0x027 4 VolumeID 'Serial' number. Used for tracking volumes between computers. You can ignore this if you want.
43 0x02B 11 Volume label string. This field is padded with spaces.
54 0x036 8 System identifier string. This field is a string representation of the FAT file system type. It is padded with spaces. The spec says never to trust the contents of this string for any use.
62 0x03E 448 Boot code.
510 0x1FE 2 Bootable partition signature 0xAA55.
FAT 32
Offset (decimal) Offset (hexadecimal) Length (in bytes) Meaning
36 0x024 4 Sectors per FAT. The size of the FAT in sectors.
40 0x028 2 Flags.
42 0x02A 2 FAT version number. The high byte is the major version and the low byte is the minor version. FAT drivers should respect this field.
44 0x02C 4 The cluster number of the root directory. Often this field is set to 2.
48 0x030 2 The sector number of the FSInfo structure.
50 0x032 2 The sector number of the backup boot sector.
52 0x034 12 Reserved. When the volume is formated these bytes should be zero.
64 0x040 1 Drive number. The values here are identical to the values returned by the BIOS interrupt 0x13. 0x00 for a floppy disk and 0x80 for hard disks.
65 0x041 1 Flags in Windows NT. Reserved otherwise.
66 0x042 1 Signature (must be 0x28 or 0x29).
67 0x043 4 Volume ID 'Serial' number. Used for tracking volumes between computers. You can ignore this if you want.
71 0x047 11 Volume label string. This field is padded with spaces.
82 0x052 8 System identifier string. Always "FAT32   ". The spec says never to trust the contents of this string for any use.
90 0x05A 420 Boot code.
510 0x1FE 2 Bootable partition signature 0xAA55.


FSInfo Structure (FAT32 only)

Offset (decimal) Offset (hexadecimal) Length (in bytes) Meaning
0 0x0 4 Lead signature (must be 0x41615252 to indicate a valid FSInfo structure)
4 0x4 480 Reserved, these bytes should never be used
484 0x1E4 4 Another signature (must be 0x61417272)
488 0x1E8 4 Contains the last known free cluster count on the volume. If the value is 0xFFFFFFFF, then the free count is unknown and must be computed. However, this value might be incorrect and should at least be range checked (<= volume cluster count)
492 0x1EC 4 Indicates the cluster number at which the filesystem driver should start looking for available clusters. If the value is 0xFFFFFFFF, then there is no hint and the driver should start searching at 2. Typically this value is set to the last allocated cluster number. As the previous field, this value should be range checked.
496 0x1F0 12 Reserved
508 0x1FC 4 Trail signature (0xAA550000)

exFat boot record

For exFAT the whole boot record was recreated from scratch instead of extending the existing FAT12/16/32 boot records even further. You can recognize exFAT by noticing that in the FAT12/16/32 boot record, the "bytes per sector" is zero.

Offset (decimal) Offset (hex) Size (in bytes) Meaning
0 0x00 3 The first three bytes EB 3C 90 disassemble to JMP SHORT 3C NOP. (The 3C value may be different.) The reason for this is to jump over the disk format information (the BPB and EBPB). Since the first sector of the disk is loaded into ram at location 0x0000:0x7c00 and executed, without this jump, the processor would attempt to execute data that isn't code. Even for non-bootable volumes, code matching this pattern (or using the E9 jump opcode) is required to be present by both Windows and OS X. To fulfil this requirement, an infinite loop can be placed here with the bytes EB FE 90.
3 0x03 8 OEM identifier. This contains the string "EXFAT ". Not to be used for filesystem determination, but it's a nice hint.
11 0x0B 53 Set to zero. This makes sure any FAT driver will not be able to load it.
64 0x40 8 Partition offset. No idea why the partition itself would have this, but it's here. Might be wrong. Probably best to just ignore.
72 0x48 8 Volume length.
80 0x50 4 FAT offset (in sectors) from start of partition.
84 0x54 4 FAT length (in sectors).
88 0x58 4 Cluster heap offset (in sectors).
92 0x5C 4 Cluster count
96 0x60 4 Root directory cluster. Typically 4 (but just read this value).
100 0x64 4 Serial number of partition.
104 0x68 2 Filesystem revision
106 0x6A 2 Flags
108 0x6C 1 Sector shift
109 0x6D 1 Cluster shift
110 0x6E 1 Number of FATs
111 0x6F 1 Drive select
112 0x70 1 Percentage in use
113 0x71 7 Reserved (set to 0).

To read the filesystem, find out how big a 'sector' and a 'cluster' are. A sector is (1 << sectorshift) bytes, a cluster is (1 << (sectorshift + clustershift)) bytes. Then, find the start of the FAT and the start of the cluster heap (note that the first cluster is *still* cluster 2).

    
// This allows you to zero-index clusters:
uint64_t clusterArray = clusterheapoffset * sectorsize - 2 * clustersize;
uint64_t fatOffset = fatoffset * sectorsize;
uint64_t usablespace = clustercount * clustersize;

Note that all values in the BPB are now naturally aligned and that this code is *significantly* simpler than FAT32's BPB reading.

File Allocation Table

The File Allocation Table (FAT) is a table stored on the storage media that indicates the status and location of all data clusters that are on the disk. It can be considered the "table of contents" of a disk. The cluster may be available for use, it may be reserved by the operating system, it may be unavailable due to a bad sector on the disk, or it may be in use by a file. The clusters of a file need not be right next to each other on the disk. In fact it is likely that they are scattered widely throughout the disk. The FAT allows the operating system to follow the "chain" of clusters in a file.

FAT 12

FAT 12 uses 12 bits to address the clusters on the disk. Each 12 bit entry in the FAT points to the next cluster of a file on the disk. Given a valid cluster number, here is how you extract the value of the next cluster in the cluster chain:

unsigned char FAT_table[sector_size * 2]; // needs two in case we straddle a sector
unsigned int fat_offset = active_cluster + (active_cluster / 2);// multiply by 1.5
unsigned int fat_sector = first_fat_sector + (fat_offset / sector_size);
unsigned int ent_offset = fat_offset % sector_size;

//at this point you need to read two sectors from disk starting at "fat_sector" into "FAT_table".

unsigned short table_value = *(unsigned short*)&FAT_table[ent_offset];

table_value = (active_cluster & 1) ? table_value >> 4 : table_value & 0xfff;

//the variable "table_value" now has the information you need about the next cluster in the chain.

If "table_value" is greater than or equal to (>=) 0xFF8 then there are no more clusters in the chain. This means that the whole file has been read. If "table_value" equals (==) 0xFF7 then this cluster has been marked as "bad". "Bad" clusters are prone to errors and should be avoided. If "table_value" is not one of the above cases then it is the cluster number of the next cluster in the file.

The entries under index 0 and 1 are reserved. Index 0 is used as a value in other entries signifying that the given cluster is free, with the corresponding first entry in the table holding the value of the BPB_Media field in its low 8 bits and 0xf in its top 4 bits. For example, if BPB_Media is 0xF8, then the zeroth entry should hold the value 0xFF8. The second entry (index 1) is unused but must hold the value 0xFFF.

FAT12 uses an entry size that is not evenly divisible by 8 bits. This has some consequences.

First is storage in the table. Consider successive entries with values 0x123 and 0x456. In the bytes of the table, they'll be stored 0x23 0x61 0x45. Note that if you do little-endian 16-bit loads, you get 0x6123 at offset 0 and 0x4561 at offset 1, letting you recover the original two entry values with the shifts, masks, and offsets seen in the above code block.

The second is that, as seen above with the offsets used being 0 and 1, those word bytes might not be 16-bit aligned. That usually just means the x86 takes a slower path to load the word if you do e.g. *(unsigned short *)bytes, but if you're use something like UBSan to avoid undefined behavior, those UB-catching routines can be triggered (usually resulting in a panic) if you don't load the two bytes separately and stick them together yourself.

The third consequence is that the word bytes might not be *sector* aligned. Which means if your code loads a single sector of the table, it needs a special case where it loads two if the entry straddles the sector-size boundary. Or you can just load two sectors every time as seen above.

FAT 16

FAT 16 uses 16 bits to address the clusters on the disk. Because of this, it is much easier to extract the values out of a 16 bit File Allocation Table. Here is how it is done:

unsigned char FAT_table[sector_size];
unsigned int fat_offset = active_cluster * 2;
unsigned int fat_sector = first_fat_sector + (fat_offset / sector_size);
unsigned int ent_offset = fat_offset % sector_size;

//at this point you need to read from sector "fat_sector" on the disk into "FAT_table".

unsigned short table_value = *(unsigned short*)&FAT_table[ent_offset];

//the variable "table_value" now has the information you need about the next cluster in the chain.

If "table_value" is greater than or equal to (>=) 0xFFF8 then there are no more clusters in the chain. This means that the whole file has been read. If "table_value" equals (==) 0xFFF7 then this cluster has been marked as "bad". "Bad" clusters are prone to errors and should be avoided. If "table_value" is not one of the above cases then it is the cluster number of the next cluster in the file.

The entries under index 0 and 1 are reserved. The zeroth entry is reserved because index 0 is used as value of other entries signifying that the given cluster is free. Zeroth entry has to hold value of the BPB_Media field from in the low 8 bits, and the rest of the bits have to be set to zero. For example, if BPB_Media is 0xF8, then the zeroth entry should hold the value 0xFFF8. The first entry is reserved for the future and must to hold the value 0xFFFF.

FAT 32 and exFAT

FAT 32 uses 28 bits to address the clusters on the disk. The highest 4 bits are reserved. This means that they should be ignored when read and unchanged when written. exFAT uses the full 32 bit to encode sector numbers. Similar to the same operation on a 16 bit FAT:

unsigned char FAT_table[sector_size];
unsigned int fat_offset = active_cluster * 4;
unsigned int fat_sector = first_fat_sector + (fat_offset / sector_size);
unsigned int ent_offset = fat_offset % sector_size;

//at this point you need to read from sector "fat_sector" on the disk into "FAT_table".

//remember to ignore the high 4 bits.
unsigned int table_value = *(unsigned int*)&FAT_table[ent_offset];
if (fat32) table_value &= 0x0FFFFFFF;

//the variable "table_value" now has the information you need about the next cluster in the chain.

If "table_value" is greater than or equal to (>=) 0x0FFFFFF8 (or 0xFFFFFFF8 for exFAT) then there are no more clusters in the chain. This means that the whole file has been read. If "table_value" equals (==) 0x0FFFFFF7 (or 0xFFFFFFF7 for exFAT) then this cluster has been marked as "bad". "Bad" clusters are prone to errors and should be avoided. If "table_value" is not one of the above cases then it is the cluster number of the next cluster in the file.

The entries under index 0 and 1 are reserved. The zeroth entry is reserved because index 0 is used as value of other entries signifying that the given cluster is free. Zeroth entry has to hold value of the BPB_Media field from in the low 8 bits, and the rest of the bits have to be set to zero. For example, if BPB_Media is 0xF8, then the zeroth entry should hold the value 0xFFFFFFF8. The first entry is reserved for the future and must to hold the value 0xFFFFFFFF.

Note that on exFAT, some files are not written out into the FAT. In the case that a file is fully contiguous, exFAT allows the operating system to encode this information and not update the FAT for this file. Unlike FAT32 therefore, the FAT table is not used for allocation status of a cluster; instead there is an allocation bitmap to handle that. See below under directory entries for that.

Directories on FAT12/16/32

A directory entry simply stores the information needed to know where a file's data or a folder's children are stored on the disk. It also holds information such as the entry's name, size, and creation time. There are two types of directories in a FAT file system. Standard 8.3 directory entries, which appear on all FAT file systems, and Long File Name directory entries which are optionally present to allow for longer file names.

Standard 8.3 format

Offset (in bytes) Length (in bytes) Meaning
0 11 8.3 file name. The first 8 characters are the name and the last 3 are the extension.
11 1 Attributes of the file. The possible attributes are:
READ_ONLY=0x01 HIDDEN=0x02 SYSTEM=0x04 VOLUME_ID=0x08 DIRECTORY=0x10 ARCHIVE=0x20 LFN=READ_ONLY|HIDDEN|SYSTEM|VOLUME_ID 
(LFN means that this entry is a long file name entry)
12 1 Reserved for use by Windows NT.
13 1 Creation time in hundredths of a second, although the official FAT Specification from Microsoft says it is tenths of a second. Range 0-199 inclusive. Based on simple tests, Ubuntu16.10 stores either 0 or 100 while Windows7 stores 0-199 in this field.
14 2 The time that the file was created. Multiply Seconds by 2.
Hour 5 bits
Minutes 6 bits
Seconds 5 bits
16 2 The date on which the file was created.
Year 7 bits
Month 4 bits
Day 5 bits
18 2 Last accessed date. Same format as the creation date.
20 2 The high 16 bits of this entry's first cluster number. For FAT 12 and FAT 16 this is always zero.
22 2 Last modification time. Same format as the creation time.
24 2 Last modification date. Same format as the creation date.
26 2 The low 16 bits of this entry's first cluster number. Use this number to find the first cluster for this entry.
28 4 The size of the file in bytes.

Long File Names

Long file name entries always have a regular 8.3 entry to which they belong. The long file name entries are always placed immediately before their 8.3 entry. Here is the format of a long file name entry.

Offset (in bytes) Length (in bytes) Meaning
0 1 The order of this entry in the sequence of long file name entries. This value helps you to know where in the file's name the characters from this entry should be placed.
1 10 The first 5, 2-byte characters of this entry.
11 1 Attribute. Always equals 0x0F. (the long file name attribute)
12 1 Long entry type. Zero for name entries.
13 1 Checksum generated of the short file name when the file was created. The short filename can change without changing the long filename in cases where the partition is mounted on a system which does not support long filenames.
14 12 The next 6, 2-byte characters of this entry.
26 2 Always zero.
28 4 The final 2, 2-byte characters of this entry.

Here is an example of what a regular 8.3 entry with one long file name entry preceding it might look like in a hex editor:

41 62 00 69 00 6E 00 00 00 FF FF 0F 00 7F FF FF FF FF FF FF FF FF FF FF FF FF 00 00 FF FF FF FF
42 49 4E 20 20 20 20 20 20 20 20 10 00 00 F7 01 D5 38 D5 38 00 00 F7 01 D5 38 03 00 00 00 00 00

And in a text editor:

Ab.i.n..........................
BIN        ......8.8.....8......

The first line is the long file name entry (the second line is the regular 8.3 entry). The very first byte (41) tells us two important pieces of information. First, the one (01) tells us that this is the first long file name entry for the regular 8.3 entry. Second the forty (40) part tells us that this is also the last long file name entry for this regular 8.3 entry. The next 10 bytes spell out the first part of the long file name. In this case they read:

b 00 i 00 n 00 00 00 FF FF

Notice that each character is two bytes long and that the name is null terminated. The two FF's at the end are the padding at the end of the long file name. This is also what the other FF's in the long file name entry are. The final important thing to notice about the long file name entry is it's attribute byte at offset 11. the 0x0F attribute allows us to verify that this is indeed a long file name entry.

Directories on exFAT

exFAT redesigned these directory entries from the ground up.

Offset (in bytes) Length (in bytes) Meaning
0 1 Entry type
1 31 Rest of entry.

The base for every entry is that they are all still 32 bytes, and they all start with the type in the first byte. The types I've encountered that are relevant for reading files from disk:

File entry

Offset (in bytes) Length (in bytes) Meaning
0 1 Entry type = 0x85
1 1 Count of secondary entries.
2 2 Checksum of entry set
4 2 File attributes
6 2 Reserved
8 4 Creation date and time
12 4 Modification date and time
16 4 Access date and time
20 1 Creation time in hundredths of a second (0-199) to be added to the FAT style date/time for more accuracy. See FAT12 entry for format of date/time.
21 1 Modification time in hundredths of a second (0-199).
22 1 UTC offset for creation time
23 1 UTC offset for modification time
24 1 UTC offset for access time
25 7 Reserved.

Stream "extension" entry

It's called an extension, but it's 100% required to exist directly after the "file" entry.

Offset (in bytes) Length (in bytes) Meaning
0 1 Entry type = 0xC0
1 1 Secondary flags
2 1 Reserved
3 1 Name length
4 2 Name hash
6 2 Reserved
8 8 Valid data length. When writing large files, exFAT allocates the whole file first, and then incrementally updates this as data is written. Not sure what you're supposed to do with this, if it's not dataLength yell at the user?
16 4 Reserved
20 4 First cluster.
24 8 Data length.

File name entry

Offset (in bytes) Length (in bytes) Meaning
0 1 Entry type = 0xC1
1 1 flags
2 30 File name characters (15 UTF16 code units).

To actually use these, they typically come in the order:

- File entry - Stream extension entry - File name entry - (Additional file name entries)

The file entry has the file metadata info, the stream extension tells you how it's stored and the file name entries tell you what it's called. There is no 8.3 name any more.

When reading the file, the second bit in the stream extension secondary flags indicates if it's stored as extent, or if you need to use the FAT table. If it is set, the file is contiguous and the FAT is not up to date, if it is clear, the FAT is accurate and needs to be used (but could still say it's contiguous).

Long File Names

Long file name entries always have a regular 8.3 entry to which they belong. The long file name entries are always placed immediately before their 8.3 entry. Here is the format of a long file name entry.

Offset (in bytes) Length (in bytes) Meaning
0 1 The order of this entry in the sequence of long file name entries. This value helps you to know where in the file's name the characters from this entry should be placed.
1 10 The first 5, 2-byte characters of this entry.
11 1 Attribute. Always equals 0x0F. (the long file name attribute)
12 1 Long entry type. Zero for name entries.
13 1 Checksum generated of the short file name when the file was created. The short filename can change without changing the long filename in cases where the partition is mounted on a system which does not support long filenames.
14 12 The next 6, 2-byte characters of this entry.
26 2 Always zero.
28 4 The final 2, 2-byte characters of this entry.


Programming Guide

This section is intended to give you information about common functions that are preformed on a FAT file system.

Reading the Boot Sector

The Boot Sector is always placed at logical sector number zero. You can either read the boot sector into an array and access it's members that way or you can read it into a structure and access it through the structure. Either way, the values in the boot sector need to be readily available in order to do much of anything with a FAT file system.

Here is an example of some boot sector structures in C.

typedef struct fat_extBS_32
{
	//extended fat32 stuff
	unsigned int		table_size_32;
	unsigned short		extended_flags;
	unsigned short		fat_version;
	unsigned int		root_cluster;
	unsigned short		fat_info;
	unsigned short		backup_BS_sector;
	unsigned char 		reserved_0[12];
	unsigned char		drive_number;
	unsigned char 		reserved_1;
	unsigned char		boot_signature;
	unsigned int 		volume_id;
	unsigned char		volume_label[11];
	unsigned char		fat_type_label[8];

}__attribute__((packed)) fat_extBS_32_t;

typedef struct fat_extBS_16
{
	//extended fat12 and fat16 stuff
	unsigned char		bios_drive_num;
	unsigned char		reserved1;
	unsigned char		boot_signature;
	unsigned int		volume_id;
	unsigned char		volume_label[11];
	unsigned char		fat_type_label[8];
	
}__attribute__((packed)) fat_extBS_16_t;

typedef struct fat_BS
{
	unsigned char 		bootjmp[3];
	unsigned char 		oem_name[8];
	unsigned short 	        bytes_per_sector;
	unsigned char		sectors_per_cluster;
	unsigned short		reserved_sector_count;
	unsigned char		table_count;
	unsigned short		root_entry_count;
	unsigned short		total_sectors_16;
	unsigned char		media_type;
	unsigned short		table_size_16;
	unsigned short		sectors_per_track;
	unsigned short		head_side_count;
	unsigned int 		hidden_sector_count;
	unsigned int 		total_sectors_32;
	
	//this will be cast to it's specific type once the driver actually knows what type of FAT this is.
	unsigned char		extended_section[54];
	
}__attribute__((packed)) fat_BS_t;

Important pieces of information that can be extracted from the boot sector include:

Total sectors in volume (including VBR):

total_sectors = (fat_boot->total_sectors_16 == 0)? fat_boot->total_sectors_32 : fat_boot->total_sectors_16;

FAT size in sectors:

fat_size = (fat_boot->table_size_16 == 0)? fat_boot_ext_32->table_size_16 : fat_boot->table_size_16;

The size of the root directory (unless you have FAT32, in which case the size will be 0):

root_dir_sectors = ((fat_boot->root_entry_count * 32) + (fat_boot->bytes_per_sector - 1)) / fat_boot->bytes_per_sector;

This calculation will round up. 32 is the size of a FAT directory in bytes.


The first data sector (that is, the first sector in which directories and files may be stored):

first_data_sector = fat_boot->reserved_sector_count + (fat_boot->table_count * fat_size) + root_dir_sectors;


The first sector in the File Allocation Table:

first_fat_sector = fat_boot->reserved_sector_count;


The total number of data sectors:

data_sectors = total_sectors - (fat_boot->reserved_sector_count + (fat_boot->table_count * fat_size) + root_dir_sectors);


The total number of clusters:

total_clusters = data_sectors / fat_boot->sectors_per_cluster;

This rounds down.

The FAT type of this file system:

if (sectorsize == 0) 
{
   fat_type = ExFAT;
}
else if(total_clusters < 4085) 
{
   fat_type = FAT12;
} 
else if(total_clusters < 65525) 
{
   fat_type = FAT16;
} 
else
{
   fat_type = FAT32;
}

Reading Directories

The first step in reading directories is finding and reading the root directory. On a FAT 12 or FAT 16 volumes the root directory is at a fixed position immediately after the File Allocation Tables:

first_root_dir_sector = first_data_sector - root_dir_sectors;

In FAT32 and exFAT, root directory appears in data area on given cluster and can be a cluster chain. In exFAT it cannot be encoded as extent and will always be present in the FAT.

root_cluster_32 = extBS_32->root_cluster;

For each given cluster number we can calculate the first sector of it (relative to the partition's offset):

first_sector_of_cluster = ((cluster - 2) * fat_boot->sectors_per_cluster) + first_data_sector;

After the correct cluster has been loaded into memory, the next step is to read and parse all of the entries in it. Each entry is 32 bytes long. For each 32 byte entry this is the flow of execution:

  1. If the first byte of the entry is equal to 0 then there are no more files/directories in this directory. FirstByte==0, finish. FirstByte!=0, goto 2.
  2. If the first byte of the entry is equal to 0xE5 then the entry is unused. FirstByte==0xE5, goto 8, FirstByte!=0xE5, goto 3.
  3. Is this entry a long file name entry? If the 11'th byte of the entry equals 0x0F, then it is a long file name entry. Otherwise, it is not. 11thByte==0x0F, goto 4. 11thByte!=0x0F, goto 5.
  4. Read the portion of the long filename into a temporary buffer. Goto 8.
  5. Parse the data for this entry using the table from further up on this page. It would be a good idea to save the data for later. Possibly in a virtual file system structure. goto 6
  6. Is there a long file name in the temporary buffer? Yes, goto 7. No, goto 8
  7. Apply the long file name to the entry that you just read and clear the temporary buffer. goto 8
  8. Increment pointers and/or counters and check the next entry. (goto number 1)

This process should be repeated until all of the entries have been read from the cluster. You should then check to see if there is another cluster following this one in the cluster chain or if this is the last cluster in the chain. See section below and FAT section for more information. You should do the above process for each cluster in the chain, following it until there are no more clusters left in the chain. Then you can check if any of the entries that you just read are directories. If the are they should each be read in the same way starting with their first cluster number which is stored in the entry.

Following Cluster Chains

There are two basic steps to following cluster chains. The first step is to find out if there is another "link" (cluster) following the current one in the chain. The second step is to actually use the value read from the FAT to read the next sector. Here is the basic idea:

  1. Extract the value from the FAT for the _current_ cluster. (Use the previous section on the File Allocation Table for details on how exactly to extract the value.) goto number 2
  2. Is this cluster marked as the last cluster in the chain? (again, see the above section for more details) Yes, goto number 4. No, goto number 3
  3. Read the cluster represented by the extracted value and return for more directory parsing.
  4. The end of the cluster chain has been found. Our work here is finished. :)

Reading extents

On exFAT, files can have a bit set in their flags that indicate it is stored as extent-based. This means that the whole file is contiguous, and that the file size plus the first cluster indicate where the (whole) file is. The FAT entries will contain garbage and are not to be trusted.

To read this, do the same calculation as above, except you may in every step assume that the next cluster is the numerically next cluster and that enough sectors have been allocated for the file size.

Creating a fresh FAT filesystem

Typically during development you want to create a disk image with a FAT filesystem. There are two common approaches for this, either by using a utility that works directly on images, or by using a Loopback Device and using the OS' own driver to work on the image. A less common alternative is to have an actual disk in your drive.

The most buildscript-friendly tool is MTools - which can do all operations directly on a disk image using the -i argument and supplies every DOS command related to files in this fashion, only prefixed with an m. It can also use a configuration file to access drives in their DOS fashion, allowing you to use for instance A: and C: as actual drives. The tool can be built out of the box for Windows and is included in many a linux package manager.

Linux-only developers can, often with a bit of sudo and permission magic, automate the Loopback Device in combination with mkdosfs or mkfs.vfat as well as partition editing. This method is less portable as the commands often can't be reused outside of Linux. Several developers also make the error of passing -F to mkdosfs in an attempt to choose a FAT size, which often has the effect of creating a corrupt filesystem since the result doesn't follow the official rule for FAT sizes anymore.

Windows users can make use of VFD for loopback devices. It comes with a GUI, but at the cost of not being properly automatable in a script.

See Also

Threads

  • from raw bits to directory listing (code posted) in the forum
  • Public Domain FAT32 code in the forum
  • FAT12/FAT16 bootsector code in the forum

External Links