FAT: Difference between revisions

[unchecked revision]

← Older edit

Content deleted Content added

VisualWikitext

Revision as of 18:56, 12 September 2021 view source Candy (talk \| contribs) 26 edits →‎ExFAT ← Older edit		Latest revision as of 05:14, 9 June 2024 view source Xtex (talk \| contribs) Interface administrators, Administrators 977 edits m Bot: Replace deprecated source tag with syntaxhighlight
(14 intermediate revisions by 7 users not shown)
Line 1: {{Filesystems}} The '''File Allocation Table''' ('''FAT''') was the native file system of MS-DOS. FAT was originally introduced by Marc McDonald in Stand-alone Disk BASIC with ~~DOS~~8-bit ~~v1.0~~FAT entries (and ~~possibly~~16 ~~CP/M)~~byte directory entries. ~~Supposedly~~The ~~written~~better byknown ~~Bill~~FAT12 ~~Gates~~variant, with 12-bit FAT entries and 32 byte directory entries, was introduced with DOS. FAT is a very simple file system -- nothing more than a singly-linked list of clusters in a gigantic table. A FAT file system uses very little memory (unless the OS caches the whole allocation table in memory) and is one of, if not the, most basic file system in use today. == Overview == Line 11: === FAT 32 === FAT 32 was introduced to us by Windows95-B and Windows98. FAT32 solved some of FAT's problems. No more 64K max clusters! Although FAT32 isuses ~~slightly~~32 ~~misnamed,~~bits asper ~~the~~FAT ~~top~~entry, ~~4 bits of~~only the 32bottom ~~bit~~28 ~~cluster number~~bits are ~~reserved,~~actually ~~and~~used ~~were~~to ~~never~~address ~~used.~~clusters Ifon ~~you~~the ~~want~~disk to(top ~~call~~4 itbits ~~FAT28~~are ~~instead,~~reserved). ~~then~~With as28 ~~the~~bits per ~~name~~FAT ~~suggests~~entry, the filesystem can ~~handle~~address a maximum of ~~256M~~about 270 million clusters ~~per~~in a partition. This enables very large hard disks to still maintain reasonably small cluster sizes and thus reduce slack space between files. === ExFAT === Line 76: \| 0x11 \| 2 \| Number of root directory entries (must be set so that the root directory occupies entire sectors). \|- \| 19 Line 408: To read the filesystem, find out how big a 'sector' and a 'cluster' are. A sector is (1 << sectorshift) bytes, a cluster is (1 << (sectorshift + clustershift)) bytes. Then, find the start of the FAT and the start of the cluster heap (note that the first cluster is still cluster 2). <~~source~~syntaxhighlight lang="C"> // This allows you to zero-index clusters: uint64_t clusterArray = clusterheapoffset * sectorsize - 2 * clustersize; uint64_t fatOffset = fatoffset * sectorsize; uint64_t usablespace = clustercount * clustersize; </syntaxhighlight> ~~</source>~~ Note that all values in the BPB are now naturally aligned and that this code is significantly simpler than FAT32's BPB reading. Line 422: ==== FAT 12 ==== FAT 12 uses 12 bits to address the clusters on the disk. Each 12 bit entry in the FAT points to the next cluster of a file on the disk. Given a valid cluster number, here is how you extract the value of the next cluster in the cluster chain: <~~source~~syntaxhighlight lang="C"> unsigned char FAT_table[sector_size * 2]; // needs two in case we straddle a sector unsigned int fat_offset = active_cluster + (active_cluster / 2);// multiply by 1.5 unsigned int fat_sector = first_fat_sector + (fat_offset / ~~section_size~~sector_size); unsigned int ent_offset = fat_offset % ~~section_size~~sector_size; //at this point you need to read two sectors from ~~sector~~disk starting at "fat_sector" ~~on the disk~~ into "FAT_table". unsigned short table_value = (unsigned short)&FAT_table[ent_offset]; iftable_value = (active_cluster & ~~0x0001~~1) ? table_value >> 4 : table_value & 0xfff; ~~table_value = table_value >> 4;~~ ~~else~~ ~~table_value = table_value & 0x0FFF;~~ //the variable "table_value" now has the information you need about the next cluster in the chain. </syntaxhighlight> ~~</source>~~ If "table_value" is greater than or equal to (>=) 0xFF8 then there are no more clusters in the chain. This means that the whole file has been read. If "table_value" equals (==) 0xFF7 then this cluster has been marked as "bad". "Bad" clusters are prone to errors and should be avoided. If "table_value" is not one of the above cases then it is the cluster number of the next cluster in the file. The entries under index 0 and 1 are reserved. Index 0 is used as a value in other entries signifying that the given cluster is free, with the corresponding first entry in the table holding the value of the BPB_Media field in its low 8 bits and 0xf in its top 4 bits. For example, if BPB_Media is 0xF8, then the zeroth entry should hold the value 0xFF8. The second entry (index 1) is unused but must hold the value 0xFFF. Since FAT12 uses an entry size that is not evenly divisible by 8 bits, figuring out how to interpret the FAT on a little-endian machine can be slightly confusing. Consider two entries of 0x123 and 0x456 back-to-back. On a little-endian machine, the first byte of the first entry is the bottom two nibbles (0x23) and the highest nibble goes into the bottom nibble of the second byte (0x?1). Since the next entry is now starting mid-byte, only the lowest nibble can fit in the byte (0x6?) and the two highest nibbles go into the next byte (0x45). Therefore the 2 entries back-to-back look like this: 0x23 0x61 0x45. FAT12 uses an entry size that is not evenly divisible by 8 bits. This has some consequences. First is storage in the table. Consider successive entries with values 0x123 and 0x456. In the bytes of the table, they'll be stored 0x23 0x61 0x45. Note that if you do little-endian 16-bit loads, you get 0x6123 at offset 0 and 0x4561 at offset 1, letting you recover the original two entry values with the shifts, masks, and offsets seen in the above code block. The second is that, as seen above with the offsets used being 0 and 1, those word bytes might not be 16-bit aligned. That usually just means the x86 takes a slower path to load the word if you do e.g. <tt>(unsigned short )bytes</tt>, but if you're use something like UBSan to avoid undefined behavior, those UB-catching routines can be triggered (usually resulting in a panic) if you don't load the two bytes separately and stick them together yourself. The third consequence is that the word bytes might not be sector aligned. Which means if your code loads a single sector of the table, it needs a special case where it loads two if the entry straddles the sector-size boundary. Or you can just load two sectors every time as seen above. ==== FAT 16 ==== FAT 16 uses 16 bits to address the clusters on the disk. Because of this, it is much easier to extract the values out of a 16 bit File Allocation Table. Here is how it is done: <~~source~~syntaxhighlight lang="C"> unsigned char FAT_table[sector_size]; unsigned int fat_offset = active_cluster * 2; Line 456 ⟶ 461: //the variable "table_value" now has the information you need about the next cluster in the chain. </syntaxhighlight> ~~</source>~~ If "table_value" is greater than or equal to (>=) 0xFFF8 then there are no more clusters in the chain. This means that the whole file has been read. If "table_value" equals (==) 0xFFF7 then this cluster has been marked as "bad". "Bad" clusters are prone to errors and should be avoided. If "table_value" is not one of the above cases then it is the cluster number of the next cluster in the file. The entries under index 0 and 1 are reserved. The zeroth entry is reserved because index 0 is used as value of other entries signifying that the given cluster is free. Zeroth entry has to hold value of the BPB_Media field from in the low 8 bits, and the rest of the bits have to be set to zero. For example, if BPB_Media is 0xF8, then the zeroth entry should hold the value 0xFFF8. The first entry is reserved for the future and must to hold the value 0xFFFF. ==== FAT 32 and exFAT ==== FAT 32 uses 28 bits to address the clusters on the disk. The highest 4 bits are reserved. This means that they should be ignored when read and unchanged when written. exFAT uses the full 32 bit to encode sector numbers. Similar to the same operation on a 16 bit FAT: <~~source~~syntaxhighlight lang="C"> unsigned char FAT_table[sector_size]; unsigned int fat_offset = active_cluster * 4; Line 474 ⟶ 481: //the variable "table_value" now has the information you need about the next cluster in the chain. </syntaxhighlight> ~~</source>~~ If "table_value" is greater than or equal to (>=) 0x0FFFFFF8 (or 0xFFFFFFF8 for exFAT) then there are no more clusters in the chain. This means that the whole file has been read. If "table_value" equals (==) 0x0FFFFFF7 (or 0xFFFFFFF7 for exFAT) then this cluster has been marked as "bad". "Bad" clusters are prone to errors and should be avoided. If "table_value" is not one of the above cases then it is the cluster number of the next cluster in the file. The entries under index 0 and 1 are reserved. The zeroth entry is reserved because index 0 is used as value of other entries signifying that the given cluster is free. Zeroth entry has to hold value of the BPB_Media field from in the low 8 bits, and the rest of the bits have to be set to zero. For example, if BPB_Media is 0xF8, then the zeroth entry should hold the value 0xFFFFFFF8. The first entry is reserved for the future and must to hold the value 0xFFFFFFFF. Note that on exFAT, some files are not written out into the FAT. In the case that a file is fully contiguous, exFAT allows the operating system to encode this information and not update the FAT for this file. Unlike FAT32 therefore, the FAT table is not used for allocation status of a cluster; instead there is an allocation bitmap to handle that. See below under directory entries for that. Line 503 ⟶ 512: \| 13 \| 1 \| Creation time in hundredths of a second, although the official FAT Specification from Microsoft says it is tenths of a second. Range 0-199 inclusive. Based on simple tests, Ubuntu16.10 stores either 0 or 100 while Windows7 stores 0-199 in this field. \|- \| 14 Line 682 ⟶ 691: \| 20 \| 1 \| Creation ~~milliseconds~~time in hundredths of a second (0-199) to be added to the FAT style date/time for more accuracy. See FAT12 entry for format of date/time. \|- \| 21 \| 1 \| Modification ~~milliseconds~~time in hundredths of a second (0-199). \|- \| 22 Line 835 ⟶ 844: Here is an example of some boot sector structures in C. <~~source~~syntaxhighlight lang="C"> typedef struct fat_extBS_32 { Line 888 ⟶ 897: }__attribute__((packed)) fat_BS_t; </syntaxhighlight> ~~</source>~~ Important pieces of information that can be extracted from the boot sector include: '''Total sectors in volume (including VBR):''' <~~source~~syntaxhighlight lang="C"> total_sectors = (fat_boot->total_sectors_16 == 0)? fat_boot->total_sectors_32 : fat_boot->total_sectors_16; </syntaxhighlight> ~~</source>~~ '''FAT size in sectors:''' <~~source~~syntaxhighlight lang="C"> fat_size = (fat_boot->table_size_16 == 0)? fat_boot_ext_32->table_size_16 : fat_boot->table_size_16; </syntaxhighlight> ~~</source>~~ '''The size of the root directory (unless you have FAT32, in which case the size will be 0):''' <~~source~~syntaxhighlight lang="C"> root_dir_sectors = ((fat_boot->root_entry_count * 32) + (fat_boot->bytes_per_sector - 1)) / fat_boot->bytes_per_sector; </syntaxhighlight> ~~</source>~~ This calculation will round up. 32 is the size of a FAT directory in bytes. '''The first data sector (that is, the first sector in which directories and files may be stored):''' <~~source~~syntaxhighlight lang="C"> first_data_sector = fat_boot->reserved_sector_count + (fat_boot->table_count * fat_size) + root_dir_sectors; </syntaxhighlight> ~~</source>~~ '''The first sector in the File Allocation Table:''' <~~source~~syntaxhighlight lang="C"> first_fat_sector = fat_boot->reserved_sector_count; </syntaxhighlight> ~~</source>~~ '''The total number of data sectors:''' <~~source~~syntaxhighlight lang="C"> data_sectors = ~~fat_boot->~~total_sectors - (fat_boot->reserved_sector_count + (fat_boot->table_count * fat_size) + root_dir_sectors); </syntaxhighlight> ~~</source>~~ '''The total number of clusters:''' <~~source~~syntaxhighlight lang="C"> total_clusters = data_sectors / fat_boot->sectors_per_cluster; </syntaxhighlight> ~~</source>~~ This rounds down. '''The FAT type of this file system:''' <~~source~~syntaxhighlight lang="C"> if (sectorsize == 0) { Line 950 ⟶ 959: fat_type = FAT32; } </syntaxhighlight> ~~</source>~~ === Reading Directories === The first step in reading directories is finding and reading the root directory. On a FAT 12 or FAT 16 volumes the root directory is at a fixed position immediately after the File Allocation Tables: <~~source~~syntaxhighlight lang="C"> first_root_dir_sector = first_data_sector - root_dir_sectors; </syntaxhighlight> ~~</source>~~ In FAT32 and exFAT, root directory appears in data area on given cluster and can be a cluster chain. In exFAT it cannot be encoded as extent and will always be present in the FAT. <~~source~~syntaxhighlight lang="C"> root_cluster_32 = extBS_32->root_cluster; </syntaxhighlight> ~~</source>~~ For each given cluster number we can calculate the first sector of it (relative to the partition's offset): <~~source~~syntaxhighlight lang="C"> first_sector_of_cluster = ((cluster - 2) * fat_boot->sectors_per_cluster) + first_data_sector; </syntaxhighlight> ~~</source>~~ After the correct cluster has been loaded into memory, the next step is to read and parse all of the entries in it. Each entry is 32 bytes long. For each 32 byte entry this is the flow of execution: