Ext2: Difference between revisions
[unchecked revision] | [unchecked revision] |
Line 271: | Line 271: | ||
block group = (inode – 1) / INODES_PER_GROUP |
block group = (inode – 1) / INODES_PER_GROUP |
||
INODES_PER_GROUP is a field in the Superblock |
where INODES_PER_GROUP is a field in the Superblock |
||
=== Finding an inode inside of a Block Group === |
|||
Once we know which group an inode resides in, we can look up the actual inode by first retrieving that block group's inode table's starting address (see [[#Block_Group_Descriptor|Block Group Descriptor]] above). The index of our inode in this block group's inode table can be determined by using the formula: |
|||
index = (inode – 1) % INODES_PER_GROUP |
|||
where INODES_PER_GROUP is a field in the Superblock (the same field which was used to determine which block group the inode belongs to). |
|||
=== Inode Data Structure === |
=== Inode Data Structure === |
Revision as of 02:59, 17 November 2010
Filesystems |
---|
Virtual Filesystems |
Disk Filesystems |
CD/DVD Filesystems |
Network Filesystems |
Flash Filesystems |
The Second Extended Filesystem (ext2fs) is a rewrite of the original Extended Filesystem and as such, is also based around the concept of "inodes." Ext2 served as the de facto filesystem of Linux for nearly a decade from the early 1990s to the early 2000s when it was superseded by the journaling file systems ext3 and ReiserFS. It has native support for UNIX ownership / access rights, symbolic- and hard-links, and other properties that are common among UNIX-like operating systems. Organizationally, it divides disk space up into groups called "block groups." Having these groups results in distribution of data across the disk which helps to minimize head movement as well as the impact of fragmentation. Further, some (if not all) groups are required to contain backups of important data that can be used to rebuild the file system in the event of disaster.
Important Note: All values are little-endian unless otherwise specified
Basic Concepts
What is a Block?
The Ext2 file system divides up disk space into logical blocks of contiguous space. The size of blocks need not be the same size as the sector size of the disk the file system resides on. The size of blocks can be determined by reading the field starting at byte 24 in the Superblock.
What is a Block Group?
Blocks, along with inodes, are divvied up into "block groups." These are nothing more than contiguous groups of blocks.
Each block group reserves a few of its blocks for special purposes such as:
- A bitmap of free/allocated blocks within the group
- A bitmap of allocated inodes within the group
- A table of inode structures that belong to the group
- Depending upon the revision of Ext2 used, some or all block groups may also contain a backup copy of the Superblock and the Block Group Descriptor Table.
What is an Inode?
An inode is a structure on the disk that represents a file, directory, symbolic link, etc. Inodes reside in a particular Block Group's inode table (see below), and are separate from the actual contents data.
Superblock
The first step in implementing an Ext2 driver is to find, extract, and parse the superblock. The Superblock contains all information about the layout of the file system and possibly contains other important information like what optional features were used to create the file system. Once you have finished with the Superblock, the next step is to look at the Block Group Descriptor Table
Locating the Superblock
The Superblock is always located at byte 1024 from the beginning of the volume and is exactly 1024 bytes in length. For example, if the disk uses 512 byte sectors, the Superblock will begin at LBA 2 and will occupy all of sector 2 and 3.
Determining the Number of Block Groups
From the Superblock, extract the size of each block, the total number of inodes, the total number of blocks, the number of blocks per block group, and the number of inodes in each block group. From this information we can infer the number of block groups there are by:
- Rounding up the total number of blocks divided by the number of blocks per block group
- Rounding up the total number of inodes divided by the number of inodes per block group
- Both (and check them against each other)
Base Superblock Fields
These fields are present in all versions of Ext2
Starting
Byte |
Ending
Byte |
Size
in Bytes |
Field Description |
---|---|---|---|
0 | 3 | 4 | Total number of inodes in file system |
4 | 7 | 4 | Total number of blocks in file system |
8 | 11 | 4 | Number of blocks reserved for superuser (see offset 80) |
12 | 15 | 4 | Total number of unallocated blocks |
16 | 19 | 4 | Total number of unallocated inodes |
20 | 23 | 4 | Block number of the block containing the superblock |
24 | 27 | 4 | log2 (block size) - 10. (In other words, the number to shift 1,024 to the left by to obtain the block size) |
28 | 31 | 4 | log2 (fragment size) - 10. (In other words, the number to shift 1,024 to the left by to obtain the fragment size) |
32 | 35 | 4 | Number of blocks in each block group |
36 | 39 | 4 | Number of fragments in each block group |
40 | 43 | 4 | Number of inodes in each block group |
44 | 47 | 4 | Last mount time (in POSIX time) |
48 | 51 | 4 | Last written time (in POSIX time) |
52 | 53 | 2 | Number of times the volume has been mounted since its last consistency check (fsck) |
54 | 55 | 2 | Number of mounts allowed before a consistency check (fsck) must be done |
56 | 57 | 2 | Ext2 signature (0xef53), used to help confirm the presence of Ext2 on a volume |
58 | 59 | 2 | File system state (see below) |
60 | 61 | 2 | What to do when an error is detected (see below) |
62 | 63 | 2 | Minor portion of version (combine with Major portion below to construct full version field) |
64 | 67 | 4 | POSIX time of last consistency check (fsck) |
68 | 71 | 4 | Interval (in POSIX time) between forced consistency checks (fsck) |
72 | 75 | 4 | Operating system ID from which the filesystem on this volume was created (see below) |
76 | 79 | 4 | Major portion of version (combine with Minor portion above to construct full version field) |
80 | 81 | 2 | User ID that can use reserved blocks |
82 | 83 | 2 | Group ID that can use reserved blocks |
File System States
Value | State Description |
---|---|
1 | File system is clean |
2 | File system has errors |
Error Handling Methods
Value | Action to Take |
---|---|
1 | Ignore the error (continue on) |
2 | Remount file system as read-only |
3 | Kernel panic |
Creator Operating System IDs
Value | Operating System |
---|---|
0 | Linux |
1 | GNU HURD |
2 | MASIX (an operating system developed by Rémy Card, one of the developers of ext2) |
3 | FreeBSD |
4 | Other "Lites" (BSD4.4-Lite derivatives such as NetBSD, OpenBSD, XNU/Darwin, etc.) |
Extended Superblock Fields
These fields are only present if Major version (specified in the base superblock fields), is greater than 1.
Starting
Byte |
Ending
Byte |
Size
in Bytes |
Field Description |
---|---|---|---|
84 | 87 | 4 | First non-reserved inode in file system. (In versions < 1.0, this is fixed as 11) |
88 | 89 | 2 | Size of each inode structure in bytes. (In versions < 1.0, this is fixed as 128) |
90 | 91 | 2 | Block group that this superblock is part of (if backup copy) |
92 | 95 | 4 | Optional features present (features that are not required to read or write, but usually result in a performance increase. see below) |
96 | 99 | 4 | Required features present (features that are required to be supported to read or write. see below) |
100 | 103 | 4 | Features that if not supported, the volume must be mounted read-only see below) |
104 | 119 | 16 | File system ID (what is output by blkid) |
120 | 135 | 16 | Volume name (C-style string: characters terminated by a 0 byte) |
136 | 199 | 64 | Path volume was last mounted to (C-style string: characters terminated by a 0 byte) |
200 | 203 | 4 | Compression algorithms used (see Required features above) |
204 | 204 | 1 | Number of blocks to preallocate for files |
205 | 205 | 1 | Number of blocks to preallocate for directories |
206 | 207 | 2 | (Unused) |
208 | 223 | 16 | Journal ID (same style as the File system ID above) |
224 | 227 | 4 | Journal inode |
228 | 231 | 4 | Journal device |
232 | 235 | 4 | Head of orphan inode list |
236 | 1023 | X | (Unused) |
Optional Feature Flags
These are optional features for an implementation to support, but offer performance or reliability gains to implementations that do support them.
Flag Value | Description |
---|---|
0x0001 | Preallocate some number of (contiguous?) blocks (see byte 205 in the superblock) to a directory when creating a new one (to reduce fragmentation?) |
0x0002 | AFS server inodes exist |
0x0004 | File system has a journal (Ext3) |
0x0008 | Inodes have extended attributes |
0x0010 | File system can resize itself for larger partitions |
0x0020 | Directories use hash index |
Required Feature Flags
These features if present on a file system are required to be supported by an implementation in order to correctly read from or write to the file system.
Flag Value | Description |
---|---|
0x0001 | Compression is used |
0x0002 | Directory entries contain a type field |
0x0004 | File system needs recovery |
0x0008 | File system uses a journal device |
Read-Only Feature Flags
These features, if present on a file system, are required in order for an implementation to write to the file system, but are not required to read from the file system.
Flag Value | Description |
---|---|
0x0001 | Sparse superblocks and group descriptor tables |
0x0002 | File system uses a 64-bit file size |
0x0004 | Directory contents are stored in the form of a Binary Tree |
Block Group Descriptor Table
The Block Group Descriptor Table contains a descriptor for each block group within the file system. The number of block groups within the file system, and correspondingly, the number of entries in the Block Group Descriptor Table, is described above. Each descriptor contains information regarding where important data structures for that group are located.
Locating the Block Group Descriptor Table
The table is located in the block immediately following the Superblock. So if the block size (determined from a field in the superblock) is 1024 bytes per block, the Block Group Descriptor Table will begin at block 2. For any other block size, it will begin at block 1. Remember that blocks are numbered starting at 0, and that block numbers don't usually correspond to physical block addresses.
Block Group Descriptor
A Block Group Descriptor contains information regarding where important data structures for that block group are located.
Starting
Byte |
Ending
Byte |
Size
in Bytes |
Field Description |
---|---|---|---|
0 | 3 | 4 | Block address of block usage bitmap |
4 | 7 | 4 | Block address of inode usage bitmap |
8 | 11 | 4 | Starting block address of inode table |
12 | 13 | 2 | Number of unallocated blocks in group |
14 | 15 | 2 | Number of unallocated inodes in group |
16 | 17 | 2 | Number of directories in group |
18 | 31 | X | (Unused) |
Inodes
Like blocks, each inode has a numerical address. It is extremely important to note that unlike block addresses, inode addresses start at 1.
With Ext2 versions prior to Major version 1, inodes 1 to 10 are reserved and should be in an allocated state. Starting with version 1, the first non-reserved inode is indicated via a field in the Superblock. Of the reserved inodes, number 2 has subjectively has the most significance as it is used for the root directory. Inode 1 keeps track of bad blocks, but it does not have any special status in the Linux kernel.
Determining which Block Group contains an Inode
From an inode address (remember that they start at 1), we can determine which group the inode is in, by using the formula:
block group = (inode – 1) / INODES_PER_GROUP
where INODES_PER_GROUP is a field in the Superblock
Finding an inode inside of a Block Group
Once we know which group an inode resides in, we can look up the actual inode by first retrieving that block group's inode table's starting address (see Block Group Descriptor above). The index of our inode in this block group's inode table can be determined by using the formula:
index = (inode – 1) % INODES_PER_GROUP
where INODES_PER_GROUP is a field in the Superblock (the same field which was used to determine which block group the inode belongs to).
Inode Data Structure
Byte Range | Description |
---|---|
0–1 | File mode (type and permissions) |
2–3 | Lower 16 bits of user ID |
4–7 | Lower 32 bits of size in bytes |
8–11 | Access Time |
12–15 | Change Time |
16–19 | Modification time |
20–23 | Deletion time |
24–25 | Lower 16 bits of group ID |
26–27 | Link count |
28–31 | Sector count |
32–35 | Flags |
36–39 | Unused |
40–87 | 12 direct block pointers |
88–91 | 1 single indirect block pointer |
92–95 | 1 double indirect block pointer |
96–99 | 1 triple indirect block pointer |
100–103 | Generation number (NFS) |
104–107 | Extended attribute block (File ACL) |
108–111 | Upper 32 bits of size / Directory ACL Yes / |
112–115 | Block address of fragment |
116–116 | Fragment index in block |
117–117 | Fragment size |
118–119 | Unused |
120–121 | Upper 16 bits of user ID |
122–123 | Upper 16 bits of group ID |
124–127 | Unused |
Each inode has a static number of fields, and additional information might be stored in extended attributes and indirect block pointers. The allocation status of an inode is determined using the inode bitmap, whose location is given in the group descriptor.
Ext2, like UFS, was designed for efficiency of small files. Therefore, each inode can store the addresses of the first 12 blocks that a file has allocated. These are called direct pointers. If a file needs more than 12 blocks, a block is allocated to store the remaining addresses. The pointer to the block is called an indirect block pointer. The addresses in the block are all four bytes, and the total number in each block is based on the block size. The indirect block pointer is stored in the inode.
If a file has more blocks than can fit in the 12 direct pointers and the indirect block, a double indirect block is used. A double indirect block is when the inode points to a block that contains a list of single indirect block pointers, each of which point to blocks that contain a list of direct pointers. Lastly, if a file needs still more space, it can use a triple indirect block pointer. A triple indirect block contains addresses of double indirect blocks, which contain addresses of single indirect blocks. Each inode contains 12 direct pointers, one single indirect pointer, one double indirect block pointer, and one triple indirect pointer.
An inode also contains the file's size, ownership, and temporal information. The size value in newer versions of ExtX is 64 bits, but older versions had only 32 bits and therefore could not handle files over 4GB. Newer versions utilize an unused field for the upper 32 bits of the size value and set a read-only compatible feature flag when a large file exists.
"Ownership" information is stored using the user and group ID.
Directories
Directories are files which contains information needed to find files within the filesystem. The root directory is Inode 2.
Filemode flags
Permission Flag | In Octal | Description |
---|---|---|
0x001 | 0001 | Other—execute permission |
0x002 | 0002 | Other—write permission |
0x004 | 0004 | Other—read permission |
0x008 | 0010 | Group—execute permission |
0x010 | 0020 | Group—write permission |
0x020 | 0040 | Group—read permission |
0x040 | 0100 | User—execute permission |
0x080 | 0200 | User—write permission |
0x100 | 0400 | User—read permission |
Flag Value | In Octal | Description |
---|---|---|
0x200 | 01000 | Sticky bit |
0x400 | 02000 | Set group ID |
0x800 | 04000 | Set user ID |
Bits 12-15
Type Value | Description |
---|---|
0x1000 | FIFO |
0x2000 | Character device |
0x4000 | Directory |
0x6000 | Block device |
0x8000 | Regular file |
0xA000 | Symbolic link |
0xC000 | Unix socket |
Inode flags
Flag Value | Description |
---|---|
0x00000001 | Secure deletion (not used) |
0x00000002 | Keep a copy of data when deleted (not used) |
0x00000004 | File compression (not used) |
0x00000008 | Synchronous updates—new data is written immediately to disk |
0x00000010 | Immutable file (content cannot be changed) |
0x00000020 | Append only |
0x00000040 | File is not included in 'dump' command |
0x00000080 | A-time is not updated |
0x00001000 | Hash indexed directory |
0x00002000 | File data is journaled with Ext3 |
Directory Information
Byte Range | Description |
---|---|
0–3 | Inode |
4-5 | Record Length (Will get you to the next record) |
6 | Name Length |
7 | File Type |
8-X | File name data |
Filetypes
Description | Value |
---|---|
EXT2_FT_UNKNOWN | 0 |
EXT2_FT_REG_FILE | 1 |
EXT2_FT_DIR | 2 |
EXT2_FT_CHRDEV | 3 |
EXT2_FT_BLKDEV | 4 |
EXT2_FT_FIFO | 5 |
EXT2_FT_SOCK | 6 |
EXT2_FT_SYMLINK | 7 |
EXT2_FT_MAX | 8 |
Putting it all together
How To Read A File
- Read the superblock to find the size of each block, the number of blocks per group, number Inodes per group, the starting block of the first group.
- Read the first entry of the Group Descriptor Table, to find the location of the Inode table. get Inode 2, this will be the root directory.
- The directory information is located within the data blocks that the Inode points to, read all the data blocks associated within the Inode.
- Interate through the directory information to find the directory/file.
- Read the inode bitmap to find out weather the inode pointed to by the directory structure is allocated.
- if Allocated and is a directory go to step 3. If not allocated continue with step 4.
- Read the inode the block information, to get where the directory information is located.
- Read the first 12 blocks, (if file is less then 12 blocks, only read the number of blocks needed specified by the inode (i.e to get the number of block the file takes, divide the size of the file by the size of each block).
- If is larger than 12 blocks.
- Read indirect block.
- foreach block larger than 12 blocks, read the pointer from the block. (i.e if reading block 13, pointer 1 from the block should be read).
- if the file needs more blocks. then read the double pointer will be a pointer to a block containing pointers to blocks of data.
- if the file still need more blocks, than read the triple indirect pointer.
Links
- ext2-doc project: Second Extended File System - implementation-oriented documentation, describes internal structure in human language.
- Design and Implementation of the Second Extended Filesystem (overview)