Ext4: Difference between revisions

6,864 bytes added ,  1 year ago
→‎See Also: Add link to official ext4 docs
[unchecked revision][unchecked revision]
(→‎See Also: Add link to official ext4 docs)
 
(5 intermediate revisions by 2 users not shown)
Line 2:
{{Filesystems}}
While the ext4 filesystem originated a series of patches to the [[ext3]] filesystem, it was later rebranded as a dedicated filesystem design that shares design with [[ext2]] and [[ext3]]. Like ext3, it supports [[journaling]]. Amongst the upgrades are larger maximums (file size, filesystem size, files per folder, folders per folder etc), and features that were inspired from existing filesystems like [[XFS]].
 
''This information is based off the Ext4 implementation as of Linux 5.9rc3. Ext4 is subject to change, so one may wish to also check the latest kernel headers for new information.''
 
== Basic Concepts ==
{{Stub}}
 
== Superblock ==
 
Line 308 ⟶ 314:
 
== Block Group Descriptor ==
 
See [[Ext2#Block_Group_Descriptor_Table|Ext2]] wiki page for an introduction to block group descriptor tables.
 
In Ext4, the block descriptors, in addition to their role in Ext2 as information of important data structures, have new features such as flex block groups and meta block groups.
 
In a flex block group, multiple block groups are grouped together into a flex block group, as the group descriptor records the location of both bitmaps and the inode table. This allows for better data locality.
 
In a meta block group, the filesystem is partitioned into multiple 'metablock' groups. This allows the metadata of the block group descriptors to be stored in one block. This strategy also increases the maximum filesystem size to 512PiB from 256TiB without metablock groups.
 
=== Locating the Block Group Descriptors ===
 
Locating the Block Group Descriptors is similar to Ext2, except for metablock and flex block group locations.
 
 
One can check for flex block groups by checking the required option 'Flex Block Group'. The structure for that is found in the superblock data structure.
 
Flex Block Group info Structure, these fields are atomic integers:
 
{| {{wikitable}}
! Starting
Byte
! Ending
Byte
! Size
in Bytes
! Description
|-
| 0 || 7 || 8 || Atomic 64 bit free clusters.
|-
| 8 || 11 || 4 || Atomic free inodes.
|-
| 12 || 15 || 4 || Atomic used directories.
|}
 
=== Block Group Descriptor ===
 
Block group descriptor Structure:
Line 392 ⟶ 433:
| 0x0004 || Block groups's inode table is zeroed.
|}
 
== Multiple Mount Protection ==
 
Multiple Mount Protection (MMP) protects filesystems being mounted multiple times leading to dangerous data races. This feature writes a sequence number into the block referenced in the MMP block superblock field.
 
To check for MMP, check the required feature flag Multiple Mount Protection, and the magic field in the block referenced by the MMP block superblock field.
 
In MMP, the driver checks the sequence number in the MMP block . If the sequence number is fs check running or any unknown code above maximum MMP value, the drive is not safe to mount, even if the timestamp is outdated.
 
While running, the driver checks the MMP block sequence number at the interval specified in the MMP check superblock field. If this does not match the in-memory sequence number, a different host has mounted the filesystem, requiring the driver to remount as readonly. If it does match, the driver increments the number in memory and on disk. The driver also writes the hostname and mount path in the MMP block on open() success.
 
The minimum interval to MMP check is 5 seconds and the maximum interval is 300.
 
=== MMP structure ===
 
{| {{wikitable}}
! Starting
Byte
! Ending
Byte
! Size
in Bytes
! Description
|-
| 0 || 3 || 4 || MMP signature (0x004d4d50)
|-
| 4 || 7 || 4 || Sequence Number ([[#MMP_Sequences|see below]])
|-
| 8 || 15 || 8 || Last updated time (does not affect algorithm)
|-
| 16 || 79 || 64 || Hostname of system that open() the filesystem (does not affect algorithm)
|-
| 80 || 111 || 32 || Mount path of system that open() the filesystem (does not affect algorithm)
|-
| 112 || 113 || 2 || Interval to check MMP block
|-
| 114 || 115 || 2 || Padding.
|-
| 116 || 1019 || 904 || Padding.
|-
| 1020 || 1023 || 4 || Checksum (crc32c(UUID+MMP Block number))
|}
 
== Journaling ==
 
See [[Journaling]] for a high level description of a filesystem journal.
 
Ext4 uses the [[Jbd2]] Journaling layer.
 
Ext4 defines the journal inode as inode 8. The superblock contains the first 68 bytes of the journal. The journal is a hidden file in the filesystem, usually using an entire block group, but it is preferred to be in the middle of the volume.
 
The optional feature filesystem journal protects against filesystem corruption if the system crashes. The filesystem journal writes important data to a small contigous sliver of disk. Once this is flushed to disk, the driver writes a record of the data to be written to the journal. Later, the driver can write the transactions to disk. If the system crashes during the second write, it can simply replay the journal to the last sync. If the write succeedes, the write's record is removed from the journal.
 
The default journaling strategy is 'ordered', writing only filesystem metadata through journaling. If stronger guarantees are preferred, the filesystem can be use the 'journal' strategy, writing both metadata and data through the journal, slowing operation. The filesystem can also use the 'writeback' strategy, where data is not flushed to the disk before a metadata update.
 
Ext4 may also use a seperate journal device, specified in the superblock's journal UUID. The separate journal device will have 1024 bytes of padding, then an ext4 superblock with a matching UUID. The journal follows on the next complete block.
 
===Journal Superblock Fields===
 
Ext4 uses [[Jbd2]] as the journal.
 
All fields are big endian unless otherwise specified.
 
{| {{wikitable}}
! Starting
Byte
! Ending
Byte
! Size
! Description
|-
| 0 || 11 || 12 || Journal header ([[#Journal_Header|see below]])
|-
| 12 || 15 || 4 || Block size of the journal device.
|-
| 16 || 19 || 4 || Total number of blocks in the journal device.
|-
| 20 || 23 || 4 || First block of journal information.
|-
| 24 || 27 || 4 || First journal transaction expected.
|-
| 28 || 31 || 4 || First block of the journal.
|-
| 32 || 35 || 4 || Errno, if the journal has an error.
|-
| 36 || 39 || 4 || Required features present.
|-
| 40 || 43 || 4 || Optional features present.
|-
| 44 || 47 || 4 || Features that if not supported the journal must be mounted read-only. There are no read-only features as of Linux 5.9rc3.
|-
| 48 || 63 || 16 || Journal UUID.
|-
| 64 || 67 || 4 || Number of filesystems using this journal.
|-
| 68 || 71 || 4 || Block number of the journal superblock copy.
|-
| 72 || 75 || 4 || Maximum journal blocks per transaction. This is unused as of Linux 5.9rc3.
|-
| 76 || 79 || 4 || Maximum data blocks per transaction. This is unused as of Linux 5.9rc3.
|-
| 80 || 80 || 1 || Checksum algorithm.
|-
| 81 || 83 || 3 || Padding.
|-
| 84 || 251 || 168 || Padding.
|-
| 252 || 255 || 4 || Checksum of the journal superblock.
|-
| 256 || 1023 || 768 || UUID of filesystem
|}
 
===Journal Header===
 
All fields are big endian unless otherwise specified.
 
{| {{wikitable}}
! Starting
Byte
! Ending
Byte
! Size
! Description
|-
| 0 || 3 || 4 || Magic signature (0xc03b3998)
|-
| 4 || 7 || 4 || Block Type ([[#Journal_Block_Type|see below]])
|-
| 8 || 11 || 4 || Journal transaction for this block.
|}
 
 
==See Also==
===External Links===
* [https://www.kernel.org/doc/html/latest/filesystems/ext4/index.html ext4 Data Structures and Algorithms] - detailed description of ext4 from the Linux kernel documentation.
* [https://github.com/torvalds/linux/blob/master/fs/ext4/ext4.h Linux ext4 header] - This header contains useful definitions and declarations.
 
Anonymous user