ISO 9660

From OSDev.wiki
Revision as of 14:49, 27 January 2010 by osdev>Aj (Formatting error.)
Jump to navigation Jump to search
Filesystems
Virtual Filesystems

VFS

Disk Filesystems
CD/DVD Filesystems
Network Filesystems
Flash Filesystems

ISO 9660 is the standard filesystem for CD-ROMs.

This page is a work in progress.
This page may thus be incomplete. Its content may be changed in the near future.

Overview and Caveats

ISO 9660 is not a complex file system, but has a few quirks that are worth remembering. It seems that some operating systems also create non-compliant CD's, so beware! The main example of this is the character set that is available for file names. Strictly, this consists of A-Z (upper case only!), digits and underscores. Many operating systems also allow lower case letters and other characters. Linux's VFS displays lower case filenames to the user despite the cd contents actually containing upper case characters.

Another (perhaps little-known and little-utilised) feature of the ISO 9660 file system is that a single file system can span multiple CD's. This is dealt with via "set numbers".

Sector Size

An ISO 9660 sector is normally 2KiB long. Although the specification allows for alternative sector sizes, you will rarely find anything other than 2KiB.

Numerical Formats

Another quirk of the system is that it has several numbering formats and multi-byte numbers are often represented in "both-endian" format (that is, LSB first followed by MSB first). For this reason, 32 bit LBA's often appear in 8 byte fields. ISO 9660 section 7.3, for example, outlines these 32 bit systems and numbers throughout the documents are referred to as being 7.3.1 format (LSB first), 7.3.2 format (MSB first) or 7.3.3 format (both-endian). Where a both-endian format exists, the x86 architecture makes use of the first 32 bit sequence (the LSB-first sequence) and ignores bytes 4-7.

Permitted Characters

The specification refers to two sets of characters: 'a-characters' and 'd-characters'. You will see these terms used in the descriptor tables throughout this article. The character sets are:

a-characters: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 _ 
              ! " % & ' ( ) * + , - . / : ; < = > ?

d-characters: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 _

Volume Descriptors

When preparing to mount a CD, your first action will be reading the volume descriptors (specifically, you will be looking for the Primary Volume Descriptor).

Sectors 0x00-0x0F of the CD are reserved for system use. This means that the Volume Descriptors can be found starting at sector 0x10. The format of the volume descriptors is as follows:

Offset Length (bytes) Field Name Description
0 1 Type Volume Descriptor type code (see below).
1 5 Identifier Always 'CD001'.
6 1 Version Volume Descriptor Version (0x01).
7 2041 Data Depends on the volume descriptor type.

This means that each volume descriptor is therefore one sector (2KiB) long.

Volume Descriptor Type Codes

Value Description
0 Volume descriptor is a Boot Record
1 Primary Volume Descriptor
2 Supplementary Volume Descriptor
3 Volume Partition Descriptor
4-254 Reserved
255 Volume Descriptor Set Terminator

When starting out with a basic CD, we are going to be interested in the Primary Volume Descriptor, which points us to the root directory and path tables, which both allow us to find any file on the CD. Using the path table is ideal for minimal implementations which do not wish to search the directory hierarchy node by node. This is slower (string comparisons across the entire filesystem) but easier to implement.

Volume Descriptor Set Terminator

The Volume Descriptor Set Terminator does not currently define bytes 7-2047 of its Volume Descriptor. This means that the only fields in use for the volume set terminator are the type code (255), the standard identifier ('CD001') and the descriptor version (0x01).

The Boot Record

The first type of Volume Descriptor is the "Boot Record". The descriptor format is as follows:

Offset Length (bytes) Field Name Value Description
0 1 Type 0 Zero indicates a boot record.
1 5 Identifier 'CD001' Always 'CD001'.
6 1 Version 0x01 Volume Descriptor Version (0x01).
7 32 Boot System Identifier (string) ID of the system which can act on and boot the system from the boot record in a-characters.
39 32 Boot Identifier (string) Identification of the boot system defined in the rest of this descriptor in a-characters.
71 1977 Boot System Use Unspecified Custom - used by the boot system.

The Primary Volume Descriptor

This is a lengthy descriptor, but it contains some very useful information for reading the rest of the file system.

Offset Length (bytes) Field Name Description
0 1 Type Code Always 0x01 for a Primary Volume Descriptor.
1 5 Standard Identifier Always 'CD001'.
6 1 Version Always 0x01.
7 1 Unused Always 0x00.
8 32 System Identifier The name of the system that can act upon sectors 0x00-0x0F for the volume in a-characters.
40 32 Volume Identifier Identification of this volume in d-characters.
72 8 Unused Field
80 8 Volume Space Size Number of Logical Blocks in which the volume is recorded. This is a 32 bit value in both-endian format.
88 32 Unused Field
120 4 Volume Set Size The size of the set in this logical volume (number of disks). This is a 16 bit value in both-endian format.
124 4 Volume Sequence Number The number of this disk in the Volume Set. This is a 16 bit value in both-endian format.
128 4 Logical Block Size The size in bytes of a logical block in both-endian format. NB: This means that a logical block on a CD could be something other than 2KiB!
132 8 Path Table Size The size in bytes of the path table in 32 bit both-endian format.
140 4 Location of Type-L Path Table LBA location of the path table, recorded in LSB-first (little endian) format. The path table pointed to also contains LSB-first values.
144 4 Location of the Optional Type-L Path Table LBA location of the optional path table, recorded in LSB-first (little endian) format. The path table pointed to also contains LSB-first values. Zero means that no optional path table exists.
148 4 Location of Type-M Path Table LBA location of the path table, recorded in MSB-first (big-endian) format. The path table pointed to also contains MSB-first values.
152 4 Location of Optional Type-M Path Table LBA location of the optional path table, recorded in MSB-first (big-endian) format. The path table pointed to also contains MSB-first values.
156 34 Directory entry for the root directory. Note that this is not an LBA address, it is the actual Directory Record, which contains a zero-length Directory Identifier, hence the fixed 34 byte size.
190 128 Volume Set Identifier Identifier of the volume set of which this volume is a member in d-characters.
318 128 Publisher Identifier The volume publisher in a-characters. If unspecified, all bytes should be 0x20. For extended publisher information, the first byte should be 0x5F, followed by an 8.3 format file name. This file must be in the root directory and the filename is made from d-characters.
446 128 Data Preparer Identifier The identifier of the person(s) who prepared the data for this volume. Format as per Publisher Identifier.
574 128 Application Identifier Identifies how the data are recorded on this volume. Format as per Publisher Identifier.
702 38 Copyright File Identifier Identifies a file containing copyright information for this volume set. The file must be contained in the root directory and is in 8.3 format. If no such file is identified, the characters in this field are all set to 0x20.
740 36 Abstract File Identifier Identifies a file containing abstract information for this volume set in the same format as the Copyright File Identifier field.
776 37 Bibliographic File Identifier Identifies a file containing bibliographic information for this volume set. Format as per the other File Identifier fields.
813 17 Volume Creation Date and Time Date and Time format as specified below.
830 17 Volume Modification Date and Time Date and Time format as specified below.
847 17 Volume Expiration Date and Time Date and Time format as specified below. After this date and time, the volume should be considered obsolete. If unspecified, then the information is never considered obsolete.
847 17 Volume Effective Date and Time Date and Time format as specified below. Date and time from which the volume should be used. If unspecified, the volume may be used immediately.
881 1 File Structure Version An 8 bit number specifying the directory records and path table version (always 0x01).
882 1 Unused Always 0x00.
883 512 Application Used Contents not defined by ISO 9660.
1395 653 Reserved Reserved by ISO.

Date and Time Format

The date / time format used in the Primary Volume Descriptor is:

Offset Size Description
0 4 Year from 1 to 9999.
4 2 Month from 1 to 12.
6 2 Day from 1 to 31.
8 2 Hour from 0 to 23.
10 2 Minute from 0 to 59.
12 2 Second from 0 to 59.
14 2 Hundredths of a second from 0 to 99.
16 1 Offset from GMT in 15 minute intervals from -48 (West) to +52 (East)

All fields except for the offset from GMT are in ASCII digits. An unspecified date and time is represented by 16 '0' digits, followed by a zero in the last field.

The Path Table

The Path Table contains a well-ordered sequence of records describing every directory extent on the CD. There are some exceptions with this: the Path Table can only contain 65536 records, due to the length of the "Parent Directory Number" field. If there are more than this number of directories on the disc, some CD authoring software will ignore this limit and create a non-compliant CD (this applies to some earlier versions of Nero, for example). If your file system uses the path table, you should be aware of this possibility. Windows uses the Path Table and will fail with such non-compliant CD's (additional nodes exist but appear as zero-byte). Linux, which uses the directory tables is not affected by this issue.

The location of the path tables can be found in the Primary Volume Descriptor. There are two table types - the L-Path table (relevant to x86) and the M-Path table. The only difference between these two tables is that multi-byte values in the L-Table are LSB-first and the values in the M-Table are MSB-first.

The structure of a Path Table Entry is as follows:

Offset Size Description
0 1 Length of Directory Identifier
1 1 Extended Attribute Record Length
2 4 Location of Extent (LBA). This is in a different format depending on whether this is the L-Table or M-Table (see explanation above).
6 2 Directory number of parent directory (an index in to the path table). This is the field that limits the table to 65536 records.
8 (variable) Directory Identifier (name) in d-characters.
(variable) 1 Padding Field - contains a zero if the Length of Directory Identifier field is odd, not present otherwise. This means that each table entry will always start on an even byte number.

The path table is in ascending order of directory level and is alphabetically sorted within each directory level.

Directories

At some point when reading from an ISO 9660 CD, you will need a directory record to locate a file, even if you generally use the path table to locate the directory initially. Unlike the path tables, there is only one version of each directory table, and multi byte numbers are in both-endian format. A directory record is laid out as follows:

Offset Size Description
0 1 Length of Directory Record.
1 1 Extended Attribute Record length.
2 8 Location of extent (LBA) in both-endian format.
10 8 Data length (size of extent) in both-endian format.
18 7 Recording date and time (see format below).
25 1 File flags (see below).
26 1 File unit size for files recorded in interleaved mode, zero otherwise.
27 1 Interleave gap size for files recorded in interleaved mode, zero otherwise.
28 4 Volume sequence number - the volume that this extent is recorded on, in 16 bit both-endian format.
32 1 Length of file identifier (file name). This terminates with a ';' character followed by the file ID number in ASCII coded decimal ('1').
33 (variable) Length of file identifier.
(variable) 1 Padding field - zero if length of file identifier is odd, otherwise, this field is not present. This means that a directory entry will always start on an even byte number.

Even if a directory spans multiple sectors, the directory entries are not permitted to cross the sector boundary (unlike the path table). Where there is not enough space to record an entire directory entry at the end of a sector, that sector is zero-padded and the next consecutive sector is used. Some of the above fields need explanation. Unfortunately, the date/time format is different from that used in the Primary Volume Descriptor. The Date/Time format is:

Offset Size Description
0 1 Number of years since 1900.
1 1 Month of the year from 1 to 12.
2 1 Day of the month from 1 to 31.
3 1 Hour of the day from 0 to 23.
4 1 Minute of the hour from 0 to 59.
5 1 Second of the minute from 0 to 59.
6 1 Offset from GMT in 15 minute intervals from -48 (West) to +52 (East).

This is quite a contrast to the PVD which contains ASCII encoded decimal values, but this format is presumably used to save disc space over a large number of entries.

The other field that needs some explanation is the File Flags field. This is represented by one bit flags as follows:

Bit Description
0 If set, the existence of this file need not be made known to the user (basically a 'hidden' flag.
1 If set, this record describes a directory (in other words, it is a subdirectory extent).
2 If set, this file is an "Associated File".
3 If set, the extended attribute record contains information about the format of this file.
4 If set, owner and group permissions are set in the extended attribute record.
5 & 6 Reserved
7 If set, this is not the final directory record for this file (for files spanning several extents, for example files over 4GiB long.

Locating Data on the CD

By now, you should be able to see that there are two main ways to navigate to a file record. You an either search the path table, or you can search the full directory structure. You may find it more convenient and faster to cache the path table, loading directories only when necessary.

Searching the Path Table

If you are using the Path Table method, you will still need to know about Directory Records to find the file you are looking for. Basically, you search the path in a reverse order, following the "Parent Directory" links in the Path Table. Once you have located the directory containing the file you want, load that Directory and scan it for the appropriate file name.

Recursing from the Root Directory

Alternatively, you can ignore the Path Table and just cache the root directory from the Primary Volume Descriptor. You then load each directory in turn. For example, for the path '/BOOT/MYLOADER/STAGE2.BIN'

  1. Read the PVD in to memory. Bytes 156-189 contain the root directory entry.
  2. Load the root directory by reading the LBA and Length values in this root directory entry.
  3. Scan the directory entry identifiers for 'BOOT;1'.
  4. If found, use the LBA and length values to load the 'BOOT' directory in to memory.
  5. Repeat steps 3 and 4 for the file identifier 'MYLOADER;1'.
  6. Scan the 'MYLOADER' directory for 'STAGE2.BIN;1'. If found, you can now use the LBA value to load your file in to memory.

See Also

Articles

  • El-Torito, a standard for creating bootable CD-ROMs

External links