Hierarchical VFS Theory

From OSDev.wiki
Jump to navigation Jump to search

Purpose of this article

This article was written in order to explain the basic concepts behind a hierarchical VFS. Articles about tag-based VFSs and DB-VFSs are proposed for a later date. This article does not seek to explain how to code or implement a VFS: it explains the structure and justification behind conventional VFS design. Do not look here for "code examples".

What is a hierarchical VFS?

In contrast with a tag-based VFS or a DB-VFS, a hierarchical VFS is concerned with caching folder-hierarchies on a set of concrete filesystems. A tag-based VFS would for example, be a tag indexer which would be most efficiently implemented (for a UTF-8 based kernel) with an array of 256 "top layer" character indexes, with each of those array indexed pointing to a further array of 256 indexes. You would then parse tag addresses (as opposed to file paths) by quickly indexing into the top level array using the first letter of the tag, then into the sub array with the second letter of the tag. From there, you would have a B-tree under each sub-index which points to a list of all tags with the same two first letters.

Tag based VFSs are generally not concerned with hierarchy, but intersection. So a tag-based address would be similar to: "Shaniqua:music:Jazz". Those three tags may individually have about 100 files each in their listings; but the intersection of each listing set incrementally will give the sum of the files which pertain to all three. An empty set is taken to be homologous to an "empty folder" in a hierarchical file-system.

Hierarchical file-systems do not work by intersection, but by iterative sub-levelling. So for a hierarchical VFS, your aim would be to quickly index paths and their subfiles by name. This would make for easy parsing of hierarchical filesystem paths. A tree-structure is usually taken to be the most efficient method of handling a hierarchical VFS.

Conventional tree caching format

Keep in mind always that a VFS is nothing more than a structural cache of the layout of the underlying concrete filesystems mounted on it; You cache the folder hierarchy as you go deeper into it.

The main distinction that must be made when designing VFS tree caching is the one between tnodes and inodes, where "t-node" stands for "tag-node" and "i-node" stands for "info-node". In conventional UNIX kernels (and in NT as well, though the facility is used by the kernel and a software based ".lnk" format is employed for userspace), a single filesystem node may point to the same file data as another file system node. That is, you may have two, three, four, or more "files", all which may be in different folders, which would show up in a directory listing of their respective folders, which may point to the same file data. Writing to the data from one of those files' perspective would produce a change in contents from the other files' point of view since they all point to the same data.

These are called file-system links. That is, file-data is to be treated separately from a tag-node which points to it. A tag-node gives collection of bytes (file data) a name. An info-node describes file data. A tag-node does nothing more than assign a name and point to the i-node which describes the data to which the file-name is associated.

Info nodes are then expected to state information such as: creation date of the file, last modification date, last access date, size of the file data, and any unique ID number which identifies that file on the underlying concrete file-system. Please keep starkly in mind that virtual file systems are not the only ones that use the t-node/i-node arrangement: many concrete file systems do as well. As a result, hereafter, this article will refer to VFS inodes as "VFS inodes" and to inodes on a concrete filesystem as "concrete inodes".

Tag node cache

Whenever the VFS parses a file path, it sequentially descends down the tree from the VFS root, looking at sub-folder names until it reaches the last part of the file path; at that point, it will check to see if it finds the last component of the file path as a subfolder. If it turns out not to be a sub-folder, then the VFS will see if it's a sub-file. If that fails, or if the descent failed at an earlier conponent, then the VFS assumes the path to be invalid.

A corollary of this is that the VFS must be able to quickly descend from the root node, through a list of t-nodes which give the file-name of each sub-file or folder.

A process of logic reveals that: A file, or folder t-node only contains the name of the file or folder, and a pointer to the inode for that file or folder. Where then, would the VFS cache file-listings for a cached folder? In that folder's VFS inode. Specifically, the folder t-node would point to a folder VFS inode. To search the the subfolders or subfiles of a folder t-node, you would dereference the t-node's VFs inode pointer, and read the list of sub-files and folders from that VFS inode.

What is the list of files and folders really, then? It is a list of t-nodes. So:

t-node -> inode with sub-file and folder list (which is really just a list of more tnodes) -> search tnodes -> find next path component match -> tnode -> inode with subfile and folder list (which are tnodes) -> search tnodes -> find next path component -> tnode -> and so on and so forth.

That is essentially the structure of a hierarchical VFS.

Folder or file t-node:
+-----------------------------------------------------------------------------+-----------------------+
| <file or folder name>         ...                                           | pointer to inode      |
+-----------------------------------------------------------------------------+-----------------------+

Folder i-node:
+-------------------------------------------+
| Pointer to list of tnodes for sub-folders |
+-------------------------------------------+
| Pointer to list of tnodes for sub-files   |
+-------------------------------------------+
| int  num_subfolders; | int num_subfiles   |
+-------------------------------------------+
| concrete FS inode number                  |
+-------------------------------------------+

File i-node:
+-------------------------------------------+
| Creation date | Date of last modification |
+-------------------------------------------+
| File size                                 |
+-------------------------------------------+
| [Optional] Lock | [Optional] State flags  |
+-------------------------------------------+
| Concrete FS inode number                  |
+-------------------------------------------+
| Other things you think are necessary      |
+-------------------------------------------+

Support for file-system links

Two unique t-nodes in the tree which point to the same inode are said to be "links" to that inode, and by extension, will cause a read or write to that t-node (file-name) to update the same file. There is no extra work required: just treat them as t-nodes which point to the same inode. You can also set flags to identify them as links, although it makes no real difference to the VFS proper. The only thing you need to note about links is that even though your VFS unifies all mounted concrete file systems under one tree, links can only be local to a single concrete file-system: this is only common sense. Remember, links point to the same inode. If there is a link on one concrete file-system which points to an inode on another concrete filesystem (in other words, on another disk), how can it then be pointing to the same file data?

Thoughts on linking the Inode-based VFS to concrete hierarchical filesystems which do not use an inode structure

Most recent concrete filesystems use an inode based approach to identifying file data within their represented disk. Each unique entity of data set is given a unique inode ID within that disk's filesystem. So every file's data is uniquely identified on that disk by means of its filesystem's unique inode as given to that file's data. If you wanted to create a generic filesystem driver interface that simplified file system interfacing for all filesystems, and you chose to use the "inode" concept, you would have no problem doing so with for example, the EXT class of filesystems, which use inodes.

When your kernel requests a folder listing, the filesystem driver would return tag-nodes for the names of all sub-files and folders, and inodes that are related to each tag node, and you would ensure you stored the concrete filesystem's inode ID for that file as given on the concrete filesystem; If later on, the user renamed a file within a folder, you would simply find that folder in your cached VFS hierarchy, and tell the filesystem driver to rename the file named "foo" on folder inode X within that driver's represented filesystem to "bar", as the user requested: there is a direct mapping between your VFS and the inodes on the underlying filesystem.

What if you wish to write a driver for your kernel for say, one of the FAT filesystems? FAT does not use inodes to uniquely identify file data. If you were to pass a request for an FAT driver to rename tag node "foo" on folder inode X to "bar", you'd fail naturally since first of all, the FAT driver would not have returned a concrete FS inode ID to you when it was giving you that folder's file listing.

A plausible solution would be to force all drivers to create file data -> inode mappings. So, in such a case, when the driver is initialized, it initializes a counter from which it will pull fake inodes to hand the kernel when it gives over file listings. It will just pass these out on the fly, creating new inodes as it gives the kernel new file listings, and it would maintain a table of the mappings between the inodes it handed out, and whatever format it uses to uniquely identify file data. It would of course, ensure that these inode IDs it hands out on the fly would be unique across mounts.

In that case, when the kernel requests a file listing, of a folder, the driver would pass the kernel the folder's inode information, and assign it, as it gives it over, a new inode ID and record this in a table. Later on, the kernel, unaware that there is translation work being done behind the scenes in the driver would pass a request for file "foo" on inode X (an inode the kernel does not know has been created on the fly) to be renamed to "bar".

The driver now looks at the inode mapping table, and sees that it assigned file data group Y the inode ID X when it was passing it to the kernel. The filesystem will now go and rename the file as appropriate, and things go on as usual, with the kernel unaware of that mapping activity beign done in the driver to make it conform to your driver interface which requires inodes to uniquely identify files within a single conrete filsystem.