Talk:PureFS

From OSDev.wiki
Jump to navigation Jump to search

Valid for PureFS V1.1.0

Some notes and questions

I have looked over the page and I have some notes and questions (in no particular order) about the file system as it currently is specified:

  1. How are hash collisions handled? All the hashes are only 64 bits, but are of arbitrarily long data, which (at least theoretically) makes it pretty simple to for example construct 2 paths with identical hashes. This means that there are an infinite number of paths that refer to the same file, and most of them do not have the property that the hash of the path to the parent matches what is stored in the file node (breaking what I assume to be an invariant).
  2. File lookup is potentially slow. Finding an arbitrary file entails scanning the entire file system for a file node with a matching hash, and you need to scan the whole file system to know a path is invalid, even if you do it component-by-component. This could be improved by storing file nodes in a hash table or some search tree keyed on the hash.
  3. How are programs like "ls" supposed to work? Directories only act as markers and do not keep track of their children, which means it is up to the OS to keep track of a list of children for a directory, which is (again) built by scanning the whole file system. Doing this is expensive for large file systems, both in terms of memory and speed.
  4. The way data blocks are arranged makes various operations (like seeking) expensive. The OS needs to cache all the block pointers, and if the cache is ever purged (e.g. to free up memory), it needs to basically read the whole file block-by-block to refill it. That, and the non-power-of-2 block size, might also make it annoying to deal with when trying to implement a page cache.
  5. How are new data blocks allocated? Consider a set of unused blocks in the middle of a partition (as can happen after a file is deleted). How do you keep track of them to allow them to be reused when a new file is created, or when a file is extended?
  6. I feel like the attribute bits could be arranged better. Instead of a "read-only" and a "write-only" bit why not have a "readable" bit and a "writable" bit (a la Unix permissions). Doing that gets rid of the invalid state of a file being both read-only and write-only at the same time, and it fits with the existing "executable" bit. The "system" bit also feels unnecessary, and I assume only exists because FAT has it (same with "hidden", but there I can see some merit to having it).
  7. Symlinks are sub-optimal. Unix-like OSes provide a way to read the exact path a symlink points to. Storing only a hash of the path means that this operation is very expensive (need to again traverse the whole file system to find the target file and all it's parents to reconstruct a path). This also means that you can't have symlinks to paths that don't exist.
  8. Some of the terminology is a bit confusing IMO. As far as I can tell, what is called a partition is actually closer to e.g. a BTRFS subvolume rather than an actual partition (as in one from the MBR or GPT)? Perhaps it'd be a good idea not to overload the term for no reason.

--Qookie 02:43, 24 May 2024 (CDT)

Hello, thanks for your questions. The goal of this file system is not speed, but ease of understanding and implementation. That is why you can install another FS on the disk with this FS. If you chase speed, the FS will lose its simplicity.

Here are my answers to your questions:

1. In fact, I'm still thinking about this problem. Perhaps I'll add 1 more field for a hash of the file path and parent directory. This way, 2 hashes of the file will be calculated and checked against the data, which will avoid most collisions.

2. As I described above, in the pursuit of speed, simplicity is lost.

3. Although it is slow, it is possible. Simply search for files whose `ParentHash` matches the hash of the path to the parent directory.

4. You're right. But the OS should not store blocks. It just reads them from disk.

5. When creating a new file, you need to go to the first data block, move it to the end and rewrite the `NextData` field if it is not 0. In its place you need to write `PureFsFile`, add the file data to the end of the `PureFsData` records and change `NumFiles` , `LastFile`, `FirstData`, `LastData`.

6. Good idea, I'll change some attributes.

7. You are right here too. I'll think about how to solve this problem.

8. I don't know English very well and use a translator. If you see errors, please correct them, I will be very grateful.

If you have ideas on how to make FS better, please let me know.

Drkeph 05:21, 24 May 2024 (CDT)