resources for learning about file system implementations?

share

Summary of results

GPT-4o
Tip: click on links to see relevant comments

Books and Textbooks

  • "Practical File System Design" by Dominic Giampaolo: Frequently recommended for its detailed exploration of file system design, though some consider it dated.
  • "File Systems: Structures and Algorithms" by Harbor: Another recommended textbook.
  • "The Design of the UNIX Operating System" by Maurice J. Bach: Covers file systems comprehensively, despite being older.
  • "The Design and Implementation of the 4.4 BSD Operating System" by Marshall Kirk McKusick: Detailed information on the original Fast File System.
  • "Linux Kernel Development" by Robert Love (chapters 13-14): Focuses on Linux file systems.
  • "The Linux Programming Interface": Contains relevant chapters on file systems.
  • "Operating Systems: Three Easy Pieces" by Remzi H. Arpaci-Dusseau and Andrea C. Arpaci-Dusseau: Free and highly recommended for foundational knowledge.

Online Resources

  • U Wisc Operating Systems open textbook: The Persistence section is particularly useful.
  • Usenix FAST conferences: Archives contain newer and experimental approaches to file systems.
  • Kernel.org documentation: Overview of the VFS layer.
  • Dan Luu's post on filesystem errors: Covers failure modes and links to relevant papers.

Historical and Practical Insights

Specific File Systems

Standards and Policies

  • Filesystem Hierarchy Standard (FHS): Guidance on filesystem paths.
  • Debian Policy Manual: Specific to Debian and derivatives.

Additional Recommendations

1.

Practical File System Design by Dominic Giampaolo

File Systems: Structures and Algorithms by Harbor

There is probably a relevant chapter on the The Linux Programming Interface on file systems as well.

2.

For an in-depth understanding of filesystems, I highly recommend "The Design of the UNIX Operating System" by Maurice J. Bach. It might be an older book, but it covers filesystems in a comprehensive and insightful way

3.

This series has been really informative regarding the history of filesystems - when each breakthrough occurred and the historical tradeoffs that were considered at each point.

https://blog.koehntopp.info/2023/05/05/50-years-in-filesyste...

4.

I would recommend as a starter the Persistence section of the U Wisc Operating Systems open text book. https://pages.cs.wisc.edu/~remzi/OSTEP/. In particular the cited papers are foundations of major file system approaches of used many (probably most) systems today.

In the non-textbook sources, I would also recommend archives of the Usenix FAST conferences (File and Storage Technologies), where you often see newer and experimental approaches to file systems.

e.g. F2FS, https://www.usenix.org/conference/fast15/technical-sessions/...

5.

Slightly OT.

What resources would people here suggest for learning about file systems? I see a lot of new file systems like zfs, btrfs, etc. I looked up for resources but couldn't find anything substantial.

I want to learn how they work so that I can appreciate projects like this and compare them.

I looked into the Build Your Own X repo but didn't find anything. I found a book called Practical File System Design: The BeOS file system but it's apparently dated, and I'm not sure I want too much of a deep dive.

7.

I remember learning about filesystem design from the classic "demon" book (The Design and Implementation of the 4.4 BSD Operating System by Marshall Kirk McKusick) which has a lot of details about the original Fast File System.

8.

Practical File System Design:The Be File System covers the designs of other file systems briefly and goes from FFS to XFS, NTFS, EXT2 (book was written in the late 90s)

9.

For anyone looking to learn a bit more about filesystems ext2 strikes a great balance of simplicity and real world practicality, making it good to learn from. Code here https://elixir.bootlin.com/linux/v3.9/source/fs/ext2

11.

Not entirely related, but recently I began re-reading Practical File System Design with the Be File System by Dominic Giampaolo. Not exactly an up-to-date text, but a good guide to the tradeoffs involved in designing a file system.

* https://web.archive.org/web/20170213221835/http://www.nobius...

12.

If you haven't seen it before, you might find this useful https://www.kernel.org/doc/html/latest/filesystems/vfs.html

It's an overview of the VFS layer, which is how they do all the filesystem-specific stuff while maintaining a consistent interface from the kernel.

13.

"We covered the grandfather of most modern file systems, BSD FFS; the

fast and unsafe grandchild, ext2; the odd-ball cousin, HFS; the burly nephew,

XFS; and the blue-suited distant relative, NTFS. Each of these file systems

has its own characteristics and target audiences. BSD FFS set the standard

for file systems for approximately 10 years. Linux ext2 broke all the rules

regarding safety and also blew the doors off the performance of its predeces-

sors. HFS addressed the needs of the GUI of the Macintosh although design

decisions made in 1984 seem foolhardy in our current enlightened day. The

aim of XFS is squarely on large systems offering huge disk arrays. NTFS is

a good, solid modern design that offers many interesting and sophisticated

features and fits well into the overall structure of Windows NT."

Thanks for the link to the book!

14.

IMO, reading books about operating systems internals / implementation, particularly the parts about disk management and file systems, block vs.

character I/O, the buffer cache, fragmentation, etc., would help, too. Some are Unix-based terms, translate for other OSes.

15.

no - a filesystem implementation on an ordinary OS has more than what you mention, including interfaces to disk device drivers

16.

Dan Luu has a post (https://danluu.com/filesystem-errors/), which covers some of the same ground, and links to papers with more information on the failure modes of file systems in the face of errors from the underlying block device. Prabhakaran, et. al. (https://research.cs.wisc.edu/wind/Publications/iron-sosp05.p...), did a bunch of filesystem testing (in 2005!), and their paper includes discussion on how to generate "realistic" filesystem errors, as well as discussion of how the then state-of-the-art filesystems (ext3, Reiser (!), and JFS) perform in the face of these errors.

I'm unaware of any research newer than Dan Luu's post on filesystem error handling.

19.

Learn about the file system, permissions, processes, signals, threads, the scheduler, standard input and output, networking, etc.

22.

Unfortunately, file system development is a pretty niche skill set these days, and the majority of the experts in the field are employed maintaining existing file systems (ext4, xfs, apfs, etc).

One thing I’ve been bugging Kent to do is to write documentation about the design and internal workings on bcachefs; very little about modern file system design is actually written down anywhere, and a detailed reference manual would attract more people to work in this area.

23.

There are several file systems: a virtual file system (in the resource manager), a network file system (in netfs, with a cache in RAM), a local persistent file system (in localfs, using littlefs and indexedDB). Pipe driver is there. and some signalling as well.

At the moment the project is a proof of concept, for sure a lot of things are not developed and not debugged. If it appears that the project is interesting enough, I will have to find people helping me to make it a real product

24.

Filesystems are literally what it says on the tin. It is a filing system. Look in library and secretarial annals for the earliest foundational thinking from which computing's idea of filesystems were born. A systemization of behaviors and abstractions that facilitate the organization, addressing, and access of data.

Go to any library, or talk to any long time/old school secretary or warden of archived paperwork, and I assure you, they will be happy to extoll the virtues of simple or reckonable information storage.

A hierarchical data store comes baked in with an opportunity of implementing topical locality for the end user, which allows you to utilize pathfinding logic baked into your brain to navigate the corpus of information in question. Content addressable stores, require praying that the layers of cryptography work, or you have enough understanding of the implementation details and tooling around the store to find what you need.

In short, find | grep being strictly necessary, rather than a fallback, means you've failed at organizing things so your user can understand where the hell something even is, and why it is there.

I assure you, more harm is done by forgetting the fundamental human way of life that computing tries to plaster over, as we inflict impedance mismatch on Users by forcing them to search in a way that makes sense only to the machine, rather than to them.

Sometimes a little less ideal computational performance pays dividends in ease of picking up.

25.

Of course it depends on the filesystem. I’d be interested in what it is concretely, for each of the commonly used desktop filesystems.

26.

Does anyone have any resources for other operating systems? This might be one of those things that I need to test drive OSes for but it'd be neat to just read about.

27.

Here's some info from gherkin0 back from June of 2016:

gherkin0 on June 26, 2016 | next [–]

> Fun fact: Dominic Giampalo (who wrote the BeOS file system) is on the APFS team. His book "Practical File System Design" is an excellent description of a traditional UNIX file system design. May be out of print now but I think used copies turn up on Amazon.

It looks like he has a PDF up on his website:

http://www.nobius.org/~dbg/practical-file-system-design.pdf

29.

Look interesting

“because it is a very small system, it is a good opportunity to easily understand, how Linux file systems are constructed”

Could what I’m looking for

31.

When you test a file system are you testing it via the libc APIs, creating, deleting, checking filesystem objects and so on? Or is it done at a lower level API (that I’d hope is the same across filesystems, different features excepted)

32.

How hard is it to maintain and keep secure an entire modern, complex, evolving file system which the OS itself does not even use (so nothing like simple, frozen, and interchange-standard FAT) in an operating system kernel?

Very hard.

Instead there are 3rd party implementations. Those are independently maintained, serving directly the (comparatively small) target audience.

33.

There is also a book at the BeFS filesystem:

http://www.nobius.org/dbg/practical-file-system-design.pdf

(Legal source, the website is the author's.)

Dominic Giampaolo now works at Apple on APFS.

34.

I’ve had this research paper on my reading list for a while (but haven’t gotten to reading the full thing)[1]. Not necessarily just a file system but It lays out an entire operating system backed by a database and OS state interactions are done through SQL.

1. https://vldb.org/pvldb/vol15/p21-skiadopoulos.pdf

36.

Some secondary reading, I've referred to FHS at times. It's the 'Filesystem Hierarchy Standard':

https://refspecs.linuxfoundation.org/FHS_3.0/fhs/index.html

They provide guidance on how a given filesystem path should be used.

This has informed the default SELinux policies greatly; familiarity turns hassle into informed assumptions/ease.

37.

Previously on 'Esoteric Filesystem Week':

0. Linux's SystemV Filesystem Support Being Orphaned https://news.ycombinator.com/item?id=34818040 by rbanffy 3 days ago, 70 points, 73 comments

1. TabFS – a browser extension that mounts the browser tabs as a filesystem https://news.ycombinator.com/item?id=34847611 by pps 1 day ago, 961 points, 185 comments

2. Vramfs – GPU VRAM based file system for Linux https://news.ycombinator.com/item?id=34855134 by pabs3 1 day ago, 226 points, 71 comments

38.

It depends a bit on your objectives - do you want to build your own OS, learn more about the theory and algorithms used in operating systems, or do you want to know details about commercial OSes?

For a hands-on view on building operating systems, I can recommend taking a look at xv6 [1] as the next step. This is a modern reimplementation of a subset of the 6th edition Unix, running on RISC-V in its current version (an older version targeted x86) [2].

The classical textbooks used in OS courses (Tanenbaum's "Modern Operating Systems", Silberschatz' "Operating System Concepts" and Stallings' "Operating Systems") are more on the theoretical side, whereas Tanenbaum's Minix books are more hands-on. A current, somewhat more hands-on, textbook is "Operating Systems: Three Easy Pieces" by Remzi H. Arpaci-Dusseau and Andrea C. Arpaci-Dusseau. Additional bonus - it's free [3].

For more hands-on resources, there's a lot of material available. osdev.org [4] provides lots of OS development knowledge in wiki form, Stephen Marz' "The Adventures of OS: Making a RISC-V Operating System using Rust" [5] is a hands-on course written as a series of blog posts to build your own OS for RISC-V (but it needs to be updated since RISC-V has changed in some details since the articles were published). I learned about OS theory and practice using Doug Comer's Xinu books [17] and I still think these are very useful.

You can then dig into older Unix(-like) systems, e.g. BSD [6,7] and/or check out more modern approaches such as Microkernels (I can especially recommend L4, starting from Jochen Liedtke's papers [8] and checking out work on seL4 [9]), capability-based systems (CHERI [10] and an old, but good overview book on capability architectures [11]), or - for something a bit more exotic - Plan 9 [12].

If you want to gain an insight into commercial operating systems, there are a number of interesting books on Apple's OS X and iOS [13,14], Windows [15] and even VMS [16].

[1] https://github.com/mit-pdos/xv6-riscv/

[2] Russ Cox, Frans Kaashoek, Robert Morris

xv6: a simple, Unix-like teaching operating system

https://pdos.csail.mit.edu/6.S081/2020/xv6/book-riscv-rev1.pdf

[3] https://pages.cs.wisc.edu/~remzi/OSTEP/

[4] https://wiki.osdev.org/Main_Page

[5] https://osblog.stephenmarz.com

[6] Marshall Kirk McKusick , Keith Bostic, Michael J. Karels

The Design and Implementation of the 4.4BSD Operating System

Addison Wesley, ISBN-13: ‎ 978-0201549799

[7] Marshall Kirk McKusick, George V. Neville-Neil, Robert N.M. Watson

The Design and Implementation of the FreeBSD Operating System

Addison Wesley, ISBN-13: 978-0321968975

[8] https://en.wikipedia.org/wiki/L4_microkernel_family

[9] https://sel4.systems

[10] https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/

[11] Henry M. Levy

Capability-Based Computer Systems

https://homes.cs.washington.edu/~levy/capabook/

[12] Francisco J Ballesteros

Notes on the Plan 9 3rd edition Kernel Source

http://www.r-5.org/files/books/computers/internals/unix/Francisco_Ballesteros-Notes_on_the_Plan_9_Kernel_Source-EN.pdf

[13] Amit Singh

Mac OS X Internals: A Systems Approach

Pearson 2006, ISBN-13: 978-0-321-46795-9

[14] Jonathan Levin

*OS Internals (Volume I: User Mode, Volume II:Kernel Mode, Volume III: Security & Insecurity)

https://newosxbook.com/home.html

[15] https://learn.microsoft.com/en-us/sysinternals/resources/windows-internals

[16] VAX/VMS Internals and Data Structures

http://www.bitsavers.org/pdf/dec/vax/vms/training/EY-00014-DP_VMS_Internals_and_Data_Structures_1984.pdf

[17] https://xinu.cs.purdue.edu

39.

Another File system I am interested in is GEFS - good enough fs (rather - "great experimental file shredder" until stable ;-). It's based on B-epsilon trees, a data structure which wasn't around when ZFS was designed. The idea is to build a ZFS like fs without the size and complexity of zfs. So far its plan 9 only and not production ready though there is a chance it could be ported to OpenBSD and a talk was given at NYC*BUG: https://www.nycbug.org/index?action=view&id=10688

Code: http://shithub.us/ori/gefs/HEAD/info.html

40.

Linux does not have a single filesystem, but many of them. I guess you could create one that implements versioning. As a matter of fact around 2000 I used one (Unix, not Linux). It was a version control system where different file versions appeared under different POSIX paths. I liked it, but it was at least as complicated as git and most developers in our organization did not really understand it a complained when things went wrong.

For system administration I use etckeeper these days. That serves nearly the same purpose. But for programming VMS was superior unless you are disciplined and commit frequently enough.

41.

Filesystems introduce complexity that may not be needed, especially on smaller targets.

I'm personally not averse to file systems. I've implemented (but not yet published) a couple for testing purposes. It's probable that one or more of these will eventually be included in Konilo. But that won't happen until I write a program that needs this and have an implementation that I'm happy with.

42.

Add to that, all the hype on file systems as well with BeOS FS or WinFS ( I had too look up the name of the abandoned Vista FS, I am almost sure it had another name), plenty of new FS were created during that era as well.

File management was a key activity at that time.

43.

Have you ever seen IFS (Installable File System) SDK for Windows and the associated documentation? And then compared that, for example to linux vfs?

No wonder that so few people tried. They can spend their time on something simpler, like on building a highway bridge to Hawaii or something.


Terms & Privacy Policy | This site is not affiliated with or sponsored by Hacker News or Y Combinator
Built by @jnnnthnn