A very minimal distributed filesystem meant solely for write-once-read-many workloads using differently sized drives under my homelab. Taking inspiration from lizardfs, moosefs, and razorfs.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
hak8or ddac47d743 (De)Serializing of Inodes and Chunk Table (without resize compression) 6 days ago
CMake Initial commit 5 months ago
docs Added catch2+spdlog, first working mojette transform+inverse 4 months ago
libs More WIP, ensure this is squashed later. 1 week ago
misc_source_data Added catch2+spdlog, first working mojette transform+inverse 4 months ago
sources (De)Serializing of Inodes and Chunk Table (without resize compression) 1 day ago
.gitignore Initial commit 5 months ago
.gitmodules More WIP, ensure this is squashed later. 1 week ago
CMakeLists.txt More WIP, ensure this is squashed later. 1 week ago
readme.md Added to readme some timing info and formatting improvements 3 weeks ago

readme.md

MyFS

A dead simple file system optimized for homelab use, where we care primarly for read-many-write-once use-cases while making use of many variable sized drives via configurable erasure encoding chunks.

FileSystem Configuration

Using the following assumptions.

Property Value
Rows per Chunk 3
Word Size (Bytes) 4
Total Storage (Bytes) 35 TB
Total Files 1,000,000
Inode Base Size (Bytes) 1024
ChunkCacheEntry Size (Bytes) 16
ChunkIDs per Inode 16
ChunkID’s per Table 8192
ChunkID Tables per Inode 2048
Inode size (Bytes) 17,536
Inode Total Size (MB) 16,724
ChunkTable Size (Bytes) 65,536

We get the following data. This will be useful for getting a sense of memory needs and lookup times.

Words/Row Data/Chunk (Bytes) Total Chunks Total Projections Chunk Cache (MB) Data/Inode (KB) ChunkTables Data/ChunkTable (MB) Max File Size (MB) ChunkTables Total (MB)
64 768 45,812,984,491 229,064,922,453 699,050.67 12 5,592,406 6 12,288 349,525
128 1,536 22,906,492,245 114,532,461,227 349,525.33 24 2,796,203 12 24,576 174,763
256 3,072 11,453,246,123 57,266,230,613 174,762.67 48 1,398,102 24 49,152 87,381
512 6,144 5,726,623,061 28,633,115,307 87,381.33 96 699,051 48 98,304 43,691
1024 12,288 2,863,311,531 14,316,557,653 43,690.67 192 349,526 96 196,608 21,845
2048 24,576 1,431,655,765 7,158,278,827 21,845.33 384 174,763 192 393,216 10,923
---- ------- -------------- --------------- ---------- ------ --------- ---- ---------- -------
4096 49,152 715,827,883 3,579,139,413 10,922.67 768 87,382 384 786,432 5,461
---- ------- -------------- --------------- ---------- ------ --------- ---- ---------- -------
8192 98,304 357,913,941 1,789,569,707 5,461.33 1,536 43,691 768 1,572,864 2,731
16384 196,608 178,956,971 894,784,853 2,730.67 3,072 21,846 1536 3,145,728 1,365
32768 393,216 89,478,485 447,392,427 1,365.33 6,144 10,923 3072 6,291,456 683
65536 786,432 44,739,243 223,696,213 682.67 12,288 5,462 6144 12,582,912 341

Benchmarks

Below is a rough benchmark using Catch2’s BENCHMARK macro. Performance can be better, for example I bet we could optimize the overflow bit handling to instead just use a byte per column, so we won’t have to do bitwise operations. We see the smaller bin sizes even off in performance somewhere between 64 and 256 words per pin, which opens up being better able to multithread getting smaller chunks.

Words/Row Mode Mojette Transform Mojette Inverse
5 Words Rel 32 nS (31,250K Chunk/s): 625 MB/s 144 ns (6,944K Chunk/s): 139 MB/s
64 Words Rel 0.8 uS ( 1,250K Chunk/s): 320 MB/s 1.7 us ( 588K Chunk/s): 151 MB/s
96 Words Rel 1.2 uS ( 833K Chunk/s): 320 MB/s 2.5 us ( 400K Chunk/s): 154 MB/s
128 Words Rel 1.6 uS ( 625K Chunk/s): 320 MB/s 2.9 us ( 345K Chunk/s): 177 MB/s
256 Words Rel 3.0 uS ( 333K Chunk/s): 341 MB/s 5.7 us ( 175K Chunk/s): 179 MB/s
1024 Words Rel 12 uS ( 83K Chunk/s): 338 MB/s 23 us ( 44K Chunk/s): 176 MB/s
4096 Words Rel 48 uS ( 21K Chunk/s): 341 MB/s 92 us ( 11K Chunk/s): 178 MB/s
16384 Words Rel 197 uS ( 5K Chunk/s): 332 MB/s 375 us ( 3K Chunk/s): 175 MB/s

Keep in mind the following

  • Time is how long to do one iteration of a Mojette Inverse or Transform.
  • Each word is 4 bytes long, with 2 overflow bits in seperate bitmap
  • Mojette Transform generates 5 projections, not the minimum 3
  • Mojette Inverse includes a copy of the projections
  • This is with 3 data rows and 5 projections.
  • Release mode is CMake’s “Release” mode with all sanitizers disabled.