I built fossil around one idea: a packed file should be able to tell you how it was made. While it compresses, it records what it did, and fossil explainreads it back. The demos below run the real thing in the page, and everything after explains how it works.
Playground
Drop a file and fossil packs it right here. The breakdown below is whatfossil explain shows: which model handled each block, and how much it saved. Nothing is uploaded.
Entropy map
Drop in a file to see its entropy. This is the same heatmap fossil map draws and the stats fossil inspect reports. It runs in your browser, so the file never leaves your machine.
What it is
Most compressors just hand you a smaller file. fossil does that too, but it also keeps a record of how, so you can go back and see what it decided.
It works in blocks. The file is cut into 4 KB chunks, and each one is handled on its own. For every chunk fossil runs a handful of small models, checks what each one produces, and keeps the smallest. Different parts of a file usually want different methods, so choosing per block beats picking one method for the whole file.
Because that choice is saved, fossil explain can walk back through a packed file and tell you, block by block, what it used and why.
Commands
fossil pack <in> [out] | compress a file or directory into a .fossil (omit the input to pack the clipboard) |
fossil lift | fossilize the clipboard, then copy the .fossil back to it |
fossil unpack <file.fossil> [out] | restore the original (checks the CRC first) |
fossil inspect <file> | per-block breakdown: entropy, model, savings |
fossil map <file> | entropy heatmap, or block models for a .fossil |
fossil explain <file.fossil> | the reconstruction recipe (--block N for one block) |
pack --lossy[=bits] drops the low bits of each byte for a smaller file; --best-effort packs already-compressed inputs losslessly instead of refusing, and --images-only limits lossy to raw images.pack --verify round-trips the result before writing it, andunpack --trust skips the CRC check. pack --fast skips the slow models for much faster packing, trading a little ratio.
How it works
Each model looks for a different kind of structure. Run-length squashes long runs of one byte. Huffman and range coding give common bytes shorter codes. LZ replaces repeated chunks with pointers back to earlier copies, and LZR packs those pointers a bit tighter with some context, like LZMA. BWT reorders the data so similar contexts line up, which helps the stages after it. The generator spots simple patterns like counters and gradients and stores the rule instead of the bytes.
Every block tries all of these and keeps whichever comes out smallest:
| Model | Best for |
|---|---|
| RAW | the fallback, stored as-is |
| RLE | adjacent repeated bytes |
| ENTROPY | skewed byte frequencies (canonical Huffman) |
| LZ | repeated substrings |
| LZH | LZ, then Huffman |
| LZR | LZ tokens range-coded with a literal context (LZMA-style) |
| BWTM | Burrows-Wheeler + move-to-front + range coding |
| RANGE | adaptive range coding, no stored table |
| PPM | order-1 context (each byte from the last) |
| GEN | formulas like constant fills and ramps |
| DELTA | smooth, slowly-changing data |
| CSV | tabular data, transposed so columns group together |
| WORD | text with repeating words, dictionary-coded |
| SIGNAL | 8/16/24-bit audio and sensor data: FLAC-style windowed LPC, mid/side, partitioned Rice |
A few things about the format. Tiny or random files are stored as-is, so they never grow. Lengths use varints. Every file gets a CRC32, so corruption turns up on unpack. The blocks are still 4 KB, but the LZ models can look back up to 64 KB into what they've already seen, so a repeat far from its original only costs a pointer instead of a second copy. Raw images (PPM and BMP) get filtered row by row first (PNG-style), so the models see small differences instead of raw pixels. Packing a folder runs one LZ pass over everything, so duplicate files cost almost nothing.
How it measures up
On the sample files, against gzip -9 and zstd -19 (percent smaller, bigger is better):
| file | fossil | gzip -9 | zstd -19 |
|---|---|---|---|
| mixed.bin | 78.2% | 74.1% | 77.0% |
| bigmix.bin | 78.3% | 74.7% | 77.3% |
| cat.ppm | 73.1% | 51.1% | 57.7% |
| wave.pcm | 81.0% | 1.7% | 2.3% |
| cat.jpg | ~0% | ~0% | ~0% |
These are just the sample files, so other data lands differently. The structured files do well because fossil picks a model per block. The image gets filtered row by row first (PNG-style), so the models see small differences instead of raw pixels. wave.pcmis audio, and fossil fits a predictor to each block the way FLAC does. gzip and zstd don't predict audio, so they barely touch it. The jpg is already compressed, so there's nothing left to take.
Install
fossil builds from source with Rust.
# install Rust (skip if you already have it)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# install fossil
cargo install --git https://github.com/punctuations/fossil
fossil help# install Rust (skip if you already have it)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# install fossil
cargo install --git https://github.com/punctuations/fossil
fossil help# install Rust from https://rustup.rs (run rustup-init.exe), then in PowerShell:
cargo install --git https://github.com/punctuations/fossil
fossil help