fossil

I built fossil around one idea: a packed file should be able to tell you how it was made. While it compresses, it records what it did, and fossil explainreads it back. The demos below run the real thing in the page, and everything after explains how it works.

Playground

Drop a file and fossil packs it right here. The breakdown below is whatfossil explain shows: which model handled each block, and how much it saved. Nothing is uploaded.

drop a file here, or click to choose

or try one:

Entropy map

Drop in a file to see its entropy. This is the same heatmap fossil map draws and the stats fossil inspect reports. It runs in your browser, so the file never leaves your machine.

drop a file here, or click to choose

What it is

Most compressors just hand you a smaller file. fossil does that too, but it also keeps a record of how, so you can go back and see what it decided.

It works in blocks. The file is cut into 4 KB chunks, and each one is handled on its own. For every chunk fossil runs a handful of small models, checks what each one produces, and keeps the smallest. Different parts of a file usually want different methods, so choosing per block beats picking one method for the whole file.

Because that choice is saved, fossil explain can walk back through a packed file and tell you, block by block, what it used and why.

Commands

`fossil pack <in> [out]`	compress a file or directory into a `.fossil` (omit the input to pack the clipboard)
`fossil lift`	fossilize the clipboard, then copy the `.fossil` back to it
`fossil unpack <file.fossil> [out]`	restore the original (checks the CRC first)
`fossil inspect <file>`	per-block breakdown: entropy, model, savings
`fossil map <file>`	entropy heatmap, or block models for a `.fossil`
`fossil explain <file.fossil>`	the reconstruction recipe (`--block N` for one block)

pack --lossy[=bits] drops the low bits of each byte for a smaller file; --best-effort packs already-compressed inputs losslessly instead of refusing, and --images-only limits lossy to raw images.pack --verify round-trips the result before writing it, andunpack --trust skips the CRC check. pack --fast skips the slow models for much faster packing, trading a little ratio.

How it works

Each model looks for a different kind of structure. Run-length squashes long runs of one byte. Huffman and range coding give common bytes shorter codes. LZ replaces repeated chunks with pointers back to earlier copies, and LZR packs those pointers a bit tighter with some context, like LZMA. BWT reorders the data so similar contexts line up, which helps the stages after it. The generator spots simple patterns like counters and gradients and stores the rule instead of the bytes.

Every block tries all of these and keeps whichever comes out smallest:

Model	Best for
RAW	the fallback, stored as-is
RLE	adjacent repeated bytes
ENTROPY	skewed byte frequencies (canonical Huffman)
LZ	repeated substrings
LZH	LZ, then Huffman
LZR	LZ tokens range-coded with a literal context (LZMA-style)
BWTM	Burrows-Wheeler + move-to-front + range coding
RANGE	adaptive range coding, no stored table
PPM	order-1 context (each byte from the last)
GEN	formulas like constant fills and ramps
DELTA	smooth, slowly-changing data
CSV	tabular data, transposed so columns group together
WORD	text with repeating words, dictionary-coded
SIGNAL	8/16/24-bit audio and sensor data: FLAC-style windowed LPC, mid/side, partitioned Rice

A few things about the format. Tiny or random files are stored as-is, so they never grow. Lengths use varints. Every file gets a CRC32, so corruption turns up on unpack. The blocks are still 4 KB, but the LZ models can look back up to 64 KB into what they've already seen, so a repeat far from its original only costs a pointer instead of a second copy. Raw images (PPM and BMP) get filtered row by row first (PNG-style), so the models see small differences instead of raw pixels. Packing a folder runs one LZ pass over everything, so duplicate files cost almost nothing.

How it measures up

On the sample files, against gzip -9 and zstd -19 (percent smaller, bigger is better):

file	fossil	gzip -9	zstd -19
mixed.bin	78.2%	74.1%	77.0%
bigmix.bin	78.3%	74.7%	77.3%
cat.ppm	73.1%	51.1%	57.7%
wave.pcm	81.0%	1.7%	2.3%
cat.jpg	~0%	~0%	~0%

These are just the sample files, so other data lands differently. The structured files do well because fossil picks a model per block. The image gets filtered row by row first (PNG-style), so the models see small differences instead of raw pixels. wave.pcmis audio, and fossil fits a predictor to each block the way FLAC does. gzip and zstd don't predict audio, so they barely touch it. The jpg is already compressed, so there's nothing left to take.

Install

fossil builds from source with Rust.

# install Rust (skip if you already have it)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# install fossil
cargo install --git https://github.com/punctuations/fossil

fossil help

# install Rust (skip if you already have it)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# install fossil
cargo install --git https://github.com/punctuations/fossil

fossil help

# install Rust from https://rustup.rs (run rustup-init.exe), then in PowerShell:
cargo install --git https://github.com/punctuations/fossil

fossil help