Resume that actually resumes

Your Wi-Fi drops at 2.1 GB of a 3 GB download. A bad download manager starts over. A merely OK one resumes from byte zero of the last chunk. We built the one that resumes from the byte it left off, on the right chunk, with host memory.

1 MB

Default chunk size

Concurrent connections

State save interval

What "resume" usually means (and doesn't)

I've lost a 4 GB download to hotel Wi-Fi twice and decided that was enough. Most download tools claim to support resume. Pull the Wi-Fi at 2.1 GB of a 3 GB file and watch what actually happens. Some start over at byte zero. The 2.1 GB you already had is gone. Some keep the partial file, reconnect, and start appending from wherever the last write landed, which is usually a coin flip between "the byte after the last successful flush" and "somewhere inside a buffer that never made it to disk." Resume works for the demo and fails for the drive home.

Resume is an architectural decision you either make on day one or you don't. You can't sprinkle it on later. If the download model assumes one continuous stream, every resume story is a retrofit. If the model assumes many independent pieces, resume is what the thing already does.

Chunks are the atom of recovery

Anything above about 10 MB gets split into 1 MB chunks. Each chunk is its own request with its own Range header, its own retry budget, its own place in the completion map. Eight chunks stream in parallel by default. The file you see on disk is stitched from those chunks when they all land.

The Range header is what makes this work over HTTP. A chunk isn't a Veloxar-internal concept the server has to know about. It's just a byte range the server already knows how to serve.

GET /video.mp4 HTTP/1.1
Host: cdn.example.com
Range: bytes=2097152-3145727

HTTP/1.1 206 Partial Content
Content-Range: bytes 2097152-3145727/3221225472
Content-Length: 1048576

When the network drops mid-file, Veloxar doesn't have to rebuild "where we were" by counting bytes written to a stream. It already knows. Chunks 0 through 1,847 completed, chunks 1,848 and 1,849 were mid-flight, chunks 1,850 through 3,072 never started. On reconnect, it asks for exactly the ranges it's missing. The 2.1 GB already on disk stays on disk. ChunkedDownloadManager is the file where this all lives. The interesting parts are where individual chunks get scheduled, cancelled, and retried without the surrounding file knowing anything happened.

Saving state every two seconds, not every chunk

ChunkStateManager writes the completion map to disk every 2 seconds. Not after every chunk, because that's too often for large files where chunks finish in bursts. And not at the end, because "at the end" is exactly when the process is most likely to die. Two seconds is the compromise. The worst case for a crash is that you repeat about two seconds of work.

The subtle thing is what gets saved. It's not "the last byte we wrote." It's the full set of which chunks are done, which are partial, and how far each partial one got. That's the difference between "resume from the start of the current chunk" (which some tools do) and "resume from inside the chunk we were working on." With 1 MB chunks the difference is small per-chunk, but multiplied across eight parallel streams on a slow connection, restarting all eight in-flight chunks is real wasted bandwidth.

The zero-byte file trap

This is the bug that ships in every download tool written in a hurry. A previous attempt crashed before any bytes were written. A zero-byte .part file is sitting on disk. The recovery path sees "file exists" and decides to resume. It sends Range: bytes=0-. The server returns 200 OK, full body. The tool appends to the zero-byte file. Sometimes that's fine. Sometimes the tool's resume logic decides the existing .part file is authoritative, truncates the response to "new data only," and you end up with a file that's smaller than it should be. I've seen variants where the tool gets stuck in a loop re-resuming zero bytes.

A .part file that's zero bytes is not a resume candidate. It's garbage. DownloadRecoveryPolicy treats zero-byte partials as "delete and start over," not "resume from offset 0." This is the kind of one-line check that separates download managers that quietly corrupt files from ones that don't.

Hosts have a reputation

The other half of resume is what you remember between sessions. Most download managers treat every launch as a fresh start. If a host rate-limited you at 8 parallel connections yesterday, you'll happily open 8 parallel connections to it today and eat the 429 all over again.

HostRateLimitTracker persists per-host observations across sessions. Each 429 steps the host's allowed concurrency down: 8 to 4, 4 to 2, 2 to 1. A cooldown of 60 seconds gets added per hit, stacking up to about 300 seconds. Successes gradually restore the limit back upward. The first download to a fragile host on a fresh launch starts at whatever that host has earned, not at the default.

This pairs with resume in a way that only matters when you trust both pieces. Resume gets you back to the byte you left off. Host memory gets you back to the concurrency that host will actually honour, so you don't immediately trip the same rate limit that caused the disconnect in the first place. Without the memory, resume is a loop.

When resume isn't the right answer

The retry and recovery policy has a branch that's as important as the resume logic itself: the branch where it refuses to resume. If a chunk fails with a 404, the URL isn't coming back. If the decoder says the bytes so far are garbage, appending more bytes won't fix them. If the server is explicitly saying 403, retrying at any byte offset is just going to get you blocked harder.

DownloadRecoveryPolicy splits errors the same way the retry classifier does. A network timeout, a lost connection, a DNS failure, a 5xx: those are resumable. The partial file stays, the state file stays, the next attempt picks up where the last one stopped. A 4xx, a decode failure, a bad URL: those aren't. The .part file and its state get cleaned up together, so the next attempt doesn't try to resume from a corrupt offset and your disk doesn't quietly fill with orphaned partials.

Chunk-level retry is independent of file-level retry. If one of eight parallel chunks fails with a recoverable error, only that chunk retries, with its own backoff, jitter, and budget. The other seven never notice. It's the same thundering-herd fix from the queue story, applied one level down. The atoms that retry are chunks, not whole files, and they're decorrelated from each other by design.

After the Wi-Fi drops

You start a 3 GB download on flaky hotel Wi-Fi. Two thirds of the way through, the access point drops for 40 seconds (my personal record is longer, and the coffee was terrible). When it comes back, Veloxar picks up within a second or two, streams the chunks it was missing, and finishes. No progress bar snapping back to zero. No "resuming" blinking for a minute while the tool reconstructs state.

A week later you come back to a host that rate-limited you on the last session. The first download opens fewer connections than usual, because the app remembers. It completes faster than it would have if you'd walked into the same 429 wall twice. You don't see any of that either, which is the whole point. Resume and host memory are features you only notice when they aren't there.