16
Archive formats
6
Naming schemes
2
Parallel extractions

The naming zoo

You queue a release from a file host. It arrives as seventeen files with names that somebody, at some point in the last thirty years, thought were a good idea. You open the download folder and stare at this:

showname.s01e04.part01.rar
showname.s01e04.part02.rar
showname.s01e04.part03.rar

bigfile.rar
bigfile.r00
bigfile.r01
bigfile.r02

release.7z.001
release.7z.002
release.7z.003

archive.zip
archive.z01
archive.z02

movie.001
movie.002
movie.003

backup.tar.gz.aa
backup.tar.gz.ab
backup.tar.gz.ac

Six different conventions for "this is one file, split into pieces." Nothing in the filenames tells a normal user which piece is the first one or how many pieces there should be. The RAR-lettered layout is the meanest of them: the first part is the one without a number. I had to look that up the first time I hit it, years ago, and I still see people get tripped up by it.

Veloxar's MultiPartArchiveDetector knows about all six. You drop files into the download folder; it groups them by base name, figures out the scheme, sorts the parts, and hands the first one to the extractor. You never type a part number.

Why one detector, six patterns

The cheap version of this is a switch statement on extension. That falls apart almost immediately. .001 could be an HJSplit part or a 7-Zip part depending on what sits before it. A filename like backup.tar.gz.001 has three dots and no signature telling you whether the outer wrapper is HJSplit or 7z. And movie.release.2026.001 has four dots, with an "extension" that is just a number.

So the detector runs five anchored regexes against each filename and stops at the first match. The patterns, lifted straight from the source:

(.rarMulti,      #"^(.+)\.part(\d+)\.rar$"#)   // part01.rar
(.rarMultiOld,   #"^(.+)\.r(\d{2,3})$"#)        // .r00
(.sevenZipParts, #"^(.+)\.7z\.(\d{3})$"#)       // .7z.001
(.zipSplit,      #"^(.+)\.z(\d{2})$"#)          // .z01
(.hjSplit,       #"^(.+)\.(\d{3})$"#)           // .001
// sixth: Unix split via .aa/.ab suffixes

Ordering matters. .7z.001 has to be checked before the bare HJSplit .001, otherwise every 7-Zip multi-volume gets misclassified as HJSplit and the base name ends up truncated. The RAR-lettered case has its own wrinkle. bigfile.rar is only a "first part" if there also happens to be a bigfile.r00 sitting next to it; a lone .rar is just a plain archive. The detector handles that in a second pass after the main scan, because you can only know "this is the first part of a set" once you have seen the rest of the set.

Missing parts are a failure, not a surprise

The worst time to find out you are missing .part07.rar is forty-five minutes into extraction, when unrar reports a CRC error, rolls the whole thing back, and leaves you scrolling your history for the link that never finished. Multi-part archives fail late by default. Finding out early is something you have to do on purpose.

The detector does that work. For each group of parts it takes the minimum and maximum indices it found, walks the integer range between them, and flags any index that has no file. The result carries a missingParts array; isComplete is just missingParts.isEmpty. If the gap list is non-empty at the moment you try to extract, you see a message naming the exact missing indices, and the extraction never runs. No CRC mystery at the forty-five minute mark.

This check happens before extraction starts, not during. The pre-flight catches the cases the tool itself would only discover halfway through: a .part07.rar that never finished downloading, a .z01 that went to the wrong folder, a 7-Zip volume where someone deleted .002 thinking it was a duplicate of .001.

Two extractions at once, not a hundred

Veloxar supports sixteen archive formats end to end: ZIP, RAR, 7Z, TAR, ISO, DMG, CAB, ARJ, LZH, ACE, plus the compound shapes TAR loves to wear (tar.gz, tar.bz2, tar.xz, tgz, tbz2). The extractor dispatches on format. ZIP goes through Foundation's built-in FileManager.unzipItem. RAR and 7Z shell out to unrar and 7z. TAR calls /usr/bin/tar. DMG and ISO go through hdiutil mount, a copy pass, then a detach.

Which means extraction is a mix of fast in-process work and slow subprocess work that pins a CPU core and hammers the disk. If you drop a hundred archives onto the app at once and let all of them extract in parallel, you get a machine that sounds like a jet engine and finishes slower than if you had done them one at a time.

ArchiveExtractor caps the active extractions at two. The rest queue up behind a structured task group. Two is the number that keeps the disk busy without making the system unresponsive, and it matches how the app thinks about download concurrency: generous with I/O, conservative with CPU-bound work. Each extraction gets a 300-second timeout and up to three retries, so a single flaky archive cannot wedge the whole queue.

Passwords, and why they go through stdin

Some archives are encrypted. RAR and 7Z both accept passwords on the command line with a flag like -p<password>. Do not do this. The command line of every running process is visible to any other user on the machine via ps, and a password that appears in argv appears in shell history, in process accounting logs, and sometimes in crash reports. Putting a secret in argv is putting it on a billboard.

Veloxar passes archive passwords to unrar and 7z through stdin. The external tool reads the password from its standard input rather than from its arguments, and the byte sequence never appears in any process listing. Small detail, but it is also the kind of thing that, when you get it wrong, ends up in a security writeup with your product's name in the title.

In the folder

You queue a multi-part release. Seventeen files land in the download folder over the next ten minutes. Veloxar groups them, figures out they are three separate archive sets (one RAR-numbered, one 7-Zip, one stray ZIP from the extras folder), and starts extracting the first two in parallel. The third waits its turn.

If something is missing, say the host's .part09.rar 404'd, you find out before the extractor spends twenty minutes making progress it is going to throw away. You never type a part number, never pick a "first file," never right-click and go hunting for the correct context-menu entry. The auto-cleanup option can delete the archives once extraction succeeds, but it is off by default. People are surprisingly attached to their .rar files, and I would rather ask than surprise anyone.

The archive is an implementation detail of how the file moved across the network. Once it is on your disk, you should get the thing inside it, not a homework assignment about split conventions from 1998.