Open Source Adventures: Episode 17: Universal Command Line Unpacker unall
Tomasz Wegrzanowski
Posted on March 16, 2022
Another small showcase. unall is a command line tool I use a lot, and I'm truly baffled that nothing like that exists, on any operating system, command line or GUI or whatever.
Unall is an universal unpacker. You download some archive, you do unall whatever.7z
and it unpacks it and if it worked correctly, moves the original to trash.
It's available in my unix-utilities repo, and I think it's the most useful tool in the whole repo.
It sounds stupidly simple, but there's really nothing like that.
Why All Other Unpacking Tools Are Bad
The first problem is formats. There's stupid number of them, and it's very common for file extension to not correspond to the format - for example .jar
is really a .zip
archive as far as unpacking is concerned, as are million other extensions. So step one of unpacking is figuring out the format.
There's also double-packed formats, like .tar.gz
and .tar.bz2
, which should be treated as if it was a single archiving.
A small additional concern with format detection is multipart archives, like foo.rar
, foo.r00
, foo.r01
etc., which unall
tries to detect and skip, but multipart archives are very rare these days, so I can't say how good it even is.
Once we identified the format, the next problem is remembering command line options for that format's unpacker, which are of course all completely different.
Anyway, there are tools which can handle tasks up to this point.
Now comes the hardest part, where every other tool fails:
- half the archives have everything in an extra directory, so
foo/1.txt
,foo/2.txt
andfoo/3.txt
etc. - half the archives have loose files, like
1.txt
,2.txt
,3.txt
If we unpack without wrapping with extra directory, we'll end up with total mess from which it will be very hard to recover. So we absolutely must wrap archives of the second kind of extra directory.
But doing it every time is also annoying, as half the archives will be double-wrapped now. It would still be a better default, as un-double-wrapping is at least relatively easy.
And if that wasn't enough, often directory name is already used. Like if you do unall backups*.7z
, every archive might be wrapped in the same production/db.sql
etc. We don't want to mix them together, or even worse ask user what to do.
unall
handles all that correctly. If archive contains one file, or is already wrapped, and that target doesn't conflict with any existing name, it does direct unpacking. Otherwise, it finds most obvious name (archive name without extension, then -1
, -2
etc.), and unpacks there.
And finally, unall
moves the original archive to trash, with trash
command that works appropriately on every operating system. If there were any problems with specific archive, it will not be moved to trash, so even if you're unall
ing hundred archives, and one in the middle has issues, you'll know the one that remains is the problematic one.
And it then prints summary with any errors encountered. It mostly matters if you're unpacking a lot of archives, as unarchivers tend to be very verbose, and otherwise you'd miss errors from earlier achive due to all the text from later archives.
How to use unall
It couldn't be simpler:
-
unall archive.zip
- unpack one -
unall archive*
- unpack any number of them -
unall -k archive.7z
- unpack one, don't move packaged file to trash -
unall -d archive.7z
- force wrapping directory
That's all. I just works, every time. It's the simplest unarchiving things ever was.
Code
It really isn't much code:
#!/usr/bin/env ruby
require "fileutils"
require "shellwords"
require "optimist"
class UnarchiveFile
Formats = {
:rar => %w[.rar .cbr],
:"7z" => %w[.7z .zip .cbz .jar .civ5mod],
:tgz => %w[.tgz .tar.gz .gem],
:tbz2 => %w[.tbz2 .tar.bz2],
:tar => %w[.tar],
:txz => %w[.tar.xz],
:single_file => %w[.gz .bz2 .xz],
}
def formats
@formats ||= Formats.map{|fmt, exts| exts.map{|ext| [fmt, ext]}}.flatten(1)
end
def initialize(path, force_separate_dir)
@path = File.expand_path(path)
@force_separate_dir = force_separate_dir
end
def mime_type
`file -b --mime-type #{@path.shellescape}`.chomp
end
def basename
File.basename(@path)
end
def call
return "Looking like multipart, skipping" if @path =~ /part/i
fmt_ext = detect_format or return "Not supported"
fmt, ext = fmt_ext
if needs_directory?(fmt)
dnx = create_directory(basename[0...-ext.size])
Dir.chdir(dnx){ send("unpack_#{fmt}") ? "OK" : "FAIL" }
else
send("unpack_#{fmt}") ? "OK" : "FAIL"
end
end
def create_directory(dn)
counter = 1
dnx = dn
while File.exist?(dnx)
dnx = "#{dn}-#{counter}"
counter += 1
end
FileUtils.mkdir_p dnx
return dnx
end
def needs_directory?(fmt)
return true if @force_separate_dir
prefixes = send("files_#{fmt}").map{|f| f.sub(/\/.*/, "")}.uniq.select{|f| f != ""}
return true if prefixes.size > 1
return true if File.exist?(prefixes[0])
false
end
def detect_format
formats.each do |fmt, ext|
if basename.downcase[-ext.size..-1] == ext
return [fmt, ext]
end
end
if mime_type == "application/zip"
return [:"7z", File.extname(@path)]
end
return nil
end
def files_rar
`unrar vb #{@path.shellescape}`.split("\n")
end
def files_7z
# First is archive name
`7za l -slt #{@path.shellescape}`.scan(/^Path = (.*)/).flatten[1..-1]
end
def files_tgz
`tar -tzf #{@path.shellescape}`.split("\n")
end
def files_tbz2
`tar -tjf #{@path.shellescape}`.split("\n")
end
def files_tar
`tar -tf #{@path.shellescape}`.split("\n")
end
def files_txz
`tar -tf #{@path.shellescape}`.split("\n")
end
def files_single_file
[File.basename(@path, File.extname(@path))]
end
def unpack_rar
system "unrar", "x", @path
end
def unpack_7z
system "7za", "x", @path
end
def unpack_tgz
system "tar", "-xzf", @path
end
def unpack_tbz2
system "tar", "-xjf", @path
end
def unpack_txz
system "tar", "-xf", @path
end
def unpack_tar
system "tar", "-xf", @path
end
def unpack_single_file
system "7za", "x", @path
end
end
class UnarchiveCommand
def initialize
@opts = Optimist::options do
opt :keep, "Keep original archive even if unpacking was successful"
opt :dir, "Force unpacking into new directory even when all files are in one directory already"
end
if ARGV.empty?
STDERR.puts "Usage:\n #{$0} [--keep] [--dir] archive1.zip archive2.rar archive3.7z"
exit 1
end
@paths = ARGV
end
def call
statuses = Hash.new{|ht,k| ht[k] = []}
@paths.each do |path|
ua = UnarchiveFile.new(path, @opts[:dir])
status = ua.call
statuses[status] << path
end
statuses.each do |status, files|
puts [status, *files].join(" ")
system "trash", *files if status == "OK" and not @opts[:keep]
end
end
end
UnarchiveCommand.new.call
Dependencies
You need to install appropriate format handlers, on OSX that would be at minimum brew install p7zip
to handle the most common ones. You'll also need gem install optimist
. Other than that, it's just one file you can get from my unix-utilities repo.
Issues
The most common problem I have is on Windows with cygwin, as format handlers it uses tend to not set +x bit, so packaged Windows programs won't run without chmod -R +x
ing them, so arguably unall could handle that automatically as well. But I don't think many people even use cygwin anymore.
Should you use unall?
I have no idea how people handle not having unall
. Like who has time for doing all this manually?
Coming next
That's enough showcasing for now, over the next few episodes we'll take a look at a few interesting technologies that didn't quiteu fit in my previous two series.
Posted on March 16, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.