Open Source Adventures: Episode 17: Universal Command Line Unpacker unall

taw

Tomasz Wegrzanowski

Posted on March 16, 2022

Open Source Adventures: Episode 17: Universal Command Line Unpacker unall

Another small showcase. unall is a command line tool I use a lot, and I'm truly baffled that nothing like that exists, on any operating system, command line or GUI or whatever.

Unall is an universal unpacker. You download some archive, you do unall whatever.7z and it unpacks it and if it worked correctly, moves the original to trash.

It's available in my unix-utilities repo, and I think it's the most useful tool in the whole repo.

It sounds stupidly simple, but there's really nothing like that.

Why All Other Unpacking Tools Are Bad

The first problem is formats. There's stupid number of them, and it's very common for file extension to not correspond to the format - for example .jar is really a .zip archive as far as unpacking is concerned, as are million other extensions. So step one of unpacking is figuring out the format.

There's also double-packed formats, like .tar.gz and .tar.bz2, which should be treated as if it was a single archiving.

A small additional concern with format detection is multipart archives, like foo.rar, foo.r00, foo.r01 etc., which unall tries to detect and skip, but multipart archives are very rare these days, so I can't say how good it even is.

Once we identified the format, the next problem is remembering command line options for that format's unpacker, which are of course all completely different.

Anyway, there are tools which can handle tasks up to this point.

Now comes the hardest part, where every other tool fails:

  • half the archives have everything in an extra directory, so foo/1.txt, foo/2.txt and foo/3.txt etc.
  • half the archives have loose files, like 1.txt, 2.txt, 3.txt

If we unpack without wrapping with extra directory, we'll end up with total mess from which it will be very hard to recover. So we absolutely must wrap archives of the second kind of extra directory.

But doing it every time is also annoying, as half the archives will be double-wrapped now. It would still be a better default, as un-double-wrapping is at least relatively easy.

And if that wasn't enough, often directory name is already used. Like if you do unall backups*.7z, every archive might be wrapped in the same production/db.sql etc. We don't want to mix them together, or even worse ask user what to do.

unall handles all that correctly. If archive contains one file, or is already wrapped, and that target doesn't conflict with any existing name, it does direct unpacking. Otherwise, it finds most obvious name (archive name without extension, then -1, -2 etc.), and unpacks there.

And finally, unall moves the original archive to trash, with trash command that works appropriately on every operating system. If there were any problems with specific archive, it will not be moved to trash, so even if you're unalling hundred archives, and one in the middle has issues, you'll know the one that remains is the problematic one.

And it then prints summary with any errors encountered. It mostly matters if you're unpacking a lot of archives, as unarchivers tend to be very verbose, and otherwise you'd miss errors from earlier achive due to all the text from later archives.

How to use unall

It couldn't be simpler:

  • unall archive.zip - unpack one
  • unall archive* - unpack any number of them
  • unall -k archive.7z - unpack one, don't move packaged file to trash
  • unall -d archive.7z - force wrapping directory

That's all. I just works, every time. It's the simplest unarchiving things ever was.

Code

It really isn't much code:

#!/usr/bin/env ruby

require "fileutils"
require "shellwords"
require "optimist"

class UnarchiveFile
  Formats = {
    :rar  => %w[.rar .cbr],
    :"7z" => %w[.7z .zip .cbz .jar .civ5mod],
    :tgz  => %w[.tgz .tar.gz .gem],
    :tbz2 => %w[.tbz2 .tar.bz2],
    :tar  => %w[.tar],
    :txz  => %w[.tar.xz],
    :single_file => %w[.gz .bz2 .xz],
  }

  def formats
    @formats ||= Formats.map{|fmt, exts| exts.map{|ext| [fmt, ext]}}.flatten(1)
  end

  def initialize(path, force_separate_dir)
    @path = File.expand_path(path)
    @force_separate_dir = force_separate_dir
  end

  def mime_type
    `file -b --mime-type #{@path.shellescape}`.chomp
  end

  def basename
    File.basename(@path)
  end

  def call
    return "Looking like multipart, skipping" if @path =~ /part/i
    fmt_ext = detect_format or return "Not supported"
    fmt, ext = fmt_ext
    if needs_directory?(fmt)
      dnx = create_directory(basename[0...-ext.size])
      Dir.chdir(dnx){ send("unpack_#{fmt}") ? "OK" : "FAIL" }
    else
      send("unpack_#{fmt}") ? "OK" : "FAIL"
    end
  end

  def create_directory(dn)
    counter = 1
    dnx = dn
    while File.exist?(dnx)
      dnx = "#{dn}-#{counter}"
      counter += 1
    end
    FileUtils.mkdir_p dnx
    return dnx
  end

  def needs_directory?(fmt)
    return true if @force_separate_dir
    prefixes = send("files_#{fmt}").map{|f| f.sub(/\/.*/, "")}.uniq.select{|f| f != ""}
    return true if prefixes.size > 1
    return true if File.exist?(prefixes[0])
    false
  end

  def detect_format
    formats.each do |fmt, ext|
      if basename.downcase[-ext.size..-1] == ext
        return [fmt, ext]
      end
    end
    if mime_type == "application/zip"
      return [:"7z", File.extname(@path)]
    end
    return nil
  end

  def files_rar
    `unrar vb #{@path.shellescape}`.split("\n")
  end
  def files_7z
    # First is archive name
    `7za l -slt #{@path.shellescape}`.scan(/^Path = (.*)/).flatten[1..-1]
  end
  def files_tgz
    `tar -tzf #{@path.shellescape}`.split("\n")
  end
  def files_tbz2
    `tar -tjf #{@path.shellescape}`.split("\n")
  end
  def files_tar
    `tar -tf #{@path.shellescape}`.split("\n")
  end
  def files_txz
    `tar -tf #{@path.shellescape}`.split("\n")
  end
  def files_single_file
    [File.basename(@path, File.extname(@path))]
  end

  def unpack_rar
    system "unrar", "x", @path
  end
  def unpack_7z
    system "7za", "x", @path
  end
  def unpack_tgz
    system "tar", "-xzf", @path
  end
  def unpack_tbz2
    system "tar", "-xjf", @path
  end
  def unpack_txz
    system "tar", "-xf", @path
  end
  def unpack_tar
    system "tar", "-xf", @path
  end
  def unpack_single_file
    system "7za", "x", @path
  end
end

class UnarchiveCommand
  def initialize
    @opts = Optimist::options do
      opt :keep, "Keep original archive even if unpacking was successful"
      opt :dir, "Force unpacking into new directory even when all files are in one directory already"
    end

    if ARGV.empty?
      STDERR.puts "Usage:\n  #{$0} [--keep] [--dir] archive1.zip archive2.rar archive3.7z"
      exit 1
    end
    @paths = ARGV
  end

  def call
    statuses = Hash.new{|ht,k| ht[k] = []}

    @paths.each do |path|
      ua = UnarchiveFile.new(path, @opts[:dir])
      status = ua.call
      statuses[status] << path
    end

    statuses.each do |status, files|
      puts [status, *files].join(" ")
      system "trash", *files if status == "OK" and not @opts[:keep]
    end
  end
end

UnarchiveCommand.new.call
Enter fullscreen mode Exit fullscreen mode

Dependencies

You need to install appropriate format handlers, on OSX that would be at minimum brew install p7zip to handle the most common ones. You'll also need gem install optimist. Other than that, it's just one file you can get from my unix-utilities repo.

Issues

The most common problem I have is on Windows with cygwin, as format handlers it uses tend to not set +x bit, so packaged Windows programs won't run without chmod -R +xing them, so arguably unall could handle that automatically as well. But I don't think many people even use cygwin anymore.

Should you use unall?

I have no idea how people handle not having unall. Like who has time for doing all this manually?

Coming next

That's enough showcasing for now, over the next few episodes we'll take a look at a few interesting technologies that didn't quiteu fit in my previous two series.

💖 💪 🙅 🚩
taw
Tomasz Wegrzanowski

Posted on March 16, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related