Compression and Archival in Linux

laxmanvijay

laxman

Posted on December 3, 2020

Compression and Archival in Linux

File archiving is used when one or more files need to be transmitted or stored as efficiently as possible. Linux supports lots of file archival mechanisms. This article describes the most popular ones.

Before looking into those here's a quick definition of compression and archival

Compression: Makes the files smaller by removing redundant information.

Archival: Combines multiple files into one, which eliminates the overhead in individual files and makes the files easier to transmit.

Simply compression reduces size and archival combines files.

Compression algorithms:

  • gzip
  • bzip2
  • xz

gzip:

gzip (GNU zip) is a compression algorithm. It uses the Lempel-Ziv-Markov (LZMA) chain algorithm. It is quite fast but the file size may be larger.

gzip {filename}
Enter fullscreen mode Exit fullscreen mode

The original file is deleted and replaced by the compressed file.

Decompression of gzipped files:

Decompression is done using gunzip command.

gunzip {gzipped filename}
Enter fullscreen mode Exit fullscreen mode
  • Including -l flag will show the compression information without actually compressing/decompressing.

bzip2:

bzip2 uses a different compression algorithm called Burrows-Wheeler block sorting, which can compress files smaller than gzip at the expense of more CPU time.

bzip2 {filename}
Enter fullscreen mode Exit fullscreen mode

Decompression of bzip2'ed files:

Decompression is done using bunzip2 command.

bunzip2 {bunzip2'ed filename}
Enter fullscreen mode Exit fullscreen mode

xz:

xz also uses the LZMA algorithm. It has the benefits of both gzip and bzip2. It compresses quickly and also results in smaller file sizes.

xz {filename}
Enter fullscreen mode Exit fullscreen mode

Decompression of xzed files:

Decompression is done using unxz command.

unxz {xzed filename}
Enter fullscreen mode Exit fullscreen mode

Archival:

  • tar
  • zip

tar:

Tar is a short form of TApe Archive. The tar command takes in several files and creates a single output file that can be split up again into the original files. The tar archived file is often called a tarball.

tar -f {filename} {options} {files to archive}
Enter fullscreen mode Exit fullscreen mode

The tar command has three modes (pass the appropriate flag to mention the mode):

  • Create: Make a new archive out of a series of files. (-c)
  • Extract: Extract files out of an archive. (-x)
  • List: Show the contents without extracting. (-t)

Tar can also compress the resulting archive using the above compression algorithms.

Provide any of the following flags to mention the compression algorithm.

  • gzip (-z)
  • bzip2 (-j)
  • xz (-J)

The -v flag can be provided for a verbose result.

Example:

tar -cvJf backup.tar.xz projects/
Enter fullscreen mode Exit fullscreen mode

This creates a tarball of the projects folder that is compressed using xz with a filename of backup.tar.xz

The extension can be anything but it is generally preferred to name this way.

  • for gzip, it is .tar.gz or .tgz, .taz

  • for bzip2, it is tar.bz2 or .tb2, .tbz, .tbz2, .tz2

  • for xz, it is tar.xz

Listing:

You can list the contents inside the archive without actually extracting using the -t flag.

Example:

tar -tvf backup.tar.xz
Enter fullscreen mode Exit fullscreen mode

This command lists the contents of backup.tar.xz

Unarchival:

Unarchival is done by passing -x flag.

Example:

tar -xvJf backup.tar.xz
Enter fullscreen mode Exit fullscreen mode

This command extracts the file in the same folder. If you wish to change it, pass the -C flag. (which will change directory to the specified one, therefore the directory should be present.)

tar -C backups -xvJf backup.tar.xz
Enter fullscreen mode Exit fullscreen mode

This will extract in backups folder.

In order to extract a specific file/folder from the archive, provide the relative path of the file/folder.

tar -xvJf backup.tar.xz projects/project1
Enter fullscreen mode Exit fullscreen mode

This will extract only project/project1 from the archive.

It is important to note that -f flag should always precede the filename

Zip:

Zip is an archival and compression mechanism. It does both. The compression is lossless. The default compression algorithm is DEFLATE. It is more common than tarballs. It has builtin support in Windows and Mac. Therefore, it is more preferred for archival than tar.

zip {options} {output filename} {files to compress}
Enter fullscreen mode Exit fullscreen mode

By default, zip will not compress recursively. Therefore in order to compress files/subfolders inside a folder, use -r flag.

Example:

zip files.zip temp*
Enter fullscreen mode Exit fullscreen mode

This will compress all files starting with temp.

zip -r projects.zip projects
Enter fullscreen mode Exit fullscreen mode

This will recursively compress all files inside the projects folder.

Unzip:

the unzip command is used to extract the contents of the zipped archive.

Example:

unzip projects.zip
Enter fullscreen mode Exit fullscreen mode

To just list without extracting,

unzip -l projects.zip
Enter fullscreen mode Exit fullscreen mode

To extract specific files/folders inside the zip,

unzip projects.zip projects/project1
Enter fullscreen mode Exit fullscreen mode

This is similar to tar.

Thanks for reading :)

💖 💪 🙅 🚩
laxmanvijay
laxman

Posted on December 3, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Compression and Archival in Linux
linux Compression and Archival in Linux

December 3, 2020