Bash command tips : tar

Tar image

Introduction

(copy pasted from wikipedia)

Initially developed to be written directly to sequential I/O devices for tape backup purposes, it is now commonly used to collect many files into one larger file for distribution or archiving, while preserving file system information such as user and group permissions, dates, and directory structures.
This page provides theoretical knowledge as well as examples for every day use.

Command information

  • Purpose : create an archive, compressed or not
  • Required privileges  : user
  • Default path (RHel based distros) : /bin/tar
  • Used version in this post : tar (GNU tar) 1.15.1
  • Exit code : 0=Success, non-zero=Failure

 

 

1) Generalities

tar is itself not a compressing program, but it is often used in association with compressing algorithm such as gzip (<file>.tar.gz or <file>.tgz) or bzip (<file>.[tb2-tbz-tar.bz2]).

Also you may keep in mind that tar will create an archive with the path as you gave it on the CLI, therefore when untaring this archive the same path will be used WITHOUT WARNING.

e.g: if you create an archive of /home (tar cvzf home.tgz /home) on one machine, the /home of the machine on which you untar this archive will be overwritten (if it exists).

To gain more control (and safety) you may “cd” to the directory that contains the file(s) you want to archive, this way you will minimize the impact of a possible mistake (when extracting), in other word: avoid the use of an absolute path for archive creation.

You can use globbing (pathname expansion) when creating / extracting / listing archive (e.g: wildcards “*”, “?” see this page for more info and examples) using this option:

--wildcards

see the related section for more infos (extracting using wildcards and creating using wildcards)

 

2) Syntax

2.1 tar command general syntax

The general syntax of the tar command is as follows.

tar <options> <source_file(s)> <destination_file>

 

2.2 Classic use

I usually use gzip as compressing algorithm that’s why i only put gzip example, if you want to use bzip instead replace each occurence of the letter “z” in the option field of the following examples by a letter “j”.

  • Create a gzipped archive
    tar cvzf <archive_name>.tar.gz </path/to/dir>
  • Create an archive based on a pattern matching (using wildcards / shell globbing)
    tar tar cvzf archive.tgz --wildcards "*.cpp"
  • Extract a gzipped archive (to a given location using “-C” option)
    tar xvzf <archive_name>.tar.gz -C </path/to/destination>

    Note : you can always extract a given file or dir (or multiple files / dirs) contained in an archive by explicitly specifying its path (the inside archive path) as:

    tar xvzf <archive_name>.tar.gz /path/within/archive/to/file /path/within/archive/to/another/file -C </path/to/destination>
  • Extract only files matching a given pattern from an archive (using wildcards / shell globbing)
    tar xvf <archive_name>.tar.gz --wildcards '*.log'
  • List content of a gzipped archive
    tar tvzf <archive.tar.gz>

    Note : When listing the content of an archive, using the “-v” option will change the output by formatting in a ls -l command style, without the v option, only the file(s) name(s) will be shown with no other information.

  • Create an archive with a chosen arborescence from any where !
    By the above i am trying to tell you that you do not have to go inside the directory from where you want to create your archive, let’s say you want to create an archive of your a directory named ToBeArchived located inside your $HOME dir, you could do :

    cd /home/yourHome/ ; tar cvzf ToBeArchived.tar.gz ToBeArchived/
    ToBeArchived/
    ToBeArchived/file1
    ToBeArchived/file2
    [...]

    But, you may also use the -C <dir> option which tells tar to change directory to <dir>, which makes it looks like :

    tar cvzf ToBeArchived.tar.gz -C /home/yourHome/ ToBeArchived/
    ToBeArchived/
    ToBeArchived/file1
    ToBeArchived/file2
    [...]

 

2.3 Remote copying with tar

Using compression for a copy over network will increase the efficiency of the transfer. (and it is classy, isn’t it?)

In the following we use the “-” character to send the output of tar command to standard output (STDOUT) in order to use it after through a pipe.

  • copy data to remote host using gzip compression only for the transfer (from a <dir> to <dir>)
    tar czf - /path/to/dir |ssh <user>@<remote_host> tar xvzf - -C /path/to/destination/dir

    Note : We omit the first “v” here to avoid the double output, first at archive creation second when untaring on remote host.

  • copy data to remote host and store it (on remote host) as a gzipped archive (from a <dir> to <archive.tgz>)
    tar cvzf - /path/to/dir |ssh <user>@<remote_host> 'cat > /path/to/destination/dir/<archive.tgz>'

    Note : Do not forget the “single quote” surrounding the “cat” command launched on remote host, otherwise it (the cat) will be evaluated by your local shell, and the archive will be created on localhost in the current dir.

  • copy data from remote host and use one of the previous form to either create a gzipped archive (as shown in this example) or copy data to <dir>
    ssh <user>@<remote_host> 'tar cvzf - /path/to/dir' | cat > /path/to/destination/dir/<archive.tgz>

 

2.4 Misc examples

  • You may need to add a date information to your archive name, in this case use the simple syntax as follows
    date +%Y%m%d
  • included to the simple archive creating syntax
    tar cvzf archive$(date +%Y%m%d).tgz /path/to/dir

 

2.5 Extract multiple / splitted tar archive (archive.tar001, archive.tar002 etc)

In this case you first need to concatenate all the part into one archive, this can be done using cat, as :

cat archive.tar001 archive.tar002 > archive_total.tar

Then you may just extract this one archive as usually :

tar xvz archive_total.tar

 

3) Options summary

 

Options Effect Comments
c create an archive
x extract an archive
z gzip option, to create or extract a gzipped archive
j bzip option, to create or extract a bzipped archive
Note : A bzipped archive needs more cpu time and has a better compression ratio than a gzip archive.
use the stdout as an output if creating, or the stdin as an input if extracting
-C specify the /path/to/destination/dir when extracting
-t list the content of an archive
f file mode, to create or extract an archive to or from the specified <file>
-d Compare an archive to system

 

More “Linux command tips” posts

Leave a Reply

Your email address will not be published. Required fields are marked *

This site supports SyntaxHighlighter via WP SyntaxHighlighter. It can highlight your code.
How to highlight your code: Paste your code in the comment form, select it and then click the language link button below. This will wrap your code in a <pre> tag and format it when submitted.