Skip to content

./spacesaver df

About

The ./spacesaver executable is composed of several inter-related sub commands. Please see ./spacesaver -h for all available options. This part of the documentation describes options, concepts, and output for ./spacesaver df sub command in more detail.

./spacesaver df can be used to report duplicated disk usage. The output of this command is similar to the unix df -h command. Internally this command calls the ./spacesaver ls command to determine the extent of duplication in a given directory. This command also accepts standard input where the output of ls sub command can be piped into the df command.

A duplication rate is calculated for each of the provided paths to assess the amount of redudant data in a given location. A weighted score, ranging from 0-100, is also assigned to each provided path where the higher the score, the better!

Scoring_System

./spacesaver df only has one required input, a path or set of paths.

Synopsis

$ spacesaver df [-h] DIRECTORY [DIRECTORY ...]

The synopsis for each command shows its parameters and their usage. Optional parameters are shown in square brackets.

The df sub command takes one or more directories as input. For a given directory, its disk space usage will be reported and assessed. This command can be used to summarize the output from the from the ls sub command into a few easy to interpret metrics. The metrics generated from this command can be used to identify directories which may be candidates for deduplication or for archiving in deeper storage. Please note that symlinks or multiple occurences of hard links will be skipped over as these files do not take up any appreciable disk space. Only one instance of a set of hard links pointing to the same inode will be reported.

Use you can always use the -h option for information on a specific command.

Required Arguments

Each of the following arguments are required. Failure to provide a required argument will result in a non-zero exit-code.

DIRECTORY [DIRECTORY ...]

Input directories to find duplicates.
type: path

One or more directories can be provided as positional arguments. From the command-line, each directory should seperated by a space. Globbing is supported! This makes selecting paths easier.

Example: /data/CCBR/rawdata/ccbr123/

Optional Arguments

Each of the following arguments are optional and do not need to be provided.

-h, --help

Display Help.
type: boolean

Shows command's synopsis, help message, and an example command

Example: --help

Output

The output of the df sub command is similar to the unix display free disk space command with more information. Just like the ls sub command, it is displayed to standard ouput.

Here is a description of each column's output:

Column Name Example Value
1 Path /data/CCBR/rawdata/ccbr123/
2 FolderOwner Guido van Rossum
3 FileCoOwners finneyr[96.297%]|maggiec[3.703%]
4 Duplicated 177.316 GiB
5 Duplicated_Bytes 190391226983
6 Used 1.456 TiB
7 %Duplicated 0.0%
8 wAgeS 12.4
9 wDupS 0.0
10 wOccS 5.1
11 Score 82.5

Please note: The output is seperated or delimited by tabs: \t. %Duplciated is reported as the DuplicatedBytes/TotalBytes. A Score is also assigned to a given path where the higher the score, the better. A Score of 0 would indicate that all the files are older than 2.7 years AND all the files are duplicates AND the files make up more than 10% of our shared group area. wAgeS, wDupS, and wOccS are the weighted indiviudal components that make up the Score listed in column 11.

Example

Please note this sub command may take a while to process depending on the shear number or the size of files existing in a given sub tree. As so, this command should not be run on the head node! Please allocate an interactive node prior to running this command or submit this command as a job via sbatch.

# Step 0.) Grab an interactive node
# Do not run this on the head node!
srun -N 1 -n 1 --time=12:00:00 -p interactive --mem=16gb  --cpus-per-task=4 --pty bash
module purge

# Option 1.) Repprt disk space usage of a directory
./spacesaver df /data/CCBR/rawdata/ccbr123/

# Option 2.) Read ls sub command from standard input
./spacesaver ls /data/CCBR/rawdata/ccbr123/ > ccbr123_ls.tsv
./cat ccbr123_ls.tsv | spacesaver df /data/CCBR/rawdata/ccbr123/

Last update: 2022-06-08
Back to top