Jimmy Yeung
Posted on January 22, 2022
Scenario
I need to check if a file is a subset of another file into the CI pipeline. Thus bash script is chosen since it's performant and we don't need to install extra dependencies into the CI pipeline.
-
diff
The first command comes to my mind isdiff
, which is a really powerful command telling the difference between two files.However it's too powerful.
diff
"predicts" which line needs to be changed in order to make the two files identical; which is unnecessary for my use case.E.g. (Example from GeeksToGeeks)
$ cat a.txt Gujarat Uttar Pradesh Kolkata Bihar Jammu and Kashmir $ cat b.txt Tamil Nadu Gujarat Andhra Pradesh Bihar Uttar pradesh $ diff a.txt b.txt 0a1 > Tamil Nadu 2,3c3 < Uttar Pradesh Andhra Pradesh 5c5 Uttar pradesh
-
comm
Without further digging intodiff
, I found another commandcomm
which is simple and just fit in my use case.comm
returns 3 columns:- first column contains names only present in the 1st file
- second column contains names only present in 2nd file
- the third column contains names common to both the files
E.g. (Example from GeeksToGeeks)
// displaying contents of file1 // $cat file1.txt Apaar Ayush Rajput Deepak Hemant // displaying contents of file2 // $cat file2.txt Apaar Hemant Lucky Pranjal Thakral $comm file1.txt file2.txt Apaar Ayush Rajput Deepak Hemant Lucky Pranjal Thakral
And to check if one file is a subset of another file, we just need the 1st column. We could just do
-23
to neglect the 2nd and 3rd column. I.e.
comm -23 file1.txt file2.txt
Conclusion
At last, I just end up with this simple bash script to check the subset condition:
#!/bin/bash
SUBSET="<subset_file_path>"
SUPERSET="<superset_file_path>"
CHECK=$(comm -23 <(sort $SUBSET | uniq ) <(sort $SUPERSET | uniq ) | head -1)
if [[ ! -z $CHECK ]]; then
echo "Detected extra line in $SUBSET and not in $SUPERSET."
echo $CHECK
exit 1
fi
Added the extra sort
and uniq
commands there just to make sure we're comparing two sorted and deduplicated files.
Posted on January 22, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.