Sourjaya Das
Posted on April 9, 2024
In this article, we will look at how to build a CLI tool using Go standard packages.We will be trying to build our own version of xxd
. This project challenge was originally posted here.
So, what exactly is xxd
?
Xxd is a CLI tool for linux that creates a hex dump of a given file or standard input.It can also convert a hex dump back to its original binary form.
To demonstrate what xxd
does:
- Create or take any existing file from your system. And run the following command:
# xxd <filename>
# xxd <filepath>
# To dump the hex dump to another file
# xxd <filename/filepath> > <filename/filepath>
xxd file.txt
- Suppose the file contents were:
Hey! How you doing?
I hope you are doing fine.
Believe in yourself.
- When we run the command mentioned above, we get the following output:
To understand what was printed out lets look at the first line of the output:
00000000: 4865 7921 2048 6f77 2079 6f75 2064 6f69 Hey! How you doi
-
The first part :
00000000
is the file offset, that is the position from which this line prints the bytes. -
The second part :
4865 7921 2048 6f77 2079 6f75 2064 6f69
is the actual bytes of data in its hexadecimal form with two octets(bytes) grouped together(48 is hex of a byte,65 is hex of another byte in the group,and so on). -
The third part :
Hey! How you doi
is the actual content(text) of those bytes printed in the line.
📝
Any ASCII character with hexadecimal value less than20
and more than7e
will be converted to '.' while printing out the original text in the third part.
Now, what is the need of such a tool? There can be many use cases of a hex dump which includes binary file analysis and comparison, data transformation, digital forensics and security.
At first glance it seems easy to code such a tool. Read the file or read from the standard input, convert each byte to its hexadecimal, and print the list of hexadecimal numbers with the original text using proper arrangement. But the challenge lies in the different edge cases that pops up due to the use of additional options
.
The xxd
has its own flags that help to manipulate the output. We will look at the behaviours of this flags, when we discuss about the edge cases.
Now that we have a brief overview of what we need to build, lets dive deep into the intricacies of this tool.
Table of contents
1. Prerequisites
2. Understanding the behaviour of the flags
3. Writing and Understanding the code
4. What's next
1. Prerequisites
As we will be using golang for our project,we should make sure :
1.1. Golang is installed in our system.
To check wether GO
is installed in our system or not use the command go version
in your terminal. If it is not installed, check the official installation steps.
1.2. Use a code editor of your choice.
And that's about all that we need to start coding.
2. Understanding the behaviour of the flags
Before we start writing our code we have to make sure we have a good understanding of what each flags do and how the output changes with change of values of this flags.
As mentioned in the challenge we will be focusing on the functionalities of six flags :
Flags | Description |
---|---|
-e | Output in little-endian format. |
-g | Specify the number of bytes per group. |
-l | Specify the no of octets to dump from the input. |
-c | Specify the number of octets to print each line. |
-s | Specify the offset to start printing from. |
-r | Revert from hexadecimal dump to original binary form. |
2.1. -e
flag
when we enter the command with -e
flag:
xxd -e file.txt
we get a output like:
if we look at each groups, there are 4 octets in reversed ordering. So the default behaviour changes.
2.2. -g
flag
when we use the -g
flag:
xxd -g 5 file.txt
we get:
In the output we see 5 octets grouped together until the columns are filled for each row.
Then again if we use both -e
and -g
together:
xxd -e -g 5 file.txt
we get:
xxd: number of octets per group must be a power of 2 with -e.
In this case, we have to make sure octets per group is given to be a power of 2, to make it work with the -e flag.
2.3. -c
flag
When we use -c
flag:
xxd -c 5 file.txt
the result is:
Notice that there are atmost 5 octets per row(line).
2.4. -l
flag
For the command:
xxd -l 20 f.txt
the output will be:
The total no of octets displayed are 20.
2.5. s
flag.
If we write:
xxd -s 3 file.txt
we get:
The offset will start from the 3rd byte in the input file. But if value of flag -s
is negative, then the offset will be set relative to the end of file.
Another edge case to consider if the value of -s
flag is negative when no file name is given, the seek won't happen.
xxd -s -3
#Output
xxd: Sorry, cannot seek.
This is also true for inputs like:
xxd -s -0
and xxd -s +-5
2.6. -r
flag
This flag is used to revert back from a hex dump to its original binary form. Suppose file.hex
contains the hex dump of file.txt
. If we want to get the text content back we do:
xxd -r file.hex
#or
xxd -r file.hex > file2.txt
The output will be:
Hey! How you doing?
I hope you are doing fine.
Believe in yourself.
📝
We can use decimal, octal or hexadecimal values for the flags.
octal values are represented with a leading 0 like014
and hexadecimal is represented like0x0c
or0x0C
.
It is important to mention that if we put a non numeric value like abcd
as any flag value, when the file name is not provided, the default flag values will be used. Also if a value like 5jkl
is given as a flag value, the value will be read as 5
.
The return values are as follows:
Value | Description |
---|---|
0 | no errors encountered. |
-1 | operation not supported. |
1 | error while parsing options. |
2 | problems with input file. |
3 | problems with output file. |
4,5 | desired seek position is unreachable. |
3. Writing and Understanding the code
Before starting with the code, its important to have an idea about how we will tackle the problem. At my first attempt at building this tool, I took a naive path, to read the file, some bytes at a time, and stored them in a slice of bytes. Then I printed out each byte in its hex format,one by one. Well this solution worked fine when there were no flags involved, and when the output format did not depend on those flag inputs. But when I started to build the logic for all the edge cases, the code started to become messy and unreadable.
That's when I had to switch the way I was processing those bytes. Instead of directly converting each individual byte to its hex representation, I converted the whole chunk of bytes to a string of hex values. This change helped in tackling most of the edge cases I talked about earlier.
3.1. Folder Structure
└── 📁sdxxd
└── 📁xxd
└── xxd.go
└── main.go
└── go.mod
└── go.sum
3.2 Create your GO Project
Before writing your code you need to setup the directory where you will house your project. Then, open the terminal from the directory, and enter the following command to initialize your project.
# go mod init <your_module_path>
go mod init github.com/Sourjaya/sdxxd
The go mod init command creates a go.mod
file to track your code's dependencies. Using your own github repository will provide a unique module path for the project.
Now, in main.go
write the following code:
package main
import "github.com/Sourjaya/sdxxd/xxd"
func main() {
xxd.Driver()
}
Here we call the Driver
function from xxd package.
3.3. Utility functions and the structs in use
In the xxd folder create a new go file xxd.go
:
Here we declare three structs Flags
, ParsedFlags
and IsSetFlags
. In function NewFlags()
we initialize the flags and check if certain flag values have been provided or not.
📝
Here to parse the flags from the terminal we are not going to use golangflags
package because this package does not have the support for this input form:xxd -s5 -g3
, where there is no gap between the flag and the flag values. Instead we are usingpflags
package.
Now, lets look at some of the helper functions we are going to need and what is the need of them.
numberParse()
This function will be used to parse the flag values and with the help of regular expression, filter out the numerical value from it.
// Function to parse number from a string using regular expression
func numberParse(input string) (res int64, err error) {
// regular expression
re := regexp.MustCompile(`-?0[xX][0-9a-fA-F]+|-\b0[0-7]*\b|-\b[1-9][0-9]*\b|0[xX][0-9a-fA-F]+|\b0[0-7]*\b|\b[1-9][0-9]*\b`)
// Find the match
s := re.FindString(input)
// if a certain match is found convert into decimal, octal or hexadecimal and return. else return 0.
if s != "" {
return strconv.ParseInt(s, 0, 64)
}
return 0, nil
}
reverseString()
This function is for reversing a hex string input. This function is exclusively used when the output should be in little-endian format.
// Function to reverse a string
// input: The input hex string to be reversed.
// Returns the reversed hex string.
func reverseString(input string) string {
// Decode hex string to byte slice
hexStr := strings.ReplaceAll(input, " ", "")
bytes, _ := hex.DecodeString(hexStr)
// Reverse the byte slice
for i, j := 0, len(bytes)-1; i < j; i, j = i+1, j-1 {
bytes[i], bytes[j] = bytes[j], bytes[i]
}
// Encode the reversed byte slice back to hex string
reversed := hex.EncodeToString(bytes)
whitespace := strings.Repeat(" ", len(input)-len(reversed))
return whitespace + reversed
}
byteToHex()
Before printing the result we will need to convert the slice of bytes to a hex string. This function is for this purpose.
// Function to convert a byte slice to a hex string with specified grouping.
// byteBuffer: The input byte slice to be converted.
// count: The number of bytes per group.
// Returns the hex string representation of the byte slice.
func byteToHex(byteBuffer []byte, count int) string {
// encode byte slice to string
encodedString := hex.EncodeToString(byteBuffer)
// add extra whitespaces
for i := 0; i < (count-(len(byteBuffer)%count))*2; i++ {
encodedString = fmt.Sprint(encodedString, " ")
}
return encodedString
}
byteToSting()
To display the third section of the result, we need to convert the byte slice to its text form. This function will do exactly that.
// input: The input byte slice to be converted.
// Returns the string representation of the byte slice.
func bytesToString(input []byte) string {
output := make([]byte, len(input))
// convert ASCII byte slice to its equivalent character string
for i, b := range input {
if b < 0x20 || b > 0x7e {
output[i] = '.'
} else {
output[i] = b
}
}
return string(output)
}
size()
The size of the chunk of bytes to read is dependent on the columns value. We can use any stop value, but I used an arbitrary value of 2048. Its essential to read the bytes in chunks because reading large files will be comparatively faster this way, than to read it as a whole.
// calculate size of chunk to read for each iteration
func size(cols int) int {
div := SIZE / cols
if SIZE%cols != 0 {
return (div + 1) * cols
}
return div * cols
}
trimBytes()
This function will be needed when the reverse conversion takes place, that is from a hex dump to the original content.
// Helper function to trim the spaces from a line
func trimBytes(s string) string {
words := strings.Fields(s)
return strings.Join(words, "")
}
3.4. Structuring the code
After we have written the helper functions its time to put them to use. We will start with the Driver()
function.
// Driver function to use the functionalities of this package
func Driver() int {
f, setFlags, args := NewFlags()
// if no file name is provided read from standard input
if len(args) == 0 || args[0] == "-" {
return f.processStdIn(setFlags)
}
return f.processFile(args[0], setFlags)
}
Here, the flag structs are set and the first thing that is checked whether there is a file name in the list of arguments.
📝
args
is a list of arguments starting from after all the flag inputs.
If there is a file that the user has mentioned, call (*Flags).processFile()
method else if the file name is absent or if the file name is given as -
, call (*Flags).processStdIn()
.
(f *Flags).processFile()
In this method, we first open the file. In case -r
flag is set, we call the revert()
function. We will look what revert()
does in a few minutes. If the flag is not present, we read a set no. of bytes at a time, from the file and pass it to InputParse()
.
(f *Flags).processStdIn()
Here, we check if -r
flag is set, and call revert()
accordingly. Otherwise, we scan the standard input and print the resultant hex dump. Here we have to consider additional edge cases, like the result will be displayed upto the no of rows whose columns have been filled completely, else the prompt waits for additional input to read. Unless we interrupt the program, it will continue to run until -l
value is reached(only when -l
is set).
The code for this two functions are given below:
Now if you look at the code, you will see three functions:
-
revert()
This function is used to convert from hexadecimal dump to the original binary form. There can be two types of input into this function.*os.File
when file is given and*bufio.Scanner
when read is done from standard input. -
(f *Flags).checkFlags()
This method properly parses each flag value(originally string value) to numerical value, which then can be used by theInputParse()
method. This method is also responsible to terminate the program if there is any error while parsing the flags.
The code for this two functions:
-
(flags *Parsedflags).InputParse()
All that is left is two use the helper functions appropriately to generate the proper hex dump. To do that we call this function.
func (flags *ParsedFlags) InputParse(s []byte, offset int, length int) string {
// convert byte slice to hex string
buffer := byteToHex(s, flags.C)
// function to generate hex dump output string
return flags.dumpHex(offset, length, buffer, s)
}
Here, first we convert the slice of bytes to hexadecimal string and then call dumpHex()
passing in the offset(this helps in proper indexing of lines),flag values, original slice of bytes and the buffer(hex string).
So, finally we reach a point where only the conversion is left. To convert from the original input to its hex dump we use the dumpHex()
method.
Since there are two characters(one octet is represented by two characters) per hexadecimal, we loop till twice of the length of input bytes. Then first we print the offset. The next step is to print the grouped octets. The no of octets depends on the flag value of -g
as well as the -c
value. We have to make sure that we reverse each group before printing if little-endian mode is set.
Once the octets are printed, the text equivalent to the octets are displayed beside them. This three part process is repeated for each row(line) until the end of file or input.
📝
Make sure that if the-l
flag value is set, no of octets that will be printed is equal to that value.
The complete code can be found in this repo.
3.5. Building the Go binary and testing the tool.
Once we have finished writing our code, we will run the code go mod tidy
to make sure all the dependencies are in order. Now, let's build the binary executable:
go build
The build file is successfully generated. We can finally test our tool.To test it, first we will create a tar file :
echo "File 1 contents" >> file1.txt
echo "File 2 contents" >> file2.txt
echo "File 3 contents" >> file3.txt
tar -cf files.tar file1.txt file2.txt file3.txt
Now, we will use files.tar
to check it out:
5. What's Next
As you may have noticed, this code reads and processes the file(or the input) in a sequential way, and there is no parallel, concurrent processing involved. For the sake of simplicity, I have not used the concepts of concurrency. Therefore, this tool will work, but struggle when there are large files involved.
Also when it comes to the options that the original xxd
tool has, we have implemented only 6 of the options. There are other options as well that we haven't looked at yet.
So there is always room to improve and optimize the code adding to its list of functionalities.
If you would like to share your feedback please feel free to drop a comment below.
Posted on April 9, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 28, 2024