A path to paths

lizmat

Elizabeth Mattijsen

Posted on November 11, 2024

A path to paths

The other day, not one but two people tried to use my rak module to create a custom file system search utility. And had problems getting it to work.

Now, truth be told: they were the first people to use that module other than myself for the App::Rak command line interface (as described in It's time to rak!). And apparently, the use of the plumbing of App::Rak was less straightforward than I expected, specifically with regards to the way results are returned.

It wasn't until a bit later that I realized they were reaching for the wrong tool. What they really wanted was to apply some search criteria to a list of files, as determined by some simple rules that weren't covered (yet) by App::Rak.

paths

Well, there's a module for that: paths, a fast recursive file / directory finder. Which is one of the dependencies of rak. And thus of App::Rak.

All of the code examples here assume you have also added a use paths; to your code. And if the paths module is not installed yet, you should install it with zef install paths in your shell.

So how do you use it?

.say for paths;
Enter fullscreen mode Exit fullscreen mode

will produce a list of all files from the current directory (recursively down), except the ones that reside in a directory that starts with a period (so, e.g. all the files and subdirectories in a .git directory would be skipped).

So, what it you would like to get all JSON-files (as defined by their .json extension)?

.say for paths(:file(*.ends-with(".json")));
Enter fullscreen mode Exit fullscreen mode

What if you'd like to list all files in ".git" directories?

.say for paths(:dir(".git"));
Enter fullscreen mode Exit fullscreen mode

Or you're just interested in directory names, not in the files inside directories?

.say for paths(:!file);
Enter fullscreen mode Exit fullscreen mode

The :!file indicates that you're not interested in files. This is Raku's way of specifying the named argument "file" with a False value. Some would write this as file => False, which would also work in Raku.

All of the above examples assumed the current directory as a starting place. The paths subroutine also takes an optional positional parameter: the directory from which to start. So if you want to know all of the directories on your computer, you could start from the root directory:

.say for paths("/", :!file);
Enter fullscreen mode Exit fullscreen mode

This may take a while!

Paths as strings

The Raku Programming Language has the IO::Path object, which conceptually consists of a volume, a directory, and a basename. It supports both purely textual operations, and operations that access the filesystem, e.g. to resolve a path, or to read all the content of a file.

Unfortunately, creating such an object is relatively expensive, so paths has chosen to just provide absolute paths as strings. If you want to work with IO::Path objects, the only thing that needs to be done, is to call the .IO method on the path.

For instance, if you would like to know the name of each file that contains the string "frobnicate", you could do:

.say for paths.grep: *.IO.slurp.contains("frobnicate");
Enter fullscreen mode Exit fullscreen mode

The .IO method call turns the path string into an IO::Path object, the .slurp method call reads the whole contents of the file into memory as a string assuming UTF-8 encoding, and the .contains returns True if the given string was found in its invocant.

If you're suprised by the *.IO... syntax: that is called Whatever priming. In this case, the syntax is short for { .IO.slurp.contains("frobnicate") }.

Now, if you do that, there's a good chance that this will end in an execution error, something like Malformed UTF-8 near byte 8b at line 1 col 2. That's because there's a good chance that at least one of the files is a binary file. Which is generally not valid UTF-8.

You could just ignore those cases with:

.say for paths.grep: { .contains("frobnicate") with .IO.slurp }
Enter fullscreen mode Exit fullscreen mode

The slurp method will return a Failure if it couldn't complete the reading of the file. The with then will only topicalize the value if it got something defined (and Failures are considered to not be defined in this context). Then the contains method is called as before and we get either True or False from that.

But doing it this way may just be a little expensive resource wise. If resource usage is an issue for your program, then maybe there's a better way to find out whether something contains text or binary information. And there is: with the sister module path-utils.

Path utilities

The path-utils module contains 41 subroutines that take a path string and then perform some check on that path. Let's look at path-is-text: "Returns 1 if path looks like it contains text, 0 if not".

use path-utils <path-is-text>;

.say for paths.grep: { path-is-text($_) && .IO.slurp.contains("frobnicate") }
Enter fullscreen mode Exit fullscreen mode

But what if you'd only like to look up texts in PDF files? Well, the selection part can be done efficiently by path-utils as well, with the path-is-pdf subroutine.

use path-utils <path-is-pdf>;

.say for paths.grep: { path-is-pdf($_) }
Enter fullscreen mode Exit fullscreen mode

but that would only show the files that appear to be PDF files. To actually search in them, you could for instance use Steve Roe's PDF::Extract module.

use path-utils <path-is-pdf>;
use PDF::Extract;

.say for paths.grep: { path-is-pdf($_) && Extract.new(:file($_)).text.contains("frobnicate") }
Enter fullscreen mode Exit fullscreen mode

Conclusion

It is always important to really understand the question, and to ask further if you don't understand the question. And make sure that the question askers understand your reply. And keep repeating that until you and the question asker are on the same page.

In this case, pointing these two Raku developers to the paths module, made their project suddenly (almost) a piece of cake.

And for me, it was a fine reason to highlight these cool modules in the Raku ecosystem.

If you like what I'm doing, committing to a small sponsorship would mean a great deal to me!

💖 💪 🙅 🚩
lizmat
Elizabeth Mattijsen

Posted on November 11, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

A path to paths
programming A path to paths

November 11, 2024

An end to magical madness
programming An end to magical madness

October 30, 2024

Raku Fall Issue Cleanup
programming Raku Fall Issue Cleanup

October 23, 2024

The End Of p6c
programming The End Of p6c

August 12, 2024

The 2023 Raku Advent Posts
programming The 2023 Raku Advent Posts

December 26, 2023