Setting up your haystack

lizmat

Elizabeth Mattijsen

Posted on November 18, 2022

Setting up your haystack

This blog post will discuss the ways you can specify where to look for matches with rak as part 4 of the It's time to rak! blog series.

From here on down

As we've seen in the earlier instalments, you can very easily search for a string using a literal string or a Raku regex in all files that look like they contain text.

# look for "foo" in all files
$ rak foo

# Search for "foo" in files of the "lib" directory
$ rak foo lib
Enter fullscreen mode Exit fullscreen mode

And we've seen we can also limit the search to a single file:

# Look for "ve" anywhere on any line in file "twenty"
$ rak --type=contains ve twenty
Enter fullscreen mode Exit fullscreen mode

These are three of the very basic ways to specify where to search: the first by not specifying anything, which implies all files in the current directory and any subdirectories of which the name does not start with a period.

The second basically being the same as the first, but starting from the "lib" directory on down, rather than from the current directory.

The third being the specification of a single file to look in. And that single file does not need to exist on the local filesystem! It can also be a URL. Let's look for the word "reading" in the first blog post of this series:

$ rak Β§reading https://dev.to/lizmat/its-time-to-rak-part-1-30ji
https://dev.to/lizmat/its-time-to-rak-part-1-30ji
598:aria-label="Add to 𝐫𝐞𝐚𝐝𝐒𝐧𝐠 list"
954:<p>Thank you for 𝐫𝐞𝐚𝐝𝐒𝐧𝐠 all the way to the end!</p>
Enter fullscreen mode Exit fullscreen mode

What this basically does is to download the indicated resource (courtesy of curl) into a temporary file, search in that while keeping the original URL as "the filename", and remove the file automatically when it's done.

Actually only two

If you look at the above, then you realize that there are actually only two types of specification: a directory or a file (which could be local or remote). And that a directory will be recursed into to look for files to include in the search.

The search for files in a directory, and its subdirectories, can be influenced by 40 different arguments. This blog post will not mention all of them. You can do:

# produce extensive help on filesystem filters
$ rak --help=filesystem --pager=less
Enter fullscreen mode Exit fullscreen mode

to get a more in-depth description of the logic for each of the options should you need a feature that is not covered in this blog post. The --pager argument is to let you more easily scroll the extensive text, but is of course not necessary.

What should be noted here is that these filesystem filters are only applied on subdirectories and files in those subdirectories. So not on any directory or file that you specify directly.

But beware! Many shells auto-expand anything you specify on the command line if they can: and these will be considered to be directly specified by rak, as it does not have a way to distinguish between what you typed and what the shell expanded it to. For example:

# Search all files and all subdirectories
$ rak foo *
Enter fullscreen mode Exit fullscreen mode

The * in the shell will effectively do ls -d *. In practice, this is almost the same as not specifying anything at all. But with one subtle difference: none of the filesystem filters will be applied to what the shell expanded to.

Whereas if you would not specifying anything (or . to indicate the current directory), the filesystem filters would be applied, because you (implicitely) specified only the current directory. So only the current directory would be exempt from filesystem filters.

At the base

The two most important filesystem filters are --file and --dir. They expect a piece of code that will be given the basename of a file or a directory, and which should return a trueish value to allow the file / directory to be accepted. And they can also be specified as a flag: --file for unconditional acceptance, and --/dir for unconditional denial (which can be handy if you do not want recursion into subdirectories).

By default, --file and dir='!.starts-with(".")' are assumed. Which effectively means, don't recurse into directories that start with a period, and accept all files in any other directory.

To make it easier for you to specify files given by one or more extensions, you can use the --extensions argument:

# Only accept files with the .bat extension
$ rak foo --extensions=bat
Enter fullscreen mode Exit fullscreen mode

As the name of the argument implies, you can specify multiple extensions, separated by commas:

# Only accept files with the .bat or the .ps1 extension
$ rak foo --extensions=bat,ps1
Enter fullscreen mode Exit fullscreen mode

It's also possible to only accept files without extension with the --extensions argument by just not specifying any actual extension:

# Only accept files without extension
$ rak foo --extensions=
Enter fullscreen mode Exit fullscreen mode

You can also specify one of the predefined groups of extensions. For instance, if you would like to only include Raku and Markdown files in your search, you can do:

# Only accept Raku and Markdown files
$ rak foo --extensions=#raku,#markdown
Enter fullscreen mode Exit fullscreen mode

Note that the groups of extensions are prefixed with #. To get an up-to-date list of extension groups:

# List all known extensions
# rak --list-known-extensions
Enter fullscreen mode Exit fullscreen mode

If there is no argument specified related to the basename of the file (any of the above here), then the content of each file will be checked to see if it looks like it contains text. Only if it looks like that, will it actually be included.

More peripherally

The rest of the filesystem filter arguments can be roughly divided into the following groups: by content, epoch, owner / group, numeric meta value, external program and by filesystem attribute. Again, you can see all of the needed information about these by doing:

# produce extensive help on filesystem filters
$ rak --help=filesystem
Enter fullscreen mode Exit fullscreen mode

In any case, the end result of all of these filters is an internal list of files that will be checked for the pattern. You could think of this list as the haystack, and the pattern as the proverbial needle, as it were.

More on the haystack

Apart from specifying paths after the pattern, there is also a --paths argument. This is supposed to contain a comma separated list of paths. So these two invocations are equivalent:

# Search in the "lib" and "doc" directories
$ rak foo lib doc
$ rak foo --paths=lib,doc
Enter fullscreen mode Exit fullscreen mode

The --paths argument allows you to save a set of paths with a shortcut (as we've seen in Customizing your options).

You can also store filenames and/or paths in a file, and specify that file to be taken as the haystack specification: the --paths-from=filename and --files-from=filename arguments. Each line of the specified file will be taken as either a file or path specification. The difference in handling is that if a file is specified on a line with --paths-from, it is accepted. If a directory is specified on a line with --files-from, then it will be ignored as not being a file. And either of these take "-" (aka a single hyphen) to mean to read from STDIN.

For open source developers, the --under-version-control argument may be of use. When used in a git repository, it will set up the haystack with all the files that are under version control.

More extensive help on these and other haystack arguments can be obtained by doing:

# produce extensive help on haystack specification
$ rak --help=haystack
Enter fullscreen mode Exit fullscreen mode

Twisting the haystack

There is one argument that converts the haystack into a list with the paths of all the files in the haystack: --find. It changes the list of targets into a target itself, if you will. So instead of looking for the pattern in the contents of the files of the haystack, you'd be looking in the names of the files instead.

# Show all filenames that have "lib" in their name
$ rak --find lib
Enter fullscreen mode Exit fullscreen mode

And if you just want a list of filenames, you can omit the pattern altogether:

# Show all filenames from current directory on down
$ rak --find
Enter fullscreen mode Exit fullscreen mode

And what if you would just like to see the names of directories instead of files? Well, that'd be only legal way to use the --file argument as a negator:

# Show all directory names from current directory down
$ rak --find --/file
Enter fullscreen mode Exit fullscreen mode

In any case, the --find argument is named after the Unix find command. I thought I'd mention that, if that wasn't clear just yet.

Conclusion

This concludes part 4 of a series of blog posts about rak.

It shows how you can instruct rak where to look for matches, to create a haystack if you will. By applying different acceptance rules for files and subdirectories, for instance by looking at extensions. It also shows how you can twist the haystack to just show filenames or the names of directories.

If you have any comments, find bugs, have recommendations / ideas, please submit them as issues at the App::Rak repository. If you would like to have a more direct interaction, you can visit the #raku-rak channel on Libera.chat.

Thank you (again) for reading all the way to the end!

πŸ’– πŸ’ͺ πŸ™… 🚩
lizmat
Elizabeth Mattijsen

Posted on November 18, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Setting up your haystack
raku Setting up your haystack

November 18, 2022

Specifying a pattern
raku Specifying a pattern

November 2, 2022