PWC 234 Same Circus, Different Clowns

boblied

Bob Lied

Posted on September 15, 2023

PWC 234 Same Circus, Different Clowns

Perl Weekly Challenge 234 is with us. This time we have two complementary challenges: the first is to find things that are the same, and the second is to count things that are different.

Task 1 Common Characters

You are given an array of words made up of alphabetic
characters only.

Write a script to return all alphabetic characters that
show up in all words including duplicates.
Enter fullscreen mode Exit fullscreen mode

Example 1

Input: @words = ("java", "javascript", "julia")
Output: ("j", "a")

Example 2

Input: @words = ("bella", "label", "roller")
Output: ("e", "l", "l")

Example 3

Input: @words = ("cool", "lock", "cook")
Output: ("c", "o")

Thoughts

This turned out to be a surprisingly rich problem.

At first glance, it's set intersection. A language with native support for sets would be keen for this. Perl is not such a language, but it's easy to do set-like things with hashes, and there are modules, of course. However, the repetition of letters and the requirement to show the letters in the order they're encountered will be at odds with the usual definition of sets, where elements are unique and unordered.

We note that the answer depends on the order of words in the input. In example 2, the result should be qw(e l l) because we start with "bella". If the list of words started with "label", the result should be qw(l e l).

Cute, but probably not

There is a cute way to do set intersection using a cross product of logical operations. For instance, to find the common letters between "perl" and "rasp", we can treat each as a vector and make a matrix that shows whether each pair of letters is the same. Any row that has a sum of 1 will be a common letter.

r a s p Ʃ
p 0 0 0 1 1
e 0 0 0 0 0
r 1 0 0 0 1
l 0 0 0 0 0

This approach is going to be rejected for two reasons: first, it won't work for repeated letters -- deal breaker -- and second, it's O(N^2) for every pair, which has a bad smell.

String world

There's an implementation which follows the problem statement literally. For each letter in the first word, check if it exists in each subsequent word. If it does, stash it aside and delete it from the subsequent words.

Let's sketch this out in code. First, let's pick off our start word and reduce the @words list to what's left.

my $start = shift @words;
Enter fullscreen mode Exit fullscreen mode

Then, we loop over the letters in $start with a common idiom that uses split. (Javascript and Python stole split from Perl, but I'm not bitter.)

for my $letter ( split(//, $start) ) { . . . }
Enter fullscreen mode Exit fullscreen mode

Now, let's take the description literally and see if the letter occurs in all the remaining words. List::Util::all is right there for us, and it has the nice effect that it stops once the condition fails, so it won't necessarily scan the entire list every time.

    if ( all { index($_, $letter) >= 0 } @words )
Enter fullscreen mode Exit fullscreen mode

I've used the index function here, because it's fast and simple. I briefly considered using a regular expression match, but that's using a tank where a bicycle would do.

So, now we know whether the letter is common. If it isn't, we can go on to the next letter. If it is, then we've used it up in every word, so let's delete it. The substr has an optional fourth argument that does string substitution, which I for one found surprising when I first learned of it.

    push @result, $letter;
    for my $w ( 0 .. $#words)
    {
        my $pos = index($words[$w], $letter);
        substr($words[$w], $pos, 1, ""); # Remove the letter
    }
Enter fullscreen mode Exit fullscreen mode

There's another little optimization we can add: if we've used up all the letters in any word, then we can stop looking for any more common letters. I'm going to do that by labeling the for loop and breaking out with a last statement. The whole function comes together like this:

sub commonCharacters(@words)
{
    use List::Util qw/all/;

    return [] unless @words;
    my $start = shift @words;
    my @result;
  LETTER:
    for my $letter ( split(//, $start) )
    {
        if ( all { index($_, $letter) >= 0 } @words )
        {
            # This letter occurs in all words.
            push @result, $letter;

            # Remove the letter from each word
            for my $w ( 0 .. $#words)
            {
                my $pos = index($words[$w], $letter);
                substr($words[$w], $pos, 1, "");
                last LETTER if $words[$w] eq "";
            }
        }
    }
    return \@result;
}
Enter fullscreen mode Exit fullscreen mode

I'm a little annoyed here that index gets called again in the deletion step -- we just did that! But trying to cache that somehow seems like it will add more overhead than just making the function call.

Counting common characters

Another approach is to check the frequency of characters in each word. If our start word has two 'a' characters, then to be common, every other word must have at least one, but no more than two. It answers the question of commonality, but the ordering will have to be figured out.

For this solution, we'll need to determine letter frequency. Last week, I used List::MoreUtils::frequency, but the problem really is almost trivial, so this week, let's write our own subroutine for a frequency hash:

sub letterFreq($word)
{
    my %freq;
    $freq{$_}++ for split(//, $word);
    return \%freq;
}
Enter fullscreen mode Exit fullscreen mode

Our solution will have a similar structure to the previous one. We'll use the first element of @words to find the initial set of possible letters. Then we'll use each subsequent word to retain or delete the letter. We'll either run out of letters or out of words, and what's left will be the answer.

The starting point will be a little different, though. We'll put the letters into a frequency hash, and the possible letters will be the keys of the hash.

    my $start = shift @words;
    my $letters = letterFreq($start);
    while ( @words and keys %{$letters} )
Enter fullscreen mode Exit fullscreen mode

Now we'll iterate over the words. We'll convert each one to a frequency hash of its own, and then compare the letter counts in the start word to the letter counts in the word. If the letter doesn't even exist in the word, it can be removed from the possibilities. If it does exist, the number of times it's common will be the minimum of the two words.

    my $start = shift @words;
    my $letters = letterFreq($start);
    while ( @words and keys %{$letters} )
    {
        # Use up words, make frequency hash
        my $wfreq = letterFreq(shift @words);

        for my $char ( keys %{$letters} )
        {
            if ( exists $wfreq->{$char} )
            {
                $letters->{$char} = min($letters->{$char}, $wfreq->{$char});
            }
            else
            {
                delete $letters->{$char};
            }
        }
    }
Enter fullscreen mode Exit fullscreen mode

I feel better about this solution. We touch every word only once, and we reduce the problem to a smaller set of letters and words with every loop iteration; performance should be linear in proportion to the number of words.

C++ programmers will sniff a bad smell here: we've deleted from the $letters hash while iterating over it. In C++, that would invalidate the iterator, and lead to an afternoon or evening of frustrating debugging (ask me how I know). Perl, however, has no such problem when keys is used to set up a list of things to iterate over.

There is a remaining problem, though. When we finish the while loop, the common characters are in the $letters hash, but they're unordered. To solve the problem, we need to put the common characters in the same order as the start word. Good thing we saved $start at the beginning.

To mix it up, let's use array operations instead of for loops this time. We're going to start by mapping each character in $start to whether or not it exists in $letters. Then we'll select (think grep) the ones that exist. In code, that will read right to left.

    return [ grep { defined }
        map { ( exists $letters->{$_} && $letters->{$_} > 0 )
                  ? do { $letters->{$_}--; $_ }
                  : undef  }
            split(//, $start) ]
Enter fullscreen mode Exit fullscreen mode

The repeated letters raise their complicated little heads here. The $letters frequency hash has the number of recurrences that can happen. We can only accept that many from the $start word. Decrementing the $letters value to zero takes care of that.

The somewhat-rare do BLOCK statement shows up here because we want to execute two statements in a place where ordinarily only a single expression could appear. Don't confuse the do in Perl with the do-while in other languages.

Task 2 Unequal Triplets

You are given an array of positive integers.

Write a script to find the number of triplets
(i, j, k) that satisfies num[i] != num[j],
num[j] != num[k] and num[k] != num[i].
Enter fullscreen mode Exit fullscreen mode

Example 1

Input: @ints = (4, 4, 2, 4, 3)
Output: 3

(0, 2, 4) because 4 != 2 != 3
(1, 2, 4) because 4 != 2 != 3
(2, 3, 4) because 2 != 4 != 3

Example 2

Input: @ints = (1, 1, 1, 1, 1)
Output: 0

Example 3

Input: @ints = (4, 7, 1, 10, 7, 4, 1, 1)
Output: 28

triplets of 1, 4, 7 = 3x2×2 = 12 combinations
triplets of 1, 4, 10 = 3×2×1 = 6 combinations
triplets of 4, 7, 10 = 2×2×1 = 4 combinations
triplets of 1, 7, 10 = 3x2x1 = 6 combinations

Thoughts

The first thought is brute force: let's make every combination of three things and see if they're different. For arrays as small as the example, that would be adequate, but that's going to be an O(N^3) algorithm.

Example 3 gives the game away. We don't need to enumerate the combinations, we just have to calculate the number of possibilities. This just got a lot easier.

Again we're going to have a frequency hash. Not only will that help with the calculation, the keys of the hash will all be different, trivially giving us n[i] ≠ n[j] ≠ n[k].

Let's go to the code, Bob. First, we already saw above how to make a frequency hash. Let's do that, and then make an array of the keys.

    my %freq; $freq{$_}++ for @num;
    my @n = keys %freq;
Enter fullscreen mode Exit fullscreen mode

Getting triplets out of @n can be done with an idiomatic triple loop that covers every combination of indices once.

      0 | 1 | 2 | 3 | 4 | 5 | . . . | n-3 | n-2 | n-1
   i: --------i----------------------->:
   j:             j------------------------->:
   k:                 k--------------------------->;
Enter fullscreen mode Exit fullscreen mode

The i index starts at zero and goes not quite up to the end, leaving one for j and one for k. The j index starts to the right of i, and ends where it leaves one element for k. The k index moves to the right of j and ends at the end of the array.

With i, j, and k in hand, we merely need to multiply the counts from the frequency hash.

    my $count = 0;
    for ( my $i = 0; $i <= $#n - 2 ; $i++ )
    {
        for ( my $j = $i+1; $j <= $#n-1; $j++ )
        {
            for ( my $k = $j+1 ; $k <= $#n; $k++ )
            {
                $count += $freq{$n[$i]} * $freq{$n[$j]} * $freq{$n[$k]};
            }
        }
    }
    return $count;
Enter fullscreen mode Exit fullscreen mode
💖 💪 🙅 🚩
boblied
Bob Lied

Posted on September 15, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related