Justifying a paragraph of text in Purescript: Part 1
druchan
Posted on July 18, 2023
Following-up on the leetcode stuff I was doing recently, I
decided to pick up another Leetcode problem. This time,
text justification algorithm.
Quick rundown of the problem:
- you're given an array of words (eg
["This", "is", "an", "example", "of", "text", "justification."]
) - and a max-length per line (eg
16
) - and you have to space the words in such a way that the resulting text/paragraph is "justified".
The rule for spacing is simple:
- one space between words
- if that does not fit the line (length of line = 16), introduce more spaces between words
- more spaces get distributed left to right so spaces on the left would get filled / incremented first and then the right.
- the last line can be left-aligned so just one space between the words is fine.
I spent a few minutes trying to find out how to extract the collection of words that would fit a line.
That is, if this is the text:
["This", "is", "an", "example", "of", "text", "justification."]
and the max length per line is 16,
the first line can only have ["This", "is", "an"]
.
This is because if I include the word "example", then the length of this line exceeds 16. Remember that I have to include spaces between the words as well in the calculation of line length.
I decided to park this and assume there's a function that gives me a "valid" line already and work on calculating the number of spaces to distribute to justify a "valid" line.
(A valid line is basically an array of words that can fit within the max-length per line including at least 1 space between each word).
In this example, some valid lines would be:
valid lines:
["This", "is", "an"]
["example", "of", "text"]
I decided to use a newtype
to differentiate between any random array of words and a valid line:
newtype ValidLine = ValidLine (Array String)
What I have to get to, as a first step, is this function:
validLineToParaLine :: Int -> ValidLine -> String
validLineToParaLine maxLen validLine = ? -- to be implemented
After a while of thinking about this, here's one logic I came up with:
- the actual length of the line is length of each word in the valid line, and spaces between them.
- the total number of spaces (single-space) in a valid line array is simply one less than the total number of words in the valid line array.
- if I subtract the actual length of the valid line from the deficit, I get the number of extra spaces I have to distribute.
This gives me the number of spaces to distribute.
Here's an example of the logic at work:
valid line -> ["This", "is", "an"]
spaces -> total words in valid line (3) minus 1 = 3-1 = 2
total characters -> sum of length of each word in valid line (4+2+2=7) + spaces(2) = 8+2 = 10
deficit -> max length (16) minus total chars (10) = 16-10 = 6
Okay, so now I have "extra number of spaces to distribute" in the line.
But it got tricky here as I had to find out how to distribute x
number of spaces across y
number of space-slots.
That is:
deficit = 6
6 spaces have to be distributed in this line: "This is an".
Visually, it looks easy. There are just 2 space-slots. Divide 6 by 2 = 3. So each slot gets 3 extra spaces. (Will replace space with hyphen to be more clear in the representation)
This----is----an
But what is the general logic here?
After much thought, I came up with what looks like a slightly-complicated solution.
Here's the logic:
- First off, instead of using the "deficit" (that is, how many more spaces need to be distributed besides the usual number of spaces), I combined the deficit with existing spaces so that now, I just have to worry about how many spaces to distribute in total between the words.
- Using the "total spaces to distribute" number, I construct another array that represents the number of spaces to actually put between each word in the valid line.
Here's an example of point #2:
valid line -> ["This", "is", "an"]
total spaces to -> usual spaces (2) + deficit (6) = 8
distribute
space distribution -> ["----","----"]
array
If the total spaces to distribute was 9, then the array would look like this:
["-----","----"]
That is, the first item will have 5 spaces and the second will have 4.
If I have this array, I could simply merge these two arrays in some way to get the final result:
valid line -> ["This", "is", "an"]
spaces array -> ["----","----"]
justified -> ["This","----","is","----","an"]
The trick is to find out how to go from a "number of spaces to distribute" to a "special spaces array".
I'll post about that in the second part.
Posted on July 18, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.