German Robayo
Posted on July 26, 2020
Introduction
Hi folks! I haven't posted here since two weeks ago, so I'd like to give you an update on the progress of my #GSoC project.
The main subject of this post is the recovering of prefix whitespace on Record keys, which will allow users to document record types and literals' keys.
First problem
Haskell's Dhall's parser treats both block and single-line comments as whitespace, which ignores since Dhall's is whitespace insensitive.
A limitation of this implementation is that we cannot fully reconstruct a Dhall file with the consumed whitespace and therefore the comments. This has been reported in some old issues on the repository. In particular, dhall format
removes comments after formatting the files
dhall-format removes comments #145
Example file:
--- This is dhall file
{ foo = "jax" }
The Output it gives:
{ foo = "jax" }
Due to the size of the project, some simple solutions for this have been made. The parser preserves:
- The Header whitespace before the first non-whitespace character of the file. This is what
dhall-docs
supports at this moment. - Whitespaces in-between let-bindings components, as highlighted by this haddock comment
... but no more expressions have been handled yet because it's a little onerous to code and it introduces a lot of backward-incompatible changes.
Two weeks ago, I started to add support to Record keys prefix on both record literals and types. The whole work is on the following PR
feat(comments): support prefix comments on Record's key-value pairs. #1908
This PR tries to add support to record field comments both on formatting and dhall-docs documentation extension. Specifically the following ones:
{
-- before each record field
a : Text
-- after each record field
, ...
}
I updated the Record
's Expr
constructor from:
Record (Map Text (Expr s a))
to
Record (Map Text (RecordField s a))
where RecordField
stores the last of the expression on the right side of the :
and the prefix and suffix comments. Although the prefix comment is annotating the label field, the best place to save that comment is there.
However, I'm having a problem with unsafeSubExpressions
, because it can't traverse the Maybe s
I put on RecordField
. A quick solution for that is to change those RecordField
's Maybe s
' to Nothing
, although that means we loose that information when traversing.
I'll keep this PR as a draft to get your opinion in that matter.
I made this because:
- It will be another element that
dhall-docs
will document for - It will help to fix the issue with
dhall format
removing comments - To take more knowledge of the whole
dhall
codebase.
In the following sections, I'll describe how the task was made.
New data constructors
The Dhall's AST constructors for Record literals and types are the following:
data Expr s a
= ...
-- | > Record [(k1, t1), (k2, t2)] ~ { k1 : t1, k2 : t1 }
| Record (Map Text (Expr s a))
-- | > RecordLit [(k1, v1), (k2, v2)] ~ { k1 = v1, k2 = v2 }
| RecordLit (Map Text (Expr s a))
...
They don't contain any property that will allow us to recover whitespace. So the first thing we need to do is to modify the constructors to include them. I ended up writing the following data-type:
data RecordField s a = RecordField
{ recordFieldSrc :: Maybe s
, recordFieldValue :: Expr s a
} deriving (Data, Eq, Foldable, Functor, Generic, Lift, NFData, Ord, Show, Traversable)
And modify the Expr s a
constructors like this:
data Expr s a
=
-- | > Record [ (k1, RecordField _ t1) ~ { k1 : t1, k2 : t1 }
-- > , (k2, RecordField _ t2)
-- > ]
| Record (Map Text (RecordField s a))
-- | > RecordLit [ (k1, RecordField _ v1) ~ { k1 = v1, k2 = v2 }
-- > , (k2, RecordField _ v2)
-- > ]
| RecordLit (Map Text (RecordField s a))
...
The captured whitespace is the prefix of the label. Although the Map
on the constructor maps labels to values, the less cumbersome place to save that was on the RecordField
data-type. This will allow us to map the following:
{ -- my single-line before `a`
a = 1
}
To something (roughly) like this:
RecordLit [("a", RecordField "-- my single-line before `a`" (NaturalLit 1)]
Note that we don't save suffix whitespace right after each record value because the Dhall's parser consumes them and there is no pretty way that doesn't involve modifying a lot the parser to fix that. Thankfully, places like the following:
a {- A -} : {- B -} B
will be easy to recover in the future and add value in the context of #145. I didn't add support for those yet since I feel that people won't write a lot of useful documentation on that place, and I wanted to finish this as quickly as possible to start other tasks more directly related to my project.
After modifying those records, the cumbersome part of the task came in: tracking the type-checking errors on the whole project was exhausting. The good thing is that it wasn't difficult, and it helped me learn more about the whole project, which is good :)
Parser modifications
Dhall's parser is located at the Dhall.Parser.Expressions
module. It is built using megaparsec
parser combinators. The relevant sections of the code are here on alternative04
:
alternative04 = (do
_openBrace
src0 <- src whitespace
mComma <- optional _comma
-- `src1` corresponds to the prefix whitespace of the first key-value
-- pair. This is done to avoid using `try` to recover the consumed
-- whitespace when the comma is not consumed
src1 <- case mComma of
Nothing -> return src0
Just _ -> src whitespace
a <- recordTypeOrLiteral src1
_closeBrace
return a ) <?> "literal"
and here:
recordTypeOrLiteral firstSrc0 =
choice
[ emptyRecordLiteral
, nonEmptyRecordTypeOrLiteral firstSrc0
, emptyRecordType
]
On the first snippet note that we parse the prefix whitespace of the first key-value pair of the record, and pass it to the recordTypeOrLiteral
combinators. That allows us to share that whitespace over all possible choice
s on recordTypeOrLiteral
efficiently. If we didn't do that, we would recur to the try
function from megaparsec
that backtracks on parse failures, which increases the benchmark numbers.
By doing all this work, the prefix whitespace of the record was preserved. You can check out that by using the following example:
{ -- I'm recovered
a = 1,
{- me too! -}
b = 2,
}
...on dhall haskell-syntax-tree --noted
To end
This was a hard job. I feel great that I'm contributing not only to my particular project but the whole Dhall's Haskell's implementation.
I'd like to thank my mentors Simon and Gabriel for their help with this PR not only by giving me suggestions and guidelines but on the exhausting job of reviewing that huge PR.
Thanks for reading!
Posted on July 26, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.