Release 0.9.0 of `@xmldom/xmldom`
Christian Bewernitz
Posted on August 29, 2024
Context
xmldom is a javascript ponyfill to provide the following APIs that are present in modern browsers to other runtimes:
- convert an XML string into a DOM tree ```
new DOMParser().parseFromString(xml, mimeType) => Document
- create, access and modify a DOM tree
new DOMImplementation().createDocument(...) => Document
- serialize a DOM tree back into an XML string
new XMLSerializer().serializeToString(node) => string
Source: xmldom readme
History
Since I started contributing to the forked xmldom
library in June 2020, there have been 40 releases.
It is a very interesting and challenging project and will most likely stay that way for quite a while.
According to GitHub over 50 people have contributed to it since it was forked.
Thank you again to all contributors.
And this doesn't count all the people that managed to make the move from the original unscoped xmldom
package, to the scoped @xmldom/xmldom
package version 0.7.0
to get all security fixes.
The most recent version released as the lts
tag is 0.7.13
.
The last version with breaking changes was 0.8.0
which was released on Dec 22, 2021, almost 3 years ago.
The most recent version released as latest
is 0.8.10
.
0.9.0 (2024-08-29)
But what I want to talk about today is all the stuff that has been released under the next
tag since October 2022.
I'm really excited about those changes since they are providing a clear foundation for potential future changes.
TLDR: More alignment with the specs, and differences are made as explicit as possible.
1. Enforcing mimeType
to give back control
One aspect that makes the implementation complex, is that there are different rules for parsing XML vs HTML.
xmldom
(to some degree) "supported" both flavors from the beginning. It was even not required to pass a mimeType
at all: What rules to apply was decided based on the current default namespace of the XML string/node that was currently being parsed.
This ends with 0.9.0: From now on the mimeType
in DOMParser.parseFromString(xml, mimeType)
is mandatory and is the only thing that is ever checked to decide whether to apply XML or HTML rules. Basta.
And that information is preserved in the resulting Document
(new type
property), so when serializing it, the proper rules are applied again.
This was a massive (and potentially breaking) change, but I'm really excited it is ready, since it made tons of related bug fixes possible/way simpler to implement and also reduces the complexity of the API and the implementation.
Additionally it now only accepts the mime types specified, and throws a TypeError
in any other case.
Strictness and Error handling
An aspect that personally confuses me about the error handling of the native browser API is that it always returns a Document and if something went wrong, a parsererror
node will be the first child of the body
:
Since error handling never worked this way in xmldom
but the existing error handling was very complex and confusing and badly documented, 0.9.0 simplifies it and now has a (way more) consistent behavior towards any potential error that happens during parsing:
It throws a ParseError
🎉, e.g. in one of the following cases:
- In previous versions it was possible for some non well-formed XML strings, that the returned
Document
would not have adocumentElement
, which will most likely lead toTypeError
s later in the code. - several non well-formed XML strings will now properly be reported as
fatalError
which now always prevents any further processing. - several things that have previously not been reported as an error or only have been reported as a
warning
are now also reported as afatalError
There are still cases left which are reported as a warning
(especially when parsing HTML) or as an error
which do not stop the data from being processed, but the new error handling makes it very easy to decide how strict the code that uses xmldom
needs to be.
The (non spec compliant) option that can be passed to the DOMParser
constructor is called onError
.
it takes a function with the following signature:
function onError(level:ErrorLevel, message:string, context: DOMHandler):void;
-
ErrorLevel
is eitherwarning
,error
orfatalError
- xmldom already provides an implementaiton for the two most common use cases:
-
onErrorStopParsing
to throw aParseError
also for allerror
level issues -
onWarningStopParsing
to throw aParseError
also for allerror
level issues
-
It is a recommendation to apply one of them to stop processing XML on the first signal of anything unexpected:
// prevent parsing of XML that has error
s
new DOMParser({onError: onErrorStopParsing}).parseFromString(...)
// prevent parsing of XML that has warning
s
new DOMParser({onError: onWarningStopParsing}).parseFromString(...)
compareDocumentPosition
, extended HTML entities , null
instead of undefined
, ...
Another fork of the original xmldom
repository made it's way back into our repo by extending the HTML entities to the complete set (also available in 0.8.x) and porting over the implementation of the compareDocumentPosition
API. Thank you, and welcome @zorkow
Along the way several places where xmldom so far returned undefined
instead of null
, have been fixed to adhere to the spec.
And I discovered that the former author seems to have preferred iterating from the end of a list in so many places, that attributes were processed in the reverse order in multiple places, which is now fixed.
The implementation of the removeChild
API changed quite a bit, to comply to the spec and throws a DOMException
when it should.
And 3 related bugs were fixed in a way that clearly states what the future direction of xmldom
is:
Support for lax HTML parsing rules will only be provided if proper strict XML parsing doesn't suffer from it.
The former (broken) "support" for automatic self closing tags in HTML is gone.
coctype internalSubset
More recently @shunkica
invested a huge amount of time end effort to fix tons of issues in the former handling of the internalSubset
part of the !DOCTYPE
.
It is now preserved as part of the internalSubset
property of the doctype
of a Document
and many wrong doctype declarations are now correctly detected as such and reported as a fatalError
.
Also thanks to @kboshold
for the latest bug fix in this area.
Along the way we created a new module containing regular expressions for the relevant grammar, and correctness checks are based on those and they are properly covered by tests.
It is not the goal of xmldom to become a validating parser, but this a great step to support those documents that come with more complex DTDs.
And there is even more
Up to now development was done using Node v10, since this is also the lowest version xmldom currently supports. As part of the work on the upcoming version, I decided to switch to v18 for development, since more and more devDependencies also made this a minimum requirement. This will be the new minimum runtime version for the time being starting with this release.
I initiated a public poll / dicussion to ask people which version of Node or other runtimes they need support for.
The next breaking release will most likely drop support for some older Node versions, if there is no feedback indicating something different.
Along the way plenty of APIs have received jsdoc comments with proper types.
Thank you
for taking the time to read through all of this.
Those are quite some changes, and I'm very excited to be able to ship those.
I hope you are as excited as I am :)
If you need more details you can go through the very detailed changelog, or head over to the repository and join or start a discussion or file an issue.
Posted on August 29, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.