How to visualize timeline of a Wiki article?
Sanjaya Kumar Saxena
Posted on April 18, 2023
Automatic generation of the timeline — a graphical representation of a time period, on which important events are marked — from a Wikipedia article is a fascinating idea and very useful in quickly grasping the historical perspective. This post outlines the approach to create a well formatted timeline from any Wikipedia article using WinkNLP’s API and Named Entity Recognition (NER) feature:
- Fetch the article's contents and convert them into a WinkNLP document.
- Iterate through detected entities and filter only DATEs.
- Use shapes of dates to convert them into standard Unix time.
- Using parentSentence() API, extract the sentence containing the date; also markup() the date to highlight it in the corresponding sentence.
- Collect each Unix time and sentence pair in an array and sort them on Unix time.
- Converts this array into a well formatted timeline using Observable capabilities along with some CSS.
The above approach is realized in about 30 lines of code:
timeLine = {
const response = await fetch( `https://en.wikipedia.org/w/api.php?action=query&prop=extracts&titles=${WikiArticleTitle || '2022 United Nations Climate Change Conference'}&explaintext=1&formatversion=2&format=json&origin=*` );
const body = await response.json();
const text = body.query.pages[ 0 ].extract;
var doc = nlp.readDoc( text || '' );
var timeline = [];
doc
.entities()
.filter( ( e ) => {
var shapes = e.tokens().out( its.shape );
// We only want dates that can be converted to an actual
// time using new Date()
return (
e.out( its.type ) === 'DATE' &&
(
shapes[ 0 ] === 'dddd' ||
( shapes[ 0 ] === 'Xxxxx' && shapes[ 1 ] === 'dddd' ) ||
( shapes[ 0 ] === 'Xxxx' && shapes[ 1 ] === 'dddd' ) ||
( shapes[ 0 ] === 'dd' && shapes[ 1 ] === 'Xxxxx' && shapes[ 2 ] === 'dddd' ) ||
( shapes[ 0 ] === 'dd' && shapes[ 1 ] === 'Xxxx' && shapes[ 2 ] === 'dddd' ) ||
( shapes[ 0 ] === 'd' && shapes[ 1 ] === 'Xxxxx' && shapes[ 2 ] === 'dddd' ) ||
( shapes[ 0 ] === 'd' && shapes[ 1 ] === 'Xxxx' && shapes[ 2 ] === 'dddd' )
)
);
})
.each( ( e ) => {
e.markup();
let eventDate = e.out();
if ( isNaN( eventDate[ 0 ] ) ) eventDate = '1 ' + eventDate;
timeline.push({
date: e.out(),
unixTime: new Date( eventDate ).getTime() / 1000,
sentence: e.parentSentence().out( its.markedUpText )
})
});
return timeline.sort( ( a, b ) => a.unixTime - b.unixTime )
}
You can see it in action on an interactive Observable notebook — "How to visualize timeline of a Wiki article?".
About winkNLP
WinkNLP is a developer friendly JavaScript library for Natural Language Processing (NLP). It can easily process large amount of raw text at speeds over 650,000 tokens/second on a M1 Macbook Pro in both browser and Node.js environments. It even runs smoothly on a low-end smartphone's browser.
It is built ground up with a lean code base that has no external dependency. A test coverage of ~100% and compliance with the Open Source Security Foundation best practices make winkNLP the ideal tool for building production grade systems with confidence.
Posted on April 18, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.