Using a Python Markdown ast to Find All Paragraphs

waylonwalker

Waylon Walker

Posted on February 5, 2022

Using a Python Markdown ast to Find All Paragraphs

In looking for a way to automatically generate descriptions for pages I stumbled into a markdown ast in python. It allows me to go over the markdown page and get only paragraph text. This will ignore headings, blockquotes, and code fences.

import commonmark parser = commonmark.Parser() ast = parser.parse(p.content)

paragraphs = '' for node in ast.walker():
    if node[0].t == "paragraph":
        paragraphs += " "
        paragraphs += node[0].first_child.literal
Enter fullscreen mode Exit fullscreen mode

It's also super fast, previously I was rendering to html and using beautifulsoup to get only the paragraphs. Using the commonmark ast was about 5x faster on my site.

💖 💪 🙅 🚩
waylonwalker
Waylon Walker

Posted on February 5, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related