Working with Markdown in Python

honeybadger_staff

Honeybadger Staff

Posted on February 20, 2023

Working with Markdown in Python

This article was originally written by Ravgeet Dhillon on the Honeybadger Developer Blog.

If you use the Internet, you have surely come across the term Markdown. Markdown is a lightweight markup language that makes it very easy to write formatted content. It was created by John Gruber and Aaron Swartz in 2004. It uses very easy-to-remember syntax and is therefore used by many bloggers and content writers around the world. Even this blog that you are reading is written and formatted using Markdown.

Markdown is one of the most widely used formats for storing formatted data. It easily integrates with Web technologies, as it can be converted to HTML or vice versa using Markdown compilers. It allows you to write HTML entities, such as headings, lists, images, links, tables, and more without much effort or code. It is used in blogs, content management systems, Wikis, documentation, and many more places.

In this article, you'll learn how to work with Markdown in a Python application using different Python packages, including markdown, front matter, and markdownify.

Prerequisites

To follow along with this tutorial, you’ll need the following:

  • Python v3.x
  • Basic understanding of HTML and Markdown

Setting Up a Project

Before proceeding with the project, you’ll need to set up a project directory to work in.

So, first, open up your terminal, navigate to a path of your choice, and create a project directory (python-markdown) by running the following commands in the terminal:

mkdir python-markdown
cd python-markdown
Enter fullscreen mode Exit fullscreen mode

Finally, create and activate the virtual environment (venv) for your Python project by running the following commands:

python3 -m venv
source venv/bin/activate
Enter fullscreen mode Exit fullscreen mode

That’s it. The project setup is complete.

Converting Markdown to HTML in Python

One of the most common operations related to Markdown is converting it to HTML. By doing so, you can write your content in Markdown and then compile it to HTML, which you can then deploy to a CDN or server.

First, install the python-markdown package by running the following command in the terminal:

pip install markdown
Enter fullscreen mode Exit fullscreen mode

Next, at your project’s root directory, create a main.py file and add the following code to it:

# 1
import markdown

markdown_string = '# Hello World'

# 2
html_string = markdown.markdown(markdown_string)
print(html_string)
Enter fullscreen mode Exit fullscreen mode

In the above code, you are doing the following:

  1. Importing the markdown module.
  2. Converting the markdown (markdown_string) to HTML (html_string) using the markdown method from the markdown package.

Finally, save your code and run the main.py file by running the following command in the terminal:

python main.py
Enter fullscreen mode Exit fullscreen mode

Once the code execution is complete, you’ll get the HTML output as follows:

Markdown to HTML.

You can try a more complex Markdown string like the one in the code below and use it to create HTML:

markdown_string = '''
# Hello World

This is a **great** tutorial about using Markdown in [Python](https://python.org).
'''
Enter fullscreen mode Exit fullscreen mode

In this example, you make use of headings, bold text, and links in Markdown.

Markdown to HTML.

Converting a Markdown File to HTML in Python

Most of the time, you’ll be working with Markdown files rather than Markdown strings. Therefore, it makes sense to learn how to convert a Markdown file to an HTML file.

To do so, first, create a sample.md file and add the following code to it:

# Hello World

This is a **Markdown** file.
Enter fullscreen mode Exit fullscreen mode

Next, replace the existing code in the main.py file with the following:

import markdown

# 1
with open('sample.md', 'r') as f:
    markdown_string = f.read()

# 2
html_string = markdown.markdown(markdown_string)

# 3
with open('sample.html', 'w') as f:
    f.write(html_string)
Enter fullscreen mode Exit fullscreen mode

In the above code, you are doing the following:

  1. Reading the sample.md and storing its content in the markdown_string variable.
  2. Converting the markdown (markdown_string) to HTML (html_string) using the markdown method from the markdown package.
  3. Creating a sample.html file and writing the HTML (html_string) to it.

Finally, save your code and run the main.py file by running the following command in the terminal:

python main.py
Enter fullscreen mode Exit fullscreen mode

Once the code execution is complete, you’ll see a sample.html file in your project’s root directory:

Markdown file to HTML file.

Converting HTML to Markdown in Python

Sometimes, a situation arises where you might want to convert HTML to Markdown. For this purpose, you can use the markdownify package in Python.

First, install the package by running the following command in the terminal:

pip install markdownify
Enter fullscreen mode Exit fullscreen mode

Next, replace the existing code in the main.py file with the following:

# 1
import markdownify

html_string = '''
<h1>Hello World</h1>
<p>This is a great tutorial about using Markdown in Python.</p>
'''

# 2
markdown_string = markdownify.markdownify(html_string)
print(markdown_string)
Enter fullscreen mode Exit fullscreen mode

In the above code, you are doing the following:

  1. Importing the markdownify module.
  2. Converting the HTML (html_string) to Markdown (markdown_string) using the markdownify method from the markdownify package.

Finally, save your code and run the main.py file by running the following command in the terminal:

python main.py
Enter fullscreen mode Exit fullscreen mode

Once the code execution is complete, you’ll get the Markdown output:

HTML to Markdown.

If you see the output above, you’ll see the headings (<h1>) created with the "underlining" with equal signs (=) instead of starting with hashtags (#). This is because Markdown comes with two styles of headers: Setext and atx, and by default, the Markdown parser uses Setext-style headers. You configure markdownify to use ATX-style headers by passing the heading_style='ATX' parameter to the markdownify method.

Markdownify also supports a number of options, including HTML tag stripping, HTML tag conversion, Markdown heading styles, and more.

Converting an HTML File to Markdown in Python

Previously, we converted a Markdown file to an HTML file. However, sometimes, you might need to convert an HTML file to a Markdown file.

To do so, first, create a sample.html file and add the following code to it:

<!DOCTYPE html>
<html lang="en">
<body>
    <h1>Hello World</h1>
    <p>This is a <strong>HTML</strong> file.</p>
    <a href="https://honeybadger.io/">Visit Honeybadger</a>
</body>
</html>
Enter fullscreen mode Exit fullscreen mode

Next, replace the existing code in the main.py file with the following:

import markdownify

# 1
with open('sample.html', 'r') as f:
    html_string = f.read()

# 2
markdown_string = markdownify.markdownify(html_string, heading_style='ATX')

# 3
with open('sample.md', 'w') as f:
    f.write(markdown_string)
Enter fullscreen mode Exit fullscreen mode

In the above code, you’re doing the following:

  1. Reading the sample.html and storing its content in the html_string variable.
  2. Converting the HTML (html_string) to Markdown (markdown_string) using the markdownify method from the markdownify package.
  3. Creating a sample.md file and writing the Markdown (markdown_string) to it.

Finally, save your code and run the main.py file by running the following command in the terminal:

python main.py
Enter fullscreen mode Exit fullscreen mode

Once the code execution is complete, you’ll see a sample.md file in your project’s root directory as follows:

HTML file to Markdown file.

Reading Markdown Front Matter in Python

In the world of Markdown, there are often some variables or metadata associated with a Markdown file. This is known as front matter. Front matter data variables are a great way to store extra information about a Markdown file. For example, a blog’s markdown files can have front matter variables like Title, Author, Image, Published At, and more.

You can specify front matter at the beginning of a Markdown file by placing the YAML data variables between triple-dashed lines. For example,

---
title: "Hello World"
Author: John Doe
published: 2020-01-20
---
Enter fullscreen mode Exit fullscreen mode

In Python, you can parse Markdown front matter with the python-front matter package.

To see this package in action, first, install the package by running the following command in the terminal:

pip install python-frontmatter
Enter fullscreen mode Exit fullscreen mode

Next, add the following front matter to the sample.md file:

---
title: Hello World
date: 2022-01-20
---
Enter fullscreen mode Exit fullscreen mode

Next, replace the existing code in the main.py file with the following:

# 1
import frontmatter

# 2
data = frontmatter.load('sample.md')

# 3
print(data.keys())
print(data['title'])
print(data['date'])
Enter fullscreen mode Exit fullscreen mode

In the above code, you are doing the following:

  1. Importing the frontmatter module.
  2. Reading the sample.md file using the load method from the frontmatter package and storing the result in the data variable.
  3. Accessing the front matter variables with the help of data.keys(). Since data is a dictionary, you can also access the individual keys (data['title'] or data['date']).

Finally, save your code and run the main.py file by running the following command in the terminal:

python main.py
Enter fullscreen mode Exit fullscreen mode

Once the code execution is complete, you’ll get the output of the front matter variables as follows:

Markdown front matter data.

Updating Markdown Front Matter in Python

Sometimes, a situation arises where you might want to convert HTML to Markdown. For this purpose, you can use the Python’s markdownify package.

You can also update the existing front matter data variables or add new ones using the front matter package.

To do so, first, replace the existing code in the main.py file with the following:

import frontmatter

# 1
data = frontmatter.load('sample.md')

# 2
data['author'] = 'John Doe'

# 3
data['title'] = 'Bye World'

# 4
updated_data = frontmatter.dumps(data)

# 5
with open('sample.md', 'w') as f:
    f.write(updated_data)
Enter fullscreen mode Exit fullscreen mode

In the above code, you are doing the following:

  1. Reading (frontmater.load()) the sample.md file.
  2. Adding a new key (author) to the front matter data variable and assigning it a value (John Doe).
  3. Updating the existing key (title) and assigning it a new value (Bye World).
  4. Serializing (frontmatter.dumps()) the data variable to a string and storing the result in the updated_data variable.
  5. Updating the sample.md file by writing the updated Markdown (updated_data) to it.

Finally, save your code and run the main.py file by running the following command in the terminal:

python main.py
Enter fullscreen mode Exit fullscreen mode

Once the code execution is complete, check the sample.md file for the updated front matter data, as follows:

Updated Markdown front matter data.

Using Python Markdown Extensions

The python-markdown package also supports extensions that allow you to modify and/or extend the default behavior of the Markdown parser. For example, to generate a table of contents (TOC), you can use the toc extension. There are other extensions, as well, which you can make use of based on your requirements.

To create a TOC for your Markdown content, first, replace the existing code in the main.py file with the following:

import markdown

# 1
markdown_string = '''
[TOC]

# Hello World

This is a **great** tutorial about using Markdown in [Python](https://python.org).

# Bye World
'''

# 2
html_string = markdown.markdown(markdown_string, extensions=['toc'])
print(html_string)
Enter fullscreen mode Exit fullscreen mode

In the above code, you are doing the following:

  1. Specifying the [TOC] string in your Markdown (markdown_string) where you want to add the table of contents.
  2. Adding the extensions parameter to the markdown method from the markdown package and specifying the extensions (['toc']) you want to use.

Finally, save your code and run the main.py file by running the following command in the terminal:

python main.py
Enter fullscreen mode Exit fullscreen mode

Once the code execution is complete, you’ll get the HTML output with the Table of Contents as a list:

Table of Contents.

Conclusion

Learning to work with Markdown can help you in lots of ways. Using Python, you can automate many tasks, including maintaining and manipulating Markdown files. For example, you can write a script that creates an index for all of your Markdown files in your blog or organize your Markdown files into different directories based on the front matter data variables, such as tags/categories.

Honeybadger, which is a cloud-based system for real-time monitoring, error tracking, and exception-catching, also uses Markdown to maintain our documentation. In case you are interested, we wrote a blog post in which we talk about how we built a documentation workflow in Rails.

💖 💪 🙅 🚩
honeybadger_staff
Honeybadger Staff

Posted on February 20, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

This Week In Python
python This Week In Python

October 11, 2024

readMeMaker: v0.1
ai readMeMaker: v0.1

September 21, 2024

Streamlit Part 1: Write and Text Elements
streamlit Streamlit Part 1: Write and Text Elements

September 4, 2024

rico: rich content to HTML as easy as Doc(df, plot)
datavisualization rico: rich content to HTML as easy as Doc(df, plot)

August 28, 2023