It's Easy to Build: Custom docs project | Part 2 - HTML to MD

cdadityang

Aditya Nagla

Posted on April 22, 2020

It's Easy to Build: Custom docs project | Part 2 - HTML to MD

Hello there, welcome to part 2 of my series to build a custom docs project. Since I'm a kind of DRY person, you should see part 1 of this series here to see the introduction and problem statement of this project.

In this tutorial, we'll learn how to convert our HTML code to the markdown text using the reverse_markdown gem. We'll also learn how to add a definition of a custom HTML tag(<bold></bold>) and parse it with this gem.

Note: This part is not something that we'll implement in our docs project. I wanted to write this to show you guys that we can do something like this for your future projects.

Getting Started

It's Easy to Build: Custom docs project | Part 2 - HTML to MD
Image from Pixabay

For this tutorial, you'll need Ruby installed in your machine. You can use this GoRails guide to setup ruby.

First, create a new folder and cd into it:

$ mkdir htmltomd
$ cd htmltomd
Enter fullscreen mode Exit fullscreen mode

Install the reverse_markdown gem:

  • First, you need to create a file named Gemfile
$ touch Gemfile
Enter fullscreen mode Exit fullscreen mode
  • Then add reverse_markdown gem to this file:
# Gemfile
source 'https://rubygems.org'
git_source(:github) { |repo| "https://github.com/#{repo}.git" }

# Your ruby version
## It's a good practice to mention it here!
ruby '2.6.5'

gem 'reverse_markdown', '~> 2.0'
Enter fullscreen mode Exit fullscreen mode

Install the dependencies:

$ bundle i
Enter fullscreen mode Exit fullscreen mode

Now create three files:

  • config.rb: We'll write our ruby code logic in this file
  • bold_converter.rb: We'll implement a custom tag <bold></bold> in our example, so reverse_markdown will not know about our custom tag, so we'll add this logic here.
  • index.html: We'll write our HTML code in this file
  • hello.md: The output of the converted HTML code to markdown will be injected in this file.
$ touch config.rb
$ touch bold_converter.rb
$ touch input.html
$ touch hello.md
Enter fullscreen mode Exit fullscreen mode

Codebase:

Let's start writing the code:

First, open up your favorite editor and write a sample HTML code in index.html file:

<h1>Hello world</h1>

<p>Hey there</p>

<hr />

<blockquote>Let there be the end</blockquote>

<table>
  <tr>
    <td>Hello 1</td>
    <td>Hello 2</td>
  </tr>
  <tr>
    <td>Body 1</td>
    <td>Body 2</td>
  </tr>
  <tr>
    <td>Body 3</td>
    <td>Body 4</td>
  </tr>
</table>

<bold>This is bold</bold>
Enter fullscreen mode Exit fullscreen mode

Now, open the config.rb file and write our logic:

# First, we need to import `reverse_markdown` library
require 'reverse_markdown'

# Then import the `bold_converter.rb` - See point 3
require './bold_converter'

# Now let us import our `index.html` and `hello.md` files:
## For `index.html`, we only give READ permission
## For `hello.md`, we'll give RE-WRITE permission
### This means, everytime we parse code,
### It will clean `hello.md` file and rewrite code.
index_html_file = File.open('./index.html', 'r')
hello_md_text_file = File.open('./hello.md', 'w+')

# Read the contents of `index.html` file
html_code = index_html_file.read

# Register our custom `<bold></bold>` tag - See point 3
ReverseMarkdown::Converters.register :bold, ReverseMarkdown::Converters::Bold.new

# Config for reverse_markdown
ReverseMarkdown.config do |config|
  ## If there is an unknown tag, raise an error
  config.unknown_tags = :raise
  config.github_flavored = true
end

# Now, convert the HTML code to MD
md_text = ReverseMarkdown.convert(html_code)

# Now, inject the MD text in `hello.md` file
hello_md_text_file.puts md_text

# Now, close the files
index_html_file.close
hello_md_text_file.close
Enter fullscreen mode Exit fullscreen mode
  • This was simple; everything is explained in the code comments above, and to summarize:
  • We first import the reverse_markdown gem.
  • Then we import our bold_converter.rb file - See next point
  • Then we import our index.html and hello.md files.
  • We then register our new Bold tag, add config to reverse_markdown and then convert our HTML code to MD.
  • We inject the converted MD text to hello.md file.
  • Finally, we close both the files as a good practice.
  • If you run the script now, you'll see an error because <bold></bold> tag is not yet defined. We'll get this error because we've used config.unknown_tags = :raise to raise errors when we have undefined tags. For more options details see the docs. We'll define it now!

Now, open the bold_converter.rb file and write our custom tag logic:

# bold_converter.rb

module ReverseMarkdown
  module Converters
    class Bold < Base
      # This is the main convert logic
      def convert(node, state = {})
        content = treat_children(node, state.merge(already_bolded_out: true))
        if content.strip.empty? || state[:already_bolded_out]
          content
        else
          " **#{content}**"
        end
      end
    end
  end
end
Enter fullscreen mode Exit fullscreen mode
  • treat_children method performs conversion of all child nodes. If <bold></bold> tag is already defined in reverse_markdown(which it is not), then we don't want to convert anything.
  • Even if the string is empty, we don't want to convert anything.
  • If everything is right, add ** to the content.

Running the script:

  • Run the script by using this command:
$ ruby config.rb
Enter fullscreen mode Exit fullscreen mode

Now, in the hello.md file, you'll see the HTML to MD converted text.

Summary and conclusion

It's Easy to Build: Custom docs project | Part 2 - HTML to MD
Image from Pixabay

Let's take a quick look at what we learned today:

  1. First of all, I introduced myself, so don't forget to connect with me at Twitter or elsewhere.
  2. Since I'm a DRY person, I referred to my part 1 article where you can see the introduction and problem statement.
  3. Then we saw what the prerequisites are, and we also set up and understood our file structure for this tutorial.
  4. We then wrote our ruby code logic to solve our problem, i.e., to convert an HTML code to markdown text and injecting it into an hello.md file.
  5. Finally, we took a quick look at this very summary...

That's it and Let there be the end. 🙏

💖 💪 🙅 🚩
cdadityang
Aditya Nagla

Posted on April 22, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related