Sulman Baig
Posted on October 31, 2024
As developers, we often face the challenge of understanding a large codebase, whether it's a project we haven't touched in years or someone else's code. With complex folder structures and numerous files, it can be overwhelming to locate specific components or simply grasp the bigger picture. This is especially true when using AI tools like Perplexity or Claude, where attaching an entire codebase for reference isn't practical. What if you could easily convert your entire codebase into a readable Markdown file? Enter my Ruby script that documents a project by turning all of its files into a single Markdown file.
The Idea Behind the Script
The main goal of this script was to create something lightweight that helps navigate a codebase, making it easier to reference files without manually piecing together content. AI tools are great at generating insights, but asking for help often involves attaching large chunks of code. By converting your project into a Markdown file with an organized table of contents, you can easily share a high-level overview of the project—along with specific snippets—to get effective help without overwhelming anyone.
The Script
GIST - Generating a Markdown Documentation for Your Codebase with Ruby
require 'fileutils'
ALWAYS_IGNORE = ['.git', 'tmp', 'log', '.ruby-lsp', '.github', '.devcontainer'].freeze
def read_gitignore(directory_path)
gitignore_path = File.join(directory_path, '.gitignore')
return [] unless File.exist?(gitignore_path)
File.readlines(gitignore_path).map(&:chomp).reject(&:empty?)
end
def ignored?(path, base_path, ignore_patterns)
relative_path = path.sub("#{base_path}/", '')
# Check if the path starts with any of the ALWAYS_IGNORE directories
return true if ALWAYS_IGNORE.any? { |dir| relative_path.start_with?(dir + '/') || relative_path == dir }
ignore_patterns.any? do |pattern|
File.fnmatch?(pattern, relative_path, File::FNM_PATHNAME | File::FNM_DOTMATCH) ||
File.fnmatch?(File.join('**', pattern), relative_path, File::FNM_PATHNAME | File::FNM_DOTMATCH)
end
end
def convert_to_markdown(file_path)
extension = File.extname(file_path).downcase[1..]
format = extension.nil? || extension.empty? ? 'text' : extension
begin
content = File.read(file_path, encoding: 'UTF-8')
"# #{File.basename(file_path)}\n\n```
#{format}\n#{content}\n
```\n\n"
rescue StandardError => e
"# #{File.basename(file_path)}\n\n[File content not displayed: #{e.message}]\n\n"
end
end
def sanitize_anchor(text)
text.gsub(/[^a-zA-Z0-9\-_]/, '-').gsub(/-+/, '-').downcase
end
def process_directory(directory_path, output_file)
ignore_patterns = read_gitignore(directory_path)
markdown_content = "# File Documentation\n\n## Table of Contents\n\n"
file_contents = []
Dir.glob("#{directory_path}/**/*", File::FNM_DOTMATCH).each do |file_path|
next if File.directory?(file_path)
next if ['.', '..'].include?(File.basename(file_path))
next if ignored?(file_path, directory_path, ignore_patterns)
relative_path = file_path.sub("#{directory_path}/", '')
anchor = sanitize_anchor(relative_path)
markdown_content += "- [#{relative_path}](##{anchor})\n"
file_contents << "## #{relative_path}\n\n#{convert_to_markdown(file_path)}"
end
markdown_content += "\n---\n\n" + file_contents.join("\n---\n\n")
File.write(output_file, markdown_content)
puts "Markdown file created: #{output_file}"
end
# Check if correct number of arguments are provided
if ARGV.length != 2
puts "Usage: ruby ruby_to_md.rb <input_directory> <output_file>"
exit 1
end
input_directory = ARGV[0]
output_file = ARGV[1]
process_directory(input_directory, output_file)
What Does the Script Do?
The script reads all the files in a given project folder, creates a structured table of contents, and converts each file's content into a Markdown-formatted section. The generated Markdown file gives you:
- An Easy-to-Navigate Table of Contents: Every file is listed with clickable links, allowing quick access to the contents of each one.
- Readable File Content: Each file is included in Markdown format, properly formatted for easy readability.
-
Filtering of Unnecessary Files: Folders like
.git
, temporary directories (tmp
,log
), and files listed in.gitignore
are automatically skipped.
Here's how you can use the script:
ruby ruby_to_md.rb /path/to/your/project output.md
This will generate a output.md
file with the complete content of your project, allowing you to browse and share it easily.
Breaking Down the Script
Let's walk through the key parts of the script:
-
Ignoring Unwanted Files:
The script ensures that irrelevant folders, like.git
and those listed in.gitignore
, are ignored. This makes sure that only the necessary parts of your codebase are documented:
ALWAYS_IGNORE = ['.git', 'tmp', 'log', '.ruby-lsp', '.github', '.devcontainer'].freeze
-
Markdown Conversion:
The script reads each file and converts it to a Markdown block that specifies the language of the file based on its extension. This helps Markdown renderers (like GitHub or VS Code) display code with proper syntax highlighting:
def convert_to_markdown(file_path) extension = File.extname(file_path).downcase[1..] format = extension.nil? || extension.empty? ? 'text' : extension begin content = File.read(file_path, encoding: 'UTF-8') "# #{File.basename(file_path)}\n\n\``` #{format}\n#{content}\n\ ```\n\n" rescue StandardError => e "# #{File.basename(file_path)}\n\n[File content not displayed: #{e.message}]\n\n" end end
-
Organizing Everything with a Table of Contents:
To make navigation easy, the script generates a table of contents at the beginning of the Markdown file:
def sanitize_anchor(text) text.gsub(/[^a-zA-Z0-9\-_]/, '-').gsub(/-+/, '-').downcase end markdown_content = "# File Documentation\n\n## Table of Contents\n\n"
Every file is listed here with an anchor link, so you can quickly jump to the specific part of the Markdown file.
Use Cases for This Markdown Documentation
- Sharing Code for Review: When collaborating with others, sharing a single Markdown file that documents the entire codebase can be extremely helpful for reviews and discussions.
- Getting Help from AI Tools: Some AI tools don't have the ability to analyze an entire project directory, but they do allow you to attach files. Instead of attaching dozens of files individually, you can attach this single Markdown file that documents everything.
- Better Understanding of a New Project: When working with an unfamiliar codebase, this documentation can serve as an effective way to explore it without getting lost in the file structure.
Next Steps
If this script sounds useful to you, give it a try on your own projects! You can tweak the ALWAYS_IGNORE
list or the .gitignore
handling to suit your specific needs. I'd love to hear how you use this tool, and I'm open to suggestions for improvements.
Feel free to share your thoughts or even fork it to add new features. Let's make code navigation and understanding easier for everyone!
Posted on October 31, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.