Git Rebase: Learn2Blog Refactor

This week, I embarked on a journey to enhance my ongoing open-source project, Learn2Blog. This is a command-line tool I developed to convert plain text and markdown files into HTML, making it a handy utility for bloggers and content creators. In this article, we'll explore the recent refactoring efforts undertaken to improve the codebase's maintainability, extensibility, and overall code quality.

The Need for Refactoring

Refactoring is a crucial practice in software development that involves restructuring and optimizing code without changing its external behaviour. The primary motivations behind the refactoring of Learn2Blog were as follows:

Modularity for Future Development: The original code was functional but lacked the modularity required for future feature additions and maintenance. By refactoring, we aimed to make the codebase more extensible and easier to work with.
Code Quality: Improving code quality not only makes the project more maintainable but also enhances its overall performance. This is crucial for an open-source project meant for public use.
Better Readability: Code readability is essential for collaboration. By making the code more readable and organized, it becomes more accessible to potential contributors.
Bug Fixing: Addressing longstanding bugs and issues to ensure a smoother user experience was another vital goal.

Refactoring: Step by Step

Let's dive into the step-by-step process of refactoring Learn2Blog. We'll discuss the improvements made to the project's structure, as well as changes to the FileProcessor class.

Improved Project Structure

In the refactored code, we introduced a cleaner project structure to enhance organization and code separation. We created a Learn2Blog.cs file, which serves as the entry point for the application. This centralization simplifies the entry point and makes it more cohesive.

public class Learn2Blog
{
    public static void Run(string[] args)
    {
        CommandLineOptions options = CommandLineParser.ParseCommandLineArgs(args) ?? new CommandLineOptions { InputPath = "", OutputPath = "" };
        if (options.ShowVersion) CommandLineUtils.ShowVersion();
        else if (options.ShowHelp) CommandLineUtils.ShowHelp();
        else FileProcessor.ProcessFiles(options);
    }
}

This new structure facilitates ease of understanding and navigation, which is crucial when working on a collaborative, open-source project. Now, the Main function in Program.cs contains the following code:

class Program
    {
        static void Main(string[] args)
        {
            Learn2Blog.Run(args);
        }
    }

FileProcessor - Breaking Down the Complexity

The most significant improvement during this refactoring process happened in the FileProcessor class. This class is responsible for processing input files. In the original code, file processing logic was coupled with command-line argument parsing, creating a challenging codebase to manage.

In the old code, the FileProcessor class did not exist, and the file processing logic was scattered throughout the application. In the refactored code, the FileProcessor class now separates file-processing logic and command-line argument parsing. This separation enhances the code's modularity and makes it easier to maintain and extend.

Handling Single File Processing

One of the key changes in the refactored code is the introduction of the ProcessFile method. This method is responsible for processing a single file. Here's how it works:

private static void ProcessFile(string inputPath, string outputPath)
{
    string ext = Path.GetExtension(inputPath);
    string html;
    try
    {
        string text = File.ReadAllText(inputPath);
        string body = "";

        if (ext == ".txt")
        {
            body = ProcessText(text);
        }
        else
        {
            body = ProcessMarkdown(text);
        }

        html = HtmlGenerator.GenerateHtmlFromText(Path.GetFileNameWithoutExtension(inputPath), body);
    }
    catch (Exception ex)
    {
        CommandLineUtils.Logger($"Error processing file {inputPath}: {ex.Message}");
        return;
    }

    string outputFileName = GetUniqueOutputFileName(inputPath, outputPath);
    SaveHtmlFile(outputFileName, html);

    CommandLineUtils.Logger($"File converted: {outputFileName}");
}

The ProcessFile method efficiently handles the conversion of individual input files. It identifies the file type, processes the content, generates HTML, and saves it with a unique filename to avoid overwriting existing files.

Processing Files in a Directory

In the refactored code, we introduced the ProcessFilesInDirectory method. This function is responsible for processing all files in a specified directory. Here's how it works:

private static void ProcessFilesInDirectory(string inputDirectory, string outputDirectory)
{
    string[] files = Directory.GetFiles(inputDirectory, "*.txt").Union(Directory.GetFiles(inputDirectory, "*.md")).ToArray();

    if files.Length == 0)
    {
        CommandLineUtils.Logger($"No .txt or .md files found in directory {inputDirectory}");
        return;
    }

    CommandLineUtils.CreateOutputDirectory(outputDirectory);

    foreach (string file in files)
    {
        ProcessFile(file, outputDirectory);
    }
}

The ProcessFilesInDirectory method scans the specified directory for .txt and .md files, and then, for each file found, it calls the ProcessFile method for individual file processing. This separation of responsibilities enhances code modularity and maintainability.

Ensuring Unique Output Filenames

Another noteworthy addition in the refactored code is the GetUniqueOutputFileName method. This method is responsible for generating a unique output filename when saving converted HTML files. This approach ensures that no files are inadvertently overwritten.

private static string GetUniqueOutputFileName(string inputPath, string outputPath)
{
    string fileName = Path.GetFileNameWithoutExtension(inputPath);
    string outputFileName = Path.Combine(outputPath, fileName + ".html");

    int fileNumber = 1;
    while (File.Exists(outputFileName))
    {
        outputFileName = Path.Combine(outputPath, $"{fileName}_{fileNumber}.html");
        fileNumber++;
    }

    return outputFileName;
}

Improved HTML Generation

In the refactored code, the HTML generation logic has been moved to a separate class called HtmlGenerator. This separation enhances code modularity. Instead of calling AppendLine repetitively, the HtmlGenerator class generates the HTML content as a single string, which embeds both the title and body.

Here's how the HtmlGenerator class works:

public static class HtmlGenerator
{
    public static string GenerateHtmlFromText(string title, string body)
    {
        string html = $@"
<!DOCTYPE html>
<html>
<head>
    <title>{title}</title>
</head>
<body>
    {body}
</body>
</html>";

        return html;
    }
}

This approach simplifies HTML generation, making the code more readable and maintainable.

Git Rebase for a Cleaner History

Maintaining a clean and organized Git history is essential for open-source projects. During this refactoring process, I used Git's interactive rebase feature to squash and clean up my commit history. This approach ensures that the project's history is coherent and easier to review for potential contributors.

$ git rebase main -i
$ git push origin main

I also recommend using the GitLens extension for Visual Studio Code, which provides a convenient GUI for Git rebase operations.

Rigorous Testing

Throughout the refactoring process, I performed rigorous testing to ensure that all existing functionalities were preserved as intended. This included manual testing only, as I have not developed a unit tester for this application yet. But that will be coming in the next few weeks, so stay tuned! By running tests at each refactoring step, I could quickly identify and rectify any issues and bugs before moving on to the next phase.

A Few Bumps in the Road

As with any significant code refactoring, some issues surfaced along the way. But in this case, nothing major. There was a minor hiccup with my .gitconfig file, which was not correctly linked to my external editor. After resolving this issue and correctly linking Visual Studio Code as my external editor, the rebase process proceeded smoothly.

The Final Commit

After completing the refactor, I squashed all 14 commits into a single one to create a clean and cohesive history. Using git commit --amend, I updated the commit message for this final commit, and then I merged it into the main branch and pushed the changes to GitHub. The final commit can be reviewed here.

I also pushed all the changes made during the refactor to a dedicated branch called refactor. This branch contains the entire commit history related to this refactoring effort.

Conclusion

Code refactoring is a vital part of maintaining a healthy open-source project. It improves code quality, enhances modularity, and makes the project more inviting to contributors. By following best practices and rigorous testing, we can ensure that existing functionalities remain intact while improving the codebase.

In the case of Learn2Blog, this refactoring effort has laid the groundwork for future enhancements and a more user-friendly experience. If you're interested in contributing to this project or exploring the changes in detail, please check out the Learn2Blog repository on GitHub. Your contributions are always welcome!

Blog