Dear AI, can you translate the Rails Guide for me?

kevinluo201

Kevin Luo

Posted on July 30, 2023

Dear AI, can you translate the Rails Guide for me?

TL;DR;

I used ChatGPT API to translate the Rails Guide into different languages:

Update on 2023/08/12

I added 3 more langauges

What's the Rails Guide?

I guess people who read this article already know Rails, however, just in case, I'll briefly introduce Ruby on Rails and the Rails Guide. Feel free to skip this section if you already knew them.

Ruby on Rails is a full-stack web application framework. With Rails, you can build a website that can access your database's data, return as API payload or render them on the user's browser easily and safely. The Rails Guide is the user manual for developers to learn how to use Rails. The Rails Guide is also a crowd-creation and is in the same repository on GitHub. It has very high quality because it is reviewed and modified again and again by numerous seasoned Rails developers. For anyone who wants to learn Ruby on Rails, I will definitely recommend they read the guide first.

Why translate the Rails Guide?

Translating the Rails Guide is not for diversity. The Ruby on Rails guide is written exclusively in English and it is totally fine. However, there are many talented developers all around the world who just cannot read English well. It is really a pity that they don't have a chance to get in touch with this wonderful and powerful web framework, Ruby on Rails, just because it lacks the information in their languages. I believe by translating the Rails Guide, we'll have a better chance for people all over the world to learn Rails.

Why use generative AI to translate Rails Guide?

First of all, generative AI can produce more human text. Moreover, with more context, it can generate more accurate and suitable translations. You must have read some articles which you could tell immediately that were translated by Google Translate because they felt very unnatural.

Second, although there are already many repositories of rails guide in different languages, https://guides.rubyonrails.org/contributing_to_ruby_on_rails.html#translating-rails-guides. However, the problem is that most of them are out of date. Those repositories also depend on volunteers' efforts. The Rails community used to have some enthusiastic fans who were willing to help translate the guide. Unfortunately, since the popularity of Rails plummeted, it hasn't had enough volunteers to continue the work. Using Generative AI to translate documents saves time and human effort. One person can refine the translation result by his/herself easily. It also means that we can update them more frequently. It could be a more sustainable method.

Proposed Workflow

My original plan was simple.

  1. Write a script to read the Rails guide files and send their content to ChatGPT to translate to a specified language.
  2. Then use the existing Rails Guide script to generate HTML files just like the current translation workflow I may wrap the code into a class, AiTranslator, so it should be like this

Original Idea

However, it was not as simple as I imagined 😅

Challenges

There are many challenges in this simple task. I picked some more significant ones here.

Tokens

ChatGPT or other generative AI models can only accept a limited number of tokens. Tokens are composed of both input and output strings. It's not the number of characters or words but only correlated. Tokens are also used for OpenAI to charge your bill.

The current most popular model, gpt-3.5-turbo only allows 4097 tokens for one request. Remember, it's used for both input and output. That means I cannot just upload a whole file to ChatGPT but I need to process a file piece by piece.

Maybe you think: it's easy, you can just send 1 to 2 phrases for a ChatGPT API call, then you'll never exceed the limit.

You're right. However, each ChatGPT request is independent, they don't share any context. I can show you an exmaple of the web page's ChatGPT. If I ask ChatGPT "Do you know NBA?" then ask it "Who's the champion of 2019?
". It will answer it's Toronto Raptors.

context ex1

However, if I only ask "Who's the champion of 2019?" directly in a new session, ChatGPT will not be able to answer me because of lacking context.

context ex2

Unlike Google Translate which is like a strengthened dictionary. We'd better treat the Generative AI model like a very smart student. The more input you give it, the better the result it returns to you. As a result, I want to feed ChatGPT text as much as possible so it can have appropriate context to translate the Rails Guide properly.

My approach is like the code block below.



buffer = []
result = ''
File.readlines(file).each do |line|
  if line == "\n" && buffer.join.split.length > @buffer_size
    translated_text = ai_translate(buffer.join)[:text]
    result += translated_text + "\n"
    buffer = []
  else
    buffer << line
  end
end


Enter fullscreen mode Exit fullscreen mode
  1. I declare a buffer = [] at the beginning.
  2. Iterate a file line by line. For each iteration, I'll put one line into buffer
  3. When the number of words exceeds a threshold, I'll send the request to ChatGPT API with the content in the buffer. The threshold, @buffer_size, is defaulted as 700. It's just an empirical magic number
  4. Plus, we know paragraphs in markdown are separated by blank lines, therefore, I also want to translate a whole paragraph in one ChatGPT request.

Prompt phrase

The prompt phrase for the Generative AI model affects the result drastically. I tried a lot of different combinations. And eventually, I made it this way:



LANGUAGES = {
  'zh-TW' => "Traditional Chinese used in Taiwan(台灣繁體中文).",
  'lt' => 'Lithuanian',
  'fr' => 'French',
  'pt-BR' => 'Brazilian Portuguese',
  'th' => 'Thai',
  'zh-CN' => 'Simplified Chinese',
}
system_prompt ||= "Translate the technical document to #{LANGUAGES[@target_language]} without adding any new content."


Enter fullscreen mode Exit fullscreen mode
  • Translate the technical document: pointing out that we are translating a technical document excerpt so it will know it does not need to translate some elements like code blocks.
  • LANGUAGES[@target_language]: I don't know whether it is a unique problem for Traditional Chinese. Although they're both Chinese words, the terminologies, writing style and intonation of Traditional Chinese in Taiwan are very different from what Simplified Chinese has. I need to specify it more clearly so I can get the desired result.
  • without adding any new content.: It is also important to tell ChatGPT not to add extra information because we're translating an article. Otherwise, it will just be like some annoying students in your classroom, who keep talking and add much needless knowledge.

Markdown parsing

The Rails Guide is full of code blocks for showing code examples. It's reasonable not to send a code block separately. I made the line reader a simple state machine. It will change the state to :codeblock when it starts parsing a codeblock and it won't call ChatGPT API until it finishes that block.



state = :readline
buffer = []
result = ''
File.readlines(file).each do |line|
  if line.include?("` ` `") # I need to add spaces between the backtick(`), or Dev.to will have problem
    buffer << line
    state = state == :codeblock ? :readline : :codeblock
  elsif line == "\n" && state == :readline && buffer.join.split.length > buffer_size
    translated_text = ai_translate(buffer.join)[:text]
    result += translated_text + "\n"
    buffer = []
  else
    buffer << line
  end
end


Enter fullscreen mode Exit fullscreen mode

Anchors

When you open any rails guide's page, you can see there's a Chapters block on the right serving as a table of content.

Chapters

That table is generated automatically by a script. The titles, such as <h1>, <h2>, <h3>, etc. will be assigned id with the title's text. For example, if the title is "Guide Assumption" in the markdown,



### Guide Assumption


Enter fullscreen mode Exit fullscreen mode

it will be rendered as in the final HTML



<h3 id="guide-assumptions">...</h3>


Enter fullscreen mode Exit fullscreen mode

The link in the table of content can then be referred to the elements with that id value.

It works fine in the original Rails Guide. When you click a link in the Chapters, the browser will jump to the corresponding section. However, a problem happens once all titles are translated. After some investigation, I found that it's related to Turbo. I guess it's a Turbo's bug. My current solution is disabling Turbo for the links in the Chapters block.



<ol class="chapters" data-turbo="false">
...
</ol>


Enter fullscreen mode Exit fullscreen mode

Code

Repository: https://github.com/kevinluo201/rails-guide-ai
This repo is forked from the Rails repo so that it can pull the updates of the guide's files. It only has 2 new files:

It only has 2 new files.

  • guides/rails_guides/ai_translator.rb: it's the main program.
  • guides/ai_translate.rb: it's the starting point

You can do the following steps if you want to play around with it.

  1. Set a new environment variable call OPENAI_ACCESS_TOKEN and set its value to your personal access token on OpenAI.
  2. add a new language in RailsGuide::AiTranslator, for example, 'jp' => 'Japanese'
  3. Open the terminal, go to guides/ and start translating by executing ```bash

ruby ./ai_translate.rb jp

4. You can also translate a single file, just add a filename after the command
```bash


ruby ./ai_translate.rb jp getting_started.md


Enter fullscreen mode Exit fullscreen mode
  1. After all files are translated, you can just execute the rails existing script to generate HTML, CSS and JS. Unfortunately, it is likely to fail when you do that. Usually, it is because there are duplicated titles which lead to duplicated id in the HTML. You can fix it by finding out which title has the problem and can change that title a bit to avoid the problem. It can also have different problems when translating into different languages. Just try solving them so the process can finish.


bundle exec rake guides:generate:html GUIDES_LANGUAGE=jp

Enter fullscreen mode Exit fullscreen mode




Help Wanted

It is just an experimental project now. There are several issues that can be improved. If you think it is an interesting topic, feel free to discuss it with me.

Current Issues

Anchor links

The table of content is solved by disabling Turbo. However, there are anchor links spread among the articles. They cannot be converted to the correct URL smoothly, especially when it refers to an anchor on another page.

Versioning

The Rails Guide has versions. A version is kind of a snapshot of the guide at a particular time. I haven't thought of a good way to manage them.

Different models

I'm now using gpt-3.5-turbo. I live in Canada so I cannot use Google's Bard. Feel free to change the code to be able to switch different models, like gpt4 or llamas 2

EPUB

Epub files can be generated by the Rails guide script. However, it has errors when I want to import them into the Epub reader software, such as "Books" on OSX. I think it may related to the broken anchor links.

Other stuff

If you have any ideas that can make this project more sustainable, please discuss it with me. For example, it's a guide for Rails, why not build it as a Rails app?

Conclusion

The quality of AI translation is not perfect but acceptable. I'm not concerned about the quality. As far I can see, the limitation of tokens and the trained model are the most significant factors. I believe this problem will be solved by swapping the current model (gpt-3.5-turbo) with a more advanced model in the future. The result shows that this workflow really works and that's the most important lesson for me.

About the cost, I have done many experiments for this idea and I translated the Rails Guides into 6 different languages. It costs me about $27 so each version of the translation costs less than $5 on average. The actual price should be less than that because many experiments just failed.

usage chart

*Due to its good quality and low cost, Generative AI might be a good solution for technical documents of open-source projects. *

Buy me a coffee

At last, if you like what i'm doing, you can buy me a coffee 😉☕️
Buy Me A Coffee](https://www.buymeacoffee.com/kevinluo)

💖 💪 🙅 🚩
kevinluo201
Kevin Luo

Posted on July 30, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related