Kevin Luo
Posted on July 30, 2023
TL;DR;
I used ChatGPT API to translate the Rails Guide into different languages:
- Taiwan's Traditional Chinese🇹🇼 https://ai.rails-guide.com/zh-TW
- French🇫🇷 https://ai.rails-guide.com/fr
- Lithuanian🇱🇹 https://ai.rails-guide.com/lt
- Brazilian Portuguese🇧🇷 https://ai.rails-guide.com/pt-BR
- Thai🇹🇭 https://ai.rails-guide.com/th
- Simplified Chinese🇨🇳 https://ai.rails-guide.com/zh-CN
Update on 2023/08/12
I added 3 more langauges
- Japanese🇯🇵 https://ai.rails-guide.com/jp
- Korean🇰🇷 https://ai.rails-guide.com/ko
- Espanõl🇪🇸 https://ai.rails-guide.com/es
What's the Rails Guide?
I guess people who read this article already know Rails, however, just in case, I'll briefly introduce Ruby on Rails and the Rails Guide. Feel free to skip this section if you already knew them.
Ruby on Rails is a full-stack web application framework. With Rails, you can build a website that can access your database's data, return as API payload or render them on the user's browser easily and safely. The Rails Guide is the user manual for developers to learn how to use Rails. The Rails Guide is also a crowd-creation and is in the same repository on GitHub. It has very high quality because it is reviewed and modified again and again by numerous seasoned Rails developers. For anyone who wants to learn Ruby on Rails, I will definitely recommend they read the guide first.
Why translate the Rails Guide?
Translating the Rails Guide is not for diversity. The Ruby on Rails guide is written exclusively in English and it is totally fine. However, there are many talented developers all around the world who just cannot read English well. It is really a pity that they don't have a chance to get in touch with this wonderful and powerful web framework, Ruby on Rails, just because it lacks the information in their languages. I believe by translating the Rails Guide, we'll have a better chance for people all over the world to learn Rails.
Why use generative AI to translate Rails Guide?
First of all, generative AI can produce more human text. Moreover, with more context, it can generate more accurate and suitable translations. You must have read some articles which you could tell immediately that were translated by Google Translate because they felt very unnatural.
Second, although there are already many repositories of rails guide in different languages, https://guides.rubyonrails.org/contributing_to_ruby_on_rails.html#translating-rails-guides. However, the problem is that most of them are out of date. Those repositories also depend on volunteers' efforts. The Rails community used to have some enthusiastic fans who were willing to help translate the guide. Unfortunately, since the popularity of Rails plummeted, it hasn't had enough volunteers to continue the work. Using Generative AI to translate documents saves time and human effort. One person can refine the translation result by his/herself easily. It also means that we can update them more frequently. It could be a more sustainable method.
Proposed Workflow
My original plan was simple.
- Write a script to read the Rails guide files and send their content to ChatGPT to translate to a specified language.
- Then use the existing Rails Guide script to generate HTML files just like the current translation workflow
I may wrap the code into a class,
AiTranslator
, so it should be like this
However, it was not as simple as I imagined 😅
Challenges
There are many challenges in this simple task. I picked some more significant ones here.
Tokens
ChatGPT or other generative AI models can only accept a limited number of tokens. Tokens are composed of both input and output strings. It's not the number of characters or words but only correlated. Tokens are also used for OpenAI to charge your bill.
The current most popular model, gpt-3.5-turbo
only allows 4097 tokens for one request. Remember, it's used for both input and output. That means I cannot just upload a whole file to ChatGPT but I need to process a file piece by piece.
Maybe you think: it's easy, you can just send 1 to 2 phrases for a ChatGPT API call, then you'll never exceed the limit.
You're right. However, each ChatGPT request is independent, they don't share any context. I can show you an exmaple of the web page's ChatGPT. If I ask ChatGPT "Do you know NBA?" then ask it "Who's the champion of 2019?
". It will answer it's Toronto Raptors.
However, if I only ask "Who's the champion of 2019?" directly in a new session, ChatGPT will not be able to answer me because of lacking context.
Unlike Google Translate which is like a strengthened dictionary. We'd better treat the Generative AI model like a very smart student. The more input you give it, the better the result it returns to you. As a result, I want to feed ChatGPT text as much as possible so it can have appropriate context to translate the Rails Guide properly.
My approach is like the code block below.
buffer = []
result = ''
File.readlines(file).each do |line|
if line == "\n" && buffer.join.split.length > @buffer_size
translated_text = ai_translate(buffer.join)[:text]
result += translated_text + "\n"
buffer = []
else
buffer << line
end
end
- I declare a
buffer = []
at the beginning. - Iterate a file line by line. For each iteration, I'll put one line into
buffer
- When the number of words exceeds a threshold, I'll send the request to ChatGPT API with the content in the
buffer
. The threshold,@buffer_size
, is defaulted as700
. It's just an empirical magic number - Plus, we know paragraphs in markdown are separated by blank lines, therefore, I also want to translate a whole paragraph in one ChatGPT request.
Prompt phrase
The prompt phrase for the Generative AI model affects the result drastically. I tried a lot of different combinations. And eventually, I made it this way:
LANGUAGES = {
'zh-TW' => "Traditional Chinese used in Taiwan(台灣繁體中文).",
'lt' => 'Lithuanian',
'fr' => 'French',
'pt-BR' => 'Brazilian Portuguese',
'th' => 'Thai',
'zh-CN' => 'Simplified Chinese',
}
system_prompt ||= "Translate the technical document to #{LANGUAGES[@target_language]} without adding any new content."
-
Translate the technical document
: pointing out that we are translating a technical document excerpt so it will know it does not need to translate some elements like code blocks. -
LANGUAGES[@target_language]
: I don't know whether it is a unique problem for Traditional Chinese. Although they're both Chinese words, the terminologies, writing style and intonation of Traditional Chinese in Taiwan are very different from what Simplified Chinese has. I need to specify it more clearly so I can get the desired result. -
without adding any new content.
: It is also important to tell ChatGPT not to add extra information because we're translating an article. Otherwise, it will just be like some annoying students in your classroom, who keep talking and add much needless knowledge.
Markdown parsing
The Rails Guide is full of code blocks for showing code examples. It's reasonable not to send a code block separately. I made the line reader a simple state machine. It will change the state to :codeblock
when it starts parsing a codeblock and it won't call ChatGPT API until it finishes that block.
state = :readline
buffer = []
result = ''
File.readlines(file).each do |line|
if line.include?("` ` `") # I need to add spaces between the backtick(`), or Dev.to will have problem
buffer << line
state = state == :codeblock ? :readline : :codeblock
elsif line == "\n" && state == :readline && buffer.join.split.length > buffer_size
translated_text = ai_translate(buffer.join)[:text]
result += translated_text + "\n"
buffer = []
else
buffer << line
end
end
Anchors
When you open any rails guide's page, you can see there's a Chapters block on the right serving as a table of content.
That table is generated automatically by a script. The titles, such as <h1>, <h2>, <h3>
, etc. will be assigned id
with the title's text. For example, if the title is "Guide Assumption" in the markdown,
### Guide Assumption
it will be rendered as in the final HTML
<h3 id="guide-assumptions">...</h3>
The link in the table of content can then be referred to the elements with that id value.
It works fine in the original Rails Guide. When you click a link in the Chapters, the browser will jump to the corresponding section. However, a problem happens once all titles are translated. After some investigation, I found that it's related to Turbo. I guess it's a Turbo's bug. My current solution is disabling Turbo for the links in the Chapters block.
<ol class="chapters" data-turbo="false">
...
</ol>
Code
Repository: https://github.com/kevinluo201/rails-guide-ai
This repo is forked from the Rails repo so that it can pull the updates of the guide's files. It only has 2 new files:
It only has 2 new files.
-
guides/rails_guides/ai_translator.rb
: it's the main program. -
guides/ai_translate.rb
: it's the starting point
You can do the following steps if you want to play around with it.
- Set a new environment variable call
OPENAI_ACCESS_TOKEN
and set its value to your personal access token on OpenAI. - add a new language in
RailsGuide::AiTranslator
, for example,'jp' => 'Japanese'
- Open the terminal, go to
guides/
and start translating by executing ```bash
ruby ./ai_translate.rb jp
4. You can also translate a single file, just add a filename after the command
```bash
ruby ./ai_translate.rb jp getting_started.md
- After all files are translated, you can just execute the rails existing script to generate HTML, CSS and JS. Unfortunately, it is likely to fail when you do that. Usually, it is because there are duplicated titles which lead to duplicated
id
in the HTML. You can fix it by finding out which title has the problem and can change that title a bit to avoid the problem. It can also have different problems when translating into different languages. Just try solving them so the process can finish.
bundle exec rake guides:generate:html GUIDES_LANGUAGE=jp
Help Wanted
It is just an experimental project now. There are several issues that can be improved. If you think it is an interesting topic, feel free to discuss it with me.
Current Issues
Anchor links
The table of content is solved by disabling Turbo. However, there are anchor links spread among the articles. They cannot be converted to the correct URL smoothly, especially when it refers to an anchor on another page.
Versioning
The Rails Guide has versions. A version is kind of a snapshot of the guide at a particular time. I haven't thought of a good way to manage them.
Different models
I'm now using gpt-3.5-turbo
. I live in Canada so I cannot use Google's Bard
. Feel free to change the code to be able to switch different models, like gpt4
or llamas 2
EPUB
Epub files can be generated by the Rails guide script. However, it has errors when I want to import them into the Epub reader software, such as "Books" on OSX. I think it may related to the broken anchor links.
Other stuff
If you have any ideas that can make this project more sustainable, please discuss it with me. For example, it's a guide for Rails, why not build it as a Rails app?
Conclusion
The quality of AI translation is not perfect but acceptable. I'm not concerned about the quality. As far I can see, the limitation of tokens and the trained model are the most significant factors. I believe this problem will be solved by swapping the current model (gpt-3.5-turbo
) with a more advanced model in the future. The result shows that this workflow really works and that's the most important lesson for me.
About the cost, I have done many experiments for this idea and I translated the Rails Guides into 6 different languages. It costs me about $27 so each version of the translation costs less than $5 on average. The actual price should be less than that because many experiments just failed.
*Due to its good quality and low cost, Generative AI might be a good solution for technical documents of open-source projects. *
Buy me a coffee
At last, if you like what i'm doing, you can buy me a coffee 😉☕️
](https://www.buymeacoffee.com/kevinluo)
Posted on July 30, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.