lukaszkuczynski
Posted on April 16, 2018
Intro
I have been learning Chinese since 2016. Unfortunately it goes slowly but from time to time I receive some task to prepare. Then I need to cover some material in that language. Then text is written in my language (Polish) or English and in Chinese characters. I can barely recognize 20 or 50 of them so Pinyin is a must. And because on a daily basis I am using Python I thought: let us see how to use it to automate my work.
Python is great
It has tools for everything. It took me only minutes to find nice libraries for segmentation, group characters into words (the leader is jieba
) and then to use pinyin transliteration (xpinyin
is one of the many examples). This is how easy it is to be done in Python
import jieba
from xpinyin import Pinyin
sentence = "我想说更好的中文,但很难,因为我是波兰人"
print(sentence)
segments = jieba.cut(sentence)
output = " ".join(segments)
print(output)
p = Pinyin()
pinyined = p.get_pinyin(output, splitter='', show_tone_marks=True)
print(pinyined)
It will produce output:
我想说更好的中文,但很难,因为我是波兰人
Building prefix dict from the default dictionary ...
Dumping model to file cache /tmp/jieba.cache
Loading model cost 0.899 seconds.
Prefix dict has been built succesfully.
我 想 说 更好 的 中文 , 但 很 难 , 因为 我 是 波兰人
wǒ xiǎng shuō gènghǎo de zhōngwén , dàn hěn nán , yīnwèi wǒ shì bōlánrén
Cool isn't it?
Now I can use it to document processing, thus making my work much faster now.
Posted on April 16, 2018
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.