Implement Language Detection — Thought and Practice
Jackson
Posted on June 8, 2022
Background
Quick question: How many languages are there in the world? Before you rush off to search for the answer, read on.
There are over 7000 languages — astonishing, right? Such diversity highlights the importance of translation, which is valuable to us on so many levels because it opens us up to a rich range of cultures. Psycholinguist Frank Smith said that, "one language sets you in a corridor for life. Two languages open every door along the way."
These days, it is very easy for someone to pick up their phone, download a translation app, and start communicating in another language without having a sound understanding of it. It has taken away the need to really master a foreign language. AI technologies such as natural language processing (NLP) not only simplify translation, but also open up opportunities for people to learn and use a foreign language.
Modern translation apps are capable of translating text into another language in just a tap. That's not to say that developing translation at a tap is as easy as it sounds. An integral and initial step of it is language detection, which tells the software what the language is.
Below is a walkthrough of how I implemented language detection for my demo app, using this service from HMS Core ML Kit. It automatically detects the language of input text, and then returns all the codes and the confidence levels of the detected languages, or returns only the code of the language with the highest confidence level. This is ideal for creating a translation app.
Implementation Procedure
Preparations
1) Configure the Maven repository address.
repositories {
maven {
url'https://cmc.centralrepo.rnd.huawei.com/artifactory/product_maven/' }
}
2) Integrate the SDK of the language detection capability.
dependencies{
implementation 'com.huawei.hms:ml-computer-language-detection:3.4.0.301'
}
Project Configuration
1) Set the app authentication information by setting either an access token or an API key.
- Call the setAccessToken method to set an access token. Note that this needs to be set only once during app initialization.
MLApplication.getInstance().setAccessToken("your access token");
- Or, call the setApiKey method to set an API key, which is also required only once during app initialization.
MLApplication.getInstance().setApiKey("your ApiKey");
2) Create a language detector using either of these two methods.
// Method 1: Use the default parameter settings.
MLRemoteLangDetector mlRemoteLangDetector = MLLangDetectorFactory.getInstance()
.getRemoteLangDetector();
// Method 2: Use the customized parameter settings.
MLRemoteLangDetectorSetting setting = new MLRemoteLangDetectorSetting.Factory()
// Set the minimum confidence level for language detection.
.setTrustedThreshold(0.01f)
.create();
MLRemoteLangDetector mlRemoteLangDetector = MLLangDetectorFactory.getInstance()
.getRemoteLangDetector(setting);
3) Detect the text language.
- Asynchronous method
// Method 1: Return detection results that contain language codes and confidence levels of multiple languages. In the code, **sourceText** indicates the text of which the language is to be detected. The maximum character count of the text is 5000.
Task<List<MLDetectedLang>> probabilityDetectTask = mlRemoteLangDetector.probabilityDetect(sourceText);
probabilityDetectTask.addOnSuccessListener(new OnSuccessListener<List<MLDetectedLang>>() {
@Override
public void onSuccess(List<MLDetectedLang> result) {
// Callback when the detection is successful.
}
}).addOnFailureListener(new OnFailureListener() {
@Override
public void onFailure(Exception e) {
// Callback when the detection failed.
try {
MLException mlException = (MLException)e;
// Result code for the failure. The result code can be customized with different popups on the UI.
int errorCode = mlException.getErrCode();
// Description for the failure. Used together with the result code, the description facilitates troubleshooting.
String errorMessage = mlException.getMessage();
} catch (Exception error) {
// Handle the conversion error.
}
}
});
// Method 2: Return only the code of the language with the highest confidence level. In the code, **sourceText** indicates the text of which the language is to be detected. The maximum character count of the text is 5000.
Task<String> firstBestDetectTask = mlRemoteLangDetector.firstBestDetect(sourceText);
firstBestDetectTask.addOnSuccessListener(new OnSuccessListener<String>() {
@Override
public void onSuccess(String s) {
// Callback when the detection is successful.
}
}).addOnFailureListener(new OnFailureListener() {
@Override
public void onFailure(Exception e) {
// Callback when the detection failed.
try {
MLException mlException = (MLException)e;
// Result code for the failure. The result code can be customized with different popups on the UI.
int errorCode = mlException.getErrCode();
// Description for the failure. Used together with the result code, the description facilitates troubleshooting.
String errorMessage = mlException.getMessage();
} catch (Exception error) {
// Handle the conversion error.
}
}
});
- Synchronous method
// Method 1: Return detection results that contain language codes and confidence levels of multiple languages. In the code, sourceText indicates the text of which the language is to be detected. The maximum character count of the text is 5000.
try {
List<MLDetectedLang> result= mlRemoteLangDetector.syncProbabilityDetect(sourceText);
} catch (MLException mlException) {
// Callback when the detection failed.
// Result code for the failure. The result code can be customized with different popups on the UI.
int errorCode = mlException.getErrCode();
// Description for the failure. Used together with the result code, the description facilitates troubleshooting.
String errorMessage = mlException.getMessage();
}
// Method 2: Return only the code of the language with the highest confidence level. In the code, sourceText indicates the text of which the language is to be detected. The maximum character count of the text is 5000.
try {
String language = mlRemoteLangDetector.syncFirstBestDetect(sourceText);
} catch (MLException mlException) {
// Callback when the detection failed.
// Result code for the failure. The result code can be customized with different popups on the UI.
int errorCode = mlException.getErrCode();
// Description for the failure. Used together with the result code, the description facilitates troubleshooting.
String errorMessage = mlException.getMessage();
}
4) Stop the language detector when the detection is complete, to release the resources occupied by the detector.
if (mlRemoteLangDetector != null) {
mlRemoteLangDetector.stop();
}
And once you've done this, your app will have implemented the language detection function, which works as shown in the demo below.
Conclusion
Translation apps are vital to helping people communicate across cultures, and play an important role in all aspects of our life, from study to business, and particularly travel. Without such apps, communication across different languages would be limited to people who have become proficient in another language.
In order to translate text for users, a translation app must first be able to identify the language of text. One way of doing this is to integrate a language detection service, which detects the language — or languages — of text and then returns either all language codes and their confidence levels or the code of the language with the highest confidence level. This capability improves the efficiency of such apps to build user confidence in translations offered by translation apps.
Posted on June 8, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.