Implement Language Detection — Thought and Practice

jacksssson

Jackson

Posted on June 8, 2022

Implement Language Detection — Thought and Practice

Background

Quick question: How many languages are there in the world? Before you rush off to search for the answer, read on.

There are over 7000 languages — astonishing, right? Such diversity highlights the importance of translation, which is valuable to us on so many levels because it opens us up to a rich range of cultures. Psycholinguist Frank Smith said that, "one language sets you in a corridor for life. Two languages open every door along the way."

These days, it is very easy for someone to pick up their phone, download a translation app, and start communicating in another language without having a sound understanding of it. It has taken away the need to really master a foreign language. AI technologies such as natural language processing (NLP) not only simplify translation, but also open up opportunities for people to learn and use a foreign language.

Modern translation apps are capable of translating text into another language in just a tap. That's not to say that developing translation at a tap is as easy as it sounds. An integral and initial step of it is language detection, which tells the software what the language is.

Below is a walkthrough of how I implemented language detection for my demo app, using this service from HMS Core ML Kit. It automatically detects the language of input text, and then returns all the codes and the confidence levels of the detected languages, or returns only the code of the language with the highest confidence level. This is ideal for creating a translation app.

Implementation Procedure

Preparations

1) Configure the Maven repository address.

repositories {
    maven { 
url'https://cmc.centralrepo.rnd.huawei.com/artifactory/product_maven/' }
}
Enter fullscreen mode Exit fullscreen mode

2) Integrate the SDK of the language detection capability.

dependencies{
    implementation 'com.huawei.hms:ml-computer-language-detection:3.4.0.301'
}
Enter fullscreen mode Exit fullscreen mode

Project Configuration

1) Set the app authentication information by setting either an access token or an API key.

  • Call the setAccessToken method to set an access token. Note that this needs to be set only once during app initialization.
MLApplication.getInstance().setAccessToken("your access token");
Enter fullscreen mode Exit fullscreen mode
  • Or, call the setApiKey method to set an API key, which is also required only once during app initialization.
MLApplication.getInstance().setApiKey("your ApiKey");
Enter fullscreen mode Exit fullscreen mode

2) Create a language detector using either of these two methods.

// Method 1: Use the default parameter settings.
MLRemoteLangDetector mlRemoteLangDetector = MLLangDetectorFactory.getInstance()
    .getRemoteLangDetector();
// Method 2: Use the customized parameter settings.
MLRemoteLangDetectorSetting setting = new MLRemoteLangDetectorSetting.Factory()
    // Set the minimum confidence level for language detection.
    .setTrustedThreshold(0.01f)
    .create();
MLRemoteLangDetector mlRemoteLangDetector = MLLangDetectorFactory.getInstance()
    .getRemoteLangDetector(setting);
Enter fullscreen mode Exit fullscreen mode

3) Detect the text language.

  • Asynchronous method
// Method 1: Return detection results that contain language codes and confidence levels of multiple languages. In the code, **sourceText** indicates the text of which the language is to be detected. The maximum character count of the text is 5000.
Task<List<MLDetectedLang>> probabilityDetectTask = mlRemoteLangDetector.probabilityDetect(sourceText);
probabilityDetectTask.addOnSuccessListener(new OnSuccessListener<List<MLDetectedLang>>() {
    @Override
    public void onSuccess(List<MLDetectedLang> result) {
        // Callback when the detection is successful.
    }
}).addOnFailureListener(new OnFailureListener() {
    @Override
    public void onFailure(Exception e) {
        // Callback when the detection failed.
        try {
            MLException mlException = (MLException)e;
            // Result code for the failure. The result code can be customized with different popups on the UI.
            int errorCode = mlException.getErrCode();
            // Description for the failure. Used together with the result code, the description facilitates troubleshooting.
            String errorMessage = mlException.getMessage();
        } catch (Exception error) {
           // Handle the conversion error.
        }
    }
});
// Method 2: Return only the code of the language with the highest confidence level. In the code, **sourceText** indicates the text of which the language is to be detected. The maximum character count of the text is 5000.
Task<String> firstBestDetectTask = mlRemoteLangDetector.firstBestDetect(sourceText);
firstBestDetectTask.addOnSuccessListener(new OnSuccessListener<String>() {
    @Override
    public void onSuccess(String s) {
        // Callback when the detection is successful.
    }
}).addOnFailureListener(new OnFailureListener() {
    @Override
    public void onFailure(Exception e) {
        // Callback when the detection failed.
        try {
            MLException mlException = (MLException)e;
            // Result code for the failure. The result code can be customized with different popups on the UI.
            int errorCode = mlException.getErrCode();
            // Description for the failure. Used together with the result code, the description facilitates troubleshooting.
            String errorMessage = mlException.getMessage();
        } catch (Exception error) {
            // Handle the conversion error.
        }
    }
});
Enter fullscreen mode Exit fullscreen mode
  • Synchronous method
// Method 1: Return detection results that contain language codes and confidence levels of multiple languages. In the code, sourceText indicates the text of which the language is to be detected. The maximum character count of the text is 5000.
try {
    List<MLDetectedLang> result= mlRemoteLangDetector.syncProbabilityDetect(sourceText);
} catch (MLException mlException) {
    // Callback when the detection failed.
    // Result code for the failure. The result code can be customized with different popups on the UI.
    int errorCode = mlException.getErrCode();
    // Description for the failure. Used together with the result code, the description facilitates troubleshooting.
    String errorMessage = mlException.getMessage();
}
// Method 2: Return only the code of the language with the highest confidence level. In the code, sourceText indicates the text of which the language is to be detected. The maximum character count of the text is 5000.
try {
    String language = mlRemoteLangDetector.syncFirstBestDetect(sourceText);
} catch (MLException mlException) {
    // Callback when the detection failed.
    // Result code for the failure. The result code can be customized with different popups on the UI.
    int errorCode = mlException.getErrCode();
    // Description for the failure. Used together with the result code, the description facilitates troubleshooting.
    String errorMessage = mlException.getMessage();
}
Enter fullscreen mode Exit fullscreen mode

4) Stop the language detector when the detection is complete, to release the resources occupied by the detector.

if (mlRemoteLangDetector != null) {
    mlRemoteLangDetector.stop();
}
Enter fullscreen mode Exit fullscreen mode

And once you've done this, your app will have implemented the language detection function, which works as shown in the demo below.
Image description

Conclusion

Translation apps are vital to helping people communicate across cultures, and play an important role in all aspects of our life, from study to business, and particularly travel. Without such apps, communication across different languages would be limited to people who have become proficient in another language.
In order to translate text for users, a translation app must first be able to identify the language of text. One way of doing this is to integrate a language detection service, which detects the language — or languages — of text and then returns either all language codes and their confidence levels or the code of the language with the highest confidence level. This capability improves the efficiency of such apps to build user confidence in translations offered by translation apps.

💖 💪 🙅 🚩
jacksssson
Jackson

Posted on June 8, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related