Plug & Play Machine Learning Models in GoLang
Michael Taylor
Posted on March 12, 2018
Problem
No one model works best for all possible situations. - No Free Lunch. DH Wolpert.
The Solution
Unit testable, dependency injectable, and backtestable models in under 200 lines of code.
Dependency Injection
Why? Dependency injection allows us to easily hot swap/inject both models and classifiers within our models in our machine learning pipeline.
Let's take a look at an end result example:
func main() {
// SpamHamModel with a Naive Bayes Classifier plugged in
spamNb := SpamHamModel{classifier: &NBClassifier{}}
// Exact Same SpamHamModel with a SVM Classifier plugged in
spamSVM := SpamHamModel{classifier: &SVMClassifier{}}
// Exact Same SpamHamModel with a Neural Network Classifier plugged in
spamNN := SpamHamModel{classifier: &NNClassifier{}}
}
Thank you for your time thus far... Let's expand on this.
Data structures and interfaces
// Step 1 - Define the Structure of Input Data
// You got mail!
type Email struct {
Author string
Body string
Flag string //Spam/Ham
}
// Step 2 - Define a ML Classifier Contract
// Binary Classifier Interface - Examples SVM, NN, NB
type Classifier interface {
Learn(emails []Email)
Predict(email Email) string
}
// Step 3 - Create a Model with a "Plug & Play" Classifer Field
// You got mail! - Is it Spam or Ham? (Model Example)
type SpamHamModel struct {
Classifier Classifier
}
func (model *SpamHamModel) Learn(emails []Email) {
model.Classifier.Learn(emails)
}
func (model *SpamHamModel) Predict(email Email) string {
return model.Classifier.Predict(email)
}
Naive Bayes Classifier
// Step 4 - Implement Classifier(s). Per the Contracts Terms.
// You got mail! - Is it Spam or Ham? (Model's Brain/Classifier)
type NBClassifer struct {
classifier *bayesian.Classifier
output []bayesian.Class
}
func (c *NBClassifier) Learn(emails []models.Email) {
c.output = distinctFlags(emails)
c.classifier = bayesian.NewClassifierTfIdf(c.output...)
for i := 0; i < len(emails); i++ {
c.classifier.Learn(strings.Split(emails[i].Body, " "), bayesian.Class(emails[i].Flag))
}
c.classifier.ConvertTermsFreqToTfIdf()
}
func (c *NBClassifier) Predict(email models.Email) string {
scores, _, _ := c.classifier.LogScores(strings.Split(email.Body, " "))
results := models.Results{}
for i := 0; i < len(scores); i++ {
results = append(results, models.Result{ID: i, Score: scores[i]})
}
sort.Sort(sort.Reverse(results))
flags := []string{}
for i := 0; i < len(results); i++ {
flags = append(flags, string(c.output[results[i].ID]))
}
return flags[0]
}
func distinctFlags(emails []models.Email) []bayesian.Class {
result := []bayesian.Class{}
j := 0
for i := 0; i < len(emails); i++ {
for j = 0; j < len(result); j++ {
if emails[i].Flag == string(result[j]) {
break
}
}
if j == len(result) {
result = append(result, bayesian.Class(emails[i].Flag))
}
}
return result
}
Unit Testing
Using interfaces like this for crucial production code pieces allows for easier adherence to development approaches like TDD.
Example:
func CreateTrainingEmails() []models.Email {
return []models.Email{
models.Email{Body: "opportunity to earn extra money", Flag: "Spam"},
models.Email{Body: "druggists blame classy gentry Aladdin", Flag: "Spam"},
models.Email{Body: "please take a look at this report", Flag: "Ham"},
models.Email{Body: "lunch at noon?", Flag: "Ham"},
}
}
func CreateValidationEmails() []models.Email {
return []models.Email{
models.Email{Body: "opportunity to earn extra money", Flag: "Spam"},
models.Email{Body: "druggists blame classy gentry Aladdin", Flag: "Spam"},
models.Email{Body: "please take a look at this report", Flag: "Ham"},
models.Email{Body: "lunch at noon?", Flag: "Ham"},
}
}
func TestLearn(t *testing.T) {
nbModel := models.SpamHamModel{Classifier: &NBClassifier{}}
trainingSet := CreateTrainingEmails()
validationSet := CreateValidationEmails()
nbModel.Learn(trainingSet)
for i := 0; i < len(validationSet); i++ {
input := validationSet[i].Body
expected := validationSet[i].Flag
actual := nbModel.Predict(validationSet[i])
Assert(t, expected, actual, input)
}
}
func Assert(t *testing.T, expected string, actual string, input string) {
if actual != expected {
t.Error(
"\nFOR: ", input,
"\nEXPECTED: ", expected,
"\nACTUAL: ", actual,
)
}
}
Backtesting(Stay tuned for part two...)
The code is also available on github (if you want to test it locally): https://github.com/heupr/resources/tree/master/plugnplay
💖 💪 🙅 🚩
Michael Taylor
Posted on March 12, 2018
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.