Building a machine learning model with SQL Server, ML.NET and C#
Luis Beltran
Posted on December 1, 2020
This article is part of The Fourth Annual C# Advent Calendar initiative by Matthew D. Groves. You'll find other helpful articles and tutorials published daily by community members and experts there, so make sure to check it out every day.
ML.NET is a free, open-source, and cross-platform machine learning framework designed for .NET developers. ML.NET lets you re-use all the knowledge, skills, code, and libraries you already have as a .NET developer so that you can easily integrate machine learning into your web, mobile, desktop, games, and IoT apps.
You can apply it to for classification, regression, time series and even computer vision (deep learning, image classification) scenarios with more than 40 trainers (task-based ML algorithms) at your disposal.
From version 1.4-preview onwards, the DatabaseLoader class is supported, which means that now we can train and build models directly against relational databases, including SQL Server, Oracle, PostgreSQL, SQLite, and others.
For this example, I am going to build a model that helps to identify if a woman can develop diabetes based on historical data from other patients. I am using a Kaggle dataset which you can download from here.
After that, create a Patient table to store the information. The only requirement is to use a real data type for numeric fields, as ML.NET will only understand this type. Another option is to perform a CAST operation when you retrieve the data and convert the fields to real on the fly.
CREATE TABLE Patient(
Id int identity(1,1) primary key,
Pregnancies real not null,
Glucose real not null,
BloodPressure real not null,
SkinThickness real not null,
Insulin real not null,
BMI real not null,
DiabetesPedigreeFunction real not null,
Age real not null,
Output varchar(1) not null
)
And of course, you need to insert all the data from the csv file into the table.
Now, let's write some code!
Step 1. Create a new C# Console application project:
Step 2. Add the following Nuget packages to your project:
- Microsoft.ML
- System.Data.SqlClient
- Microsoft.Extensions.Configuration
- Microsoft.Extensions.Configuration.Json
- Microsoft.Extensions.Configuration.FileExtensions
Step 3. Add an App Settings File to your project.
In this file, add a ConnectionStrings collection with a DbConnection element. The value, of course, is the connection string to the database where your data resides.
For instance, I will connect to an Azure SQL database:
{
"Logging": {
"LogLevel": {
"Default": "Warning"
}
},
"ConnectionStrings": {
"DbConnection": "Server=tcp:myserver.database.windows.net,1433;Initial Catalog=mydatabase;Persist Security Info=False;User ID=myadmin;Password=MYadm1n;MultipleActiveResultSets=False;Encrypt=True;TrustServerCertificate=False;Connection Timeout=30;"
}
}
NOTE: Set the Copy to Output directory property for this file, otherwise it won't be read by the program later.
Step 4. Add a Models folder to your project. Inside, create a new class named Patient, which includes several properties that match the Table structure. Also, each property is decorated with the LoadColumnAttribute with an zero-based index that represents the column that will be mapped from the database table.
using Microsoft.ML.Data;
namespace DiabetesPrediction.Models
{
public class Patient
{
[LoadColumn(0)]
public float Id { get; set; }
[LoadColumn(1)]
public float Pregnancies { get; set; }
[LoadColumn(2)]
public float Glucose { get; set; }
[LoadColumn(3)]
public float BloodPressure { get; set; }
[LoadColumn(4)]
public float SkinThickness { get; set; }
[LoadColumn(5)]
public float Insulin { get; set; }
[LoadColumn(6)]
public float BMI { get; set; }
[LoadColumn(7)]
public float DiabetesPedigreeFunction { get; set; }
[LoadColumn(8)]
public float Age { get; set; }
[LoadColumn(9)]
public float Output { get; set; }
}
}
Step 5. Add a DiabetesMLPrediction class that inherits from Patient and includes additional properties. This will be used after the machine learning model is built, to show predicted data:
using Microsoft.ML.Data;
namespace DiabetesPrediction.Models
{
public class DiabetesMLPrediction : Patient
{
[ColumnName("PredictedLabel")]
public float Prediction { get; set; }
public float Probability { get; set; }
public float[] Score { get; set; }
}
}
Step 6. In the Program.cs file:
a. Add these namespaces:
using System;
using System.IO;
using System.Linq;
using System.Data.SqlClient;
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.Extensions.Configuration;
using DiabetesPrediction.Models;
b. Inside the class, add a GetDbConnection method that extracts the connection string from the appsettings.json file:
private static string GetDbConnection()
{
var builder = new ConfigurationBuilder()
.SetBasePath(Directory.GetCurrentDirectory())
.AddJsonFile("appsettings.json", optional: true, reloadOnChange: true);
return builder.Build().GetConnectionString("DbConnection");
}
c. In the Main method:
- Create an MLContext instance
- Create a DatabaseLoader instance based on the Patient class
- Call the GetDbConnection method
- Prepare a SQL statement that reads all data (and converts the id into a real field)
- Prepare a DatabaseSource instance that uses the connection string and statement.
var context = new MLContext();
var loader = context.Data.CreateDatabaseLoader<Patient>();
var connectionString = GetDbConnection();
var sqlCommand = "Select CAST(Id as REAL) as Id, Pregnancies, Glucose, BloodPressure, SkinThickness, Insulin, BMI, DiabetesPedigreeFunction, Age, CAST(Output as REAL) as Output From Patient";
var dbSource = new DatabaseSource(SqlClientFactory.Instance, connectionString, sqlCommand);
- Load the data from the table into a IDataView object and split it into two other IDataViews, one for training and another one for evaluation:
Console.WriteLine("Loading data from database...");
var data = loader.Load(dbSource);
var set = context.Data.TrainTestSplit(data, testFraction: 0.2);
var trainingData = set.TrainSet;
var testData = set.TestSet;
- Create an ITransformer by preparing a training pipeline that will build a BinaryClassification machine learning model. Specify the column that will be predicted (Output):
Console.WriteLine("Preparing training operations...");
var pipeline = context.Transforms
.Conversion.MapValueToKey(outputColumnName: "Label", inputColumnName: "Output")
.Append(context.Transforms.Concatenate("Features", "Pregnancies", "Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI", "DiabetesPedigreeFunction", "Age"))
.Append(context.MulticlassClassification.Trainers.OneVersusAll(context.BinaryClassification.Trainers.AveragedPerceptron("Label", "Features", numberOfIterations: 10))
.Append(context.Transforms.Conversion.MapKeyToValue("PredictedLabel")));
- Now, divide the training dataset into 10 folds. 9 folds are used in training and the remaining fold is used for testing. This process is repeated 10 times changing the train and test datasets. This process is known as 10-fold cross validation (of course, you can change the number). Metrics are displayed as well:
Console.WriteLine("=============== Starting 10 fold cross validation ===============");
var crossValResults = context.MulticlassClassification.CrossValidate(data: trainingData, estimator: pipeline, numberOfFolds: 10, labelColumnName: "Label");
var metricsInMultipleFolds = crossValResults.Select(r => r.Metrics);
var microAccuracyValues = metricsInMultipleFolds.Select(m => m.MicroAccuracy);
var microAccuracyAverage = microAccuracyValues.Average();
var macroAccuracyValues = metricsInMultipleFolds.Select(m => m.MacroAccuracy);
var macroAccuracyAverage = macroAccuracyValues.Average();
var logLossValues = metricsInMultipleFolds.Select(m => m.LogLoss);
var logLossAverage = logLossValues.Average();
var logLossReductionValues = metricsInMultipleFolds.Select(m => m.LogLossReduction);
var logLossReductionAverage = logLossReductionValues.Average(); Console.WriteLine($"*************************************************************************************************************");
Console.WriteLine($"* Metrics Multi-class Classification model ");
Console.WriteLine($"*------------------------------------------------------------------------------------------------------------");
Console.WriteLine($"* Average MicroAccuracy: {microAccuracyAverage:0.###} ");
Console.WriteLine($"* Average MacroAccuracy: {macroAccuracyAverage:0.###} ");
Console.WriteLine($"* Average LogLoss: {logLossAverage:#.###} ");
Console.WriteLine($"* Average LogLossReduction: {logLossReductionAverage:#.###} ");
Console.WriteLine($"*************************************************************************************************************");
- Next up, you can train the model by calling the Fit method:
Console.WriteLine($"Training process is starting. {DateTime.Now.ToLongTimeString()}");
var model = pipeline.Fit(trainingData);
Console.WriteLine($"Training process has finished. {DateTime.Now.ToLongTimeString()}");
This process takes some time.
- After the model is created, you can start making predictions by building a PredictionEngine and passing a Patient object to the Predict method:
var predictionEngine = context.Model.CreatePredictionEngine<Patient, DiabetesMLPrediction>(model);
var patient = new Patient()
{
Age = 42,
BloodPressure = 81,
BMI = 30.1f,
DiabetesPedigreeFunction = 0.987f,
Glucose = 120,
Insulin = 100,
Pregnancies = 1,
SkinThickness = 26,
Id = 0,
Output = 0
};
var prediction = predictionEngine.Predict(patient);
Console.WriteLine($"Diabetes? {prediction.Output} | Prediction: {(Convert.ToBoolean(prediction.Prediction) ? "Yes" : "No")} | Probability: {prediction.Probability} ");
- Finally, you can save the model to use it in other projects (Web Api, Azure Functions, etc.)
Console.WriteLine("Saving the model");
context.Model.Save(model, trainingData.Schema, "MLModel.zip");
Step 7. Run the program, you'll get the results and an ML Model ready for some predictions:
The code is available on GitHub.
I hope that this blog post was interesting and useful for you. I invite you to visit my blog for more technical posts about Xamarin, Azure, and the .NET ecosystem. I write in Spanish language =)
Thanks for your time, and enjoy the rest of the C# Advent Calendar publications!
See you next time,
Luis
Posted on December 1, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.