Building a machine learning model with SQL Server, ML.NET and C#

icebeam7

Luis Beltran

Posted on December 1, 2020

Building a machine learning model with SQL Server, ML.NET and C#

This article is part of The Fourth Annual C# Advent Calendar initiative by Matthew D. Groves. You'll find other helpful articles and tutorials published daily by community members and experts there, so make sure to check it out every day.

ML.NET is a free, open-source, and cross-platform machine learning framework designed for .NET developers. ML.NET lets you re-use all the knowledge, skills, code, and libraries you already have as a .NET developer so that you can easily integrate machine learning into your web, mobile, desktop, games, and IoT apps.

ML.NET

You can apply it to for classification, regression, time series and even computer vision (deep learning, image classification) scenarios with more than 40 trainers (task-based ML algorithms) at your disposal.

From version 1.4-preview onwards, the DatabaseLoader class is supported, which means that now we can train and build models directly against relational databases, including SQL Server, Oracle, PostgreSQL, SQLite, and others.

For this example, I am going to build a model that helps to identify if a woman can develop diabetes based on historical data from other patients. I am using a Kaggle dataset which you can download from here.

Alt Text

After that, create a Patient table to store the information. The only requirement is to use a real data type for numeric fields, as ML.NET will only understand this type. Another option is to perform a CAST operation when you retrieve the data and convert the fields to real on the fly.



CREATE TABLE Patient(
  Id int identity(1,1) primary key,
  Pregnancies real not null,
  Glucose real not null,
  BloodPressure real not null,
  SkinThickness real not null,
  Insulin real not null,
  BMI real not null,
  DiabetesPedigreeFunction real not null,
  Age real not null,
  Output varchar(1) not null
)


Enter fullscreen mode Exit fullscreen mode

And of course, you need to insert all the data from the csv file into the table.

Now, let's write some code!

Step 1. Create a new C# Console application project:

Alt Text

Step 2. Add the following Nuget packages to your project:

  • Microsoft.ML
  • System.Data.SqlClient
  • Microsoft.Extensions.Configuration
  • Microsoft.Extensions.Configuration.Json
  • Microsoft.Extensions.Configuration.FileExtensions

Alt Text

Step 3. Add an App Settings File to your project.

Alt Text

In this file, add a ConnectionStrings collection with a DbConnection element. The value, of course, is the connection string to the database where your data resides.

For instance, I will connect to an Azure SQL database:



{
  "Logging": {
    "LogLevel": {
      "Default": "Warning"
    }
  },
  "ConnectionStrings": {
    "DbConnection": "Server=tcp:myserver.database.windows.net,1433;Initial Catalog=mydatabase;Persist Security Info=False;User ID=myadmin;Password=MYadm1n;MultipleActiveResultSets=False;Encrypt=True;TrustServerCertificate=False;Connection Timeout=30;"
  }
}


Enter fullscreen mode Exit fullscreen mode

NOTE: Set the Copy to Output directory property for this file, otherwise it won't be read by the program later.

Step 4. Add a Models folder to your project. Inside, create a new class named Patient, which includes several properties that match the Table structure. Also, each property is decorated with the LoadColumnAttribute with an zero-based index that represents the column that will be mapped from the database table.



using Microsoft.ML.Data;

namespace DiabetesPrediction.Models
{
    public class Patient
    {
        [LoadColumn(0)]
        public float Id { get; set; }

        [LoadColumn(1)]
        public float Pregnancies { get; set; }

        [LoadColumn(2)]
        public float Glucose { get; set; }

        [LoadColumn(3)]
        public float BloodPressure { get; set; }

        [LoadColumn(4)]
        public float SkinThickness { get; set; }

        [LoadColumn(5)]
        public float Insulin { get; set; }

        [LoadColumn(6)]
        public float BMI { get; set; }

        [LoadColumn(7)]
        public float DiabetesPedigreeFunction { get; set; }

        [LoadColumn(8)]
        public float Age { get; set; }

        [LoadColumn(9)]
        public float Output { get; set; }
    }
}



Enter fullscreen mode Exit fullscreen mode

Step 5. Add a DiabetesMLPrediction class that inherits from Patient and includes additional properties. This will be used after the machine learning model is built, to show predicted data:



using Microsoft.ML.Data;

namespace DiabetesPrediction.Models
{
    public class DiabetesMLPrediction : Patient
    {
        [ColumnName("PredictedLabel")]
        public float Prediction { get; set; }

        public float Probability { get; set; }

        public float[] Score { get; set; }
    }
}


Enter fullscreen mode Exit fullscreen mode

Step 6. In the Program.cs file:

a. Add these namespaces:



using System;
using System.IO;
using System.Linq;
using System.Data.SqlClient;

using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.Extensions.Configuration;

using DiabetesPrediction.Models;


Enter fullscreen mode Exit fullscreen mode

b. Inside the class, add a GetDbConnection method that extracts the connection string from the appsettings.json file:



private static string GetDbConnection()
{
   var builder = new ConfigurationBuilder()
      .SetBasePath(Directory.GetCurrentDirectory())
      .AddJsonFile("appsettings.json", optional: true, reloadOnChange: true);

   return builder.Build().GetConnectionString("DbConnection");
}


Enter fullscreen mode Exit fullscreen mode

c. In the Main method:

  • Create an MLContext instance
  • Create a DatabaseLoader instance based on the Patient class
  • Call the GetDbConnection method
  • Prepare a SQL statement that reads all data (and converts the id into a real field)
  • Prepare a DatabaseSource instance that uses the connection string and statement.


var context = new MLContext();

var loader = context.Data.CreateDatabaseLoader<Patient>();

var connectionString = GetDbConnection();

var sqlCommand = "Select CAST(Id as REAL) as Id, Pregnancies, Glucose, BloodPressure, SkinThickness, Insulin, BMI, DiabetesPedigreeFunction, Age, CAST(Output as REAL) as Output From Patient";

var dbSource = new DatabaseSource(SqlClientFactory.Instance, connectionString, sqlCommand);


Enter fullscreen mode Exit fullscreen mode
  • Load the data from the table into a IDataView object and split it into two other IDataViews, one for training and another one for evaluation:


Console.WriteLine("Loading data from database...");
var data = loader.Load(dbSource);

var set = context.Data.TrainTestSplit(data, testFraction: 0.2);
var trainingData = set.TrainSet;
var testData = set.TestSet;


Enter fullscreen mode Exit fullscreen mode
  • Create an ITransformer by preparing a training pipeline that will build a BinaryClassification machine learning model. Specify the column that will be predicted (Output):


Console.WriteLine("Preparing training operations...");
var pipeline = context.Transforms
       .Conversion.MapValueToKey(outputColumnName: "Label", inputColumnName: "Output")
       .Append(context.Transforms.Concatenate("Features", "Pregnancies", "Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI", "DiabetesPedigreeFunction", "Age"))
       .Append(context.MulticlassClassification.Trainers.OneVersusAll(context.BinaryClassification.Trainers.AveragedPerceptron("Label", "Features", numberOfIterations: 10))
       .Append(context.Transforms.Conversion.MapKeyToValue("PredictedLabel")));


Enter fullscreen mode Exit fullscreen mode
  • Now, divide the training dataset into 10 folds. 9 folds are used in training and the remaining fold is used for testing. This process is repeated 10 times changing the train and test datasets. This process is known as 10-fold cross validation (of course, you can change the number). Metrics are displayed as well:


Console.WriteLine("=============== Starting 10 fold cross validation ===============");
var crossValResults = context.MulticlassClassification.CrossValidate(data: trainingData, estimator: pipeline, numberOfFolds: 10, labelColumnName: "Label");
var metricsInMultipleFolds = crossValResults.Select(r => r.Metrics);
var microAccuracyValues = metricsInMultipleFolds.Select(m => m.MicroAccuracy);
var microAccuracyAverage = microAccuracyValues.Average();
var macroAccuracyValues = metricsInMultipleFolds.Select(m => m.MacroAccuracy);
var macroAccuracyAverage = macroAccuracyValues.Average();
var logLossValues = metricsInMultipleFolds.Select(m => m.LogLoss);
var logLossAverage = logLossValues.Average();
var logLossReductionValues = metricsInMultipleFolds.Select(m => m.LogLossReduction);
var logLossReductionAverage = logLossReductionValues.Average(); Console.WriteLine($"*************************************************************************************************************");

Console.WriteLine($"*       Metrics Multi-class Classification model      ");
Console.WriteLine($"*------------------------------------------------------------------------------------------------------------");
Console.WriteLine($"*       Average MicroAccuracy:   {microAccuracyAverage:0.###} ");
Console.WriteLine($"*       Average MacroAccuracy:    {macroAccuracyAverage:0.###} ");
Console.WriteLine($"*       Average LogLoss:          {logLossAverage:#.###} ");
Console.WriteLine($"*       Average LogLossReduction: {logLossReductionAverage:#.###} ");
Console.WriteLine($"*************************************************************************************************************");



Enter fullscreen mode Exit fullscreen mode
  • Next up, you can train the model by calling the Fit method:


Console.WriteLine($"Training process is starting. {DateTime.Now.ToLongTimeString()}");
var model = pipeline.Fit(trainingData);
Console.WriteLine($"Training process has finished. {DateTime.Now.ToLongTimeString()}");


Enter fullscreen mode Exit fullscreen mode

This process takes some time.

  • After the model is created, you can start making predictions by building a PredictionEngine and passing a Patient object to the Predict method:


var predictionEngine = context.Model.CreatePredictionEngine<Patient, DiabetesMLPrediction>(model);

var patient = new Patient()
{
  Age = 42,
  BloodPressure = 81,
  BMI = 30.1f,
  DiabetesPedigreeFunction = 0.987f,
  Glucose = 120,
  Insulin = 100,
  Pregnancies = 1,
  SkinThickness = 26,
  Id = 0,
  Output = 0
};

var prediction = predictionEngine.Predict(patient);
Console.WriteLine($"Diabetes? {prediction.Output} | Prediction: {(Convert.ToBoolean(prediction.Prediction) ? "Yes" : "No")} | Probability: {prediction.Probability} ");



Enter fullscreen mode Exit fullscreen mode
  • Finally, you can save the model to use it in other projects (Web Api, Azure Functions, etc.)


Console.WriteLine("Saving the model");
context.Model.Save(model, trainingData.Schema, "MLModel.zip");


Enter fullscreen mode Exit fullscreen mode

Step 7. Run the program, you'll get the results and an ML Model ready for some predictions:

Alt Text

Alt Text

The code is available on GitHub.

I hope that this blog post was interesting and useful for you. I invite you to visit my blog for more technical posts about Xamarin, Azure, and the .NET ecosystem. I write in Spanish language =)

Thanks for your time, and enjoy the rest of the C# Advent Calendar publications!

See you next time,
Luis

💖 💪 🙅 🚩
icebeam7
Luis Beltran

Posted on December 1, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related