Learn Python by building investment AI for fintech - Lesson3: How to build and train AI

This article was originally published at:
https://www.blog.duomly.com/python-course-with-building-a-fintech-investment-ai-lesson-3-how-to-build-and-train-ai

Intro

In today's episode of the Python with AI course, we will learn how to build AI and how to train AI.

Last episodes you can find here:

Lesson 2 - How to find financial data and how to use Pandas:

Python course with building a fintech investment AI – Lesson 2: Pandas and getting financial data

Lesson 1 - Prepare the AI project:

Python course with building a fintech investment AI – Lesson 2: Pandas and getting financial data

In this lesson, we can leave boring preparation and focus on the super exciting things!

Today we will build the first AI model, which will be exactly RNN (Recurrent Neural Network).

We will use some Long short-term memory and train our artificial intelligence.

Are you ready for that awesome journey?

I hope you are because I cannot wait to show you all of those powerful things!

Let's start!

If you prefer video, here is the youtube version:

Create module AI

As the first step, we will create a module named "AI".

First, you need to create a directory named "AI" at the root of the project.

Next, we need to create a file with the same name, with the ".py" extension.

Import dependencies

The second step should be to import dependencies that we will use later.

Go into the ai.py file and import these dependencies.

from datetime import date, timedelta
import numpy as np
import pandas as pd
import yfinance as yf
from pandas_datareader import data as pdr
import matplotlib.pyplot as plt

Install and import Keras

Next, we should install a library named "Keras".

Of course, if you have not installed them, you should install sklearn, pandas, NumPy, and matplotlib.

If you use conda, you probably have them already installed.

The first open terminal, and type:

pip install keras

Next, below the previous imports, you need to import "layers", and "Sequential" from Keras.

As the third line, import MinMaxScaler from sklearn.preprocessing.

from keras import layers
from keras import Sequential
from sklearn.preprocessing import MinMaxScaler

Get stock prices from API

As the fourth step, we need to focus on the prices.

We need to define a function named "getStock" with "stock" as a param.

Inside the function, we need to define today's date.

def getStock(stock):
  today = date.today()

Take data for the last 5 years

The next step is to take the pricing date for the previous five years.

We should use a similar method that we use in Lesson 2.

The only change here will be to change the number of days to 1850.

monthAgo = today - timedelta(days=1850)
data = pdr.get_data_yahoo(stock, start=monthAgo, end=today)
return data

Split data

Now, we can go into some work on data that we have got.

As the first step, we need to create the function named "splitData", and take "stockPrices", as a param.

Next, we need to split data into two parts.

The first one prices from the last month(20 days).

The second part is all of the data without the last 20 days.

Both of the parts should be returned.

def splitData(stockPrices):
  stockPricesLastMonth = stockPrices[-20:]
  stockPricesTest = stockPrices[:-20]
  return stockPricesLastMonth, stockPricesTest

Prepare data

In the next step, we can go into preparing that data.

Let's create the function named "prepareData", and take "prices", as the param.

def prepareData(prices):
  stockPricesLastMonth, stockPricesTest = splitData(prices)

Get only "Open" price value as the important one

Now, we should continue with preparing data inside the "prepareData" function.

We need to take only "Open" prices from the row and assign it to the variable named "scaledData".

cleanData = stockPricesTest.iloc[:, 1:2].values

Scale data to speed up the algorithm

Next, we need to scale the data, that will speed up our calculation.

We will use "fit_transform" function on the data that we cleaned before.

All of these variables should be returned in the last line of the function "prepareData".

scaledData = scaler.fit_transform(cleanData)
return scaledData, stockPricesTest, stockPricesLastMonth

Split train and result

In this step, we need to define the function named "splitTrainTest".

Next, we should add "scaledData" as a param.

def splitTrainTest(scaledData):

Take the data for the first 6 months and real prices

Inside the function "splitTrainTest", we will take data for the last 6 months and return two variables.

One of them will be "inputs", and the second one, "realPrice".

inputs = []
realPrice = []
for i in range(120, len(scaledData)):
  inputs.append(scaledData[i-120:i, 0])
  realPrice.append(scaledData[i, 0])
return inputs, realPrice

Create the table of arrays and reshape to have one list

Next, we should create a 3d array and reshape the variable named "inputs".

To do that, we need to create a function named "reshapeData", put "inputs", and "realPrice" as variables.

Inside the function, we should use "np.array", and "np.reshape" methods.

As the last step, we will return inputs.

def reshapeData(inputs, realPrice):
  inputs, realPrice = np.array(inputs), np.array(realPrice)
  inputs = np.reshape(inputs, (inputs.shape[0], inputs.shape[1], 1))
  return inputs

Create a model

Great! Now we will go into the real thing. We will create an AI model.

As the first step, we need to create the function named "createModel", and pass "inputs", as a param.

def createModel(inputs):

Create 3 LSTM layers

In the second step of creating a model, we should add long short-term memory layers, setup units, and dropout.

You can try to experiment with that because different setup will give different results.

I've found 3 LSTM layers with 30 units, and 0.2 dropout gives good results.

model.add(layers.LSTM(units = 30, return_sequences = True, input_shape = (inputs.shape[1], 1)))
model.add(layers.Dropout(0.2))

model.add(layers.LSTM(units = 30, return_sequences = True))
model.add(layers.Dropout(0.2))

model.add(layers.LSTM(units = 30))
model.add(layers.Dropout(0.2))

Add model and compile

The last part of the "createModel" function should be to add the Dense layer and compile the model.

I've used "adam", as an optimizer, and "mse" as a loss.

model.add(layers.Dense(units = 1))
model.compile(optimizer='adam', loss='mse')

Train model

Awesome!

Now we can go into the training model.

We need to create the function named "trainModel", and pass there four params.

We will add "inputs", "realPrice", "epochs", and "batch" as params.

Next, we need to add the "model.fit", that is responsible for the training.

def trainModel(inputs, realPrice, epochs, batch):
  model.fit(inputs, realPrice, epochs = epochs, batch_size = batch)

Define the price, model, and scaler

All the logic is ready, in this step, we should complete that.

The first is to define variable responsible for the getting price.

Next, we need to assign "Sequential" to the "model".

And we need to add "scaler".

prices = getStock('MSFT')
model = Sequential()
scaler = MinMaxScaler(feature_range=(0,1))

Train AI

In the almost last step, we need to put all of these functions to the one named "trainAI".

Add the "prices", "epochs", and "batch" as params.

def trainAI(prices, epochs, batch):
  scaled, stockPricesTest, stockPricesLastMonth = prepareData(prices)
  inputs, realPrice = splitTrainTest(scaled)
  reshapedInputs = reshapeData(inputs, realPrice)
  createModel(reshapedInputs)
  trainModel(reshapedInputs, realPrice, epochs, batch)

Call train AI

Now is the last step, and the fun begins!

Call the method "trainAI", and pass the necessary data.

About the amount of epochs, and batch size, you can experiment, I've made 50, but some people recommend even 1000, and batch size small as 4-8.

Remember about the fact, more epochs, and smaller batch size required much more iterations, and training can take many long hours.

trainAI(prices, 50, 64)

Conclusion

Congratulations!

Your project has its own and trained artificial intelligence that you can use to see what can be the next pricing trends for the stocks.

Of course, you need to remember it's only AI, it can make a lot of mistakes, and the real stock prices can be different (it depends on a few points).

Code repository for the Python course Lesson 3 is here:

https://github.com/Duomly/python-ai-investment-fintech/tree/Python-AI-course-Lesson-3

In the next lesson, we will test our AI, and we will predict the stock prices.

We will compare these prices with the real ones to know how smart is our AI.

Keep learning with us, and I cannot wait to show you the next episode, where we will see the real results, how our AI works!

Thanks for reading,

Radek from Duomly

Blog