Running a random forest

Random forest is a commonly-used machine learning algorithm trademarked by Leo Breiman and Adele Cutler,Which combines the output of multiple decision trees to reach a single result.Its ease of use and flexibility have fueled its adoption,as it handles both classification and regression problems.

Random forest algorithm

The random forest algorithm is an extension of the bagging method and feature randomness to create an uncorrelated forest of decision trees. Feature randomness,also known as feature bagging or "the random subspace method"(link resides outside IBM)(PDF,121KB),generates a random subset of those features.

If we go back to the"Should I surf?"example,the questions that i may ask to determine the prediction may not be as comprehensive as someone else's set of questions.By accounting for all the potential variability in the data,we can reduce the risk of overfittting,bias,and overall variance,resulting in more precise predictions.

How it works

Random forest application

The random forest algorithm has been applied across a number of industries ,allowing them to make better business decisions .Some use cases include:

Finance

It is a preferred algorithm over others as it reduces time spent on data management and pre-processing tasks .It can be used to evaluate customers with high credit risk ,to detect frauds ,and option pricing problems.

Healthcare

The random forest algorithm has application with computational biology (link resides outside IBM) (PDF,737 KB), allowing doctors to tackle problems such as gene expression classification ,biomarker discovery ,and sequence annotation .As a result ,doctors can make estimates around drug responses to specific medications.
E-commerce:It can be used for recommendation engines for cross-sell purposes.

Implementation

Step 1

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('Salaries.csv')
print(data)

Step 2

Step 3

Step 4

from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators = 100, random_state = 0)
regressor.fit(x,y)

output

Step 5

Y_pred = regressor.predict(np.array([6.5]).reshape(1, 1))

Step 6

X_grid = np.arrange(min(x), max(x), 0.01)
X_grid = X_grid.reshape((len(X_grid), 1))
plt.scatter(x, y, color = 'blue') 
plt.plot(X_grid, regressor.predict(X_grid), 
         color = 'green') 
plt.title('Random Forest Regression')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()

output

Blog