Linear Regression Program in Python from Scratch

Linear regression is a machine learning algorithm in which a best scalar response is established according to the variables. This scalar has the least error and can be used for prediction tasks.

Linear regression works by finding the best values for the variables for equation y = w0 + w1*x, where w0, and w1 are variables. This can be extended to any number of features.
For example: y = w0 + w1*x1 + w2*x2 + … + wn*xn. The best values are found using an algorithm known as gradient descend. But in this post, we will covariance, variance, and mean to find the values for the variables.

We will write the program for the linear regression implementation. We will use python to write this program and we will not use any libraries.

Input

We have the dataset given below. It consists of 2D points.

data = [
        [0,1],
        [1,2],
        [2,4],
        [3,6],
        [4,7],
        [5,8],
        [6,9],
        [7,9],
        [8,10],
        [9,12]
        ]

x = [i[0] for i in data]
y = [i[1] for i in data]

import matplotlib.pyplot as plt
plt.scatter(x,y)
plt.show()
Plotted dataset

Mean, covariance, and variance function

Now, we will define functions to calculate mean, covariance and variance.

import math

def mean(n):
	return sum(n) / float(len(n))

def covariance(x, mx, y, my):
	covar = 0.0
	for i in range(len(x)):
		covar += (x[i] - mx) * (y[i] - my)
	return covar

def variance(values, mean):
	return sum([(x-mean)**2 for x in values])

Algorithm

Now, we will find the variables using the following code:

x_mean, y_mean = mean(x), mean(y)
w1 = covariance(x, x_mean, y, y_mean) / variance(x, x_mean)
w0 = y_mean - w1 * x_mean
print(w0)
print(w1)
1.6181818181818173
1.1515151515151516

Plotting the line

After applying the algorithm, we will plot the predicted line.

import numpy as np
x = np.array(x)
plt.scatter(x,y)
plt.plot(x, w0 + w1*x, linestyle='solid')
plt.show()
Plotted line

Calculating error

We will now find the RMSE error for the predicted line.

def rmse(actual, predicted):
	sum_error = 0.0
	for i in range(len(actual)):
		prediction_error = predicted[i] - actual[i]
		sum_error += (prediction_error ** 2)
	mean_error = sum_error / float(len(actual))
	return (mean_error)**0.5

yPred = [w0 + q*w1 for q in x]
print(rmse(y,yPred))
0.6485414871895709

Predicting the input

Now, we can predict the y value for any input x.

print("Enter the x to classify")
test = [int(i) for i in input().split()]
yNew = w0 + w1 * test[0]
print(yNew)
Enter the x to classify
5
7.375757575757575

That’s it. This is the complete implementation of linear regression.

Complete code

data = [
        [0,1],
        [1,2],
        [2,4],
        [3,6],
        [4,7],
        [5,8],
        [6,9],
        [7,9],
        [8,10],
        [9,12]
        ]

x = [i[0] for i in data]
y = [i[1] for i in data]

import matplotlib.pyplot as plt
plt.scatter(x,y)
plt.show()

import math

def mean(n):
	return sum(n) / float(len(n))

def covariance(x, mx, y, my):
	covar = 0.0
	for i in range(len(x)):
		covar += (x[i] - mx) * (y[i] - my)
	return covar

def variance(values, mean):
	return sum([(x-mean)**2 for x in values])

x_mean, y_mean = mean(x), mean(y)
w1 = covariance(x, x_mean, y, y_mean) / variance(x, x_mean)
w0 = y_mean - w1 * x_mean
print(w0)
print(w1)

import numpy as np
x = np.array(x)
plt.scatter(x,y)
plt.plot(x, w0 + w1*x, linestyle='solid')
plt.show()

def rmse(actual, predicted):
	sum_error = 0.0
	for i in range(len(actual)):
		prediction_error = predicted[i] - actual[i]
		sum_error += (prediction_error ** 2)
	mean_error = sum_error / float(len(actual))
	return (mean_error)**0.5

yPred = [w0 + q*w1 for q in x]
print(rmse(y,yPred))

print("Enter the x to classify")
test = [int(i) for i in input().split()]
yNew = w0 + w1 * test[0]
print(yNew)

Other Machine Learning algorithms:

Let us know in the comments if you are having any questions regarding this machine learning algorithm.

And if you found this post helpful, then please help us by sharing this post with your friends. Thank You

Leave a Reply