Free cookie consent management tool by TermsFeed Linear regression using scikit-learn | Pythontic.com

Linear regression using scikit-learn

Overview:

Linear regression fits a model to describe how a change in one variable affects another variable. Linear regression is not about causation where one variable is the cause of change in another variable. Linear regression is a technique to measure how two variables are related.

The first variable is plotted in X and the second variable is plotted in Y. These variables are known with several names. The X variable is known as the predictor variable. It is also known as the explanatory variable and feature. The Y variable is known as the dependent variable or response variable.

The equation of a regression line for a multiple regression analysis is given by

yi = β0 + β1x1i + β2x2i + ... + βpxpi + ε

In case of Multiple Regression Analysis the number of variables that are predicting the response variable is more than one.

The equation of a regression line for a simple regression analysis using ordinary least squares method is given by

yi = β0 + β1x1i+ ε

For a given data this line represents the best fit which minimizes the sum of the squared differences between the predicted value and the actual value. The β0 is the Y-intercept and β1 is the coefficient.

Linear regression using scikit-learn:

Finding of linear regression parameters using ordinary least squares is implemented by the scikit-learn class LinearRegression.

The fit() method does a regression line fitting for the given data and returns an object of LinearRegression. The intercept and the coefficient values are available in the intercept_ and the coef_ attributes of the object.

Linear regression using single predictor variable:

The Python example uses the weight of the car as X, the independent variable. The Y variable is the price of the car. The data points are for the sedans selling between $22000 to $27000.

The Python example predicts the price of a car whose kerb weight is 2976 lbs.

 

Name of the
Car company

Model

Price in USD

Weight in Lb

1

Toyota

Corolla

22325

2955

2

Hyundai

Elantra

22125

2725

3

Kia

K4

21990

2932

4

Kia

K5

27390

3230

5

Volkswagen

Jetta

22995

3012

6

Nissan

Sentra

21590

3036

7

Nissan

Altima

27000

3212

Data Courtesy:

Data points have been collected from the websites of the companies owning the brands Toyota, Hyundai, Kia, Volkswagen, Nissan and Honda. While the weight of a car is not a barometer of its price the variables weight of the car and the price of the car are used for pedagogical purposes.

Example:

# Example Python program that fits a line of
# linear regression using scikit-learn
# between the independent variable price and the
# target variable car weight
import numpy as np
from sklearn.linear_model import LinearRegression

# Kerb weight in lbs
x = np.array([[2955], [2725], [2932], [3230], [3012], [3036], [3212]])  

# Price of the cars in dollars
y = np.array([22325, 22125, 21990, 27390, 22995, 21590, 27000])

# Fit the regression line using ordinary least squares
reg = LinearRegression().fit(x,y)
print(x.shape)

# Print the intercept and the coefficient
print("Intercept:")
print(reg.intercept_)

print("Coefficient:")
print(reg.coef_)

# Predict the price of a car given its weight 
kerbWeight  = 2976

carPrice     = reg.predict([[kerbWeight]])
print("Predicted price of a car weighing {} lbs:".format(kerbWeight))
print(carPrice)

Output:

(7, 1)
Intercept:
-11227.744620793932
Coefficient:
[11.5633216]
Predicted price of a car weighing 2976 lbs:
[23184.70045268]

 


Copyright 2025 © pythontic.com