Overview:
Linear regression fits a model to describe how a change in one variable affects another variable. Linear regression is not about causation where one variable is the cause of change in another variable. Linear regression is a technique to measure how two variables are related.
The first variable is plotted in X and the second variable is plotted in Y. These variables are known with several names. The X variable is known as the predictor variable. It is also known as the explanatory variable and feature. The Y variable is known as the dependent variable or response variable.
The equation of a regression line for a multiple regression analysis is given by
yi = β0 + β1x1i + β2x2i + ... + βpxpi + ε
In case of Multiple Regression Analysis the number of variables that are predicting the response variable is more than one.
The equation of a regression line for a simple regression analysis using ordinary least squares method is given by
yi = β0 + β1x1i+ ε
For a given data this line represents the best fit which minimizes the sum of the squared differences between the predicted value and the actual value. The β0 is the Y-intercept and β1 is the coefficient.
Linear regression using scikit-learn:
Finding of linear regression parameters using ordinary least squares is implemented by the scikit-learn class LinearRegression.
The fit() method does a regression line fitting for the given data and returns an object of LinearRegression. The intercept and the coefficient values are available in the intercept_ and the coef_ attributes of the object.
Linear regression using single predictor variable:
The Python example uses the weight of the car as X, the independent variable. The Y variable is the price of the car. The data points are for the sedans selling between $22000 to $27000.
The Python example predicts the price of a car whose kerb weight is 2976 lbs.
|
Name of the |
Model |
Price in USD |
Weight in Lb |
1 |
Toyota |
Corolla |
22325 |
2955 |
2 |
Hyundai |
Elantra |
22125 |
2725 |
3 |
Kia |
K4 |
21990 |
2932 |
4 |
Kia |
K5 |
27390 |
3230 |
5 |
Volkswagen |
Jetta |
22995 |
3012 |
6 |
Nissan |
Sentra |
21590 |
3036 |
7 |
Nissan |
Altima |
27000 |
3212 |
Data Courtesy:
Data points have been collected from the websites of the companies owning the brands Toyota, Hyundai, Kia, Volkswagen, Nissan and Honda. While the weight of a car is not a barometer of its price the variables weight of the car and the price of the car are used for pedagogical purposes.
Example:
# Example Python program that fits a line of # Kerb weight in lbs # Price of the cars in dollars # Fit the regression line using ordinary least squares # Print the intercept and the coefficient print("Coefficient:") # Predict the price of a car given its weight carPrice = reg.predict([[kerbWeight]]) |
Output:
(7, 1) Intercept: -11227.744620793932 Coefficient: [11.5633216] Predicted price of a car weighing 2976 lbs: [23184.70045268] |