Linear Regression

KDAG IIT KGP
8 min readJul 5, 2021

--

Introduction

Do you remember drawing a straight line V-I characteristic in your cherished Electrical Lab? Do you remember how you drew it? Let’s brush it up a bit.

We first set a voltage (independent variable) and measure the current in the circuit (dependent variable). Then we keep a tab of this pair (V1, I1) and proceed to change the voltage and continue the process. Once we have enough data points (‘enough’ is an ambiguous term in the field of Machine Learning, you never know how much you require. You must have experienced it when you and the professor did not see eye-to-eye about how many measurements you needed to make :p). After that, you were asked to draw the ‘best fit’ line through these points. Well, what do you mean by ‘best fit’?

That is going to be the topic of today’s blog. We will look at the method of ‘fitting a line’ to our data points to finally obtain a model that is capable of making predictions given our independent variable.

In this article, we are going to cover–

  1. Hypothesis function of Linear Regression
  2. Types of Linear Regression
  3. Underlying Assumptions
  4. Cost function & Gradient Descent
  5. Evaluation metrics
  6. Finally, we will code our model from scratch and apply it to the height-weight dataset.

Without further ado, let’s get started!

Hypothesis Function

The hypothesis function is the approximated relationship between a dependent variable and the independent variables. We are trying to fit our hypothesis function which can approximately estimate the dependent variable concerning the independent variables.

Regression: Regression is a method of modeling a target value based on independent predictors. This method is mostly used for forecasting and finding out cause and effect relationships between variables. Regression techniques mostly differ based on the number of independent variables and the type of relationship between the independent and dependent variables.

Linear Regression

In simple words, Linear Regression is a supervised Machine Learning model in which the model finds the best fit linear line between the independent and dependent variable i.e., it finds the linear relationship between the dependent and independent variable.

Linear Regression is of two types: Simple and Multiple -

In Simple Linear Regression, only one independent variable is present and the model has to find the linear relationship of it with the dependent variable.

where bo is the intercept, b1 is coefficient or slope, x is the independent variable and y is the dependent variable. In this article, we will see this type of regression only.

In Multiple Linear Regression, there are more than one independent variables for the model to find the relationship.

where bo is the intercept, b1, b2, b3, b4…, bn are coefficients or slopes of the independent variables x1, x2, x3, x4…, xn, and y is the dependent variable.

“A Linear Regression model’s main aim is to find the best fit linear line and the optimal values of intercept and coefficients such that the error is minimized.”

In the above diagram, x is our independent variable (plotted on x-axis) and y is the dependent variable (plotted on y-axis). Black dots are the data points i.e., the actual values. bo is the intercept which is 10 and b1 is the slope of the x variable. The blue line is the best fit line predicted by the model i.e., the predicted values lie on the blue line. The vertical distance between the data point and the regression line is known as error or residual.

Assumptions

Then, can we apply linear regression everywhere? Well not quite. There are some things that we need to keep in mind before applying linear regression. We have summarized most of them for you.

Linearity: It states that the dependent variable Y should be linearly related to independent variables. This assumption can be checked by plotting a scatter plot between both variables.

Normality: The X and Y variables should be normally distributed. Histograms, KDE plots, Q-Q plots can be used to check the Normality assumption.

Homoscedasticity: The variance of the error terms should be constant i.e., the spread of residuals should be constant for all values of X. This assumption can be checked by plotting a residual plot. If the assumption is violated then the points will form a funnel shape otherwise, they will be constant.

The error terms should be normally distributed. Q-Q plots and Histograms can be used to check the distribution of error terms.

Cost Function

The cost function helps us to figure out the best possible values for ‘b’ and ‘w’ which would provide the best fit line for the data points. Since we want the best values for b and w, we convert this search problem into a minimization problem where we would like to minimize the error between the predicted value and the actual value.

We choose the above function to minimize. The difference between the predicted values and ground truth measures the error difference. We square the error difference and sum over all data points and divide that value by the total number of data points. This provides the average squared error over all the data points. Therefore, this cost function is also known as the Mean Squared Error (MSE) function. Now, using this MSE function we are going to change the values of b and w such that the MSE value settles at the minima.

Gradient Descent

Gradient Descent is a method of updating b and w to reduce the cost function (MSE). The idea is that we start with some values for b and w and then we change these values iteratively to reduce the cost. Gradient descent helps us on how to change the values.

To draw an analogy, imagine a pit in the shape of U and you are standing at the topmost point in the pit and your objective is to reach the bottom of the pit. There is a catch, you can only take a discrete number of steps to reach the bottom. If you decide to take one step at a time, you would eventually reach the bottom of the pit but this would take a longer time. If you choose to take longer steps each time, you would reach sooner but, there is a chance that you could overshoot the bottom of the pit and not exactly at the bottom. In the gradient descent algorithm, the number of steps you take is the learning rate. This decides how fast the algorithm converges to the minima.

Sometimes the cost function can be a non-convex function where you could settle at a local minimum but for linear regression, it is always a convex function.

Evaluation Metrics

Well done! We have fitted our line to our data points! We have done a good job, haven’t we? Did we actually? How do we know it was a ‘good job and not a ‘so-so’ job? Yes, you are thinking right. We need some metrics, some scores that will tell us how good of a job we have done. Let’s briefly look at some metrics that are commonly used to measure our performance.

R squared or Coefficient of Determination: The most commonly used metric for model evaluation in regression analysis is R squared. It can be defined as a Ratio of variation to the Total Variation. The value of R squared lies between 0 to 1, the value closer to 1 the better the model.

where SSRES is the Residual Sum of squares and SSTOT is the Total Sum of squares.

Adjusted R squared: It is the improvement to R squared. The drawback with R2 is that as the features increase, the value of R2 also increases which gives the illusion of a good model. So, the Adjusted R2 solves the drawback of R2. It only considers the features which are important for the model and shows the real improvement of the model. Adjusted R2 is always lower than R2.

Mean Squared Error (MSE): Another Common metric for evaluation is Mean squared error which is the mean of the squared difference of actual and predicted values.

Root Mean Squared Error (RMSE): It is the root of MSE i.e root of the mean of the squared difference of actual and predicted values. RMSE penalizes the large errors whereas MSE doesn’t.

Let’s Code!

Now we will code this algorithm from scratch using the Gradient Descent algorithm.

GitHub link for Code and Dataset is available here.

Conclusion

Linear Regression is a very simple algorithm that every Machine Learning enthusiast must know and it is also the right place to start for people who want to learn Machine Learning as well. Well, what if we have multiple variables? We can easily add these variables to the equations. We have been operating with lines till now, adding another will get us to planes and so on. We will cover more advanced cases with multiple variables, modifications to the method we portrayed here, to get a more robust and accurate model in our next article.

So, revise, code, and watch out for our next article!

Follow us on:

  1. Facebook
  2. Instagram
  3. Linkedin

--

--

KDAG IIT KGP

We aim to provide ample opportunity & resources to all the AI/ML enthusiasts out there that are required to build a successful career in this emerging domain.