# Decoding R squared— back to basics

Image source — shuttershock.com

Quite often I have seen data scientists with decent amount of experience struggling to explain “R squared for regression model”.The idea of writing this story came out from one of these experience recently. My intention of the story is to make “R squared” absolutely clear for readers.

Image source- dribbble.com

Lets get refreshed with few basic question-answer pair:

1. What is a regression model — A regression model estimates a dependent variable based on one or more independent variable(s). In a regression model, dependent variable is continuous in nature.

Lets try understanding by a simple example of estimating height(dependent variable) of a person from weight of the person(independent variable). Check below data set for demo purpose.

# Training model:

I will be using R-studio for demo purpose and create above dummy data in R-studio:

`#create dummy datamydata <- data.frame( Height = c(175,172,155), Weight = c(78,82,62))`

Lets train the linear regression model in R-studio using lm function:

`#Train linear modellinearMod <- lm(Height ~ Weight , data=mydata)`

checking details of the model using “summary” function

`summary(linearMod) #Check model summary`

As observed, R-squared for the model is 0.8952.

We will try reaching this number 0.8952 by manual calculation.

# Testing model:

let us create similar data set for independent feature, Weight in this case as test data

`mytestdata = c(78,82,62) #create test data`

let us predict the dependent feature, height in this case to obtain the predicted values:

`predict(linearMod,data = mytestdata) #Predict values for test`

Output for above command is a list of predicted values for 3 data points in test. These values are displayed below in R console:

So far, so good. We have two set of values with us. Actual values and Predicted values.

# Calculate R square:

Actual values = [175,172,155]

Predicted values = [171.19,175.04,155.76]

mean of actual values = (175+172+155)/3=167.33

To calculate Total sum of squares(TSS) of a population, we need to take squared sum of difference between individual values with mean. Hence TSS can be calculated as:

Total sum of squares = (175–167.33)²+(172–167.33)²+(155–167.33)² = 232.66

To calculate Residual sum of squares(RSS), we need to take squared sum of difference between Actual value and predicted values. Hence RSS can be calculates as:

Residual sum of squares = (175–171.19)²+(172–175.04)²+(155–155.76)² = 24.33

Mathematical formula for calculating R-square is 1-(RSS/TSS) which is derived form the formula (TSS-RSS)/TSS. Meaning of this formula is explained below.

100% minus “unexplained percent by model”.To understand it in other words, percentage of TSS which is explained by the model is known as R-squared for the model.

Residuals(RSS) are the error of the model and hence these are unexplained part of the model. These are subtracted from 100% to get R-squared.

Putting the values in above equation, R square = 1-(24.33/232.66) =0.8954

# Comparison:

Summary function in R gave R-squared as 0.895(up to three decimal points)

Manual calculation also gave us R-squared as 0.895(up to three decimal points)

Hence we could calculate R square and compare with R studio output.

# Conclusion:

This story was intended to give you a very clear cut idea of what is R squared for regression model and help you being more confident in data science model building.

You can also join my Facebook group — “Unfold Data Science ” here ,where me and fellow data scientists keep discussing about data science concepts and other queries/doubts related to data science industry.This group is useful for data science aspirants as well.

Connect with me on LinkedIn here.

Anyone looking for guidance in data science can reach me on below mentioned email.

Thank you

Aman(amanrai77@gmail.com)

I am a data scientist continuously helping businesses grow by machine learning consulting along with data science initiatives like mentoring and training.

## More from Aman Kumar

I am a data scientist continuously helping businesses grow by machine learning consulting along with data science initiatives like mentoring and training.