Simple Linear Regression explained so simply that even a 5yo can understand

Linear regression is an algorithm used to predict or visualise a relationship between two different features/variables. In linear regression tasks, there are two kinds of variables being examined: the dependent variable and the independent variable.
Let us build our first Simple Linear Regression Model and learn along the way by building.
This particular model is called as simple because it has only one independent variable.
Let's start by importing the modules
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
  • matplotlib: used to plot the data in a graphical manner
  • pandas: used for working with the dataset
  • sklearn: used to split the dataset and then apply the linear regression class onto the data.
  • Importing the dataset
    dataset = pd.read_csv("Salary_Data.csv")
    X = dataset.iloc[:, :-1].values
    y = dataset.iloc[:, -1].values
    # X is taking all the values except the last 
    # column whereas y is taking the last value
    Here we are using the data containing people's salary and working experience to predict someone's salary based on their experience.
    This is what the dataset looks like
    image
    Splitting the dataset into training and test set
    We need to split the dataset into two models i.e. the test set and the training set.
    X_train, X_test, y_train, y_test = train_test_split(
            X,y, test_size = 0.2, random_state = 0)
    # X_train contains the independent varibale
    # y_train contains the dependent variable
    Here we have used the train_test_split function that we imported from sklearn.model_selection.
    x and y are the variables, test_size tells the function about the size of the test set.
    So, if there exists 100 lines of data, It will be split into following segments,
  • Training: 80 lines
  • Testing: 20 lines
  • Training the model
    After we are done with splitting the model, now is the time to actually train the model with the training set.
    regressor = LinearRegression()
    regressor.fit(X_train, y_train)
    We simply initialise the LinearRegression class and then pass our training sets into the fit() method of the LinearRegression Class
    Visualising the training set results
    plt.scatter(X_train, y_train, color= "red")
    
    # Plotting the data
    plt.plot(X_train, regressor.predict(X_train), color="blue" )
    
    # Add title to the plot
    plt.title("Salary vs Experience(train)")
    
    # Labels on x and y axis
    plt.xlabel("Years of Experience")
    plt.ylabel("Salary")
    
    #Finally, display the plot
    plt.show()
    The output of the following code block will be
    Visualising the test set results
    plt.scatter(X_test, y_test, color= "red")
    
    # Here we are not replacing X_train with X_test because this line tells us about the data predicted and how close our results are to the training set
    plt.plot(X_train, regressor.predict(X_train), color="blue" )
    
    # Add title and labels
    plt.title("Salary vs Experience (test)")
    
    plt.xlabel("Years of Experience")
    plt.ylabel("Salary")
    
    # Finally, display the plot
    plt.show()
    The output of the following code block will be
    This was it!! We have successfully built our fully functional simple linear regression model.
    If you liked this, don't forget to give a like on the post and share it with your friends.
    Do give me a follow for more such blog posts.

    23

    This website collects cookies to deliver better user experience

    Simple Linear Regression explained so simply that even a 5yo can understand