Machine Learning Algorithms Explained with Numerical Examples and Code: October 2024

Linear regression can be performed using several approaches, including Ordinary Least Squares (OLS with single variable ), the OLS with multiple variables (closed-form solution ), and gradient descent.

In this post, we discuss Ordinary Least Squares (OLS) for performing simple linear regression. The topics include:

Simple Linear regression
Objective of Simple Linear Regression
Example
Mathematical Intuition
Interpretation of Coefficients
Applications of Simple Linear Regression
Numerical Example Using Ordinary Least Squares (OLS)
Model Evaluation
Steps and Python code for Implementing Simple Linear Regression using OLS
Limitations of Ordinary Least Squares (OLS)

1. Simple Linear Regression is a type of supervised learning algorithm in machine learning that predicts a continuous outcome based on one input feature. It models the relationship between two variables by fitting a straight line (regression line) through the data points. The idea is to understand how a change in the independent variable (X) leads to a change in the dependent variable (Y).

The model assumes that the relationship between X and Y is linear, meaning it can be described by a straight line. This can be expressed mathematically as:

Y = b_{0} + b_{1} X

MSE = \frac{1}{n} \sum_{i=1}^{n} (Y_i - \hat{Y}_i)^2

Hours Studied (X)	Test Score (Y)
1	50
2	55
3	65
4	70
5	80

Hours Studied (X)	Test Score (Y)
1	50
2	55
3	65
4	70
5	80

Hours Studied (X)	Actual Score (Y)	Predicted Score Ŷ (41.5 + 7.5X)	Residual (Y - Ŷ )
1	50	49	1
2	55	56.5	-1.5
3	65	64	1
4	70	71.5	-1.5
5	80	79	1

Hours Studied (X)	Test Score (Y)
1	50
2	55
3	65
4	70
5	80

Hours Studied (X)	Actual Score (Y)	Predicted Score Ŷ	Residual ( Y - Ŷ)
1	50	49	1
2	55	56.5	-1.5
3	65	64	1
4	70	71.5	-1.5
5	80	79	1

Machine Learning Algorithms Explained with Numerical Examples and Code

Simple Linear Regression using OLS Made Easy: A Layman’s Perspective on Machine Learning

2. Objective of Simple Linear Regression Example

3. Example

Step 1: Calculate the Mean of X and Y

Step 2: Calculate the Slope $b_1$

Step 3: Calculate the Intercept $b_0$

Step 4: Write the Final Equation

Model Equation:

1. Residual Analysis

2. R-squared ( $R^{2}$ )

Calculate Adjusted $R^2$

Step 1.1: Calculate the Sum of Squared Residuals

Step 1.2: Calculate RSE

Residuals Calculation

Independence

Residuals Check

Residuals Calculation

Normality of Residuals

Residuals Check

Mean of Residuals

Variance of Residuals

Standard Deviation of Residuals

The standard deviation $\sigma$ is the square root of the variance.
$σ = \sqrt{1.5} \approx 1.22$
To check the normality of residuals, you can use either a Q-Q (Quantile-Quantile) plot or a histogram.

Q-Q Plot

Histogram

Step 8: Calculate residuals (actual - predicted values).

2. Sensitivity to Outliers

3. Assumption of Homoscedasticity

4. Normality of Errors

Simple Linear Regression using OLS Made Easy: A Layman’s Perspective on Machine Learning

2. Objective of Simple Linear Regression Example

3. Example

Step 1: Calculate the Mean of X and Y

Step 2: Calculate the Slope b1b_1​

Step 3: Calculate the Intercept b0b_0​

Step 4: Write the Final Equation

Model Equation:

1. Residual Analysis

2. R-squared ( R2 )

Calculate Adjusted R2R^2

Step 1.1: Calculate the Sum of Squared Residuals

Step 1.2: Calculate RSE

Residuals Calculation

Independence

Residuals Check

Residuals Calculation

Normality of Residuals

Residuals Check

Mean of Residuals

Variance of Residuals

Standard Deviation of Residuals

The standard deviation σ\sigmaσ is the square root of the variance.σ=1.5≈1.22To check the normality of residuals, you can use either a Q-Q (Quantile-Quantile) plot or a histogram.

Q-Q Plot

Histogram

Step 8: Calculate residuals (actual - predicted values).

2. Sensitivity to Outliers

3. Assumption of Homoscedasticity

4. Normality of Errors

Step 2: Calculate the Slope $b_1$

Step 3: Calculate the Intercept $b_0$

2. R-squared ( $R^{2}$ )

Calculate Adjusted $R^2$

The standard deviation $\sigma$ is the square root of the variance.
$σ = \sqrt{1.5} \approx 1.22$
To check the normality of residuals, you can use either a Q-Q (Quantile-Quantile) plot or a histogram.