July 24, 2023
Photo by Lukasz Szmigiel on Unsplash Introduction A random forest is an ensemble model that…
Are you looking for a computationally cheap, easy-to-explain linear estimator that’s based on simple mathematics? Look no further than OLS!
OLS stands for ordinary least squares. OLS is heavily used in econometrics—a branch of economics where statistical methods are used to find the insights in economic data.
As we know, the simplest linear regression algorithm assumes that the relationship between an independent variable (x) and dependent variable (y) is of the following form: y = mx + c
, which is the equation of a line.
In line with that, OLS is an estimator in which the values of m and c (from the above equation) are chosen in such a way as to minimize the sum of the squares of the differences between the observed dependent variable and predicted dependent variable. That’s why it’s named ordinary least squares.
Also, it should be noted that when the sum of the squares of the differences is minimum, the loss is also minimum—hence the prediction is better.
Please find here the video on Multiple Linear Regression in Python and sklearn.
Statsmodels is part of the scientific Python library that’s inclined towards data analysis, data science, and statistics. It’s built on top of the numeric library NumPy and the scientific library SciPy.
The Statsmodels package provides different classes for linear regression, including OLS. However, linear regression is very simple and interpretative using the OLS module. We can perform regression using the sm.OLS
class, where sm
is alias for Statsmodels.
The sm.OLS
method takes two array-like objects a
and b
as input. a
is generally a Pandas dataframe or a NumPy array. The shape of a
is o*c
, where o
is the number of observations and c
is the number of columns. b
is generally a Pandas series of length o
or a one dimensional NumPy array.
In the below code, OLS is implemented using the Statsmodels package:
Here we worked through a quick overview of OLS using Statsmodels and its implementation in a Jupyter Notebook with sample data. I hope you liked it and will give OLS a try for your regression problems.
You can find the code and the data here.
Happy Machine Learning :)