Regression algorithms are used to predict numerical values based on input data. These algorithms are commonly used in applications such as forecasting stock prices, predicting home values, and estimating the likelihood of a customer making a purchase.
There are several different types of regression algorithms, including linear regression, polynomial regression, and support vector regression.
Linear Regression
Linear regression is a simple and widely used method for predicting a numerical value based on a single input variable. The model is created by fitting a straight line to the data, which is done by minimizing the sum of the squared differences between the predicted values and the actual values.
Here is an example of how to create a linear regression model using the scikit-learn library in Python:
from sklearn import datasets
from sklearn.linear_model import LinearRegression
# Load the diabetes dataset
diabetes = datasets.load_diabetes()
X = diabetes[['age']]
y = diabetes['target']
# Create an instance of the LinearRegression class
model = LinearRegression()
# Fit the model to the training data
model.fit(X, y)
# Predict the target values for a given set of input values
X_pred = [[40], [50], [60]]
y_pred = model.predict(X_pred)
print(y_pred)
This will print the predicted target values for ages 40, 50, and 60.
Polynomial Regression
Polynomial regression is a type of regression that can be used to model relationships between variables that are not linear. It does this by fitting a polynomial function to the data.
Here is an example of how to create a polynomial regression model using the scikit-learn library in Python:
from sklearn import datasets
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
# Load the diabetes dataset
diabetes = datasets.load_diabetes()
X = diabetes[['age']]
y = diabetes['target']
# Create a polynomial feature transformer
poly_transformer = PolynomialFeatures(degree=2)
# Transform the input data
X_poly = poly_transformer.fit_transform(X)
# Create an instance of the LinearRegression class
model = LinearRegression()
# Fit the model to the transformed data
model.fit(X_poly, y)
# Predict the target values for a given set of input values
X_pred = [[40], [50], [60]]
X_pred_poly = poly_transformer.transform(X_pred)
y_pred = model.predict(X_pred_poly)
print(y_pred)
This will print the predicted target values for ages 40, 50, and 60 using a polynomial function of degree 2.
Support Vector Regression
Support vector regression (SVR) is a type of regression that uses support vector machines to predict numerical values. It is particularly useful for dealing with non-linear relationships between variables.
Here is an example of how to create an SVR model using the scikit-learn library in Python:
from sklearn import datasets
from sklearn.svm import SVR
# Load the diabetes dataset
diabetes = datasets.load_diabetes
from sklearn.preprocessing import StandardScaler
# Load the diabetes dataset
diabetes = datasets.load_diabetes()
X = diabetes.data
y = diabetes.target
# Scale the input data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Create an instance of the SVR class
model = SVR()
# Fit the model to the training data
model.fit(X_scaled, y)
# Predict the target values for a given set of input values
X_pred = [[40, 1, 180, 87, 31, 0.543, 0.192, 0.060, 0.202, 0.060]]
X_pred_scaled = scaler.transform(X_pred)
y_pred = model.predict(X_pred_scaled)
print(y_pred)
This will print the predicted target value for a set of input values representing a patient with a specific age, BMI, and so on.
Exercises
To review these concepts, we will go through a series of exercises designed to test your understanding and apply what you have learned.
Create a linear regression model using the scikit-learn library to predict the housing prices in the California Housing dataset.
from sklearn import datasets
from sklearn.linear_model import LinearRegression
# Load the California Housing dataset
california_housing = datasets.fetch_california_housing()
X = california_housing.data
y = california_housing.target
# Create an instance of the LinearRegression class
model = LinearRegression()
# Fit the model to the training data
model.fit(X, y)
# Predict the housing prices for a given set of input values
X_pred = [[2.0, 1.0, 800.0, 0.0, 3.0]]
y_pred = model.predict(X_pred)
print(y_pred)
This will print the predicted housing price for a property with 2 bedrooms, 1 bathroom, 800 square feet, no block, and 3 occupants.
Create a polynomial regression model using the scikit-learn library to predict the stock prices of a company based on its quarterly earnings.
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
# Load the quarterly earnings data
X = [[1], [2], [3], [4]]
y = [10, 15, 20, 25]
# Create a polynomial feature transformer
poly_transformer = PolynomialFeatures(degree=2)
# Transform the input data
X_poly = poly_transformer.fit_transform(X)
# Create an instance of the LinearRegression class
model = LinearRegression()
# Fit the model to the transformed data
model.fit(X_poly, y)
# Predict the stock prices for a given set of input values
X_pred = [[1], [2], [3], [4], [5]]
X_pred_poly = poly_transformer.transform(X_pred)
y_pred = model.predict(X_pred_poly)
print(y_pred)
This will print the predicted stock prices for the first 5 quarters based on the quarterly earnings data.
Create a decision tree classifier using the scikit-learn library to predict whether a patient has diabetes or not based on their age, BMI, and blood pressure.
from sklearn import datasets
from sklearn.tree import DecisionTreeClassifier
# Load the diabetes dataset
diabetes = datasets.load_diabetes()
X = diabetes.data[:, :3]
y = diabetes.target
# Create an instance of the DecisionTreeClassifier class
model = DecisionTreeClassifier()
# Fit the model to the training data
model.fit(X, y)
# Predict the class label for a given set of input values
X_pred = [[40, 1, 180]]
y_pred = model.predict(X_pred)
print(y_pred)
This will print the predicted class label (either 0 or 1) for a patient with age 40, BMI 1, and blood pressure 180.
Create a K-Means clustering model using the scikit-learn library to group customers into different clusters based on their annual income and spending score.
from sklearn.cluster import KMeans
# Load the customer data
X = [[50000, 50], [60000, 60], [70000, 70], [80000, 80], [90000, 90], [100000, 100]]
# Create an instance of the KMeans class
model = KMeans(n_clusters=3)
# Fit the model to the data
model.fit(X)
# Predict the cluster labels for a given set of input values
X_pred = [[50000, 50], [60000, 60], [70000, 70], [80000, 80], [90000, 90], [100000, 100]]
y_pred = model.predict(X_pred)
print(y_pred)
This will print the cluster labels (either 0, 1, or 2) for each customer based on their annual income and spending score.
Create a random forest classifier using the scikit-learn library to predict whether a patient has heart disease or not based on their age, sex, cholesterol, and blood pressure.
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
# Load the heart disease dataset
heart_disease = datasets.load_heart_disease()
X = heart_disease.data[:, :4]
y = heart_disease.target
# Create an instance of the RandomForestClassifier class
model = RandomForestClassifier()
# Fit the model to the training data
model.fit(X, y)
# Predict the class label for a given set of input values
X_pred = [[40, 1, 180, 1]]
y_pred = model.predict(X_pred)
print(y_pred)
This will print the predicted class label (either 0 or 1) for a patient with age 40, sex 1, cholesterol 180, and blood pressure 1.