Demystifying Logistic Regression: A Beginner’s Guide to Binary Classification

11 min readFeb 12, 2023

What is Logistic Regression?

Logistic regression is a type of regression analysis used to model the relationship between a binary response variable and one or more predictor variables. It is a statistical technique that is commonly used in machine learning for classification tasks, where the goal is to predict the probability of an observation belonging to a certain class.

Comparison Between Linear Regression And Logistic Regression

The main difference between logistic regression and linear regression is the type of response variable. Linear regression is used when the response variable is continuous, while logistic regression is used when the response variable is binary (i.e., yes or no, 1 or 0). Linear regression attempts to find the line of best fit that minimizes the sum of the squared errors between the predicted and actual values, whereas logistic regression uses the logistic function to model the probability of a binary outcome.

Another difference between the two is the way they model the relationship between the response variable and the predictor variables. Linear regression models the relationship as a linear function, while logistic regression models it as a logistic function, which gives an S-shaped curve that can capture non-linear relationships between the variables.

In summary, while linear regression is used to model the relationship between a continuous outcome variable and predictor variables, logistic regression is used to model the relationship between a binary outcome variable and predictor variables by producing probability values.

Types Of Logistic Regression

Here are the main 3 types of logistic regression:

1. Binary Logistic Regression: This is the most common type of logistic regression, where the response variable has only two possible outcomes, often denoted as 0 or 1. It is used when the objective is to predict a binary outcome, such as whether a customer will buy a product or not.

In this example, we are using binary logistic regression to predict whether a person has diabetes or not based on their age and gender. The response variable is binary (0 or 1) indicating whether the person has diabetes or not.

2. Multinomial Logistic Regression: This is used when the response variable has more than two categories but is still nominal. For example, it could be used to predict the color of a flower, which could be red, blue, or green.

In this example, we are using multinomial logistic regression to predict the color of a car based on the age and gender of the owner. The response variable is categorical with three levels: red, blue, and green.

3. Ordinal Logistic Regression: This is used when the response variable is ordinal, which means that it has a natural ordering. For example, it could be used to predict a person’s level of education, which could be high school, college, or graduate school.

In this example, we are using ordinal logistic regression to predict the education level based on the age and gender of the individual. The response variable is ordinal with three levels: high school, college, and graduate school.

Sigmoid Activation

The sigmoid activation function is used in binary logistic regression to transform the output of the linear regression model to a probability value between 0 and 1.

The equation for the sigmoid function is:

f(x) = 1 / (1 + e^(-x))

where x is the input to the function.

In binary logistic regression, the input x is the linear combination of the predictor variables, which is modeled as:

y = b0 + b1 * x1 + b2 * x2 + ... + bn * xn

where y is the predicted output, x1, x2, ..., xn are the predictor variables, and b0, b1, b2, ..., bn are the coefficients or weights of the model.

The output y is then passed through the sigmoid function to obtain a probability value between 0 and 1:

p = 1 / (1 + e^(-y))

where p is the probability of the binary outcome.

Here is an example to illustrate how the sigmoid activation function is used in binary logistic regression:

Suppose we have a dataset with one predictor variable, x, and a binary response variable, y (0 or 1), and we want to model the probability of y based on x. We fit a binary logistic regression model of the form:

y = b0 + b1 * x

where b0 and b1 are the coefficients of the model. The output of the model is passed through the sigmoid function:

p = 1 / (1 + e^(-(b0 + b1 * x)))

The resulting probability value, p, represents the likelihood of the binary outcome being 1 for a given value of x. For example, if p is 0.8, it means that the model predicts an 80% chance of the binary outcome being 1 for the given value of x.

Decision boundary

The decision boundary is the boundary that separates the two classes, typically represented as a straight line in two-dimensional space or a hyperplane in higher-dimensional space. The decision boundary is determined by the coefficients or weights of the logistic regression model.

The decision boundary is obtained by setting the predicted probability of the binary outcome to 0.5 and solving for the value(s) of the predictor variable(s) that satisfy the equation. In other words, the decision boundary is the set of points where the model is equally likely to predict a positive or negative outcome.

For example, suppose we have a binary logistic regression model of the form:

y = b0 + b1 * x1 + b2 * x2

where y is the predicted probability of a positive outcome (e.g., y=1), x1 and x2 are the predictor variables, and b0, b1, and b2 are the coefficients of the model.

The decision boundary can be found by setting y=0.5 and solving for x1 and x2:

0.5 = b0 + b1 * x1 + b2 * x2

The resulting equation represents a line (in two-dimensional space) or a hyperplane (in higher-dimensional space) that separates the positive and negative outcomes.

Data points that fall on one side of the decision boundary are classified as one outcome (e.g., y=1), while data points that fall on the other side of the boundary are classified as the other outcome (e.g., y=0).

The decision boundary is a critical aspect of binary logistic regression, as it determines the accuracy of the model’s predictions and the types of errors that it is likely to make. The location and shape of the decision boundary depend on the specific values of the coefficients or weights of the model and the distribution of the predictor variables in the data.

Making Predictions

predictions are made by estimating the probability of the binary outcome for a given set of predictor variables. The predicted probability is then compared to a threshold value (usually 0.5) to determine the predicted class.

The steps for making predictions in binary logistic regression are as follows:

1. Fit a binary logistic regression model to the training data, estimating the coefficients or weights of the model.

2. For a new observation with predictor variables x1, x2, ..., xn, calculate the linear combination of the coefficients and predictor variables:

y = b0 + b1*x1 + b2*x2 + ... + bn*xn

where b0, b1, b2, ..., bn are the coefficients or weights of the model.

3. Pass the linear combination y through the sigmoid activation function to obtain the estimated probability of the binary outcome:

p = 1 / (1 + e^(-y))

4. Compare the estimated probability p to a threshold value (usually 0.5). If p >= 0.5, predict a positive outcome (e.g., 1). If p < 0.5, predict a negative outcome (e.g., 0).

For example, suppose we have a binary logistic regression model that predicts the likelihood of a person being admitted to a university based on two predictor variables, their GRE score and their GPA. The model has the following coefficients:

Intercept (b0) = -4.0777
GRE score (b1) = 0.0023
GPA (b2) = 0.8040

We want to use the model to predict whether a new student with a GRE score of 680 and a GPA of 3.5 will be admitted to the university.?

To make the prediction, we first compute the linear combination of the predictor variables using the formula:

y = b0 + b1 * x1 + b2 * x2

where y is the predicted log-odds of being admitted, x1 is the GRE score, and x2 is the GPA.

Plugging in the values for the new student, we get:

y = -4.0777 + 0.0023 * 680 + 0.8040 * 3.5
  = 2.0836

Next, we apply the sigmoid activation function to the predicted log-odds to get the predicted probability of being admitted:

p = 1 / (1 + e^(-y))
  = 1 / (1 + e^(-2.0836))
  = 0.8893

The predicted probability of the new student being admitted is 0.8893.

Finally, we compare the predicted probability to the threshold value (0.5) to make the prediction. Since the predicted probability is greater than 0.5, we predict that the new student will be admitted to the university.

Cost function

In binary logistic regression, the cost function (also called the objective or loss function) is used to measure the error or mismatch between the predicted probabilities and the actual binary outcomes in the training data. The goal of logistic regression is to minimize the cost function, which can be accomplished by adjusting the coefficients or weights of the model.

The most commonly used cost function in binary logistic regression is the binary cross-entropy loss function, which is defined as:

J = -1/m * sum(yi*log(pi) + (1-yi)*log(1-pi))

where J is the cost or loss, m is the number of training examples, yi is the actual binary outcome (0 or 1) for the ith training example, pi is the predicted probability of the binary outcome for the ith training example, and log is the natural logarithm.

The following graph shows an example of the cost function for a logistic regression model:

The x-axis represents the value of the coefficient or weight for the predictor variable, and the y-axis represents the value of the cost function. The curve shows how the cost changes as the coefficient or weight is updated. The minimum point on the curve represents the optimal value of the coefficient or weight that minimizes the cost function and produces the best fit to the training data.

Gradient descent for Logistic Cost function

The gradient descent algorithm can be used to update the coefficients or weights of a logistic regression model for the purpose of minimizing the cost function. In logistic regression, the cost function is typically the binary cross-entropy or log loss function.

Steps followed by the Gradient Descent to obtain lower cost function:

Let’s have a look at the logistic(sigmoid) function.

Here, x = mx+b or x = b0 + b1x

Initially, the values of m and b will be 0 and the learning rate(α) will be introduced to the function.
The value of the learning rate(α) is taken very small, something between 0.01 or 0.0001.
Then the partial derivative is calculated for the cost function is take. After Calculation, the equation achieved will be.

After the derivatives are calculated, weights are updated with the help of the following equation.

Which can also be written as:

The process of updating the weights will continue until the cost function reaches the ideal value of 0 or close to 0.

The following is a summary of the gradient descent algorithm for logistic regression:

Initialize the coefficients or weights with small random values.
Compute the predicted probabilities h0(x^i)for each observation in the training data using the logistic function with the current coefficients or weights.
Compute the gradient of the cost function with respect to each coefficient or weight using the predicted probabilities and the actual binary outcomes in the training data.
Update the coefficients or weights using the update rule.
Repeat steps 2–4 until the cost function is minimized or a stopping criterion is met.

Multiclass logistic regression

Multiclass logistic regression is a type of machine learning algorithm used for classification problems where there are more than two classes.

The procedure for multiclass logistic regression is as follows:

Collect and preprocess data: The first step is to collect data and preprocess it by removing any outliers, normalizing the data, and encoding the target variable.
Split the data: The next step is to split the data into training and testing sets.
Fit the model: The third step is to fit the logistic regression model to the training data using the softmax activation function. The softmax function converts the output of the model into a probability distribution across all classes.
Predict on test data: The fourth step is to use the trained model to make predictions on the test data.
Evaluate the model: The final step is to evaluate the performance of the model using metrics such as accuracy, precision, recall, and F1-score.

Here’s an example of how to implement multiclass logistic regression using Scikit-Learn in Python:

from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the iris dataset
iris = load_iris()

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)

# Fit the logistic regression model to the training data
clf = LogisticRegression(multi_class='multinomial', solver='lbfgs')
clf.fit(X_train, y_train)

# Make predictions on the test data
y_pred = clf.predict(X_test)

# Evaluate the performance of the model
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy:', accuracy)

In this example, we use the load_iris function to load the iris dataset, which is a popular dataset for classification problems. We split the data into training and testing sets using the train_test_split function. We then fit a logistic regression model to the training data using the LogisticRegression class from Scikit-Learn, with the multi_class parameter set to 'multinomial' and the solver parameter set to 'lbfgs'. We make predictions on the test data using the predict method of the logistic regression model, and evaluate the performance of the model using the accuracy_score function from Scikit-Learn's metrics module.

Top Machine Learning Mastery: Elevate Your Skills with this Step-by-Step Tutorial
1. Need for Machine Learning, Basic Principles, Applications, Challenges
2. Types of Machine Learning
3. Linear Regression
4. Logistic Regression (Binary Classification)
5. K-Nearest Neighbors
6. Decision Tree
7. Random Forest
8. Gradient Boosting (XGboost)
9. Support Vector Machines
10. Classification Evaluation measures (Accuracy, Precision, Recall, confusion Metrics) Overfitting and underfitting
11. Neural Network Representation (Perceptron Learning)
12. Convolution Neural Nets
13. Recurrent Neural Nets
14. Hyperparameter tuning
15. Dimensionality Reduction (PCA, SVD)
16. Clustering (K-Means Clustering, Hierarchical Clustering)
17. Anomaly Detection
18. Association Rule Learning
19. Reinforcement Learning Fundamentals and Applications
20. Q-Learning
21. Recommendation Systems
Dive into an insightful Machine Learning tutorial for exam success and knowledge expansion. More concepts and hands-on projects coming soon — follow my Medium profile for updates!