Logistic Regression
The name says "regression," but it does classification — using a line to separate two classes of data
01 Core Principles (Plain English)
The linear regression we covered earlier is essentially about fitting a function — finding a line that passes through as many data points as possible.
Logistic regression is the exact opposite: we don't care about the specific values of the data points, we just want to find a line that separates two classes of data.
Why Not Use a Step Function Directly?
You might naturally think: just check which side of the line a data point is on — one side is 0, the other is 1. This is exactly the idea behind a step function.
Step Function
Jumps directly from 0 to 1 at the boundary, with zero gradient everywhere (undefined at the boundary). No gradient means gradient descent can't work.
Sigmoid Function
Also squeezes the output into 0~1, but with a smooth transition that is differentiable everywhere. Gradient descent can use it to guide parameter updates.
Logistic regression uses Sigmoid to smooth out the step, enabling gradient descent training: σ(z) = 1 / (1 + e⁻ᶻ), z = w·x + b. The decision boundary is the line where z = 0, with different classes on each side.
Building Logistic Regression Step by Step
We'll break the complete code into pieces and understand what each step does.
Compared to linear regression, the data has an
additional dimension — the label y,
which only takes values 0 and 1.
To separate two classes of data, it's natural to
use a discriminant:
z = w·x + b, classify as
category 1 if z > 0, and category 0 if z <
0. Note that x here is a
vector — with two-dimensional
features, it's actually x₁ and x₂, written in
full as:
z is the result of plugging the data point into the line equation — positive on one side, negative on the other. But using z directly for classification is a step function: output 1 if z > 0, otherwise 0. The gradient at the boundary is zero, so gradient descent can't work.
The solution: wrap it in a Sigmoid, which squeezes z into a probability between (0,1) while remaining smooth and differentiable everywhere:
One line of code compresses any real number into (0,1) while guaranteeing differentiability everywhere — this is the core of logistic regression.
Using mean squared error to measure classification performance has issues. Cross-entropy is a loss function specifically designed for probability outputs:
ŷ is the probability output by Sigmoid, and y is the true label (0 or 1). The more accurate the prediction, the closer the log value is to 0, and the smaller the loss. When the prediction is completely wrong, log approaches negative infinity and the loss approaches infinity — the more wrong the prediction, the heavier the penalty.
Taking partial derivatives of the cross-entropy loss, expanding by the chain rule:
Let err = ŷ − y, the form is
exactly the same as the
gradient for linear regression — this is a nice
property of the joint derivative of
cross-entropy + Sigmoid: the derivative of
sigmoid and the derivative of cross-entropy
cancel out perfectly, leaving only the
prediction error err. Extending to n samples by
taking the average:
Put the four parts together with visualization, and you get the complete demo code — see below.
02 Code
03 Academic Explanation
Although logistic regression has "regression" in its name, it is actually a classification algorithm. It is primarily used for binary classification problems: determining whether the result is "0" or "1".
Sigmoid Function
The core of logistic regression is the Sigmoid function, which maps any real number to the (0, 1) interval:
Logistic Regression Model
Pass the linear combination into the Sigmoid function:
The model outputs a probability value between 0 and 1. We typically use 0.5 as the threshold for classification.
Loss Function: Binary Cross-Entropy
Logistic regression uses Binary Cross-Entropy as the loss function:
where ŷ is the predicted probability and y is the true label (0 or 1).
Decision Boundary
The decision boundary is the dividing line where the model classifies. For linear logistic regression, the decision boundary is a straight line:
Each side of the line belongs to a different class.
Summary
Binary classification
Sigmoid
Cross-Entropy
Probability value