Each factoring pattern can be proven by expanding the right side and verifying it equals the left. These aren't formulas to memorize on faith — they are algebraic identities with clean proofs.
This is the distributive property read in reverse. The distributive property states:
This is an axiom — one of the foundational rules of arithmetic accepted without proof. Factoring out the GCF is simply reading this axiom from right to left: if every term shares the factor $a$, we can "undistribute" it.
Example: $6x^2 + 3x = 3x(2x + 1)$. The GCF of $6x^2$ and $3x$ is $3x$. Verify: $3x \cdot 2x + 3x \cdot 1 = 6x^2 + 3x$. ✓
GCF factoring is the first step in every factoring problem. Always look for a common factor before trying any other technique. It simplifies everything that follows.
Multiply $(a + b)(a - b)$ using distribution (FOIL):
The middle terms $-ab$ and $+ab$ cancel:
$a^2 + b^2$ does not factor over the real numbers. The middle terms would need to cancel, but $(a + bi)(a - bi) = a^2 + b^2$ — the factorization requires complex numbers. Over the reals, $a^2 + b^2$ is irreducible.
$(a + b)^2 = (a + b)(a + b)$. Distribute:
The proof for $(a - b)^2 = a^2 - 2ab + b^2$ is identical but with $b$ replaced by $-b$:
A trinomial $x^2 + bx + c$ is a perfect square if and only if $c = (b/2)^2$. Check: is the constant term the square of half the linear coefficient? If yes, it factors as $(x + b/2)^2$. This recognition is the key to completing the square.
Distribute $a$ and $b$ separately across the trinomial:
Cancel the middle terms: $-a^2b + a^2b = 0$ and $+ab^2 - ab^2 = 0$:
Distribute $a$ and $(-b)$:
Again, the middle terms cancel in pairs:
Both cube formulas have the same structure: a binomial times a trinomial, where the trinomial is engineered so the cross terms cancel. The sign pattern is: SOAP — Same, Opposite, Always Positive. The binomial has the Same sign as the original, the first term of the trinomial has the Opposite sign, and the last term is Always Positive.
This generalizes: $a^n - b^n = (a-b)(a^{n-1} + a^{n-2}b + \cdots + ab^{n-2} + b^{n-1})$ for any positive integer $n$.
The proof constructs $q(x)$ and $r(x)$ through the long division algorithm:
Base case: If $\deg(f) < \deg(d)$, then $q(x) = 0$ and $r(x) = f(x)$. Done.
Inductive step: If $\deg(f) \geq \deg(d)$, let $f(x) = a_n x^n + \cdots$ and $d(x) = b_m x^m + \cdots$. Form the term $\frac{a_n}{b_m} x^{n-m}$ and compute:
This cancels the leading term of $f$, so $\deg(f_1) < \deg(f)$. Repeat this process with $f_1$ in place of $f$. Each step reduces the degree by at least 1, so the process must terminate when the degree drops below $\deg(d)$. The accumulated terms form $q(x)$; the leftover is $r(x)$.
Suppose $f = dq_1 + r_1$ and $f = dq_2 + r_2$. Then:
The left side has degree $\geq \deg(d)$ (unless $q_1 = q_2$). The right side has degree $< \deg(d)$ (since both remainders do). The only way these can be equal is if both sides are $0$. Therefore $q_1 = q_2$ and $r_1 = r_2$.
This is exactly the polynomial version of integer division: $17 = 5 \times 3 + 2$. The dividend ($17$) equals the divisor ($5$) times the quotient ($3$) plus the remainder ($2$), and the remainder is strictly smaller than the divisor.
By the Division Algorithm, we can write:
Since we're dividing by $(x - c)$ (degree 1), the remainder $r$ must be a constant (degree 0). Now evaluate at $x = c$:
Therefore $r = f(c)$.
If $(x - c)$ is a factor, then $f(x) = (x - c)\,q(x)$ for some polynomial $q(x)$.
Evaluate at $x = c$: $f(c) = (c - c)\,q(c) = 0$.
If $f(c) = 0$, then by the Remainder Theorem, the remainder when dividing $f(x)$ by $(x-c)$ is $0$.
So $f(x) = (x - c)\,q(x) + 0 = (x-c)\,q(x)$, which means $(x-c)$ is a factor.
Since $\frac{p}{q}$ is a root, we have $f\!\left(\frac{p}{q}\right) = 0$:
Multiply both sides by $q^n$ to clear all denominators:
Move $a_0 q^n$ to the right side:
Every term on the left has $p$ as a factor. Therefore $p$ divides the right side $a_0 q^n$. Since $\gcd(p, q) = 1$ (the fraction is in lowest terms), $p$ doesn't share any factors with $q^n$, so $p$ must divide $a_0$.
Similarly, move $a_n p^n$ to the right side:
Every term on the left has $q$ as a factor. So $q$ divides $a_n p^n$. Since $\gcd(p, q) = 1$, $q$ must divide $a_n$.
A finite list of candidates to test: all fractions $\pm\frac{p}{q}$ where $p$ divides the constant term and $q$ divides the leading coefficient. Test each by evaluating $f(p/q)$. If it equals zero, you've found a root. This is the "test and factor" strategy for polynomials.
A rigorous proof requires complex analysis (Liouville's Theorem or winding numbers), but we can build intuition:
Degree 1: $ax + b = 0$ always has root $x = -b/a$. ✓
Degree 2: The quadratic formula always produces two roots (real or complex), since $\sqrt{b^2 - 4ac}$ is always defined in $\mathbb{C}$. ✓
General pattern: Once we find one root $c$ (guaranteed to exist by the theorem), the Factor Theorem gives us $f(x) = (x-c)\,q(x)$ where $q(x)$ has degree $n-1$. Apply the theorem again to $q(x)$, and repeat. After $n$ steps:
This gives exactly $n$ roots $c_1, c_2, \ldots, c_n$ (some may repeat).
The deepest part is proving that at least one root exists. This requires the completeness of $\mathbb{C}$ — the complex numbers have "no holes." Real numbers alone aren't enough ($x^2 + 1 = 0$ has no real roots). The complex numbers were essentially invented to make this theorem true.
Add $\left(\frac{b}{2a}\right)^2 = \frac{b^2}{4a^2}$ to both sides:
The expression under the square root, $\Delta = b^2 - 4ac$, is called the discriminant. It determines the nature of the roots before you solve:
| Discriminant | Meaning | Roots |
|---|---|---|
| $b^2 - 4ac > 0$ | Positive under the radical | Two distinct real roots |
| $b^2 - 4ac = 0$ | Zero under the radical | One repeated real root |
| $b^2 - 4ac < 0$ | Negative under the radical | Two complex conjugate roots |
The midpoint $M$ is equidistant from both endpoints. On a number line, the midpoint of $a$ and $b$ is their average: $(a+b)/2$. Coordinates are independent, so:
Verification: The distance from $(x_1, y_1)$ to $M$ equals the distance from $M$ to $(x_2, y_2)$:
Consider two lines through the origin with slopes $m_1$ and $m_2$. Pick a point on each at $x = 1$: $A = (1, m_1)$ and $B = (1, m_2)$.
The lines are perpendicular when triangle $OAB$ has a right angle at $O$. By the Pythagorean Theorem:
Compute: $|OA|^2 = 1 + m_1^2$, $|OB|^2 = 1 + m_2^2$, $|AB|^2 = (m_1 - m_2)^2$.
The $m_1^2$ and $m_2^2$ terms cancel:
Place the parallelogram with vertices at $A = (0, 0)$, $B = (a, 0)$, $C = (a + b, c)$, $D = (b, c)$.
The midpoints are identical — the diagonals bisect each other.
This proof is three lines of algebra. With traditional Euclidean geometry (congruent triangles, alternate interior angles), it takes a full page. This is why Descartes' invention of the coordinate plane was revolutionary.
Place $F = (0, p)$ and directrix at $y = -p$.
Equidistance condition: $\sqrt{x^2 + (y-p)^2} = |y + p|$. Square both sides:
Cancel and simplify:
Setting $a = 1/(4p)$ gives $y = ax^2$. So $a$ determines the focal distance: $p = 1/(4a)$. Wider parabola = focus farther from vertex.
Foci at $(\pm c, 0)$. The defining condition $\sqrt{(x+c)^2 + y^2} + \sqrt{(x-c)^2 + y^2} = 2a$, after isolating and squaring twice, simplifies to:
Define $b^2 = a^2 - c^2$ (positive since $a > c$):
Same technique as the ellipse, but with subtraction and $c > a$:
Define $b^2 = c^2 - a^2$ (positive since $c > a$):
Ellipse: $b^2 = a^2 - c^2$ (plus sign, sum of distances). Hyperbola: $b^2 = c^2 - a^2$ (minus sign, difference of distances). Both are slices of a cone — hence "conic sections."
If the graph is symmetric about the y-axis, then for every point $(x, y)$ on the graph, $(-x, y)$ is also on the graph. This means $f(x) = y = f(-x)$.
If $f(-x) = f(x)$, then for any point $(x, f(x))$ on the graph, the point $(-x, f(-x)) = (-x, f(x))$ is also on the graph — the mirror image across the y-axis.
For $f(x) = x^n$ where $n$ is even: $f(-x) = (-x)^n = (-1)^n x^n = x^n = f(x)$. ✓
For odd $n$: $f(-x) = (-1)^n x^n = -x^n = -f(x)$ — odd function (origin symmetry).
$E(-x) = \frac{f(-x) + f(x)}{2} = E(x)$ ✓ (even)
$O(-x) = \frac{f(-x) - f(x)}{2} = -\frac{f(x) - f(-x)}{2} = -O(x)$ ✓ (odd)
$E(x) + O(x) = \frac{f(x)+f(-x)}{2} + \frac{f(x)-f(-x)}{2} = f(x)$ ✓
This is used in signal processing: any signal splits into symmetric and antisymmetric components. Example: $e^x = \cosh(x) + \sinh(x)$, where $\cosh$ is even and $\sinh$ is odd.
Let $x = \log_b a$, so $b^x = a$. Take $\ln$ of both sides:
Since $x = \log_b a$: $\quad \log_b a = \frac{\ln a}{\ln b}$. ✓
Invest $\$1$ at 100% annual interest. If compounded $n$ times per year, the amount after one year is:
| $n$ | Compounding | $\left(1 + \frac{1}{n}\right)^n$ |
|---|---|---|
| $1$ | Annually | $2.000000$ |
| $12$ | Monthly | $2.613035$ |
| $365$ | Daily | $2.714567$ |
| $10{,}000$ | ~Every 5 min | $2.718146$ |
| $\to \infty$ | Continuously | $e = 2.71828\ldots$ |
As compounding becomes continuous, the result converges to $e$. This is the natural growth constant.
$e$ can also be defined as the sum of an infinite series:
This converges rapidly — just 10 terms give 7 correct decimal places.
$e^x$ is the only function that is its own derivative: $\frac{d}{dx}e^x = e^x$. This makes $e$ the natural base for calculus, differential equations, and the Kalman filter state-transition matrix $e^{At}$.
Suppose $e$ is rational: $e = p/q$ with $q \geq 2$ (since $2 < e < 3$, it's not an integer). Multiply both sides by $q!$:
$q! \cdot e = (q-1)! \cdot p$ (an integer). $A$ is an integer because each $q!/k!$ is an integer when $k \leq q$. Therefore $T = q! \cdot e - A$ must also be an integer.
So $T$ is positive and less than $1$. No integer exists between 0 and 1. Contradiction.
Hermite later proved (1873) that $e$ is transcendental — not the root of any polynomial with integer coefficients. This is even stronger than irrationality.
The three elementary row operations are:
(i) Swap two rows. (ii) Multiply a row by a nonzero scalar. (iii) Add a multiple of one row to another.
Each operation corresponds to a reversible algebraic manipulation of the system of equations:
(i) Reordering equations doesn't change their solutions.
(ii) Multiplying both sides of an equation by $c \neq 0$ doesn't change its solution set — divide by $c$ to reverse.
(iii) If $E_1$ and $E_2$ are both true, then $E_2 + kE_1$ is also true, and $E_2 = (E_2 + kE_1) - kE_1$ recovers the original. The solution set is unchanged.
Since every step is reversible, the transformed system has exactly the same solution set as the original. ✓
The staircase structure of REF means the last nonzero equation has only one variable (solve directly), the second-to-last has two (substitute the first, solve), and so on up the staircase. This is back-substitution, and it works precisely because each row introduces exactly one new variable relative to the row below it.
The algorithm is constructive: (1) Find the leftmost nonzero column. (2) Use row swaps to place a nonzero entry (pivot) at the top. (3) Use row operations to eliminate all entries below the pivot. (4) Repeat on the submatrix below and to the right of the pivot.
Each iteration reduces the submatrix size, so the process terminates in at most $\min(m, n)$ steps for an $m \times n$ matrix. The result is in REF.
Different sequences of row operations can produce different REF matrices for the same system. The solution set is always the same, but the specific pivot values and non-pivot entries may differ. This is why RREF (below) was invented — it is unique.
Starting from REF, apply two more operations:
Scale: Divide each row by its pivot to make every pivot equal to $1$.
Eliminate upward: Use back-substitution row operations to zero out all entries above each pivot, not just below.
The result is a matrix where each pivot column has exactly one $1$ and the rest zeros — the solution is read off directly without back-substitution.
Unlike REF, the RREF of a matrix is unique — regardless of which sequence of row operations you use, you always arrive at the same RREF. Here's why:
The pivot columns of the RREF correspond to the linearly independent columns of the original matrix — this is a property of the column space, not of the reduction path. The values in the non-pivot columns are determined by expressing those columns as linear combinations of the pivot columns — and these relationships are inherent to the matrix, not the reduction method.
Formally: if $R_1$ and $R_2$ are both in RREF and row-equivalent (same solution set), then $R_1 = R_2$. The proof proceeds column by column, showing that the pivot positions must be identical and the non-pivot entries are forced by the requirement that the same linear combinations hold in both forms.
One solution: Every column is a pivot column → the last column gives the unique solution.
No solution: A row of the form $[0 \; 0 \; \cdots \; 0 \;|\; c]$ with $c \neq 0$ → inconsistent system.
Infinitely many solutions: Non-pivot columns correspond to free variables — you can set them to any value, and the pivot variables are determined in terms of them. The solution set is a parametric family.
Row reduction and RREF are the foundation of linear algebra, which is the primary mathematical tool in Kalman filters. The state estimation equations $\hat{\mathbf{x}} = \mathbf{K}\mathbf{z}$ involve solving systems of linear equations at every time step. Understanding how and why these solutions are found (and when they're unique) is essential groundwork.