Orthogonality

Orthogonal subspaces

Definition 1: two subspaces \(S\) and \(T\) of an inner product space \(V\) are orthogonal if

\[ \langle \mathbf{u}, \mathbf{v} \rangle = 0, \]

for all \(\mathbf{u} \in S\) and \(\mathbf{v} \in T\). Orthogonality of \(S\) and \(T\) may be denoted by \(S \perp T\).

The notion of orthogonality is only valid in vector spaces with a defined inner product.

Definition 2: let \(S\) be a subspace of an inner product space \(V\). The set of all vectors in \(V\) that are orthogonal to every vector in \(S\) will be denoted by \(S^\perp\). Which implies

\[ S^\perp = \{\mathbf{v} \in V \;|\; \langle \mathbf{v}, \mathbf{u} \rangle = 0 \; \forall \mathbf{u} \in S \}. \]

The set \(S^\perp\) is called the orthogonal complement of \(S\).

For example the subspaces \(X = \mathrm{span}(\mathbf{e}_1)\) and \(Y = \mathrm{span}(\mathbf{e}_2)\) of \(\mathbb{R}^3\) are orthogonal, but they are not orthogonal complements. Indeed,

\[ X^\perp = \mathrm{span}(\mathbf{e}_2, \mathbf{e}_3) \quad \text{and} \quad Y^\perp = \mathrm{span}(\mathbf{e}_1, \mathbf{e}_3). \]

We may observe that if \(S\) and \(T\) are orthogonal subspaces of an inner product space \(V\), then \(S \cap T = \{\mathbf{0}\}\). Since for \(\mathbf{v} \in S \cap T\) and \(S \perp T\) then \(\langle \mathbf{v}, \mathbf{v} \rangle = 0\) and hence \(\mathbf{v} = \mathbf{0}\).

Additionally, we may also observe that if \(S\) is a subspace of an inner product space \(V\), then \(S^\perp\) is also a subspace of \(V\). Since for \(\mathbf{u} \in S^\perp\) and \(a \in \mathbb{K}\) then

\[ \langle a \mathbf{u}, \mathbf{v} \rangle = a \cdot 0 = 0 \]

for all \(\mathbf{v} \in S\), therefore \(a \mathbf{u} \in S^\perp\).

If \(\mathbf{u}_1, \mathbf{u}_2 \in S^\perp\) then

\[ \langle \mathbf{u}_1 + \mathbf{u}_2, \mathbf{v} \rangle = \langle \mathbf{u}_1, \mathbf{v} \rangle + \langle \mathbf{u}_2, \mathbf{v} \rangle = 0 + 0 = 0, \]

for all \(\mathbf{v} \in S\), and hence \(\mathbf{u}_1 + \mathbf{u}_2 \in S^\perp\). Therefore \(S^\perp\) is a subspace of \(V\).

Fundamental subspaces

Let \(V\) be an Euclidean inner product space \(V = \mathbb{R}^n\) with its inner product defined by the scalar product. With this definition of the inner product on \(V\) the following theorem may be posed.

Theorem 1: let \(A\) be an \(m \times n\) matrix, then

\[ N(A) = R(A^T)^\perp, \]

and

\[ N(A^T) = R(A)^\perp, \]

for all \(A \in \mathbb{R}^{m \times n}\) with \(R(A)\) denoting the column space of \(A\) and \(R(A^T)\) denoting the row space of \(A\).

Proof:

Let \(A \in \mathbb{R}^{m \times n}\) with \(R(A) = \mathrm{span}(\mathbf{\vec{a}}_i^T)\) for \(i \in \mathbb{N}[i \leq n]\) denoting the column space of \(A\) and \(R(A^T) = \mathrm{span}(\mathbf{a}_i)\) for \(i \in \mathbb{N}[i \leq m]\) denoting the row space of \(A\).

For the first equation, let \(\mathbf{v} \in R(A^T)^\perp\) then \(\mathbf{v}^T \mathbf{\vec{a}}_i^T = \mathbf{0}\) which obtains

\[ \mathbf{0} = \mathbf{v}^T \mathbf{\vec{a}}_i^T = \big(\mathbf{v}^T \mathbf{\vec{a}}_i^T \big)^T = \mathbf{\vec{a}}_i \mathbf{v}, \]

so \(A \mathbf{v} = \mathbf{0}\) and hence \(\mathbf{v} \in N(A)\). Which implies that \(R(A^T)^\perp \subseteq N(A)\). Similarly, let \(\mathbf{w} \in N(A)\) then \(A \mathbf{w} = \mathbf{0}\) which obtains

\[ \mathbf{0} = \mathbf{\vec{a}}_i \mathbf{v} = \big(\mathbf{v}^T \mathbf{\vec{a}}_i^T \big)^T = \mathbf{v}^T \mathbf{\vec{a}}_i^T, \]

and hence \(\mathbf{w} \in R(A^T)^\perp\) which implies that \(N(A) \subseteq R(A^T)^\perp\). Therefore \(N(A) = R(A^T)^\perp\).

For the second equation, let \(\mathbf{v} \in R(A)^\perp\) then \(\mathbf{v}^T \mathbf{a}_i = \mathbf{0}\) which obtains

\[ \mathbf{0} = \mathbf{v}^T \mathbf{a}_i = \big(\mathbf{v}^T \mathbf{a}_i \big)^T = \mathbf{a}_i^T \mathbf{v}, \]

so \(A^T \mathbf{v} = \mathbf{0}\) and hence \(\mathbf{v} \in N(A^T)\). Which implies that \(R(A)^\perp \subseteq N(A^T)\). Similarly, let \(\mathbf{w} \in N(A^T)\) then \(A^T \mathbf{w} = \mathbf{0}\) which obtains

\[ \mathbf{0} = \mathbf{a}_i^T \mathbf{w} = \big(\mathbf{a}_i^T \mathbf{w} \big)^T = \mathbf{w}^T \mathbf{a}_i, \]

and hence \(\mathbf{w} \in R(A)^\perp\) which implies that \(N(A^T) \subseteq R(A)^\perp\). Therefore \(N(A^T) = R(A)^\perp\).

Known as the fundamental theorem of linear algebra. Which can be used to prove the following theorem.

Theorem 2: if \(S\) is a subspace of the inner product space \(V = \mathbb{R}^n\), then

\[ \dim S + \dim S^\perp = n. \]

Furthermore, if \(\{\mathbf{v}_i\}_{i=1}^r\) is a basis of \(S\) and \(\{\mathbf{v}_i\}_{i=r+1}^n\) is a basis of \(S^\perp\) then \(\{\mathbf{v}_i\}_{i=1}^n\) is a basis of \(V\).

Proof:

If \(S = \{\mathbf{0}\}\), then \(S^\perp = V\) and

\[ \dim S + \dim S^\perp = 0 + n = n. \]

If \(S \neq \{\mathbf{0}\}\), then let \(\{\mathbf{x}_i\}_{i=1}^r\) be a basis of \(S\) and define \(X \in \mathbb{R}^{r \times m}\) whose \(i\)th row is \(\mathbf{x}_i^T\) for each \(i\). Matrix \(X\) has rank \(r\) and \(R(X^T) = S\). Then by theorem 2

\[ S^\perp = R(X^T)^\perp = N(X), \]

from the rank nullity theorem it follows that

\[ \dim S^\perp = \dim N(X) = n - r. \]

and therefore

\[ \dim S + \dim S^\perp = r + n - r = n. \]

Let \(\{\mathbf{v}_i\}_{i=1}^r\) be a basis of \(S\) and \(\{\mathbf{v}_i\}_{i=r+1}^n\) be a basis of \(S^\perp\). Suppose that

\[ c_1 \mathbf{v}_1 + \dots + c_r \mathbf{v}_r + c_{r+1} \mathbf{v}_{r+1} + \dots + c_n \mathbf{v}_n = \mathbf{0}. \]

Let \(\mathbf{u} = c_1 \mathbf{v}_1 + \dots + c_r \mathbf{v}_r\) and let \(\mathbf{w} = c_{r+1} \mathbf{v}_{r+1} + \dots + c_n \mathbf{v}_n\). Then we have

\[ \mathbf{u} + \mathbf{w} = \mathbf{0}, \]

implies \(\mathbf{u} = - \mathbf{w}\) and thus both elements must be in \(S \cap S^\perp\). However, \(S \cap S^\perp = \{\mathbf{0}\}\), therefore

\[ \begin{align*} c_1 \mathbf{v}_1 + \dots + c_r \mathbf{v}_r &= \mathbf{0}, \\ c_{r+1} \mathbf{v}_{r+1} + \dots + c_n \mathbf{v}_n &= \mathbf{0}, \end{align*} \]

since \(\{\mathbf{v}_i\}_{i=1}^r\) and \(\{\mathbf{v}_i\}_{i=r+1}^n\) are linearly independent, we must also have that \(\{\mathbf{v}_i\}_{i=1}^n\) are linearly independent and therefore form a basis of \(V\).

We may further extend this with the notion of a direct sum.

Definition 3: if \(U\) and \(V\) are subspaces of a vector space \(W\) and each \(\mathbf{w} \in W\) can be written uniquely as

\[ \mathbf{w} = \mathbf{u} + \mathbf{v}, \]

with \(\mathbf{u} \in U\) and \(\mathbf{v} \in V\) then \(W\) is a direct sum of U and \(V\) denoted by \(W = U \oplus V\).

In the following theorem it will be posed that the direct sum of a subspace and its orthogonal complement make up the whole vector space, which extends the notion of theorem 2.

Theorem 3: if \(S\) is a subspace of the inner product space \(V = \mathbb{R}^n\), then

\[ V = S \oplus S^\perp. \]

Proof:

Will be added later.

The following results emerge from these posed theorems.

Proposition 1: let \(S\) be a subspace of \(V\), then \((S^\perp)^\perp = S\).

Proof:

Will be added later.

Recall that the system \(A \mathbf{x} = \mathbf{b}\) is consistent if and only if \(\mathbf{b} \in R(A)\) since \(R(A) = N(A^T)^\perp\) we have the following result.

Proposition 2: let \(A \in \mathbb{R}^{m \times n}\) and \(\mathbf{b} \in \mathbb{R}^m\), then either there is a vector \(\mathbf{x} \in \mathbb{R}^n\) such that

\[ A \mathbf{x} = \mathbf{b}, \]

or there is a vector \(\mathbf{y} \in \mathbb{R}^m\) such that

\[ A^T \mathbf{y} = \mathbf{0} \;\land\; \mathbf{y}^T \mathbf{b} \neq 0 . \]

Proof:

Will be added later.

Orthonormal sets

In working with an inner product space \(V\), it is generally desirable to have a basis of mutually orthogonal unit vectors.

Definition 4: the set of vectors \(\{\mathbf{v}_i\}_{i=1}^n\) in an inner product space \(V\) is orthogonal if

\[ \langle \mathbf{v}_i, \mathbf{v}_j \rangle = 0, \]

whenever \(i \neq j\). Then \(\{\mathbf{v}_i\}_{i=1}^n\) is said to be an orthogonal set of vectors.

For example the trivial set \(\mathbf{e}_1, \mathbf{e}_2, \mathbf{e}_3\) is an orthogonal set in \(\mathbb{R}^3\).

Theorem 4: if \(\{\mathbf{v}_i\}_{i=1}^n\) is an orthogonal set of nonzero vectors in an inner product space \(V\), then \(\{\mathbf{v}_i\}_{i=1}^n\) are linearly independent.

Proof:

Suppose that \(\{\mathbf{v}_i\}_{i=1}^n\) is an orthogonal set of nonzero vectors in an inner product space \(V\) and

\[ c_1 \mathbf{v}_1 + \dots + c_n \mathbf{v}_n = \mathbf{0}, \]

then

\[ c_1 \langle \mathbf{v}_j, \mathbf{v}_1 \rangle + \dots + c_n \langle \mathbf{v}_j, \mathbf{v}_n \rangle = 0, \]

for \(j \in \mathbb{N}[j \leq n]\) obtains \(c_j \|\mathbf{v}_j\| = 0\) and hence \(c_j = 0\) for all \(j \in \mathbb{N}[j \leq n]\).

We may even go further and define a set of vectors that are orthogonal and have a length of \(1\), a unit vector by definition.

Definition 5: an orthonormal set of vectors is an orthogonal set of unit vectors.

For example the set \(\{\mathbf{u}_i\}_{i=1}^n\) will be orthonormal if and only if

\[ \langle \mathbf{u}_i, \mathbf{u}_j \rangle = \delta_{ij}, \]

where

\[ \delta_{ij} = \begin{cases} 1 &\text{ for } i = j, \\ 0 &\text{ for } i \neq j.\end{cases} \]

Theorem 5: let \(\{\mathbf{u}_i\}_{i=1}^n\) be an orthonormal basis of an inner product space \(V\). If

\[ \mathbf{v} = \sum_{i=1}^n c_i \mathbf{u}_i, \]

then \(c_i = \langle \mathbf{v}, \mathbf{u}_i \rangle\) for all \(i \in \mathbb{N}[i \leq n]\).

Proof:

Let \(\{\mathbf{u}_i\}_{i=1}^n\) be an orthonormal basis of an inner product space \(V\) and let

\[ \mathbf{v} = \sum_{i=1}^n c_i \mathbf{u}_i, \]

we have

\[ \langle \mathbf{v}, \mathbf{u}_i \rangle = \Big\langle \sum_{j=1}^n c_j \mathbf{u}_j, \mathbf{u}_i \Big\rangle = \sum_{j=1}^n c_j \langle \mathbf{u}_j, \mathbf{u}_i \rangle = \sum_{j=1}^n c_j \delta_{ij} = c_i. \]

Implying that it is much easier to calculate the coordinates of a given vector with respect to an orthonormal basis.

Corollary 1: let \(\{\mathbf{u}_i\}_{i=1}^n\) be an orthonormal basis of an inner product space \(V\). If

\[ \mathbf{v} = \sum_{i=1}^n a_i \mathbf{u}_i, \]

and

\[ \mathbf{w} = \sum_{i=1}^n b_i \mathbf{u}_i, \]

then \(\langle \mathbf{v}, \mathbf{w} \rangle = \sum_{i=1}^n a_i b_i\).

Proof:

Let \(\{\mathbf{u}_i\}_{i=1}^n\) be an orthonormal basis of an inner product space \(V\) and let

\[ \mathbf{v} = \sum_{i=1}^n a_i \mathbf{u}_i, \]

and

\[ \mathbf{w} = \sum_{i=1}^n b_i \mathbf{u}_i, \]

by theorem 5 we have

\[ \langle \mathbf{v}, \mathbf{w} \rangle = \Big\langle \sum_{i=1}^n a_i \mathbf{u}_i, \mathbf{w} \Big\rangle = \sum_{i=1}^n a_i \langle \mathbf{w}, \mathbf{u}_i \rangle = \sum_{i=1}^n a_i b_i. \]

Corollary 2: let \(\{\mathbf{u}_i\}_{i=1}^n\) be an orthonormal basis of an inner product space \(V\) and

\[ \mathbf{v} = \sum_{i=1}^n c_i \mathbf{u}_i, \]

then

\[ \|\mathbf{v}\|^2 = \sum_{i=1}^n c_i^2. \]

Proof:

Let \(\{\mathbf{u}_i\}_{i=1}^n\) be an orthonormal basis of an inner product space \(V\) and let

\[ \mathbf{v} = \sum_{i=1}^n c_i \mathbf{u}_i, \]

then by corollary 1 we have

\[ \|\mathbf{v}\|^2 = \langle \mathbf{v}, \mathbf{v} \rangle = \sum_{i=1}^n c_i \mathbf{u}_i. \]

Orthogonal matrices

Definition 6: an \(n \times n\) matrix \(Q\) is an orthogonal matrix if

\[ Q^T Q = I. \]

Orthogonal matrices have column vectors that form an orthonormal set in \(V\), as may be posed in the following theorem.

Theorem 6: let \(Q = (\mathbf{q}_1, \dots, \mathbf{q}_n)\) be an orthogonal matrix, then \(\{\mathbf{q}_i\}_{i=1}^n\) is an orthonormal set.

Proof:

Let \(Q = (\mathbf{q}_1, \dots, \mathbf{q}_n)\) be an orthogonal matrix. Then

\[ Q^T Q = I, \]

and hence \(\mathbf{q}_i^T \mathbf{q}_j = \delta_{ij}\) such that for an inner product space with a scalar product we have

\[ \langle \mathbf{q}_i, \mathbf{q}_j \rangle = 0, \]

for \(i \neq j\).

It follows then that if \(Q\) is an orthogonal matrix, then \(Q\) is nonsingular and \(Q^{-1} = Q^T\).

In general scalar products are preserved under multiplication by an orthogonal matrix since

\[ \langle Q \mathbf{u}, Q \mathbf{v} \rangle = (Q \mathbf{v})^T Q \mathbf{u} = \mathbf{v}^T Q^T Q \mathbf{u} = \langle \mathbf{u}, \mathbf{v} \rangle. \]

In particular, if \(\mathbf{u} = \mathbf{v}\) then \(\|Q \mathbf{u}\|^2 = \|\mathbf{u}\|^2\) and hence \(\|Q \mathbf{u}\| = \|\mathbf{u}\|\). Multiplication by an orthogonal matrix preserves the lengths of vectors.

Orthogonalization process

Let \(\{\mathbf{a}_i\}_{i=1}^n\) be a basis of an inner product space \(V\). We may use the modified method of Gram-Schmidt to determine the orthonormal basis \(\{\mathbf{q}_i\}_{i=1}^n\) of \(V\).

Let \(\mathbf{q}_1 = \frac{1}{\|\mathbf{a}_1\|} \mathbf{a}_1\) be the first step.

Then we may induce the following step for \(i \in \mathrm{range}(2,n)\):

\[ \begin{align*} \mathbf{w} &= \mathbf{a}_i - \langle \mathbf{a}_i, \mathbf{q}_1 \rangle \mathbf{q}_1 - \dots - \langle \mathbf{a}_i, \mathbf{q}_{i-1} \rangle \mathbf{q}_{i-1}, \\ \mathbf{q}_i &= \frac{1}{\|\mathbf{w}\|} \mathbf{w}. \end{align*} \]

Proof:

Will be added later.

Least squares solutions of overdetermined systems

A standard technique in mathematical and statistical modeling is to find a least squares fit to a set of data points. This implies that the sum of squares fo errors between the model and the data points are minimized. A least squares problem can generally be formulated as an overdetermined linear system of equations.

For a system of equations \(A \mathbf{x} = \mathbf{b}\) with \(A \in \mathbb{R}^{m \times n}\) with \(m, n \in \mathbb{N}[m>n]\) and \(\mathbf{b} \in \mathbb{R}^m\) then for each \(\mathbf{x} \in \mathbb{R}^n\) a residual \(\mathbf{r}: \mathbb{R}^n \to \mathbb{R}^m\) can be formed

\[ \mathbf{r}(\mathbf{x}) = \mathbf{b} - A \mathbf{x}. \]

The distance between \(\mathbf{b}\) and \(A \mathbf{x}\) is given by

\[ \| \mathbf{b} - A \mathbf{x} \| = \|\mathbf{r}(\mathbf{x})\|, \]

We wish to find a vector \(\mathbf{x} \in \mathbb{R}^n\) for which \(\|\mathbf{r}(\mathbf{x})\|\) will be a minimum. A solution \(\mathbf{\hat x}\) that minimizes \(\|\mathbf{r}(\mathbf{x})\|\) is a least squares solution of the system \(A \mathbf{x} = \mathbf{b}\). Do note that minimizing \(\|\mathbf{r}(\mathbf{x})\|\) is equivalent to minimizing \(\|\mathbf{r}(\mathbf{x})\|^2\).

Theorem 7: let \(S\) be a subspace of \(\mathbb{R}^m\). For each \(b \in \mathbb{R}^m\), there exists a unique \(\mathbf{p} \in S\) that suffices

\[ \|\mathbf{b} - \mathbf{s}\| > \|\mathbf{b} - \mathbf{p}\|, \]

for all \(\mathbf{s} \in S\backslash\{\mathbf{p}\}\) and \(\mathbf{b} - \mathbf{p} \in S^\perp\).

Proof:

Will be added later.

If \(\mathbf{p} = A \mathbf{\hat x}\) in \(R(A)\) that is closest to \(\mathbf{b}\) then it follows that

\[ \mathbf{b} - \mathbf{p} = \mathbf{b} - A \mathbf{x} = \mathbf{r}(\mathbf{\hat x}), \]

must be an element of \(R(A)^\perp\). Thus, \(\mathbf{\hat x}\) is a solution to the least squares problem if and only if

\[ \mathbf{r}(\mathbf{\hat x}) \in R(A)^\perp = N(A^T). \]

Thus to solve for \(\mathbf{\hat x}\) we have the normal equations given by

\[ A^T A \mathbf{x} = A^T \mathbf{b}. \]

Uniqueness of \(\mathbf{\hat x}\) can be obtained if \(A^T A\) is nonsingular which will be posed in the following theorem.

Theorem 8: let \(A \in \mathbb{R}^{m \times n}\) be an \(m \times n\) matrix with rank \(n\), then \(A^T A\) is nonsingular.

Proof:

Let \(A \in \mathbb{R}^{m \times n}\) be an \(m \times n\) matrix with rank \(n\). Let \(\mathbf{v}\) be a solution of

\[ A^T A \mathbf{x} = \mathbf{0}, \]

then \(A \mathbf{v} \in N(A^T)\), but we also have that \(A \mathbf{v} \in R(A) = N(A^T)^\perp\). Since \(N(A^T) \cap N(A^T)^\perp = \{\mathbf{0}\}\) it follows that

\[ A\mathbf{v} = \mathbf{0}, \]

so \(\mathbf{v} = \mathbf{0}\) by the nonsingularity of \(A\).

It follows that

\[ \mathbf{\hat x} = (A^T A)^{-1} A^T \mathbf{b}, \]

is the unique solution of the normal equations for \(A\) nonsingular and consequently, the unique least squares solution of the system \(A \mathbf{x} = \mathbf{b}\).