Differentation

Generalization of derivatives to higher dimensions:

limit of difference quotient: partial derivatives,
linearization: total derivative.

Partial derivatives

Definition: let \(D \subseteq \mathbb{R}^n\) (\(n=2\) for simplicity) and let \(f: D \to \mathbb{R}\) and \(\mathbf{a} \in D\), if the limit exists the partial derivates of \(f\) are

\[ \begin{align*} &\partial_1 f(\mathbf{a}) := \lim_{h \to 0} \frac{f(a_1 + h, a_2) - f(\mathbf{a})}{h}, \\ &\partial_2 f(\mathbf{a}) := \lim_{h \to 0} \frac{f(a_1, a_2 + h) - f(\mathbf{a})}{h}. \end{align*} \]

Theorem: suppose that two mixed \(n\)th order partial derivatives of a function \(f\) involve the same differentations but in different orders. If those partials are continuous at a point \(\mathbf{a}\) and if \(f\) and all partials of \(f\) of order less than \(n\) are continuous in a neighbourhood of \(\mathbf{a}\), then the two mixed partials are equal at the point \(\mathbf{a}\). We have for \(n=2\)

\[ \partial_{12} f(P) = \partial_{21} f(P), \]

Proof:

Will be added later.

Total derivatives

Definition: let \(D \subseteq \mathbb{R}^n\) (\(n=2\) for simplicity) and let \(f: D \to \mathbb{R}\), determining an affine linear approximation of \(f\) around \(\mathbf{a} \in D\)

\[ p(\mathbf{x}) = f(\mathbf{a}) + \big\langle L,\; \mathbf{x} - \mathbf{a} \big\rangle, \]

with \(f(\mathbf{x}) = p(\mathbf{x}) + r(\mathbf{x})\) demand \(\frac{r(\mathbf{x})}{\|\mathbf{x} - \mathbf{a}\|} \to 0\) when \(\mathbf{x} \to \mathbf{a}\).

if \(L \in \mathbb{R}^2\) exists to satisfy this, then \(f\) is called totally differentiable in \(\mathbf{a}\).

Theorem: if \(f\) is totally differentiable in \(\mathbf{a}\), then \(f\) is partially differentiable in \(\mathbf{a}\) and the partial derivatives are

\[ \partial_1 f(\mathbf{x}) = L_1, \qquad \partial_2 f(\mathbf{x}) = L_2, \]

obtaining

\[ p(\mathbf{x}) = f(\mathbf{a}) + \big\langle \nabla f(\mathbf{a}),\; \mathbf{x} - \mathbf{a} \big\rangle. \]

with \(\nabla f(\mathbf{a})\) the gradient of \(f\).

Proof:

Will be added later.

Chain rule

Definition: let \(D \subseteq \mathbb{R}^n\) (\(n=2\) for simplicity) and let \(f: D \to \mathbb{R}\), also let \(g: \mathbb{R} \to \mathbb{R}\) given by

\[ g(t) = f\big(\mathbf{x}(t)\big), \]

if \(f\) is continuously differentiable, then \(g\) is differentiable with

\[ g'(t) = \big\langle \nabla f\big(\mathbf{x}(t)\big),\; \mathbf{\dot x}(t) \big\rangle. \]

Gradients

Definition: at any point \(\mathbf{x} \in D\) where the first partial derivatives of \(f\) exist, we define the gradient vector \(\nabla\) by

\[ \nabla f(\mathbf{x}) = \begin{pmatrix} \partial_1 f(\mathbf{x}) \\ \partial_2 f(\mathbf{x}) \end{pmatrix}. \]

The direction of the gradient is the direction of steepest increase of \(f\) at \(\mathbf{x}\).

Theorem: gradients are orthogonal to level lines and level surfaces.

Proof:

let \(\mathbf{r}(t) = \big(x(t),\; y(t) \big)^T\) be a parameterization of the level curve of \(f\) such that \(\mathbf{r}(0) = \mathbf{a}\). Then for all \(t\) near \(0\), \(f(\mathbf{r}(t)) = f(\mathbf{a})\). Differentiating this equation with respect to \(t\) using the chain rule, we obtain

\[ \partial_1 f(\mathbf{x}) \dot x(t) + \partial_2 f(\mathbf{x}) \dot y(t) = 0, \]

at \(t=0\), we can rewrite this to

\[ \big\langle \nabla f(\mathbf{a}),\; \mathbf{\dot r}(0) \big\rangle = 0, \]

obtaining that \(\nabla f\) is orthogonal to \(\mathbf{\dot r}\).

Directional derivatives

Definition: let \(D \subseteq \mathbb{R}^n\) and let \(f: D \to \mathbb{R}\) with \(\mathbf{v} \in D\) and \(\|\mathbf{v}\| = 1\) a unit vector. The directional derivative is then the change of \(f\) near a point \(\mathbf{a} \in D\) in the direction of \(\mathbf{v}\)

\[ D_\mathbf{v} f(\mathbf{a}) = \big\langle \mathbf{v},\; \nabla f(\mathbf{a}) \big\rangle. \]

The general case

Definition: let \(D \subseteq \mathbb{R}^n\) and let \(\mathbf{f}: D \to \mathbb{R}^m\), with \(f_i: D \to \mathbb{R}\), with \(i = 1, \dotsc, m\) being the components of \(\mathbf{f}\).

\(\mathbf{f}\) is continuous at \(\mathbf{a} \in D\) \(\iff\) all \(f_i\) continuous at \(\mathbf{a}\),
\(\mathbf{f}\) is partially/totally differentiable at \(\mathbf{a}\) \(\iff\) all \(f_i\) are partially/totally differentiable at \(\mathbf{a}\).

The linearization of every component \(f_i\) we have

\[ f_i(\mathbf{x}) = f_i(\mathbf{a}) + \big\langle \nabla f_i(\mathbf{a}),\; \mathbf{x} - \mathbf{a} \big\rangle + r_i(\mathbf{x}), \]

so in total we have

\[ \mathbf{f}(\mathbf{x}) = \mathbf{f}(\mathbf{a}) + D\mathbf{f}(\mathbf{a}) \big(\mathbf{x} - \mathbf{a}\big) + \mathbf{r}(\mathbf{x}), \]

with \(D\mathbf{f}(\mathbf{a})\) the Jacobian of \(\mathbf{f}\).

Definition: the Jacobian is given by \(\big[D\mathbf{f}(\mathbf{a}) \big]_{i,\;j} = \partial_j f_i(\mathbf{a}).\)

Chain rule

Let \(D \subseteq \mathbb{R}^n\) and let \(E \subseteq \mathbb{R}^m\) be sets and let \(\mathbf{f}: D \to \mathbb{R}^m\) and let \(\mathbf{g}: E \to \mathbb{R}^k\) with \(\mathbf{f}\) differentiable at \(\mathbf{x}\) and \(\mathbf{g}\) differentiable at \(\mathbf{f}(\mathbf{x})\). Then \(D\mathbf{f}(\mathbf{x}) \in \mathbb{R}^{m \times n}\) and \(D\mathbf{g}\big(\mathbf{f}(\mathbf{x})\big) \in \mathbb{R}^{k \times m}\).

Then if we differentiate \(\mathbf{g} \circ \mathbf{f}\) we obtain

\[ D(\mathbf{g} \circ \mathbf{f})(\mathbf{x}) = D\mathbf{g}\big(\mathbf{f}(\mathbf{x})\big) D\mathbf{f}(\mathbf{x}). \]

We have two interpretations:

the composition of linear maps,
the matrix multiplication of the Jacobian.

Proof:

Will be added later.