Variational calculus


A function space, \mathfrak F, is a set topological vector space (a set) whose elements are functions with a common domain. We assume that functions in the space are all differentiable to any order as needed.

A functional is a function \mathcal F: \mathfrak F \to \mathbb R.

It should be noted that a functional eats a function and return a real value rather than eating a value of a function and returning a number. So, if \mathcal F is a functional and f(x)\in \mathfrak F a function, we write \mathcal F[f] which is a real number. It doesn’t matter what value of f(x) is at any x; a functional sees the function f:D \to C, i.e. the rule.

Remark: A functional is not a compositions of functions like h(x):=fog(x) as f acts of the value of g(x) not on the rule g.

Example: The integral F[f]=\int_x f(x)p(x)\mathrm d x is a functional; the same for the sum F[f]=\sum_i f(x_i)p(x_i). Note that f can be a constant function like f=c\in \mathbb R or the identity function f(x)=x. We can also write as follows to better denote that a functional eats a function: F[\cdot]=\int_x (\cdot)p(x)\mathrm d x.

Adding two functions in a function vector space can be interpreted two ways. Let f,g \in \mathfrak F and \varepsilon\in \mathbb R. Then, f + \varepsilon g can be interpreted as 1) perturbing f with \varepsilon g, or 2- h:=f + \varepsilon g is a function produced by moving along the direction of g in the function space and reaching the function h. The functions are the member of the vector space, hence f + \varepsilon g = f + \varepsilon g/\|g\|\|g\| = f + \varepsilon \|g\| \hat g such that the norm is defined based on an inner product in the function space. The term \hat g is considered as the unit vector of g and its direction can be relatively calculated with respect to \hat f = f/\|f\| when using the space’s inner product.

Variation of a function

Definition: For a functions f, \eta \in \mathfrak F, and \varepsilon \in \mathbb R, the term \delta f(x):=\varepsilon \eta (x) is called a variation of f for an arbitrary function \eta(x).

Let u:D\subset \mathbb R \to \mathbb R and F:\mathbb R \to \mathbb R be two (fixed) functions. Then, the variation of the composition F(u) is defined as,

    \[\delta F := F(u(x)+\delta u(x)) - F(u(x)) \forall x \in D\]

where \delta u = \varepsilon \eta is a function of \varepsilon. Note that \delta F is a function of x and \varepsilon on the domain of F.

Observing that \delta F is a function of \varepsilon for a fixed \eta, the variation of F can be linearized for small variations of u. Letting \varepsilon_0 =0 and \varepsilon close to zero, we can use the Taylor series and write,

    \[\delta F := F(u(x)+\varepsilon \eta(x)) - F(u(x)) = F(u) + \frac{\mathrm d F(u+\varepsilon \eta)}{\mathrm d \varepsilon}\bigg|_{\varepsilon =\varepsilon_0=0}\varepsilon + \mathcal O(\varepsilon^2) - F(u(x)) \approx  \frac{\mathrm d F(u+\varepsilon \eta)}{\mathrm d \varepsilon}\bigg|_{\varepsilon =\varepsilon_0=0}\varepsilon \]

letting y(\varepsilon):=u+\varepsilon \eta lead to,

    \[\delta F =\frac{\mathrm d F(y)}{\mathrm d y}\bigg|_{y(0)}\frac{\mathrm dy}{\mathrm d \varepsilon}\bigg|_{\varepsilon=0}\varepsilon=\frac{\mathrm d F(y)}{\mathrm d y}\bigg|_{y(0)}\eta\varepsilon=\frac{\mathrm d F(y)}{\mathrm d y}\bigg|_{u(x)}\delta u = \frac{\mathrm d F(u)}{\mathrm d u}\delta u\quad \forall x \in D \qquad (1)\]

which is called the linearized or the first variation of the function F(u) due to variation in its argument u.

Similarly if F:\mathbb R^n \to \mathbb R with F=F(u_1, u_2,\cdots, u_n) and u_i:\mathbb R\to \mathbb R, then,

    \[\delta F =\sum_{i=1}^n \frac{\partial F}{\partial u_i}\delta u_i \quad \forall x\in R\]

Lemma 1: If F,G:\mathbb R\to \mathbb R and u:D\subset \mathbb R \to \mathbb R, then \delta (FG)=F\delta G + G\delta F. Proof is as follows.

    \[\delta (FG)=\frac{\mathrm d (FG)}{\mathrm d u}\delta u =F\frac{\mathrm dG}{\mathrm d u}\delta u + G\frac{\mathrm dF}{\mathrm d u}\delta u = F\delta G + G\delta F\]

Lemma 2: If u:D\to \mathbb R, then \delta \frac{\mathrm d u}{\mathrm dx}=\frac{\mathrm d }{\mathrm dx} \delta u. Because \text {LHS}=\frac{\mathrm d }{\mathrm d \varepsilon}\frac{\mathrm d (u+\varepsilon \eta)}{\mathrm d x}\bigg|_{\varepsilon=0}\varepsilon= \frac{\mathrm d \eta(x)}{\mathrm d x}\varepsilon=\text{RHS}

Variation of a functional

Variation of a functional should show its instant variation when there is a infinitesimal change in its argument, i.e. a function.

Definition: For a functional \mathcal F, the term \delta \mathcal F := \mathcal F[f + \delta f] - \mathcal F[f] is called the variation of \matcal F due to/in the direction of the variation of f.

To evaluate \delta \mathcal F, we observed that \mathcal F[f+\varepsilon \eta] is a function of \varepsilon for a fixed f and \eta, i.e. \mathcal F_{f,\eta}(\varepsilon):=\mathcal F[f+\varepsilon \eta] for a fixed f and \eta is now a real valued function on \mathbb R.

It should be noted that \mathcal F[f(x)+\varepsilon \eta(x)] is a function of \varepsilon when f and \eta are fixed. Therefore, this rule, \mathcal F_{f,\eta}, is not a functional anymore; instead, it is a function. The cause of change in the argument of the function, f+\varepsilon \eta, is the change in \varepsilon. Hence, we can write the following limit (if it exists).

    \[\lim_{h \to 0} \frac{\mathcal F[f +(\varepsilon_0 + h) \eta]-\mathcal F[f +\varepsilon_0 \eta]}{h}=\frac{\mathrm d \mathcal F[f+\varepsilon \eta]}{\mathrm d \varepsilon}\bigg|_{\varepsilon=\varepsilon_0}\]

With that limit defined, the Taylor expansion of the function \mathcal F_{f,\eta}(\varepsilon) about zero is,

    \[\mathcal F_{f,\eta}(\varepsilon) = \mathcal F_{f,\eta}(\varepsilon)\bigg|_{\varepsilon = 0} + \frac{\mathrm d \mathcal F_{f,\eta}(\varepsilon)}{\mathrm d \varepsilon}\bigg|_{\varepsilon =0} \varepsilon +\matcal O(\varepsilon^2)\]

or with keeping in mind that f and \eta are fixed and not functions of \varepsilon, above can be written as,

    \[\matcal F[f + \varepsilon \eta] = \mathcal F[f + \varepsilon \eta]\bigg|_{\varepsilon = 0} + \frac{\mathrm d\mathcal F[f+\varepsilon \eta]}{\mathrm d \varepsilon}\bigg|_{\varepsilon =0} \varepsilon +\matcal O(\varepsilon^2)\]

With this regard, the variation of the functional for small \varepsilon becomes,

(1)   \[\delta \mathcal F= \frac{\mathrm d\mathcal F[f+\varepsilon \eta]}{\mathrm d \varepsilon}\bigg|_{\varepsilon =0} \varepsilon \]

which is called the first variation of the functional or the Gateaux derivative of the functional. The function \eta is referred to as a test function.

The formula for the variation of a functional cannot extended as partial derivatives as in Eq. 1 because d\mathcal F/du for u:=f+\varepsilon \eta is not defined in the function space (it’s nonsense). However, the variation of a functional in the form of an integral (or sum) operator leads to the variation of the function inside the operator. In other word, the variation operator gets inside the integral operator.

Example: let \mathcal F[y] =\int_I (\frac{\mathrm dy}{\mathrm dx})^2 -wy \mathrm dx, where I=[a,b], y:\mathbb R \to \mathbb R, w:\mathbb R \to \mathbb R, be a functional. Then, the variation of \mathcal F is as follows (\delta y = \varepsilon \eta).

    \[\begin{split} \delta \mathcal F[y] &=\delta \int_I (\frac{\mathrm dy}{\mathrm dx})^2 -wy \mathrm dx =\frac{\mathrm d}{\mathrm d\varepsilon} \int_I \left((\frac{\mathrm d(y+\delta y)}{\mathrm dx})^2 -w(y+\delta y) \mathrm dx\right)\bigg|_{\varepsilon=0}\varepsilon\\&=\int_I \frac{\mathrm d}{\mathrm d\varepsilon}\bigg|_{\varepsilon=0}(\frac{\mathrm d(y+\delta y)}{\mathrm dx})^2\varepsilon -w\frac{\mathrm d}{\mathrm d\varepsilon}\bigg|_{\varepsilon=0}(y+\delta y)\varepsilon \mathrm dx =\int_I \delta(\frac{\mathrm d y}{\mathrm dx})^2 -w\delta y \mathrm dx\\&\text{by Eq. 1 }= \int_I 2(\frac{\mathrm d y}{\mathrm dx})\delta(\frac{\mathrm d y}{\mathrm dx}) -w\delta y \mathrm dx\end{split}\]

As noted in the example the variation of the functional is now transferred to the variations of functions.

The basic problem of variational calculus

Let \mathcal F: \mathfrak F \to \mathbb R, i.e. a real-valued functional, then the basic problem of variational calculus is expressed as finding y^* \in \mathfrak F for which \mathcal F attains a minimum (or -\mathcal F attains a maximum), i.e.

    \[y^* :=\arg \min_{y\in \mathfrak F}(\mathcal F[y]) = \arg \max_{y\in \mathfrak F}(-\mathcal F[y]) \]

To solve this problem, a necessary condition can be established. Let y^* be a minimum of \mathcal F and v\in \mathfrak F be another function. Defining y^*+\varepsilon v, we can write

    \[\mathcal F[y^*+\varepsilon v] \ge \mathcal F[y^*] \ \forall v\in\mathfrak F\]

For all v, when fixed, the term \mathcal F[y^*+\varepsilon v] becomes a function of epsilon. This function has a minimum at \varepsilon = 0, because \mathcal F[y^*+0\cdot v]=F[y^*]\le \mathcal F[y^*+\varepsilon v]. This means

    \[\frac{\mathrm d \mathcal F[y^*+\varepsilon v]}{\mathrm d \varepsilon} \bigg|_{\varepsilon=0}=0 \iff \frac{\mathrm d \mathcal F[y^*+\varepsilon v]}{\mathrm d \varepsilon} \bigg|_{\varepsilon=0} \varepsilon =0 \ \forall \varepsilon \in \mathbb R\]


Because v is an arbitrary function and \varepsilon v=:\delta y^* is called a variation of y^* (by the definition), we can write,

Proposition: If a functional \mathcal F[y] attains a minimum at y^* \in \mathfrak F, then the variation of \mathcal F at y^* is zero for all variations of y^*. In notations,

    \[ y^* =\arg \min_{y\in \mathfrak F}(\mathcal F[y]) \implies \delta \matcal F[y^*]=0 \ \forall \delta y^*\]

Definition: If for a functional \mathcal F[y], the y_s such that \delta \mathcal F[y_s]=0 is called an stationary point of the functional. \delta \mathcal F[y_s]=0 means,

    \[\delta \mathcal F[y_s]=\frac{\mathrm d \mathcal F[y_s+\delta y_s]}{\mathrm d \varepsilon} \bigg|_{\varepsilon=0} \varepsilon =0 \ \forall \varepsilon \in \mathbb R \text{ and }\delta y_s\in\mathfrak F \]

The proposition says that if y^* minimizes a functional, then it is an stationary point as well.