Table of Contents
- Complex Real Representation
- Complex Derivative
- CR Derivatives
- CR Multivariate Derivatives
- CR Gradient
- Lemma
- Summary
- References
Complex Real Representation
Consider a complex-valued function of a complex variable $f(z) : \setC \rightarrow \setC$ $$ f(z) \in \setC . $$ Complex variable $z$ can be represented in rectangular form as a linear combination of two real variables $x,y \in \setR$ $$ \boxed{z = x + i y} . $$ Likewise function $f$ can be represented in rectangular form as a linear combination of two real-valued functions $u, v : \setR^2 \rightarrow \setR$ of two real variables $x, y \in \setR$ $$ \label{eqn:fxy} \boxed{f(x, y) = u(x, y) + i v(x, y)} $$ so $f$ can be represented as a complex-valued function of two real variables $f: \setR^2 \rightarrow \setC$ $$ f(z) = f(x, y) \in \setC . $$
Complex Derivative
The complex derivative is defined analogous to the real derivative $$ \label{eqn:dfdz} \boxed{\od{f}{z} \equiv \lim_{\Delta z \rightarrow 0} \q{f(z + \Delta z) - f(z)}{\Delta z}} . $$ Complex differentiability imposes strong constraints on $f$. For the complex derivative to exist the definition must evaluate to the same number for any direction of $\Delta z$ in the complex plane. This is similar to the condition that for a real derivative to exist the left and right limits must be equal. Represent $\Delta z$ in rectangular form as $$ \Delta z = \Delta x + i \Delta y . $$ One way to evaluate the complex derivative is along the real axis $$ \label{eqn:complexdfdx} \od{f}{z} = \lim_{\Delta x \rightarrow 0} \q{f(x + \Delta x, y) - f(x, y)}{\Delta x} = \pd{f}{x} . $$ Another way to evaluate the complex derivative is along the imaginary axis $$ \label{eqn:complexdfdy} \od{f}{z} = \lim_{\Delta y \rightarrow 0} \q{f(x, y + \Delta y) - f(x, y)}{i \Delta y} = \q{1}{i} \pd{f}{y} . $$ For the complex derivative to exist, Eq. $\eqref{eqn:complexdfdx}$ and Eq. $\eqref{eqn:complexdfdy}$ must be equal $$ \pd{f}{x} = \q{1}{i} \pd{f}{y} . $$ Insert the rectangular form of $f$ into this equation $$ \bb{\pd{u}{x} + i \pd{v}{x}}= \q{1}{i} \bb{\pd{u}{y} + i \pd{v}{y}} $$ and isolate the real and imaginary parts to form a linear system of equations $$ \label{eqn:cr} \boxed{\begin{cases}\pd{u}{x} &= \pd{v}{y} \\ \pd{u}{y} &= -\pd{v}{x} \end{cases}} . $$ These are the Cauchy-Riemann equations. A function's real and imaginary parts must satisfy these equations to be complex-differentiable, or holomorphic.
Some important functions are not holomorphic, such as complex conjugation $$ f(x, y) = x - i y = \conj{z} $$ and complex modulus $$ f(x, y) = x^2 + y^2 = [x - i y] [x + i y] = \conj{z} z = \nn{z}^2. $$ These functions are used in the complex least squares cost function via the norm. Thus the complex derivative can't be applied to complex least squares optimization. In fact the only holomorphic real-valued function is a constant real function. However the rectangular form of a non-holomorphic complex function can still have well-defined real derivatives which can be exploited. This is the key idea of CR calculus.
CR Derivatives
Consider the complex differential $dz$ and its conjugate $d\conj{z}$ in rectangular form $$ \boxed{dz = dx + i dy \quad\quad\quad d\conj{z} = dx - i dy} . $$ Rewrite the complex differentials as the real part and imaginary part differentials $$ \label{eqn:dz} dx = \q{1}{2}[dz + d\conj{z}] \quad\quad\quad dy = \q{1}{2i}[dz - d\conj{z}] . $$ The total derivative of function $f$ in rectangular form is $$ df = \pd{f}{x} dx + \pd{f}{y} dy . $$ Use Eq. $\eqref{eqn:dz}$ to express the total derivative in terms of the complex differentials $$ df = \q{1}{2} \bb{\pd{f}{x} + \q{1}{i} \pd{f}{y}} dz + \q{1}{2} \bb{\pd{f}{x} - \q{1}{i} \pd{f}{y}} d\conj{z} . $$ Define the CR (Wirtinger) derivatives as $$ \label{eqn:wirt} \boxed{\pd{}{z} \equiv \q{1}{2} \bb{\pd{}{x} + \q{1}{i}\pd{}{y}}\quad\quad\quad\pd{}{\conj{z}} \equiv \q{1}{2} \bb{\pd{}{x} - \q{1}{i}\pd{}{y}}} $$ so the total derivative becomes $$ \boxed{df = \pd{f}{z} dz + \pd{f}{\conj{z}} d\conj{z}} . $$ These definitions suggest that $f(z)$ can be parameterized in terms of $f(z,\conj{z})$ $$ f(z) = f(x, y) = f(z, \conj{z}) \in \setC $$ which is further supported by $z$ and $\conj{z}$ behaving like independent variables with respect to the CR derivatives $$ \pd{z}{z} = \q{1}{2}\bb{\pd{}{x} + \q{1}{i}\pd{}{y}}[x + i y] = 1\quad\quad\quad\pd{\conj{z}}{z} = \q{1}{2}\bb{\pd{}{x} + \q{1}{i}\pd{}{y}}[x - i y] = 0 $$ $$ \pd{z}{\conj{z}} = \q{1}{2}\bb{\pd{}{x} - \q{1}{i}\pd{}{y}}[x + i y] = 0\quad\quad\quad\pd{\conj{z}}{\conj{z}} = \q{1}{2}\bb{\pd{}{x} - \q{1}{i}\pd{}{y}}[x - i y] = 1 . $$ In addition, a function $f(z, \conj{z})$ is holomorphic exactly when $\partial f / \partial \conj{z} = 0$ $$ \pd{f}{\conj{z}} = \q{1}{2}\bb{\pd{}{x} - \q{1}{i}\pd{}{y}}[u + i v] = \q{1}{2}\bb{\pd{u}{x} - \pd{v}{y}} + \q{i}{2} \bb{\pd{u}{y} + \pd{v}{x}} = 0 . $$ The last step follows from the Cauchy-Riemann equations, Eq. $\eqref{eqn:cr}$. Therefore a concise statement of the Cauchy-Riemann equations is $$ \boxed{\pd{f}{\conj{z}} = 0 \quad\Leftrightarrow\quad \text{holomorphic}} . $$ In the language of CR calculus it is now clear why the least squares cost function is not holomorphic; it depends on conjugated variables via the norm, so $\partial f / \partial \conj{z} \neq 0$.
The CR derivatives behave similar to real derivatives, in that they are linear operators with a product rule and chain rule. Linearity follows from the CR derivative definitions Eq. $\eqref{eqn:wirt}$ because they are linear combinations of linear operators, thus $$ \boxed{\pd{}{z}[\alpha f(z, \conj{z}) + \beta g(z, \conj{z})] = \alpha \pd{f}{z} + \beta \pd{g}{z}} . $$ The CR product rule follows the same pattern as the real product rule $$ \boxed{\pd{}{z}[f(z, \conj{z}) g(z, \conj{z})] = \pd{f}{z} g + f \pd{g}{z}} . $$
The CR chain rule is more complicated than the real chain rule, but the pattern follows the derivative of a function of two variables that depend on the same variable $$ \boxed{\pd{}{z} g(f(z, \conj{z}), \conj{f}(z, \conj{z})) = \pd{g}{f} \pd{f}{z} + \pd{g}{\conj{f}} \pd{\conj{f}}{z}} . $$
In addition, conjugation rules follow directly from the definitions Eq. $\eqref{eqn:wirt}$ $$ \boxed{ \conj{\bb{\pd{f}{z}}} = \pd{\conj{f}}{\conj{z}} \quad\quad\quad \conj{\bb{\pd{f}{\conj{z}}}} = \pd{\conj{f}}{z} } . $$
CR Multivariate Derivatives
Extending to multivariate functions is straight forward. Consider a complex-valued vector function $\vec{f} : \setC^n \rightarrow \setC^m$ of a complex-valued vector $\vec{z} \in \setC^n$ $$ \vec{f}(\vec{z}) = \vec{f}(\vec{z}, \conj{\vec{z}}) = \vec{f}(\vec{x}, \vec{y}) \in \setC^m . $$ The total derivative written in matrix notation is $$ \boxed{d \vec{f} = \pd{\vec{f}}{\vec{z}} d \vec{z} + \pd{\vec{f}}{\conj{\vec{z}}} d \conj{\vec{z}}} $$ where the differential vectors $$ \boxed{d\vec{z} \equiv \begin{bmatrix} dz_1 \\ \vdots \\ dz_n \\ \end{bmatrix} \quad\quad\quad d\conj{\vec{z}} \equiv \begin{bmatrix} d\conj{z}_1 \\ \vdots \\ d\conj{z}_n \\ \end{bmatrix}} $$ are transformed by the matrices $$ \boxed{\pd{\vec{f}}{\vec{z}} \equiv \begin{bmatrix} \pd{f_1}{z_1} &\dots &\pd{f_1}{z_n} \\ \vdots& \ddots & \vdots \\ \pd{f_m}{z_1} &\dots &\pd{f_m}{z_n} \\ \end{bmatrix} \quad\quad\quad \pd{\vec{f}}{\conj{\vec{z}}} \equiv \begin{bmatrix} \pd{f_1}{\conj{z}_1} &\dots &\pd{f_1}{\conj{z}_n} \\ \vdots& \ddots & \vdots \\ \pd{f_m}{\conj{z}_1} &\dots &\pd{f_m}{\conj{z}_n} \\ \end{bmatrix}} . $$ This is simply a system of equations describing the total derivative of each component of $\vec{f}$ with respect to complex differential vectors $d\vec{z}$ and $d\conj{\vec{z}}$. Conjugation generalizes in the expected way $$ \boxed{ \conj{\bb{\pd{\vec{f}}{\vec{z}}}} = \pd{\conj{\vec{f}}}{\conj{\vec{z}}} \quad\quad\quad \conj{\bb{\pd{\vec{f}}{\conj{\vec{z}}}}} = \pd{\conj{\vec{f}}}{\vec{z}} } . $$ The chain rule also generalizes in the expected way $$ \label{eqn:cr_chain} \boxed{ \pd{}{\vec{z}} \vec{g}(\vec{f}(\vec{z})) =\pd{\vec{g}}{\vec{f}} \pd{\vec{f}}{\vec{z}} + \pd{\vec{g}}{\conj{\vec{f}}} \pd{\conj{\vec{f}}}{\vec{z}} \quad\quad\quad \pd{}{\conj{\vec{z}}} \vec{g}(\vec{f}(\vec{z})) =\pd{\vec{g}}{\vec{f}} \pd{\vec{f}}{\conj{\vec{z}}} + \pd{\vec{g}}{\conj{\vec{f}}} \pd{\conj{\vec{f}}}{\conj{\vec{z}}} } . $$ There is no higher-dimensional analog of the product rule.
CR Gradient
Now consider the special case of the total derivative of a real-valued function of complex variables $f : \setC^n \rightarrow \setR$. Because $f$ is real, the total derivative can be written in a special form $$ \label{eqn:cr_df} df = \pd{f}{\vec{z}} d \vec{z} + \pd{f}{\conj{\vec{z}}} d \conj{\vec{z}} = \pd{f}{\vec{z}} d \vec{z} + \conj{\bb{\pd{f}{\vec{z}} d \vec{z}}} = 2 \Re\pp{\pd{f}{\vec{z}} d \vec{z}} . $$ As when defining the real gradient, write Eq. $\eqref{eqn:cr_df}$ using a (standard complex) inner product $$ \pd{f}{\vec{z}} d \vec{z} = \bigg[ \bb{\pd{f}{\vec{z}}}^H \bigg]^H d \vec{z} = \grad f^H d \vec{z} = \aa{\grad f, d \vec{z}} $$ such that $$ \label{eqn:cr_df2} \boxed{df = 2 \Re(\aa{\grad f, d \vec{z}})} $$ where the CR gradient is defined as $$ \boxed{\grad f(\vec{z}) \equiv \bb{\pd{f}{\vec{z}}}^H} . $$ The factor of $2$ from Eq. $\eqref{eqn:cr_df2}$ is not included in the definition of the CR gradient to make it compatible with the complex gradient. Annoyingly this makes the CR gradient $1/2$ the real gradient when applied to real-valued functions of real variables, so there isn't perfect symmetry between real and complex gradient theories. However both theories will agree on $df$.
To verify that the CR gradient shares qualities with the real gradient consider the Schwarz inequality applied to the CR gradient of function $f$ and unit vector $\un \in \setC^n$ $$\nn{\aa{\grad f, \un}} \leq \mm{\grad f} \mm{\un} = \mm{\grad f} .$$ The inequality is an equality either when $\grad f = \vec{0}$, or when the vectors are colinear such that $\grad f = \alpha \un$ for a scalar $\alpha \in \setC - \cc{0}$. To maximize $df$ in Eq. $\eqref{eqn:cr_df2}$ the real part of $\aa{\grad f, \un}$ must be maximized, which happens when $\un = \grad f / \norm{\grad f}$. In other words, the CR gradient points in the direction of maximal change like the real gradient, so the gradient terminology is warranted. It follows then that extrema for function $f$ are necessarily at critical points where the CR gradient is zero $$\grad f(\vec{z}_s) = \vec{0} .$$ These facts taken together solidify the interpretation of the CR gradient as a gradient.
Lemma
In addition to basic CR calculus, an identity for the CR derivative of the modulus of a holomorphic function $f : \setC^n \rightarrow \setC$ will be theoretically useful $$ \begin{equation} \label{eqn:cr_id} \pd{}{z} \nn{f(z; \conj{z})}^2 = \pd{}{z} [\conj{f} f] = \pd{\conj{f}}{z} f + \conj{f} \pd{f}{z} = \conj{\bb{\pd{f}{\conj{z}} \conj{f}}} + \conj{f} \pd{f}{z} = \conj{f} \pd{f}{z} \end{equation} $$ where the semicolon in $f(z; \conj{z})$ indicates the function is technically parameterized in terms of $z$ and $\conj{z}$ but is independent of $\conj{z}$.
Summary
[Reiterate main equations here]
References
-
Kreutz-Delgado, K. (2009)
"The Complex Gradient Operator and the CR-Calculus"
DOI:10.48550/arXiv.0906.4835
arXiv.org