设 X = ( x i j ) m × n X = (x_{ij})_{m \times n} X=(xij)m×n,函数 f ( X ) = f ( x 11 , x 12 , … , x 1 n , x 21 , … , x m n ) f(X) = f(x_{11}, x_{12}, \ldots, x_{1n}, x_{21}, \ldots, x_{mn}) f(X)=f(x11,x12,…,x1n,x21,…,xmn) 是一个 m × n m \times n m×n 元的多元函数,且偏导数
∂ f ∂ x i j ( i = 1 , 2 , … , m , j = 1 , 2 , … , n ) \frac{\partial f}{\partial x_{ij}} \quad (i=1,2,\ldots,m,\ j=1,2,\ldots,n) ∂xij∂f(i=1,2,…,m, j=1,2,…,n)
都存在。定义 f ( X ) f(X) f(X) 对矩阵 X X X 的导数为:
d f ( X ) d X = ( ∂ f ∂ x i j ) m × n = [ ∂ f ∂ x 11 ⋯ ∂ f ∂ x 1 n ⋮ ⋱ ⋮ ∂ f ∂ x m 1 ⋯ ∂ f ∂ x m n ] \frac{df(X)}{dX} = \left( \frac{\partial f}{\partial x_{ij}} \right)_{m \times n} =\begin{bmatrix} \frac{\partial f}{\partial x_{11}} & \cdots & \frac{\partial f}{\partial x_{1n}} \\ \vdots & \ddots & \vdots \\ \frac{\partial f}{\partial x_{m1}} & \cdots & \frac{\partial f}{\partial x_{mn}} \end{bmatrix} dXdf(X)=(∂xij∂f)m×n= ∂x11∂f⋮∂xm1∂f⋯⋱⋯∂x1n∂f⋮∂xmn∂f
(1) 设 x = ( ξ 1 , ξ 2 , ⋯ , ξ n ) ⊤ \mathbf{x} = (\xi_1, \xi_2, \cdots, \xi_n)^\top x=(ξ1,ξ2,⋯,ξn)⊤, n n n 元函数 f ( x ) f(\mathbf{x}) f(x),求 d f d x ⊤ \frac{df}{d\mathbf{x}^\top} dx⊤df、 d f d x \frac{df}{d\mathbf{x}} dxdf 和 d 2 f d x 2 \frac{d^2f}{d\mathbf{x}^2} dx2d2f。
d f d x ⊤ = ( ∂ f ∂ ξ 1 , ∂ f ∂ ξ 2 , ⋯ , ∂ f ∂ ξ n ) \frac{df}{d\mathbf{x}^\top} = \begin{pmatrix} \frac{\partial f}{\partial \xi_1}, \frac{\partial f}{\partial \xi_2},\cdots, \frac{\partial f}{\partial \xi_n} \end{pmatrix} dx⊤df=(∂ξ1∂f,∂ξ2∂f,⋯,∂ξn∂f)
∇ f ( x ) = d f d x = ( ∂ f ∂ ξ 1 ∂ f ∂ ξ 2 ⋮ ∂ f ∂ ξ n ) ,这就是梯度。 \nabla f(\mathbf{x}) = \frac{df}{d\mathbf{x}} = \begin{pmatrix} \frac{\partial f}{\partial \xi_1} \\ \frac{\partial f}{\partial \xi_2} \\ \vdots \\ \frac{\partial f}{\partial \xi_n} \end{pmatrix} \text{,这就是梯度。} ∇f(x)=dxdf= ∂ξ1∂f∂ξ2∂f⋮∂ξn∂f ,这就是梯度。
H ( x ) = ∇ 2 f ( x ) = ∂ 2 f ∂ x ∂ x ⊤ = [ ∂ 2 f ∂ ξ 1 2 ∂ 2 f ∂ ξ 1 ∂ ξ 2 ⋯ ∂ 2 f ∂ ξ 1 ∂ ξ n ∂ 2 f ∂ ξ 2 ∂ ξ 1 ∂ 2 f ∂ ξ 2 2 ⋯ ∂ 2 f ∂ ξ 2 ∂ ξ n ⋮ ⋮ ⋱ ⋮ ∂ 2 f ∂ ξ n ∂ ξ 1 ∂ 2 f ∂ ξ n ∂ ξ 2 ⋯ ∂ 2 f ∂ ξ n 2 ] , 这就是Hessian 矩阵,它是对称的。 H(\mathbf{x}) = \nabla^2 f(\mathbf{x}) = \frac{\partial^2 f}{\partial \mathbf{x} \partial \mathbf{x}^\top} = \begin{bmatrix} \frac{\partial^2 f}{\partial \xi_1^2} & \frac{\partial^2 f}{\partial \xi_1 \partial \xi_2} & \cdots & \frac{\partial^2 f}{\partial \xi_1 \partial \xi_n} \\ \frac{\partial^2 f}{\partial \xi_2 \partial \xi_1} & \frac{\partial^2 f}{\partial \xi_2^2} & \cdots & \frac{\partial^2 f}{\partial \xi_2 \partial \xi_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial \xi_n \partial \xi_1} & \frac{\partial^2 f}{\partial \xi_n \partial \xi_2} & \cdots & \frac{\partial^2 f}{\partial \xi_n^2} \end{bmatrix}, \text{这就是Hessian 矩阵,它是对称的。} H(x)=∇2f(x)=∂x∂x⊤∂2f= ∂ξ12∂2f∂ξ2∂ξ1∂2f⋮∂ξn∂ξ1∂2f∂ξ1∂ξ2∂2f∂ξ22∂2f⋮∂ξn∂ξ2∂2f⋯⋯⋱⋯∂ξ1∂ξn∂2f∂ξ2∂ξn∂2f⋮∂ξn2∂2f ,这就是Hessian 矩阵,它是对称的。
(2) 设 a = ( a 1 , a 2 , ⋯ , a n ) ⊤ \mathbf{a} = \begin{pmatrix} a_1, a_2, \cdots, a_n \end{pmatrix}^\top a=(a1,a2,⋯,an)⊤ 为向量变量,且 f ( x ) = f ( x , a ) f(\mathbf{x}) = f(\mathbf{x}, \mathbf{a}) f(x)=f(x,a),求 ∂ f ∂ x \frac{\partial f}{\partial \mathbf{x}} ∂x∂f。
解:由于 f ( x ) = ∑ i = 1 n a i ξ j f(\mathbf{x}) = \sum_{i=1}^{n} a_i \xi_j f(x)=∑i=1naiξj, ∂ f ∂ ξ j = a j \frac{\partial f}{\partial \xi_j} = a_j ∂ξj∂f=aj, ( j = 1 , 2 , ⋯ , n ) (j = 1,2,\cdots, n) (j=1,2,⋯,n),所以
∂ f ∂ x = ( ∂ f ∂ ξ 1 ∂ f ∂ ξ 2 ⋮ ∂ f ∂ ξ n ) = ( a 1 a 2 ⋮ a n ) = a \frac{\partial f}{\partial \mathbf{x}} = \begin{pmatrix} \frac{\partial f}{\partial \xi_1} \\ \frac{\partial f}{\partial \xi_2} \\ \vdots \\ \frac{\partial f}{\partial \xi_n} \end{pmatrix} = \begin{pmatrix} a_1 \\ a_2 \\ \vdots \\ a_n \end{pmatrix} = \mathbf{a} ∂x∂f= ∂ξ1∂f∂ξ2∂f⋮∂ξn∂f = a1a2⋮an =a
(3) 设 A = ( a i j ) m × n A = \left(a_{ij}\right)_{m \times n} A=(aij)m×n 为常矩阵, X = ( x i j ) n × m X = \left( x_{ij} \right)_{n \times m} X=(xij)n×m 为矩阵变量,且 f ( X ) = tr ( A X ) f(\mathbf{X}) = \operatorname{tr}(\mathbf{A X}) f(X)=tr(AX),求 ∂ f ∂ X \frac{\partial f}{\partial X} ∂X∂f。
分析:
(
c
11
⋯
c
1
m
⋮
⋱
⋮
c
m
1
⋯
c
m
m
)
=
(
a
11
⋯
a
1
n
⋮
⋱
⋮
a
m
1
⋯
a
m
n
)
(
x
11
⋯
x
1
n
⋮
⋱
⋮
x
n
1
⋯
x
n
m
)
\begin{pmatrix} c_{11} & \cdots & c_{1m} \\ \vdots & \ddots & \vdots \\ c_{m1} & \cdots & c_{mm} \end{pmatrix}=\begin{pmatrix} a_{11} & \cdots & a_{1n} \\ \vdots & \ddots & \vdots \\ a_{m1} & \cdots & a_{mn} \end{pmatrix}\begin{pmatrix} x_{11} & \cdots & x_{1n} \\ \vdots & \ddots & \vdots \\ x_{n1} & \cdots & x_{nm} \end{pmatrix}
c11⋮cm1⋯⋱⋯c1m⋮cmm
=
a11⋮am1⋯⋱⋯a1n⋮amn
x11⋮xn1⋯⋱⋯x1n⋮xnm
展开后得:
c
11
=
a
11
x
11
+
a
12
x
21
+
⋯
+
a
1
n
x
n
1
,
c
22
=
a
21
x
12
+
a
22
x
22
+
⋯
+
a
2
n
x
n
2
,
⋮
c
m
n
=
a
m
1
x
1
m
+
a
m
2
x
2
m
+
⋯
+
a
m
n
x
n
m
\begin{equation} \begin{aligned} c_{11} &= a_{11}x_{11} + a_{12}x_{21} + \cdots + a_{1n}x_{n1}, \\ c_{22} &= a_{21}x_{12} + a_{22}x_{22} + \cdots + a_{2n}x_{n2}, \\ &\qquad \mathllap{\vdots} \\ c_{mn} &= a_{m1}x_{1m} + a_{m2}x_{2m} + \cdots + a_{mn}x_{nm} \end{aligned} \end{equation}
c11c22cmn=a11x11+a12x21+⋯+a1nxn1,=a21x12+a22x22+⋯+a2nxn2,⋮=am1x1m+am2x2m+⋯+amnxnm
规律:每个 x x x 只会被用到一次, x x x 的下标和 a a a 的下标是相反的。
解:由于 A X = ( ∑ i = 1 n a i k x k i ) m × m AX = \left(\sum_{i=1}^n a_{ik}x_{ki}\right)_{m \times m} AX=(∑i=1naikxki)m×m,
所以: f ( X ) = tr ( A X ) = ∑ s = 1 m ∑ k = 1 n a s k x k s f(\mathbf{X}) = \operatorname{tr}(\mathbf{AX}) = \sum_{s=1}^{m} \sum_{k=1}^n a_{sk} x_{ks} f(X)=tr(AX)=∑s=1m∑k=1naskxks。
而:
(
∂
f
∂
x
i
j
)
n
×
m
=
(
a
j
i
)
n
×
m
(
i
=
1
,
2
,
⋯
,
n
,
j
=
1
,
2
,
⋯
,
m
)
\left( \frac{\partial f}{\partial x_{ij}} \right)_{n \times m} = (a_{ji})_{n \times m} \quad (i=1,2,\cdots,n, j = 1,2,\cdots,m)
(∂xij∂f)n×m=(aji)n×m(i=1,2,⋯,n,j=1,2,⋯,m)
故:
∂
f
∂
X
=
(
∂
f
∂
x
i
j
)
=
(
a
j
i
)
n
×
m
=
A
⊤
\frac{\partial f}{\partial X} = \left( \frac{\partial f}{\partial x_{ij}} \right) = (a_{ji})_{n \times m} = A^\top
∂X∂f=(∂xij∂f)=(aji)n×m=A⊤
(4) 设 x = ( ξ 1 , ξ 2 , ⋯ , ξ n ) ⊤ \mathbf{x} = \left( \xi_1, \xi_2, \cdots, \xi_n \right)^\top x=(ξ1,ξ2,⋯,ξn)⊤,矩阵 A = ( a i j ) n × n A = \left(a_{ij}\right)_{n \times n} A=(aij)n×n, n n n 元函数 f ( x ) = x ⊤ A x f(\mathbf{x}) = \mathbf{x}^\top A \mathbf{x} f(x)=x⊤Ax,求导数 d f d x \dfrac{d f}{d \mathbf{x}} dxdf。
解:因
f
(
x
)
=
x
⊤
A
x
=
(
ξ
1
,
ξ
2
,
⋯
,
ξ
n
)
(
a
11
a
12
⋯
a
1
n
a
21
a
22
⋯
a
2
n
⋮
⋮
⋱
⋮
a
n
1
a
n
2
⋯
a
n
n
)
(
ξ
1
ξ
2
ξ
3
⋮
ξ
n
)
=
(
ξ
1
ξ
2
⋯
ξ
k
⋯
ξ
n
)
(
∑
i
=
1
n
a
1
i
ξ
i
∑
i
=
1
n
a
2
i
ξ
i
⋮
∑
i
=
1
n
a
k
i
ξ
i
⋮
∑
i
=
1
n
a
n
i
ξ
i
)
=
ξ
1
∑
j
=
1
n
a
1
j
ξ
j
+
⋯
+
ξ
k
∑
j
=
1
n
a
k
j
ξ
j
+
⋯
+
ξ
n
∑
j
=
1
n
a
n
j
ξ
j
\begin{align*} f\left( \mathbf{x} \right) &= \mathbf{x}^\top A \mathbf{x} \\ &= \left( \xi_1, \xi_2, \cdots, \xi_n \right) \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{pmatrix} \begin{pmatrix} \xi_1 \\ \xi_2 \\ \xi_3 \\ \vdots \\ \xi_n \end{pmatrix} \\ &= \left( \begin{array}{cccccc} \xi_1 & \xi_2 & \cdots & \xi_k & \cdots & \xi_n \end{array} \right) \left( \begin{array}{c} \displaystyle \sum_{i=1}^n a_{1i} \xi_i \\ \displaystyle \sum_{i=1}^n a_{2i} \xi_i \\ \vdots \\ \displaystyle \sum_{i=1}^n a_{ki} \xi_i \\ \vdots \\ \displaystyle \sum_{i=1}^n a_{ni} \xi_i \end{array} \right) \\ &= \xi_1\sum_{j=1}^{n}a_{1j}\xi_j + \cdots + \xi_k\sum_{j=1}^{n}a_{kj}\xi_j + \cdots + \xi_n\sum_{j=1}^{n}a_{nj}\xi_j \end{align*}
f(x)=x⊤Ax=(ξ1,ξ2,⋯,ξn)
a11a21⋮an1a12a22⋮an2⋯⋯⋱⋯a1na2n⋮ann
ξ1ξ2ξ3⋮ξn
=(ξ1ξ2⋯ξk⋯ξn)
i=1∑na1iξii=1∑na2iξi⋮i=1∑nakiξi⋮i=1∑naniξi
=ξ1j=1∑na1jξj+⋯+ξkj=1∑nakjξj+⋯+ξnj=1∑nanjξj
所以:
∂
f
(
x
)
∂
ξ
k
=
ξ
1
a
1
k
+
⋯
+
ξ
k
−
1
a
k
−
1
,
k
+
(
∑
j
=
1
n
a
k
j
ξ
j
+
ξ
k
a
k
k
)
+
ξ
k
+
1
a
k
+
1
,
k
+
⋯
+
ξ
n
a
n
k
=
∑
i
=
1
n
a
i
k
ξ
i
+
∑
j
=
1
n
a
k
j
ξ
j
,
k
=
1
,
2
,
⋯
,
n
\begin{align*} \frac{\partial f(\mathbf{x})}{\partial \xi_k} &= \xi_1 a_{1k} + \cdots + \xi_{k-1} a_{k-1,k} + \left( \sum_{j=1}^{n} a_{kj} \xi_j + \xi_k a_{kk}\right) + \xi_{k+1} a_{k+1,k} + \cdots + \xi_n a_{nk} \\ &= \sum_{i=1}^n a_{ik} \xi_i + \sum_{j=1}^n a_{kj} \xi_j, \quad k=1,2,\cdots,n \end{align*}
∂ξk∂f(x)=ξ1a1k+⋯+ξk−1ak−1,k+(j=1∑nakjξj+ξkakk)+ξk+1ak+1,k+⋯+ξnank=i=1∑naikξi+j=1∑nakjξj,k=1,2,⋯,n
所以:
d
f
d
x
=
(
∂
f
∂
ξ
1
∂
f
∂
ξ
2
⋮
∂
f
∂
ξ
n
)
=
(
∑
j
=
1
n
a
1
j
ξ
j
∑
j
=
1
n
a
2
j
ξ
j
⋮
∑
j
=
1
n
a
n
j
ξ
j
)
+
(
∑
i
=
1
n
a
i
1
ξ
i
∑
i
=
1
n
a
i
2
ξ
i
⋮
∑
i
=
1
n
a
i
n
ξ
i
)
=
A
x
+
A
⊤
x
=
(
A
+
A
⊤
)
x
\begin{align*} \dfrac{d f}{d \mathbf{x}} &=\begin{pmatrix} \frac{\partial f}{\partial \xi_1} \\ \frac{\partial f}{\partial \xi_2} \\ \vdots \\ \frac{\partial f}{\partial \xi_n} \end{pmatrix} =\left( \begin{array}{c} \displaystyle \sum_{j=1}^n a_{1j} \xi_j \\ \displaystyle \sum_{j=1}^n a_{2j} \xi_j \\ \vdots \\ \displaystyle \sum_{j=1}^n a_{nj} \xi_j \end{array} \right) + \left( \begin{array}{c} \displaystyle \sum_{i=1}^n a_{i1} \xi_i \\ \displaystyle \sum_{i=1}^n a_{i2} \xi_i \\ \vdots \\ \displaystyle \sum_{i=1}^n a_{in} \xi_i \end{array} \right) \\ &=Ax + A^\top x = (A + A^\top)x \end{align*}
dxdf=
∂ξ1∂f∂ξ2∂f⋮∂ξn∂f
=
j=1∑na1jξjj=1∑na2jξj⋮j=1∑nanjξj
+
i=1∑nai1ξii=1∑nai2ξi⋮i=1∑nainξi
=Ax+A⊤x=(A+A⊤)x
特别地,当A为对称矩阵时,
d
f
d
x
=
2
A
x
\dfrac{d f}{d \mathbf{x}} = 2Ax
dxdf=2Ax