晋城做网站的,设计网页的详细步骤,甘肃省建设厅网站官网,wordpress不同背景图片目录 前言矩阵求导1. 矩阵的迹1.1 定义1.2 迹的性质 2. 矩阵微分的几种情况2.1 向量变元的实值标量函数2.2 矩阵变元的实值标量函数2.3 矩阵变元的实矩阵函数 3. 矩阵微分3.1 矩阵微分的意义3.2 矩阵微分示范 参考 前言
这篇笔记的内容是基于参考的文章写出的#xff0c;公式… 目录 前言矩阵求导1. 矩阵的迹1.1 定义1.2 迹的性质 2. 矩阵微分的几种情况2.1 向量变元的实值标量函数2.2 矩阵变元的实值标量函数2.3 矩阵变元的实矩阵函数 3. 矩阵微分3.1 矩阵微分的意义3.2 矩阵微分示范 参考 前言
这篇笔记的内容是基于参考的文章写出的公式部分可以会沿用文章本来的式但会加入我自己的一些思考以及注释如果读者认为我写的不够好得话可以参考原文章~
本文介绍向量变元的实值标量函数、矩阵变元的实值标量函数中进阶的矩阵求导的技巧矩阵的迹与一阶实矩阵微分 。本笔记的推导过程会使用到矩阵变元的实值矩阵函数但矩阵变元的实矩阵函数的求导本笔记不会涉及看懂本博文需要了解前几篇博文所提及的知识以及了解线性代数中矩阵乘法、向量内积的知识。
下面有一个求矩阵导数的网站可以用来验证求导结果是否正确Matrix Calculus
矩阵求导
1. 矩阵的迹
1.1 定义 n × n n\times n n×n 的方阵 A n × n A_{n\times n} An×n 的主对角线元素之和就叫矩阵 A A A 的迹(trace), 记作 tr ( A ) \operatorname{tr}(\boldsymbol{A}) tr(A) 即 A A A 的迹为 tr ( A ) a 11 a 22 ⋯ a n n ∑ i 1 n a i i \operatorname{tr}(\boldsymbol{A})a_{11}a_{22}\cdotsa_{nn}\sum_{i1}^na_{ii} tr(A)a11a22⋯anni1∑naii注意根据矩阵迹的定义可以知道只有方阵才有迹
1.2 迹的性质
以下不加证明地给出几条矩阵的性质虽然是以两个矩阵给出的但是可以推广到多个矩阵时也同样适用
1标量的迹
对于一个标量 x x x由于标量可以看成 1 × 1 1 \times 1 1×1 的矩阵因此标量的迹就是自身即 t r ( x ) x \mathrm{tr}(x)x tr(x)x
2线性法则
矩阵的迹遵循线性可加原则tr内的加法可以提到tr的外面即 t r ( c 1 A c 2 B ) c 1 t r ( A ) c 2 t r ( B ) (3) \mathbb{tr}(c_1\pmb{A}c_2\pmb{B}) c_1\mathbb{tr}(\pmb{A})c_2\mathbb{tr}(\pmb{B}) \\\\ \tag{3} tr(c1Ac2B)c1tr(A)c2tr(B)(3)其中 c 1 c_1 c1 和 c 2 c_2 c2 是标量
3转置法则
矩阵的转置不会改变矩阵的痕即 t r ( A ) t r ( A T ) \mathbb{tr}(\pmb{A})\mathbb{tr}(\pmb{A}^T) tr(A)tr(AT)
4交换法则
对于两个维数都是 m × n m\times n m×n 的矩阵 A m × n A_{m\times n} Am×n 和 B m × n B_{m\times n} Bm×n其中一个矩阵乘以左乘右乘都可以另一个矩阵的转置的迹的记过是两个矩阵对应位置的元素相乘的加和因此我们将矩阵的迹可以理解为向量的点积在矩阵上的推广即 t r ( A B T ) a 11 b 11 a 12 b 12 ⋯ a 1 n b 1 n a 21 b 21 a 22 b 22 ⋯ a 2 n b 2 n ⋯ a m 1 b m 1 a m 2 b m 2 ⋯ a m n b m n (6) \begin{aligned} \mathbb{tr}(\pmb{A}\pmb{B}^T) a_{11}b_{11}a_{12}b_{12}\cdotsa_{1n}b_{1n}\\ a_{21}b_{21}a_{22}b_{22}\cdotsa_{2n}b_{2n}\\ \cdots \\ a_{m1}b_{m1}a_{m2}b_{m2}\cdotsa_{mn}b_{mn} \end{aligned} \\\\ \tag{6} tr(ABT)a11b11a12b12⋯a1nb1na21b21a22b22⋯a2nb2n⋯am1bm1am2bm2⋯amnbmn(6)这是因为 t r ( A B T ) t r ( [ a 11 a 12 ⋯ a 1 n a 21 a 22 ⋯ a 2 n ⋮ ⋮ ⋮ ⋮ a m 1 a m 2 ⋯ a m n ] [ b 11 b 21 ⋯ b m 1 b 12 b 22 ⋯ b m 2 ⋮ ⋮ ⋮ ⋮ b 1 n b 2 n ⋯ b m n ] ) t r [ a 11 b 11 a 12 b 12 ⋯ a 1 n b 1 n ∗ ⋯ ∗ ∗ a 21 b 21 a 22 b 22 ⋯ a 2 n b 2 n ⋯ ∗ ⋮ ⋮ ⋱ ⋮ ∗ ∗ ⋯ a m 1 b m 1 a m 2 b m 2 ⋯ a m n b m n ] m × m a 11 b 11 a 12 b 12 ⋯ a 1 n b 1 n a 21 b 21 a 22 b 22 ⋯ a 2 n b 2 n ⋯ a m 1 b m 1 a m 2 b m 2 ⋯ a m n b m n (7) \begin{aligned} \mathbb{tr}(\pmb{A}\pmb{B}^T) \mathbb{tr}( \begin{bmatrix} a_{11} a_{12} \cdots a_{1n} \\ a_{21} a_{22} \cdots a_{2n} \\ \vdots \vdots \vdots \vdots \\ a_{m1} a_{m2} \cdots a_{mn} \\ \end{bmatrix} \begin{bmatrix} b_{11} b_{21} \cdots b_{m1} \\ b_{12} b_{22} \cdots b_{m2} \\ \vdots \vdots \vdots \vdots \\ b_{1n} b_{2n} \cdots b_{mn} \\ \end{bmatrix} ) \\\\ \mathbb{tr} \begin{bmatrix} a_{11}b_{11}a_{12}b_{12}\cdotsa_{1n}b_{1n} * \cdots * \\ * a_{21}b_{21}a_{22}b_{22}\cdotsa_{2n}b_{2n} \cdots *\\ \vdots \vdots \ddots \vdots \\ * * \cdots a_{m1}b_{m1}a_{m2}b_{m2}\cdotsa_{mn}b_{mn} \\ \end{bmatrix}_{m \times m} \\\\ a_{11}b_{11}a_{12}b_{12}\cdotsa_{1n}b_{1n}\\ a_{21}b_{21}a_{22}b_{22}\cdotsa_{2n}b_{2n}\\ \cdots \\ a_{m1}b_{m1}a_{m2}b_{m2}\cdotsa_{mn}b_{mn} \end{aligned} \\\\ \tag{7} tr(ABT)tr( a11a21⋮am1a12a22⋮am2⋯⋯⋮⋯a1na2n⋮amn b11b12⋮b1nb21b22⋮b2n⋯⋯⋮⋯bm1bm2⋮bmn )tr a11b11a12b12⋯a1nb1n∗⋮∗∗a21b21a22b22⋯a2nb2n⋮∗⋯⋯⋱⋯∗∗⋮am1bm1am2bm2⋯amnbmn m×ma11b11a12b12⋯a1nb1na21b21a22b22⋯a2nb2n⋯am1bm1am2bm2⋯amnbmn(7)于是由上述的过程以及矩阵乘积 A B AB AB 和 B A BA BA 的计算过程我们可以知道矩阵的迹是遵守交换律的即 t r ( A B ) t r ( B A ) \mathbb{tr}(\pmb{A}\pmb{B}) \mathbb{tr}(\pmb{B}\pmb{A}) tr(AB)tr(BA)其中 A A A 是 m × n m \times n m×n 维的而 B B B 是 n × m n \times m n×m 维的
2. 矩阵微分的几种情况
2.1 向量变元的实值标量函数 f ( x ⃗ ) , x ⃗ [ x 1 , x 2 , ⋯ , x n ] T f(\vec{x}),\vec{x}[x_1,x_2,\cdots,x_n]^T f(x ),x [x1,x2,⋯,xn]T实际上 f ( x ⃗ ) f(\vec{x}) f(x ) 就是多元函数如果 f ( x ⃗ ) f(\vec{x}) f(x ) 可微则全微分为 d f ( x ⃗ ) ∂ f ∂ x 1 d x 1 ∂ f ∂ x 2 d x 2 ⋯ ∂ f ∂ x n d x n ( ∂ f ∂ x 1 , ∂ f ∂ x 2 , ⋯ , ∂ f ∂ x n ) [ d x 1 d x 2 ⋮ d x n ] 由标量的性质 t r ( ( ∂ f ∂ x 1 , ∂ f ∂ x 2 , ⋯ , ∂ f ∂ x n ) [ d x 1 d x 2 ⋮ d x n ] ) \begin{aligned}\mathrm{d}f(\vec{x}){\frac{\partial f}{\partial x_{1}}}\mathrm{d}x_{1}{\frac{\partial f}{\partial x_{2}}}\mathrm{d}x_{2}\cdots{\frac{\partial f}{\partial x_{n}}}\mathrm{d}x_{n} \\ (\frac{\partial f}{\partial x_1},\frac{\partial f}{\partial x_2},\cdots,\frac{\partial f}{\partial x_n})\begin{bmatrix}\mathrm{d}x_1\\\mathrm{d}x_2\\\vdots\\\mathrm{d}x_n\end{bmatrix}\\\overset{\text{由标量的性质}}{\!\!\!\!\!\!\!\!}\mathrm{tr}((\frac{\partial f}{\partial x_1},\frac{\partial f}{\partial x_2},\cdots,\frac{\partial f}{\partial x_n})\begin{bmatrix}\mathrm{d}x_1\\\mathrm{d}x_2\\\vdots\\\mathrm{d}x_n\end{bmatrix})\end{aligned} df(x )∂x1∂fdx1∂x2∂fdx2⋯∂xn∂fdxn(∂x1∂f,∂x2∂f,⋯,∂xn∂f) dx1dx2⋮dxn 由标量的性质tr((∂x1∂f,∂x2∂f,⋯,∂xn∂f) dx1dx2⋮dxn )
2.2 矩阵变元的实值标量函数 f ( X ) , X m × n ( x i j ) i 1 , j 1 m , n f(\pmb{X}),\pmb{X}_{m\times n}(x_{ij})_{i1,j1}^{m,n} f(X),Xm×n(xij)i1,j1m,n实际上这仍然是多元函数设其可微则全微分为 d f ( X ) ∂ f ∂ x 11 d x 11 ∂ f ∂ x 12 d x 12 ⋯ ∂ f ∂ x 1 n d x 1 n ∂ f ∂ x 21 d x 21 ∂ f ∂ x 22 d x 22 ⋯ ∂ f ∂ x 2 n d x 2 n ⋯ ∂ f ∂ x m 1 d x m 1 ∂ f ∂ x m 2 d x m 2 ⋯ ∂ f ∂ x m n d x m n (19) \begin{aligned} \mathbb{d}f(\pmb{X}) \frac{\partial f}{\partial x_{11}}\mathbb{d}x_{11}\frac{\partial f}{\partial x_{12}}\mathbb{d}x_{12} \cdots\frac{\partial f}{\partial x_{1n}}\mathbb{d}x_{1n}\\ \frac{\partial f}{\partial x_{21}}\mathbb{d}x_{21}\frac{\partial f}{\partial x_{22}}\mathbb{d}x_{22} \cdots\frac{\partial f}{\partial x_{2n}}\mathbb{d}x_{2n}\\ \cdots\\ \frac{\partial f}{\partial x_{m1}}\mathbb{d}x_{m1}\frac{\partial f}{\partial x_{m2}}\mathbb{d}x_{m2} \cdots\frac{\partial f}{\partial x_{mn}}\mathbb{d}x_{mn} \end{aligned} \\\ \tag{19} df(X)∂x11∂fdx11∂x12∂fdx12⋯∂x1n∂fdx1n∂x21∂fdx21∂x22∂fdx22⋯∂x2n∂fdx2n⋯∂xm1∂fdxm1∂xm2∂fdxm2⋯∂xmn∂fdxmn (19)
我们从上述的结果可以发现它其实就是矩阵 ( ∂ f ∂ x i j ) i 1 , j 1 m , n (\frac{\partial f}{\partial x_{ij}})_{i1,j1}^{m,n} (∂xij∂f)i1,j1m,n 与矩阵 ( d x i j ) i 1 , j 1 m , n (\mathrm{d}x_{ij})_{i1,j1}^{m,n} (dxij)i1,j1m,n 对应位置的元素相乘的加和于是利用矩阵的迹的性质我们可以将这个式写成迹的形式即 d f ( X ) ∂ f ∂ x 11 d x 11 ∂ f ∂ x 12 d x 12 ⋯ ∂ f ∂ x 1 n d x 1 n ∂ f ∂ x 21 d x 21 ∂ f ∂ x 22 d x 22 ⋯ ∂ f ∂ x 2 n d x 2 n ⋯ ∂ f ∂ x m 1 d x m 1 ∂ f ∂ x m 2 d x m 2 ⋯ ∂ f ∂ x m n d x m n t r ( [ ∂ f ∂ x 11 ∂ f ∂ x 21 ⋯ ∂ f ∂ x m 1 ∂ f ∂ x 12 ∂ f ∂ x 22 ⋯ ∂ f ∂ x m 2 ⋮ ⋮ ⋮ ⋮ ∂ f ∂ x 1 n ∂ f ∂ x 2 n ⋯ ∂ f ∂ x m n ] n × m [ d x 11 d x 12 ⋯ d x 1 n d x 21 d x 22 ⋯ d x 2 n ⋮ ⋮ ⋮ ⋮ d x m 1 d x m 2 ⋯ d x m n ] m × n ) (20) \begin{aligned} \mathbb{d}f(\pmb{X}) \frac{\partial f}{\partial x_{11}}\mathbb{d}x_{11}\frac{\partial f}{\partial x_{12}}\mathbb{d}x_{12} \cdots\frac{\partial f}{\partial x_{1n}}\mathbb{d}x_{1n}\\ \frac{\partial f}{\partial x_{21}}\mathbb{d}x_{21}\frac{\partial f}{\partial x_{22}}\mathbb{d}x_{22} \cdots\frac{\partial f}{\partial x_{2n}}\mathbb{d}x_{2n}\\ \cdots\\ \frac{\partial f}{\partial x_{m1}}\mathbb{d}x_{m1}\frac{\partial f}{\partial x_{m2}}\mathbb{d}x_{m2} \cdots\frac{\partial f}{\partial x_{mn}}\mathbb{d}x_{mn} \\\\ \mathbb{tr}( \begin{bmatrix} \frac{\partial f}{\partial x_{11}}\frac{\partial f}{\partial x_{21}}\cdots\frac{\partial f}{\partial x_{m1}} \\ \frac{\partial f}{\partial x_{12}}\frac{\partial f}{\partial x_{22}} \cdots \frac{\partial f}{\partial x_{m2}}\\ \vdots\vdots\vdots\vdots\\ \frac{\partial f} {\partial x_{1n}}\frac{\partial f}{\partial x_{2n}}\cdots\frac{\partial f}{\partial x_{mn}} \end{bmatrix}_{n\times m} \begin{bmatrix} \mathbb{d}x_{11} \mathbb{d}x_{12} \cdots \mathbb{d}x_{1n} \\ \mathbb{d}x_{21} \mathbb{d}x_{22} \cdots \mathbb{d}x_{2n} \\ \vdots\vdots\vdots\vdots\\ \mathbb{d}x_{m1} \mathbb{d}x_{m2} \cdots \mathbb{d}x_{mn} \end{bmatrix}_{m \times n} ) \end{aligned} \\\ \tag{20} df(X)∂x11∂fdx11∂x12∂fdx12⋯∂x1n∂fdx1n∂x21∂fdx21∂x22∂fdx22⋯∂x2n∂fdx2n⋯∂xm1∂fdxm1∂xm2∂fdxm2⋯∂xmn∂fdxmntr( ∂x11∂f∂x12∂f⋮∂x1n∂f∂x21∂f∂x22∂f⋮∂x2n∂f⋯⋯⋮⋯∂xm1∂f∂xm2∂f⋮∂xmn∂f n×m dx11dx21⋮dxm1dx12dx22⋮dxm2⋯⋯⋮⋯dx1ndx2n⋮dxmn m×n) (20)
2.3 矩阵变元的实矩阵函数
矩阵变元的实矩阵函数其每个元素其实就是一个矩阵变元的实值标量函数 f i j ( X ) f_{ij}(X) fij(X)
我们定义矩阵变元的实矩阵函数的微分如下设 f i j ( X ) f_{ij}(X) fij(X) 可微则矩阵变元的实矩阵函数的矩阵微分就是对每个位置的元素 的全微分且结果布局不变即 d F p × q ( X ) [ d f 11 ( X ) d f 12 ( X ) ⋯ d f 1 q ( X ) d f 21 ( X ) d f 22 ( X ) ⋯ d f 2 q ( X ) ⋮ ⋮ ⋮ ⋮ d f p 1 ( X ) d f p 2 ( X ) ⋯ d f p q ( X ) ] p × q (21) \begin{aligned} \mathbb{d}\pmb{F}_{p \times q}(\pmb{X}) \begin{bmatrix} \mathbb{d}f_{11}(\pmb{X}) \mathbb{d}f_{12}(\pmb{X}) \cdots \mathbb{d}f_{1q}(\pmb{X}) \\ \mathbb{d}f_{21}(\pmb{X}) \mathbb{d}f_{22}(\pmb{X}) \cdots \mathbb{d}f_{2q}(\pmb{X}) \\ \vdots\vdots\vdots\vdots \\ \mathbb{d}f_{p1}(\pmb{X}) \mathbb{d}f_{p2}(\pmb{X}) \cdots \mathbb{d}f_{pq}(\pmb{X}) \end{bmatrix}_{p \times q} \end{aligned} \\\ \tag{21} dFp×q(X) df11(X)df21(X)⋮dfp1(X)df12(X)df22(X)⋮dfp2(X)⋯⋯⋮⋯df1q(X)df2q(X)⋮dfpq(X) p×q (21)相应地类似于矩阵变元的实值标量函数和向量变元的实值标量函数对于矩阵变元的实矩阵函数的微分也有四条法则
常数矩阵的矩阵微分 d A m × n 0 m × n \mathrm{d}\boldsymbol{A}_{m\times n}\mathbf{0}_{m\times n} dAm×n0m×n
线性法则
相加再微分等于微分再相加即常数(标量)可以放到括号外 d ( c 1 F ( X ) c 2 G ( X ) ) c 1 d F ( X ) c 2 d G ( X ) \mathrm{d}(c_1\boldsymbol{F}(\boldsymbol{X})c_2\boldsymbol{G}(\boldsymbol{X}))c_1\mathrm{d}\boldsymbol{F}(\boldsymbol{X})c_2\mathrm{d}\boldsymbol{G}(\boldsymbol{X}) d(c1F(X)c2G(X))c1dF(X)c2dG(X)其中 c 1 c_1 c1 和 c 2 c_2 c2 是常数(标量)
乘法法则 d ( F ( X ) G ( X ) ) d ( F ( X ) ) G ( X ) F ( X ) d G ( X ) \begin{aligned}\operatorname{d}(\boldsymbol{F}(\boldsymbol{X})\boldsymbol{G}(\boldsymbol{X}))\operatorname{d}(\boldsymbol{F}(\boldsymbol{X}))\boldsymbol{G}(\boldsymbol{X})\boldsymbol{F}(\boldsymbol{X})\mathrm{d}\boldsymbol{G}(\boldsymbol{X})\end{aligned} d(F(X)G(X))d(F(X))G(X)F(X)dG(X)其中 F ( X ) \boldsymbol{F}(\boldsymbol{X}) F(X) 是 p × q {p\times q} p×q 维的 G ( X ) \boldsymbol{G}(\boldsymbol{X}) G(X) 是 q × s {q\times s} q×s 维的
注意根据线性代数的内容由于此时的微分是矩阵不能交换乘积的左右顺序。
证明 F ( X ) G ( X ) F(X)\boldsymbol{G}(X) F(X)G(X) 的每个元素都是 ∑ k 1 q [ f i k ( X ) g k j ( X ) ] \sum_{k1}^q[f_{ik}(\boldsymbol{X})g_{kj}(\boldsymbol{X})] ∑k1q[fik(X)gkj(X)]而每个元素的全微分是 d ( ∑ k 1 q [ f i k ( X ) g k j ( X ) ] ) ∑ k 1 q d ( f i k ( X ) g k j ( X ) ) ∑ k 1 q [ d ( f i k ( X ) ) g k j ( X ) f i k ( X ) d g k j ( X ) ] ∑ k 1 q [ d ( f i k ( X ) ) g k j ( X ) ] ∑ k 1 q [ f i k ( X ) d g k j ( X ) ] \begin{aligned} \mathbb{d}\left( \sum_{k1}^q[f_{ik}(\pmb{X})g_{kj}(\pmb{X})] \right) \sum_{k1}^q \mathbb{d}(f_{ik}(\pmb{X})g_{kj}(\pmb{X})) \\\\ \sum_{k1}^q[\mathbb{d}(f_{ik}(\pmb{X}))g_{kj}(\pmb{X})f_{ik}(\pmb{X})\mathbb{d}g_{kj}(\pmb{X})] \\\\ \sum_{k1}^q[\mathbb{d}(f_{ik}(\pmb{X}))g_{kj}(\pmb{X})] \sum_{k1}^q[f_{ik}(\pmb{X})\mathbb{d}g_{kj}(\pmb{X})] \end{aligned} d(k1∑q[fik(X)gkj(X)])k1∑qd(fik(X)gkj(X))k1∑q[d(fik(X))gkj(X)fik(X)dgkj(X)]k1∑q[d(fik(X))gkj(X)]k1∑q[fik(X)dgkj(X)]我们发现在上式中左边的求和式就是 d ( F ( X ) G ( X ) ) \mathrm{d}(F(X)\boldsymbol{G}(X)) d(F(X)G(X)) 的每个元素结果右边的求和式就是 就是 d ( G ( X ) F ( X ) ) \mathrm{d}(G(X)\boldsymbol{F}(X)) d(G(X)F(X)) 的每个元素。
转置法则
即转置的矩阵微分等于矩阵微分的转置 d F p × q T ( X ) ( d F p × q ( X ) ) T \mathrm{d}\boldsymbol{F}_{p\times q}^T(\boldsymbol{X})(\mathrm{d}\boldsymbol{F}_{p\times q}(\boldsymbol{X}))^T dFp×qT(X)(dFp×q(X))T证明 d F p × q T ( X ) d [ f 11 ( X ) f 21 ( X ) ⋯ f p 1 ( X ) f 12 ( X ) f 22 ( X ) ⋯ f p 2 ( X ) ⋮ ⋮ ⋮ ⋮ f 1 q ( X ) f 2 q ( X ) ⋯ f p q ( X ) ] q × p [ d f 11 ( X ) d f 21 ( X ) ⋯ d f p 1 ( X ) d f 12 ( X ) d f 22 ( X ) ⋯ d f p 2 ( X ) ⋮ ⋮ ⋮ ⋮ d f 1 q ( X ) d f 2 q ( X ) ⋯ d f p q ( X ) ] q × p [ d f 11 ( X ) d f 12 ( X ) ⋯ d f 1 q ( X ) d f 21 ( X ) d f 22 ( X ) ⋯ d f 2 q ( X ) ⋮ ⋮ ⋮ ⋮ d f p 1 ( X ) d f p 2 ( X ) ⋯ d f p q ( X ) ] p × q T ( d F p × q ( X ) ) T \begin{aligned} \mathbb{d}\pmb{F}^T_{p \times q}(\pmb{X}) \mathbb{d} \begin{bmatrix} f_{11}(\pmb{X}) f_{21}(\pmb{X}) \cdots f_{p1}(\pmb{X}) \\ f_{12}(\pmb{X}) f_{22}(\pmb{X}) \cdots f_{p2}(\pmb{X}) \\ \vdots\vdots\vdots\vdots \\ f_{1q}(\pmb{X})f_{2q}(\pmb{X}) \cdots f_{pq}(\pmb{X}) \end{bmatrix}_{q \times p} \\\\ \begin{bmatrix} \mathbb{d}f_{11}(\pmb{X}) \mathbb{d}f_{21}(\pmb{X}) \cdots \mathbb{d}f_{p1}(\pmb{X}) \\ \mathbb{d}f_{12}(\pmb{X}) \mathbb{d}f_{22}(\pmb{X}) \cdots \mathbb{d}f_{p2}(\pmb{X}) \\ \vdots\vdots\vdots\vdots \\ \mathbb{d}f_{1q}(\pmb{X})\mathbb{d}f_{2q}(\pmb{X}) \cdots \mathbb{d}f_{pq}(\pmb{X}) \end{bmatrix}_{q \times p} \\\\ \begin{bmatrix} \mathbb{d}f_{11}(\pmb{X}) \mathbb{d}f_{12}(\pmb{X}) \cdots \mathbb{d}f_{1q}(\pmb{X}) \\ \mathbb{d}f_{21}(\pmb{X}) \mathbb{d}f_{22}(\pmb{X}) \cdots \mathbb{d}f_{2q}(\pmb{X}) \\ \vdots\vdots\vdots\vdots \\ \mathbb{d}f_{p1}(\pmb{X}) \mathbb{d}f_{p2}(\pmb{X}) \cdots \mathbb{d}f_{pq}(\pmb{X}) \end{bmatrix}_{p \times q}^T \\\\ (\mathbb{d}\pmb{F}_{p \times q}(\pmb{X}))^T \end{aligned} dFp×qT(X)d f11(X)f12(X)⋮f1q(X)f21(X)f22(X)⋮f2q(X)⋯⋯⋮⋯fp1(X)fp2(X)⋮fpq(X) q×p df11(X)df12(X)⋮df1q(X)df21(X)df22(X)⋮df2q(X)⋯⋯⋮⋯dfp1(X)dfp2(X)⋮dfpq(X) q×p df11(X)df21(X)⋮dfp1(X)df12(X)df22(X)⋮dfp2(X)⋯⋯⋮⋯df1q(X)df2q(X)⋮dfpq(X) p×qT(dFp×q(X))T
3. 矩阵微分
3.1 矩阵微分的意义 X m × n X_{m\times n} Xm×n 可以看成是以自身为矩阵变元的实矩阵函数它的每个元素是 x i j x_{ij} xij 。每个元素的全微分是 d x i j \mathrm{d}x_{ij} dxij 因此矩阵 X m × n X_{m\times n} Xm×n 的全微分为 d X m × n [ d x 11 d x 12 ⋯ d x 1 n d x 21 d x 22 ⋯ d x 2 n ⋮ ⋮ ⋮ ⋮ d x m 1 d x m 2 ⋯ d x m n ] m × n \mathrm{d}{X}_{m\times n}\begin{bmatrix}\mathrm{d}x_{11}\mathrm{d}x_{12}\cdots\mathrm{d}x_{1n}\\\mathrm{d}x_{21}\mathrm{d}x_{22}\cdots\mathrm{d}x_{2n}\\\vdots\vdots\vdots\vdots\\\mathrm{d}x_{m1}\mathrm{d}x_{m2}\cdots\mathrm{d}x_{mn}\end{bmatrix}_{m\times n} dXm×n dx11dx21⋮dxm1dx12dx22⋮dxm2⋯⋯⋮⋯dx1ndx2n⋮dxmn m×n前面我们提到过向量可以看成是一种特殊的矩阵那么向量 x ⃗ [ x 1 , x 2 , ⋯ , x n ] T \vec{x}[x_1,x_2,\cdots,x_n]^T x [x1,x2,⋯,xn]T 的矩阵微分为 d x ⃗ [ d x 1 d x 2 ⋮ d x n ] n × 1 \mathrm{d}\vec{x}\begin{bmatrix}\mathrm{d}x_1\\\mathrm{d}x_2\\\vdots\\\mathrm{d}x_n\end{bmatrix}_{n\times1} dx dx1dx2⋮dxn n×1前面提到过的矩阵微分的基本运算法则对于矩阵 X m × n X_{m\times n} Xm×n 和 向量 x ⃗ [ x 1 , x 2 , ⋯ , x n ] T \vec{x}[x_1,x_2,\cdots,x_n]^T x [x1,x2,⋯,xn]T 的微分也是适用的。
现在回到矩阵变元的实值标量函数的全微分即 d f ( X ) ∂ f ∂ x 11 d x 11 ∂ f ∂ x 12 d x 12 ⋯ ∂ f ∂ x 1 n d x 1 n ∂ f ∂ x 21 d x 21 ∂ f ∂ x 22 d x 22 ⋯ ∂ f ∂ x 2 n d x 2 n ⋯ ∂ f ∂ x m 1 d x m 1 ∂ f ∂ x m 2 d x m 2 ⋯ ∂ f ∂ x m n d x m n t r ( [ ∂ f ∂ x 11 ∂ f ∂ x 21 ⋯ ∂ f ∂ x m 1 ∂ f ∂ x 12 ∂ f ∂ x 22 ⋯ ∂ f ∂ x m 2 ⋮ ⋮ ⋮ ⋮ ∂ f ∂ x 1 n ∂ f ∂ x 2 n ⋯ ∂ f ∂ x m n ] n × m [ d x 11 d x 12 ⋯ d x 1 n d x 21 d x 22 ⋯ d x 2 n ⋮ ⋮ ⋮ ⋮ d x m 1 d x m 2 ⋯ d x m n ] m × n ) (20) \begin{aligned} \mathbb{d}f(\pmb{X}) \frac{\partial f}{\partial x_{11}}\mathbb{d}x_{11}\frac{\partial f}{\partial x_{12}}\mathbb{d}x_{12} \cdots\frac{\partial f}{\partial x_{1n}}\mathbb{d}x_{1n}\\ \frac{\partial f}{\partial x_{21}}\mathbb{d}x_{21}\frac{\partial f}{\partial x_{22}}\mathbb{d}x_{22} \cdots\frac{\partial f}{\partial x_{2n}}\mathbb{d}x_{2n}\\ \cdots\\ \frac{\partial f}{\partial x_{m1}}\mathbb{d}x_{m1}\frac{\partial f}{\partial x_{m2}}\mathbb{d}x_{m2} \cdots\frac{\partial f}{\partial x_{mn}}\mathbb{d}x_{mn} \\\\ \mathbb{tr}( \begin{bmatrix} \frac{\partial f}{\partial x_{11}}\frac{\partial f}{\partial x_{21}}\cdots\frac{\partial f}{\partial x_{m1}} \\ \frac{\partial f}{\partial x_{12}}\frac{\partial f}{\partial x_{22}} \cdots \frac{\partial f}{\partial x_{m2}}\\ \vdots\vdots\vdots\vdots\\ \frac{\partial f} {\partial x_{1n}}\frac{\partial f}{\partial x_{2n}}\cdots\frac{\partial f}{\partial x_{mn}} \end{bmatrix}_{n\times m} \begin{bmatrix} \mathbb{d}x_{11} \mathbb{d}x_{12} \cdots \mathbb{d}x_{1n} \\ \mathbb{d}x_{21} \mathbb{d}x_{22} \cdots \mathbb{d}x_{2n} \\ \vdots\vdots\vdots\vdots\\ \mathbb{d}x_{m1} \mathbb{d}x_{m2} \cdots \mathbb{d}x_{mn} \end{bmatrix}_{m \times n} ) \end{aligned} \\\ \tag{20} df(X)∂x11∂fdx11∂x12∂fdx12⋯∂x1n∂fdx1n∂x21∂fdx21∂x22∂fdx22⋯∂x2n∂fdx2n⋯∂xm1∂fdxm1∂xm2∂fdxm2⋯∂xmn∂fdxmntr( ∂x11∂f∂x12∂f⋮∂x1n∂f∂x21∂f∂x22∂f⋮∂x2n∂f⋯⋯⋮⋯∂xm1∂f∂xm2∂f⋮∂xmn∂f n×m dx11dx21⋮dxm1dx12dx22⋮dxm2⋯⋯⋮⋯dx1ndx2n⋮dxmn m×n) (20)观察上式的结果能够发现tr 中实际就是前面提到过的Jacabian矩阵的形式即 D X f ( X ) ∂ f ( X ) ∂ X m × n T [ ∂ f ∂ x 11 ∂ f ∂ x 21 ⋯ ∂ f ∂ x m 1 ∂ f ∂ x 12 ∂ f ∂ x 22 ⋯ ∂ f ∂ x m 2 ⋮ ⋮ ⋮ ⋮ ∂ f ∂ x 1 n ∂ f ∂ x 2 n ⋯ ∂ f ∂ x m n ] n × m \begin{aligned} \text{D}_{\pmb{X}}f(\pmb{X}) \frac{\partial f(\pmb{X})}{\partial \pmb{X}^T_{m\times n}} \\\\ \begin{bmatrix} \frac{\partial f}{\partial x_{11}}\frac{\partial f}{\partial x_{21}}\cdots\frac{\partial f}{\partial x_{m1}} \\ \frac{\partial f}{\partial x_{12}}\frac{\partial f}{\partial x_{22}} \cdots \frac{\partial f}{\partial x_{m2}}\\ \vdots\vdots\vdots\vdots\\ \frac{\partial f} {\partial x_{1n}}\frac{\partial f}{\partial x_{2n}}\cdots\frac{\partial f}{\partial x_{mn}} \end{bmatrix}_{n\times m} \end{aligned} DXf(X)∂Xm×nT∂f(X) ∂x11∂f∂x12∂f⋮∂x1n∂f∂x21∂f∂x22∂f⋮∂x2n∂f⋯⋯⋮⋯∂xm1∂f∂xm2∂f⋮∂xmn∂f n×m而第二项实际上就是矩阵 X m × n X_{m\times n} Xm×n 的全微分 d X m × n [ d x 11 d x 12 ⋯ d x 1 n d x 21 d x 22 ⋯ d x 2 n ⋮ ⋮ ⋮ ⋮ d x m 1 d x m 2 ⋯ d x m n ] m × n \mathrm{d}{X}_{m\times n}\begin{bmatrix}\mathrm{d}x_{11}\mathrm{d}x_{12}\cdots\mathrm{d}x_{1n}\\\mathrm{d}x_{21}\mathrm{d}x_{22}\cdots\mathrm{d}x_{2n}\\\vdots\vdots\vdots\vdots\\\mathrm{d}x_{m1}\mathrm{d}x_{m2}\cdots\mathrm{d}x_{mn}\end{bmatrix}_{m\times n} dXm×n dx11dx21⋮dxm1dx12dx22⋮dxm2⋯⋯⋮⋯dx1ndx2n⋮dxmn m×n因此矩阵变元的实值标量函数的全微分可以写成 d f ( X ) t r ( ∂ f ( X ) ∂ X T d X ) \mathrm{d}f(\boldsymbol{X})\mathrm{tr}(\frac{\partial f(\boldsymbol{X})}{\partial\boldsymbol{X}^T}\mathrm{d}\boldsymbol{X}) df(X)tr(∂XT∂f(X)dX)对于一个矩阵变元的实值标量函数而言要求其微分其实就是求 ∂ f ( X ) ∂ X T \frac{\partial f(\boldsymbol{X})}{\partial\boldsymbol{X}^T} ∂XT∂f(X)如果能够写成上述形式那么就完成了计算的过程但可能你还会想这个分解式是否是唯一的呢实际上确实就是唯一的即如果 d f ( X ) t r ( A 1 d X ) t r ( A 2 d X ) \mathrm{d}f(\boldsymbol{X})\mathrm{tr}(\boldsymbol{A}_1\mathrm{d}\boldsymbol{X})\mathrm{tr}(\boldsymbol{A}_2\mathrm{d}\boldsymbol{X}) df(X)tr(A1dX)tr(A2dX)则有 A 1 A 2 \boldsymbol{A}_1\boldsymbol{A}_2 A1A2
实际上由于向量可以看成一种特殊的矩阵因此向量变元的实值标量函数的全微分可以写成 d f ( x ⃗ ) t r ( ∂ f ( x ⃗ ) ∂ x ⃗ T d x ⃗ ) \mathrm{d}f(\vec{x})\mathrm{tr}(\frac{\partial f(\vec{x})}{\partial\vec{x}^T}\mathrm{d}\vec{x}) df(x )tr(∂x T∂f(x )dx )当矩阵变元 X X X 退化为一个列向量 x ⃗ \vec{x} x 时则有 ∂ f ( X ) ∂ X T ∂ f ( x ⃗ ) ∂ x ⃗ T d X d x ⃗ \frac{\partial f(\boldsymbol{X})}{\partial\boldsymbol{X}^T}\frac{\partial f(\vec{x})}{\partial\vec{x}^T}\mathrm{d}X\mathrm{d}\vec{x} ∂XT∂f(X)∂x T∂f(x )dXdx 那么对于实值标量函数而言不论变元是向量还是矩阵都可以用如下形式来求解微分 d f ( X ) t r ( ∂ f ( X ) ∂ X T d X ) \mathrm{d}f(\boldsymbol{X})\mathrm{tr}(\frac{\partial f(\boldsymbol{X})}{\partial\boldsymbol{X}^T}\mathrm{d}\boldsymbol{X}) df(X)tr(∂XT∂f(X)dX)
下面来看看几个常用的公式最好能够记忆因为在各种相关论文中都经常见到
夹心公式 d ( A X B ) A d ( X ) B \mathrm{d}(\boldsymbol{A}\boldsymbol{X}\boldsymbol{B})\boldsymbol{A}\mathrm{d}(\boldsymbol{X})\boldsymbol{B} d(AXB)Ad(X)B 其中 A p × m , B n × q A_{p\times m},\boldsymbol{B}_{n\times q} Ap×m,Bn×q 是常数矩阵。
证明 由乘积法则得 d ( A X B ) d ( A ) X B A d ( X ) B A X d B \mathrm{d}(AXB)\mathrm{d}(A)XBA\mathrm{d}(X)BAX\mathrm{d}B d(AXB)d(A)XBAd(X)BAXdB再由常数矩阵微分为 0 可以得 d A 0 p × m , d B 0 n × q \mathrm{d}\boldsymbol{A}\mathbf{0}_{p\times m},\mathrm{d}\boldsymbol{B}\mathbf{0}_{n\times q} dA0p×m,dB0n×q由此结果代入 d ( A ) X B A d ( X ) B A X d B \mathrm{d}(A)XBA\mathrm{d}(X)BAX\mathrm{d}B d(A)XBAd(X)BAXdB 即可证明
行列式 d ∣ X ∣ ∣ X ∣ t r ( X − 1 d X ) t r ( ∣ X ∣ X − 1 d X ) \begin{aligned}\mathrm{d}|\boldsymbol{X}||\boldsymbol{X}|\mathrm{tr}(\boldsymbol{X}^{-1}\mathrm{d}\boldsymbol{X})\mathrm{tr}(|\boldsymbol{X}|\boldsymbol{X}^{-1}\mathrm{d}\boldsymbol{X})\end{aligned} d∣X∣∣X∣tr(X−1dX)tr(∣X∣X−1dX)其中 X n × n \boldsymbol{X}_{n\times n} Xn×n 是 n × n n \times n n×n 维的
由于行列式是一个实值标量函数那么我们便可以应用矩阵变元的实值标量函数的公式即 d f ( X ) t r ( ∂ f ( X ) ∂ X T d X ) \mathrm{d}f(\boldsymbol{X})\mathrm{tr}(\frac{\partial f(\boldsymbol{X})}{\partial\boldsymbol{X}^T}\mathrm{d}\boldsymbol{X}) df(X)tr(∂XT∂f(X)dX)
由线性代数的知识行列式可以按照一行展开即一行中每个元素乘以它的代数余子式然后求和我们从矩阵 X X X 的第 i i i 行展开 ∣ X ∣ x i 1 A i 1 x i 2 A i 2 ⋯ x i n A i n |\boldsymbol{X}|x_{i1}A_{i1}x_{i2}A_{i2}\cdotsx_{in}A_{in} ∣X∣xi1Ai1xi2Ai2⋯xinAin
因此行列式对元素 x i j x_{ij} xij 的偏导即为该元素对应的代数余子式也就是有 ∂ ∣ X ∣ ∂ x i j A i j \frac{\partial|\boldsymbol{X}|}{\partial x_{ij}}A_{ij} ∂xij∂∣X∣Aij从而行列式对矩阵求导的结果为 ∂ ∣ X ∣ ∂ X T [ A 11 A 21 ⋯ A n 1 A 12 A 22 ⋯ A n 2 ⋮ ⋮ ⋱ ⋮ A 1 n A 2 n ⋯ A n n ] \begin{aligned}\frac{\partial|\boldsymbol{X}|}{\partial\boldsymbol{X}^T}\begin{bmatrix}A_{11}A_{21}\cdotsA_{n1}\\A_{12}A_{22}\cdotsA_{n2}\\\vdots\vdots\ddots\vdots\\A_{1n}A_{2n}\cdotsA_{nn}\end{bmatrix}\end{aligned} ∂XT∂∣X∣ A11A12⋮A1nA21A22⋮A2n⋯⋯⋱⋯An1An2⋮Ann 这个结果其实就是伴随矩阵利用伴随矩阵和行列式以及逆矩阵的关系有 X − 1 X ∗ ∣ X ∣ X^{-1}\frac{X^*}{|X|} X−1∣X∣X∗将该式带入到矩阵变元对标量实值函数的求导公式 d f ( X ) t r ( ∂ f ( X ) ∂ X T d X ) \mathrm{d}f(\boldsymbol{X})\mathrm{tr}(\frac{\partial f(\boldsymbol{X})}{\partial\boldsymbol{X}^T}\mathrm{d}\boldsymbol{X}) df(X)tr(∂XT∂f(X)dX) 中后可以得到 d ∣ X ∣ tr ( ∂ ∣ X ∣ ∂ X T d X ) tr ( ∣ X ∣ X − 1 d X ) \begin{aligned}\mathrm{d}|X| \operatorname{tr}(\frac{\partial|\boldsymbol{X}|}{\partial\boldsymbol{X}^T}\mathrm{d}\boldsymbol{X}) \\ \operatorname{tr}(|\boldsymbol{X}|\boldsymbol{X}^{-1}\mathrm{d}\boldsymbol{X}) \end{aligned} d∣X∣tr(∂XT∂∣X∣dX)tr(∣X∣X−1dX)又因为行列式是标量由矩阵的迹的性质可以将标量提到迹外也就是 d ∣ X ∣ ∣ X ∣ t r ( X − 1 d X ) t r ( ∣ X ∣ X − 1 d X ) \mathrm{d}|\boldsymbol{X}||\boldsymbol{X}|\mathrm{tr}(\boldsymbol{X}^{-1}\mathrm{d}\boldsymbol{X})\mathrm{tr}(|\boldsymbol{X}|\boldsymbol{X}^{-1}\mathrm{d}\boldsymbol{X}) d∣X∣∣X∣tr(X−1dX)tr(∣X∣X−1dX)如上所述证明完毕
逆矩阵 d ( X − 1 ) − X − 1 d ( X ) X − 1 \mathrm{d}(\boldsymbol{X}^{-1})-\boldsymbol{X}^{-1}\mathrm{d}(\boldsymbol{X})\boldsymbol{X}^{-1} d(X−1)−X−1d(X)X−1其中 X n × n \boldsymbol{X}_{n\times n} Xn×n 是 n × n n \times n n×n 维的
因为 X X − 1 E XX^{-1}\boldsymbol{E} XX−1E
而常数矩阵微分为 0 , 两边同时取矩阵微分得 d ( X ) X − 1 X d ( X − 1 ) 0 \operatorname{d}(X)X^{-1}X\operatorname{d}(X^{-1})\mathbf{0} d(X)X−1Xd(X−1)0
等式两边左乘 X − 1 X^{-1} X−1 即得到结果。
3.2 矩阵微分示范
对于实值标量函数 f ( X ) f(\boldsymbol{X}) f(X)满足 tr ( f ( X ) ) f ( X ) , d f ( X ) tr ( d f ( X ) ) \operatorname{tr}(f(\boldsymbol{X}))f(\boldsymbol{X})\text{,}\operatorname{d}f(\boldsymbol{X})\operatorname{tr}(\operatorname{d}f(\boldsymbol{X})) tr(f(X))f(X),df(X)tr(df(X))所以有 d f ( X ) d ( t r f ( X ) ) t r ( d f ( X ) ) \mathrm df(\boldsymbol{X})\mathrm d(\mathrm trf(\boldsymbol{X}))\mathrm t\mathrm r(\mathrm df(\boldsymbol{X})) df(X)d(trf(X))tr(df(X))而如果实值标量函数本身就是某个矩阵函数 F p × p ( X ) \boldsymbol{F}_{p\times p}(\boldsymbol{X}) Fp×p(X) 的迹如 t r F ( X ) \mathrm{tr}\boldsymbol{F}(\boldsymbol{X}) trF(X)则由全微分的线性法则有 d ( t r F p × p ( X ) ) d ( ∑ i 1 p f i i ( X ) ) ∑ i 1 p d ( f i i ( X ) ) t r ( d F p × p ( X ) ) \mathrm{d}(\mathrm{tr}\boldsymbol{F}_{p\times p}(\boldsymbol{X}))\mathrm{d}(\sum_{i1}^pf_{ii}(\boldsymbol{X}))\sum_{i1}^p\mathrm{d}(f_{ii}(\boldsymbol{X}))\mathrm{tr}(\mathrm{d}F_{p\times p}(\boldsymbol{X})) d(trFp×p(X))d(i1∑pfii(X))i1∑pd(fii(X))tr(dFp×p(X))我下面以几个例子作为示范来看看如何使用矩阵微分求导。
例子1 ∂ ( a ⃗ T X b ⃗ ) ∂ X a ⃗ b ⃗ T \frac{\partial(\vec{a}^TX\vec{b})}{\partial{X}}\vec{a}\vec{b}^T ∂X∂(a TXb )a b T
其中 a ⃗ m × 1 , b ⃗ n × 1 \vec{a}_{m\times1},\vec{b}_{n\times1} a m×1,b n×1 为常数向量 a ⃗ ( a 1 , a 2 , ⋯ , a m ) T , b ⃗ ( b 1 , b 2 , ⋯ , b n ) T \vec{a}_(a_1,a_2,\cdots,a_m)^T,\vec{b}(b_1,b_2,\cdots,b_n)^T a (a1,a2,⋯,am)T,b (b1,b2,⋯,bn)T矩阵 X X X 是 m × n m \times n m×n 维
证首先我们由笔记(1)中的内容可以知道上述情况是分子为标量而分母为矩阵的形式因此结果的维度是 m × n m \times n m×n 维 d ( a ⃗ T X X T b ⃗ ) 按tr对标量微分的性质 t r ( d ( a ⃗ T X X T b ⃗ ) ) 由夹心公式 t r ( a ⃗ T d ( X X T ) b ⃗ ) 矩阵乘积微分法则 tr [ a ⃗ T ( d ( X ) X T X d X T ) b ⃗ ] tr的线性法则 t r ( a ⃗ T d ( X ) X T b ⃗ ) t r ( a ⃗ T X d ( X T ) b ⃗ ) 矩阵求导的转置法则 t r ( a ⃗ T d ( X ) X T b ⃗ ) t r ( a ⃗ T X ( d X ) T b ⃗ ) tr的转置法则 tr ( X T b ⃗ a ⃗ T d X ) tr ( X T a ⃗ b ⃗ T d X ) tr的线性法则 tr ( ( X T b ⃗ a ⃗ T X T a ⃗ b ⃗ T ) d X ) \begin{aligned}\mathrm{d}(\vec{a}^T{X}{X}^T\vec{b})\overset{\text{按tr对标量微分的性质}}{\!\!\!\!\!\!\!\!\!\!\!\!}\mathrm{tr}(\mathrm{d}(\vec{a}^T{X}{X}^T\vec{b}))\\ \overset{\text{由夹心公式}}{\!\!\!\!\!\!\!\!\!\!\!\!}\mathrm{tr}(\vec{a}^T\mathrm{d}({X}{X}^T)\vec{b})\\ \overset{\text{矩阵乘积微分法则}}{\!\!\!\!\!\!\!\!\!\!\!\!}\operatorname{tr}[\vec{a}^T(\operatorname{d}({X}){X}^T{X}\mathrm{d}{X}^T)\vec{b}]\\ \overset{\text{tr的线性法则}}{\!\!\!\!\!\!\!\!\!\!\!\!}\mathrm{tr}(\vec{a}^T\mathrm{d}({X}){X}^T\vec{b})\mathrm{tr}(\vec{a}^T{X}\mathrm{d}({X}^T)\vec{b})\\ \overset{\text{矩阵求导的转置法则}}{\!\!\!\!\!\!\!\!\!\!\!\!}\mathrm{tr}(\vec{a}^T\mathrm{d}({X}){X}^T\vec{b})\mathrm{tr}(\vec{a}^T{X}(\mathrm{d}X)^T\vec{b})\\ \overset{\text{tr的转置法则}}{\!\!\!\!\!\!\!\!\!\!\!\!}\operatorname{tr}({X}^T\vec{b}\vec{a}^T\mathrm{d}{X})\operatorname{tr}({X}^T\vec{a}\vec{b}^T\mathrm{d}{X})\\ \overset{\text{tr的线性法则}}{\!\!\!\!\!\!\!\!\!\!\!\!}\operatorname{tr}(({X}^T\vec{b}\vec{a}^T{X}^T\vec{a}\vec{b}^T)\mathrm{d}{X}) \end{aligned} d(a TXXTb )按tr对标量微分的性质tr(d(a TXXTb ))由夹心公式tr(a Td(XXT)b )矩阵乘积微分法则tr[a T(d(X)XTXdXT)b ]tr的线性法则tr(a Td(X)XTb )tr(a TXd(XT)b )矩阵求导的转置法则tr(a Td(X)XTb )tr(a TX(dX)Tb )tr的转置法则tr(XTb a TdX)tr(XTa b TdX)tr的线性法则tr((XTb a TXTa b T)dX)将该式 d f ( X ) t r ( ∂ f ( X ) ∂ X T d X ) \mathrm{d}f({X})\mathrm{tr}(\frac{\partial f({X})}{\partial{X}^T}\mathrm{d}{X}) df(X)tr(∂XT∂f(X)dX) 比较令 f ( X ) a ⃗ T X X T b ⃗ f(X)\vec{a}^T{X}{X}^T\vec{b} f(X)a TXXTb 便可以得到 ∂ ( a ⃗ T X X T b ⃗ ) ∂ X T X T b ⃗ a ⃗ T X T a ⃗ b ⃗ T ∂ ( a ⃗ T X X T b ⃗ ) ∂ X a ⃗ b ⃗ T X b ⃗ a ⃗ T X \begin{aligned}\frac{\partial(\vec{a}^T{X}{X}^T\vec{b})}{\partial{X}^T}{X}^T\vec{b}\vec{a}^T{X}^T\vec{a}\vec{b}^T\\\\\frac{\partial(\vec{a}^T{X}{X}^T\vec{b})}{\partial{X}}\vec {a}\vec {b}^T{X}\vec {b}\vec {a}^T{X}\end{aligned} ∂XT∂(a TXXTb )XTb a TXTa b T∂X∂(a TXXTb )a b TXb a TX证毕
例子2 ∂ t r ( X T X ) ∂ X 2 X \frac{\partial\mathrm{tr}(\boldsymbol{X}^T\boldsymbol{X})}{\partial\boldsymbol{X}}2\boldsymbol{X} ∂X∂tr(XTX)2X证明因为 d ( t r ( X T X ) ) 按tr对标量微分的性质 t r ( d ( X T X ) ) 矩阵乘积微分法则 tr ( d ( X T ) X X T d ( X ) ) tr的线性法则 tr ( d ( X T ) X ) tr ( X T d ( X ) ) 矩阵求导的转置法则 tr ( ( d ( X ) ) T X ) tr ( X T d ( X ) ) tr的转置法则 2 tr ( X T d X ) tr的线性法则 tr ( 2 X T d X ) \begin{aligned}\mathrm{d}(\mathrm{tr}(X^TX))\overset{\text{按tr对标量微分的性质}}{\!\!\!\!\!\!\!\!\!\!\!\!}\mathrm{tr}(\mathrm{d}(X^TX))\\ \overset{\text{矩阵乘积微分法则}}{\!\!\!\!\!\!\!\!\!\!\!\!}\operatorname{tr}(\operatorname{d}({X^T}){X}{X^T}\mathrm{d}{(X)})\\ \overset{\text{tr的线性法则}}{\!\!\!\!\!\!\!\!\!\!\!\!}\operatorname{tr}(\operatorname{d}({X^T}){X})\operatorname{tr}({X^T}\mathrm{d}{(X)})\\ \overset{\text{矩阵求导的转置法则}}{\!\!\!\!\!\!\!\!\!\!\!\!}\operatorname{tr}((\operatorname{d}({X}))^T{X})\operatorname{tr}({X^T}\mathrm{d}{(X)})\\ \overset{\text{tr的转置法则}}{\!\!\!\!\!\!\!\!\!\!\!\!}2\operatorname{tr}({X}^T\mathrm{d}{X})\\ \overset{\text{tr的线性法则}}{\!\!\!\!\!\!\!\!\!\!\!\!}\operatorname{tr}(2{X}^T\mathrm{d}{X}) \end{aligned} d(tr(XTX))按tr对标量微分的性质tr(d(XTX))矩阵乘积微分法则tr(d(XT)XXTd(X))tr的线性法则tr(d(XT)X)tr(XTd(X))矩阵求导的转置法则tr((d(X))TX)tr(XTd(X))tr的转置法则2tr(XTdX)tr的线性法则tr(2XTdX)将该式 d f ( X ) t r ( ∂ f ( X ) ∂ X T d X ) \mathrm{d}f({X})\mathrm{tr}(\frac{\partial f({X})}{\partial{X}^T}\mathrm{d}{X}) df(X)tr(∂XT∂f(X)dX) 比较令 f ( X ) t r ( X T X ) f(X)\mathrm{tr}({X^T}{X}) f(X)tr(XTX)便可以得到 t r ( X T X ) ∂ X T 2 X T t r ( X T X ) ∂ X 2 X \begin{aligned}\frac{\mathrm{tr}({X^T}{X})}{\partial{X}^T}2{X}^T\\\\\frac{\mathrm{tr}({X^T}{X})}{\partial{X}}2X\end{aligned} ∂XTtr(XTX)2XT∂Xtr(XTX)2X证毕
例子3 ∂ log ∣ X ∣ ∂ X ( X − 1 ) T \frac{\partial\log|{X}|}{\partial{X}}({X}^{-1})^T ∂X∂log∣X∣(X−1)T其中 X X X 是 n × n n \times n n×n 的
Emmm这里我就不看了请允许我偷个懒- -
至此所有的矩阵变元或者向量变元的实值标量函数的一阶矩阵求导都可以用本文的方法进行计算。实际上由于我们只考虑了实值函数和一阶导数我们仍然可以定义更高阶的矩阵微分以及将数域推广到复数但这就属于后话了以后如果需要的话我会再学习这方面的内容的。
好吧就到此为止了如果想了解更多的例子可以从参考的原文章中获取哦毕竟我也只是来学习的嘛
参考
矩阵求导公式的数学推导矩阵求导——进阶篇
张贤达《矩阵分析与应用第二版》