文章目录
- 1. 概述
- 2. 泰勒公式
- 3. 雅可比矩阵
- 4. 经典牛顿法
- 4.1 经典牛顿法理论
- 4.2 牛顿迭代法解求方程根
- 4.3 牛顿迭代法解求方程根 Python
 
- 5. 梯度下降和经典牛顿法
- 5.1 线搜索方法
- 5.2 经典牛顿法
 
- 6. 凸优化问题
- 6.1 约束问题
- 6.1 凸集组合
 
Mit麻省理工教授视频如下:逐步最小化一个函数
1. 概述
主要讲的是无约束情况下的最小值问题。涉及到如下:
- 矩阵求导
- 泰勒公式,函数到向量的转换
- 梯度下降
- 牛顿法梯度下降
2. 泰勒公式
我们之前在高等数学中学过关于f(x)的泰勒展开如下:
 定义: 
     
      
       
        
         
         
           lim 
          
         
            
          
         
         
         
           x 
          
         
           → 
          
         
           a 
          
         
        
        
        
          h 
         
        
          k 
         
        
       
         ( 
        
       
         x 
        
       
         ) 
        
       
         = 
        
       
         0 
        
       
      
        \lim\limits_{x\to a}h_k(x)=0 
       
      
    x→alimhk(x)=0
  
      
       
        
         
          
          
           
            
            
              f 
             
            
              ( 
             
            
              x 
             
            
              ) 
             
            
              = 
             
            
              f 
             
            
              ( 
             
            
              a 
             
            
              ) 
             
            
              + 
             
             
             
               f 
              
             
               ′ 
              
             
            
              ( 
             
            
              a 
             
            
              ) 
             
            
              ( 
             
            
              x 
             
            
              − 
             
            
              a 
             
            
              ) 
             
            
              + 
             
             
              
               
               
                 f 
                
                
                
                  ′ 
                 
                
                  ′ 
                 
                
               
              
                ( 
               
              
                a 
               
              
                ) 
               
              
              
              
                2 
               
              
                ! 
               
              
             
            
              ( 
             
            
              x 
             
            
              − 
             
            
              a 
             
             
             
               ) 
              
             
               2 
              
             
            
              + 
             
            
              ⋯ 
             
            
              + 
             
             
              
               
               
                 f 
                
                
                
                  ( 
                 
                
                  k 
                 
                
                  ) 
                 
                
               
              
                ( 
               
              
                a 
               
              
                ) 
               
              
              
              
                k 
               
              
                ! 
               
              
             
            
              ( 
             
            
              x 
             
            
              − 
             
            
              a 
             
             
             
               ) 
              
             
               k 
              
             
            
              + 
             
             
             
               h 
              
             
               k 
              
             
            
              ( 
             
            
              x 
             
            
              ) 
             
            
              ( 
             
            
              x 
             
            
              − 
             
            
              a 
             
             
             
               ) 
              
             
               k 
              
             
            
           
          
          
          
         
        
       
         \begin{equation} f(x)=f(a)+f'(a)(x-a)+\frac{f''(a)}{2!}(x-a)^2+\cdots+\frac{f^{(k)}(a)}{k!}(x-a)^k+h_k(x)(x-a)^k \end{equation} 
        
       
     f(x)=f(a)+f′(a)(x−a)+2!f′′(a)(x−a)2+⋯+k!f(k)(a)(x−a)k+hk(x)(x−a)k
- 那么我们只提取二次项, 
      
       
        
        
          x 
         
        
          + 
         
        
          Δ 
         
        
          x 
         
        
          → 
         
        
          x 
         
        
          ; 
         
        
          x 
         
        
          → 
         
        
          a 
         
        
       
         x+\Delta x \rightarrow x;x\rightarrow a 
        
       
     x+Δx→x;x→a 可得如下:
 f ( x + Δ x ) ≈ f ( x ) + f ′ ( x ) Δ x + f ′ ′ ( x ) 2 ! Δ x 2 \begin{equation} f(x+\Delta x)\approx f(x)+f'(x)\Delta x+\frac{f''(x)}{2!}\Delta x^2 \end{equation} f(x+Δx)≈f(x)+f′(x)Δx+2!f′′(x)Δx2
- 上面的公式中x为标量,现在我们需要用到向量 x
-  
      
       
        
        
          a 
         
        
          , 
         
        
          b 
         
        
       
         a,b 
        
       
     a,b均为1维列向量,S为对称矩阵时,我们可得得到如下:
 a T b = c , x T S x = d → c , d 均为标量 \begin{equation} a^Tb=c,x^TSx=d\rightarrow c,d均为标量 \end{equation} aTb=c,xTSx=d→c,d均为标量
- 定义如下:
 x = [ x 1 x 2 ⋯ x n ] T , f = [ f 1 f 2 ⋯ f n ] T \begin{equation} x=\begin{bmatrix}x_1&x_2&\cdots&x_n\end{bmatrix}^T,f=\begin{bmatrix}f_1&f_2&\cdots&f_n\end{bmatrix}^T \end{equation} x=[x1x2⋯xn]T,f=[f1f2⋯fn]T
 f ′ ( x ) = ∇ F = [ ∂ f ∂ x 1 ∂ f ∂ x 1 ⋯ ∂ f ∂ x n ] T → f ′ ( x ) Δ x = ( Δ x ) T ∇ F ( x ) \begin{equation} f'(x)=\nabla F=\begin{bmatrix}\frac{\partial f}{\partial x_1}&\frac{\partial f}{\partial x_1}&\cdots&\frac{\partial f}{\partial x_n}\end{bmatrix}^T \rightarrow f'(x)\Delta x=(\Delta x)^T \nabla F(x) \end{equation} f′(x)=∇F=[∂x1∂f∂x1∂f⋯∂xn∂f]T→f′(x)Δx=(Δx)T∇F(x)
-  
      
       
        
         
         
           H 
          
          
          
            j 
           
          
            k 
           
          
         
        
       
         H_{jk} 
        
       
     Hjk为hessian matrix具有对称性
 f ′ ′ ( x ) = H j k = ∂ 2 F ∂ x j ⋅ ∂ x k → f ′ ′ ( x ) 2 ! Δ x 2 = 1 2 ( Δ x ) T H j k ( Δ x ) \begin{equation} f''(x)=H_{jk}=\frac{\partial^2F}{\partial x_j\cdot \partial x_k}\rightarrow \frac{f''(x)}{2!}\Delta x^2=\frac{1}{2}(\Delta x)^T H_{jk}(\Delta x) \end{equation} f′′(x)=Hjk=∂xj⋅∂xk∂2F→2!f′′(x)Δx2=21(Δx)THjk(Δx)
- 整理上述公式可得:
 F ( x + Δ x ) ≈ F ( x ) + ( Δ x ) T ∇ F ( x ) + 1 2 ( Δ x ) T H j k ( Δ x ) \begin{equation} F(x+\Delta x)\approx F(x)+(\Delta x)^T \nabla F(x)+\frac{1}{2}(\Delta x)^T H_{jk}(\Delta x) \end{equation} F(x+Δx)≈F(x)+(Δx)T∇F(x)+21(Δx)THjk(Δx)
3. 雅可比矩阵
假设有一个m维度向量函数 
     
      
       
       
         f 
        
       
         ( 
        
       
         x 
        
       
         ) 
        
       
         = 
        
        
         
         
           [ 
          
          
           
            
             
              
               
               
                 f 
                
               
                 1 
                
               
              
                ( 
               
              
                x 
               
              
                ) 
               
              
             
            
            
             
              
               
               
                 f 
                
               
                 2 
                
               
              
                ( 
               
              
                x 
               
              
                ) 
               
              
             
            
            
             
              
              
                ⋯ 
               
               
               
                 f 
                
               
                 m 
                
               
              
                ( 
               
              
                x 
               
              
                ) 
               
              
             
            
           
          
         
           ] 
          
         
        
          T 
         
        
       
      
        f(x)=\begin{bmatrix}f_1(x)&f_2(x)&\cdots f_m(x)\end{bmatrix}^T 
       
      
    f(x)=[f1(x)f2(x)⋯fm(x)]T[列向量],其中
  
     
      
       
       
         x 
        
       
         = 
        
        
         
         
           [ 
          
          
           
            
             
              
              
                x 
               
              
                1 
               
              
             
            
            
             
              
              
                x 
               
              
                2 
               
              
             
            
            
             
             
               ⋯ 
              
             
            
            
             
              
              
                x 
               
              
                n 
               
              
             
            
           
          
         
           ] 
          
         
        
          T 
         
        
       
      
        x=\begin{bmatrix}x_1&x_2&\cdots&x_n\end{bmatrix}^T 
       
      
    x=[x1x2⋯xn]T是一个n维输入向量,雅可比矩阵J是一个 
     
      
       
       
         m 
        
       
         × 
        
       
         n 
        
       
      
        m\times n 
       
      
    m×n的矩阵,其元素由函数的偏导数组成:雅可比矩阵第i行第j列表示的是 
     
      
       
        
        
          f 
         
        
          i 
         
        
       
         ( 
        
       
         x 
        
       
         ) 
        
       
      
        f_i(x) 
       
      
    fi(x)对 
     
      
       
        
        
          x 
         
        
          i 
         
        
       
      
        x_i 
       
      
    xi的偏导
  
      
       
        
         
          
          
           
            
             
             
               J 
              
              
              
                i 
               
              
                j 
               
              
             
            
              = 
             
             
              
              
                ∂ 
               
               
               
                 f 
                
               
                 i 
                
               
              
                ( 
               
              
                x 
               
              
                ) 
               
              
              
              
                ∂ 
               
               
               
                 x 
                
               
                 j 
                
               
              
             
            
           
          
          
          
         
        
       
         \begin{equation} J_{ij}=\frac{\partial f_i(x)}{\partial x_j} \end{equation} 
        
       
     Jij=∂xj∂fi(x)
-  本质上就是函数值 f i ( x ) f_i(x) fi(x)对 x i x_i xi的每个元素求导: 
-  第一步假设 f i ( x ) f_i(x) fi(x)是常数, ∂ f i ( x ) ∂ X \frac{\partial f_i(x)}{\partial X} ∂X∂fi(x)为分子布局,遵循标量不变,向量拉伸原则 
-  XY拉伸术,分子布局, X横向拉,Y纵向拉,可得如下:
 ∂ f i ( x ) ∂ X = [ ∂ f i ( x ) ∂ x 1 ∂ f i ( x ) ∂ x 2 ⋯ ∂ f i ( x ) ∂ x n ] \begin{equation} \frac{\partial f_i(x)}{\partial X}= \begin{bmatrix} \frac{\partial f_i(x)}{\partial x_1}& \frac{\partial f_i(x)}{\partial x_2}& \cdots& \frac{\partial f_i(x)}{\partial x_n} \end{bmatrix} \end{equation} ∂X∂fi(x)=[∂x1∂fi(x)∂x2∂fi(x)⋯∂xn∂fi(x)]
-  第二步假设 f ( x ) f(x) f(x)为向量, ∂ f ( x ) ∂ X \frac{\partial f(x)}{\partial X} ∂X∂f(x)为分子布局,遵循标量不变,向量拉伸原则 
-  XY拉伸术,分子布局, X横向拉,Y 纵向拉,可得如下:
 J = [ ∂ f 1 ( x ) ∂ x 1 ∂ f 1 ( x ) ∂ x 2 ⋯ ∂ f 1 ( x ) ∂ x n ∂ f 2 ( x ) ∂ x 1 ∂ f 2 ( x ) ∂ x 2 ⋯ ∂ f 2 ( x ) ∂ x n ⋮ ⋮ ⋯ ⋮ ∂ f m ( x ) ∂ x 1 ∂ f m ( x ) ∂ x 2 ⋯ ∂ f m ( x ) ∂ x n ] \begin{equation} \mathrm{J}= \begin{bmatrix} \frac{\partial f_1(x)}{\partial x_1}&\frac{\partial f_1(x)}{\partial x_2}&\cdots&\frac{\partial f_1(x)}{\partial x_n}\\\\ \frac{\partial f_2(x)}{\partial x_1}&\frac{\partial f_2(x)}{\partial x_2}&\cdots&\frac{\partial f_2(x)}{\partial x_n} \\\\ \vdots&\vdots&\cdots&\vdots\\\\\ \frac{\partial f_m(x)}{\partial x_1}&\frac{\partial f_m(x)}{\partial x_2}&\cdots& \frac{\partial f_m(x)}{\partial x_n} \end{bmatrix} \end{equation} J= ∂x1∂f1(x)∂x1∂f2(x)⋮ ∂x1∂fm(x)∂x2∂f1(x)∂x2∂f2(x)⋮∂x2∂fm(x)⋯⋯⋯⋯∂xn∂f1(x)∂xn∂f2(x)⋮∂xn∂fm(x) 
-  泰勒公式1阶展开可得: 
 f ( x + Δ x ) = f ( x ) + f ′ ( x ) Δ x \begin{equation} f(x+\Delta x)=f(x)+f'(x)\Delta x \end{equation} f(x+Δx)=f(x)+f′(x)Δx
-  转换成雅可比矩阵可得: 
 f ( x + Δ x ) = f ( x ) + J j k Δ x ; J j k = ∂ f j ( x ) ∂ x k \begin{equation} f(x+\Delta x)=f(x)+\mathrm{J}_{jk}\Delta x;\mathrm{J}_{jk}=\frac{\partial f_j(x)}{\partial x_k} \end{equation} f(x+Δx)=f(x)+JjkΔx;Jjk=∂xk∂fj(x)
4. 经典牛顿法
4.1 经典牛顿法理论
我们已经知道了函数的二阶泰勒展开表示如下:
  
      
       
        
         
          
          
           
            
            
              F 
             
            
              ( 
             
            
              x 
             
            
              + 
             
            
              Δ 
             
            
              x 
             
            
              ) 
             
            
              ≈ 
             
            
              F 
             
            
              ( 
             
            
              x 
             
            
              ) 
             
            
              + 
             
            
              ( 
             
            
              Δ 
             
            
              x 
             
             
             
               ) 
              
             
               T 
              
             
            
              ∇ 
             
            
              F 
             
            
              ( 
             
            
              x 
             
            
              ) 
             
            
              + 
             
             
             
               1 
              
             
               2 
              
             
            
              ( 
             
            
              Δ 
             
            
              x 
             
             
             
               ) 
              
             
               T 
              
             
             
             
               H 
              
              
              
                j 
               
              
                k 
               
              
             
            
              ( 
             
            
              Δ 
             
            
              x 
             
            
              ) 
             
            
           
          
          
          
         
        
       
         \begin{equation} F(x+\Delta x)\approx F(x)+(\Delta x)^T \nabla F(x)+\frac{1}{2}(\Delta x)^T H_{jk}(\Delta x) \end{equation} 
        
       
     F(x+Δx)≈F(x)+(Δx)T∇F(x)+21(Δx)THjk(Δx)
- 一般如果在 
      
       
        
         
         
           x 
          
         
           ∗ 
          
         
        
       
         x^* 
        
       
     x∗处取得最小值,那么其导数为0;现在我们求导可得:
 d F ( x ) d Δ x = 0 ; ( Δ x ) T ∇ F ( x ) d Δ x = ∇ F ( x ) ; d 1 2 ( Δ x ) T H j k ( Δ x ) d Δ x = H j k Δ x ; \begin{equation} \frac{\mathrm{d}F(x)}{\mathrm{d}\Delta x}=0;\frac{(\Delta x)^T \nabla F(x)}{\mathrm{d}\Delta x}=\nabla F(x);\frac{\mathrm{d}\frac{1}{2}(\Delta x)^T H_{jk}(\Delta x)}{\mathrm{d}\Delta x}=H_{jk}\Delta x; \end{equation} dΔxdF(x)=0;dΔx(Δx)T∇F(x)=∇F(x);dΔxd21(Δx)THjk(Δx)=HjkΔx;
 d F ( x + Δ x ) d Δ x = 0 + ∇ F ( x ) + H j k Δ x = 0 \begin{equation} \frac{\mathrm{d}F(x+\Delta x)}{\mathrm{d}\Delta x}=0+\nabla F(x)+H_{jk}\Delta x=0 \end{equation} dΔxdF(x+Δx)=0+∇F(x)+HjkΔx=0
- 当 
      
       
        
         
         
           H 
          
          
          
            j 
           
          
            k 
           
          
         
        
          = 
         
         
         
           J 
          
          
          
            j 
           
          
            k 
           
          
         
        
       
         H_{jk}=\mathrm{J}_{jk} 
        
       
     Hjk=Jjk可逆时, 
      
       
        
        
          Δ 
         
        
          x 
         
        
          = 
         
         
         
           x 
          
          
          
            k 
           
          
            + 
           
          
            1 
           
          
         
        
          − 
         
         
         
           x 
          
         
           k 
          
         
        
       
         \Delta x=x_{k+1}-x_k 
        
       
     Δx=xk+1−xk可得:
 − [ H j k ] − 1 ∇ F ( x ) = x k + 1 − x k → x k + 1 = x k − [ J j k ] − 1 ∇ F ( x ) \begin{equation} -[H_{jk}]^{-1}\nabla F(x)=x_{k+1}-x_k\rightarrow x_{k+1}=x_k-[\mathrm{J}_{jk}]^{-1}\nabla F(x) \end{equation} −[Hjk]−1∇F(x)=xk+1−xk→xk+1=xk−[Jjk]−1∇F(x)
- 我们定义 
      
       
        
        
          ∇ 
         
        
          F 
         
        
          ( 
         
        
          x 
         
        
          ) 
         
        
          = 
         
        
          f 
         
        
          ( 
         
         
         
           x 
          
         
           k 
          
         
        
          ) 
         
        
       
         \nabla F(x)=f(x_k) 
        
       
     ∇F(x)=f(xk), 
      
       
        
         
         
           J 
          
          
          
            j 
           
          
            k 
           
          
         
        
          = 
         
         
         
           J 
          
          
          
            x 
           
          
            k 
           
          
         
        
       
         \mathrm{J}_{jk}=\mathrm{J}_{x_k} 
        
       
     Jjk=Jxk
 x k + 1 = x k − [ J x k ] − 1 f ( x k ) \begin{equation} x_{k+1}=x_k-[\mathrm{J}_{x_k}]^{-1}f(x_k) \end{equation} xk+1=xk−[Jxk]−1f(xk)
4.2 牛顿迭代法解求方程根
- 已知: f ( x ) = x 2 − 9 = 0 f(x)=x^2-9=0 f(x)=x2−9=0,用牛顿迭代的方法求解方程的根
- 根据迭代公式可得: 
      
       
        
         
         
           f 
          
         
           ′ 
          
         
        
          ( 
         
        
          x 
         
        
          ) 
         
        
          = 
         
         
         
           J 
          
          
          
            x 
           
          
            k 
           
          
         
        
          = 
         
        
          2 
         
        
          x 
         
        
          , 
         
        
          f 
         
        
          ( 
         
         
         
           x 
          
         
           k 
          
         
        
          ) 
         
        
          = 
         
         
         
           x 
          
         
           k 
          
         
           2 
          
         
        
          − 
         
        
          9 
         
        
       
         f'(x)=\mathrm{J}_{x_k}=2x,f(x_k)=x_k^2-9 
        
       
     f′(x)=Jxk=2x,f(xk)=xk2−9
 x k + 1 = x k − [ J x k ] − 1 f ( x k ) → x k + 1 = x k − f ( x k ) J x k \begin{equation} x_{k+1}=x_k-[\mathrm{J}_{x_k}]^{-1}f(x_k)\rightarrow x_{k+1}=x_k-\frac{f(x_k)}{\mathrm{J}_{x_k}} \end{equation} xk+1=xk−[Jxk]−1f(xk)→xk+1=xk−Jxkf(xk)
- 整理可得:
 x k + 1 = x k − x k 2 − 9 2 x k = 1 2 x k + 9 2 x k \begin{equation} x_{k+1}=x_k-\frac{x_k^2-9}{2x_k}=\frac{1}{2}x_k+\frac{9}{2x_k} \end{equation} xk+1=xk−2xkxk2−9=21xk+2xk9
- 收敛依据:
 判断新的近似值 x k + 1 x_{k+1} xk+1与当前值 x k x_k xk之间的差距是否小于某个值 ϵ = 1 0 − 10 \epsilon=10^{-10} ϵ=10−10,如果小于该值则认为收敛,否则继续迭代。
- 我们先设置初始值 
      
       
        
         
         
           x 
          
         
           0 
          
         
        
          = 
         
        
          2 
         
        
       
         x_0=2 
        
       
     x0=2可得 
      
       
        
         
         
           x 
          
         
           1 
          
         
        
       
         x_1 
        
       
     x1:
 x 1 = 1 2 x 0 + 9 2 x 0 = 3.25 ; \begin{equation} x_{1}=\frac{1}{2}x_0+\frac{9}{2x_0}=3.25; \end{equation} x1=21x0+2x09=3.25;
- 继续迭代得  
      
       
        
         
         
           x 
          
         
           2 
          
         
        
       
         x_2 
        
       
     x2
 x 2 = 1 2 x 1 + 9 2 x 1 = 3.0096153846153846 ; \begin{equation} x_{2}=\frac{1}{2}x_1+\frac{9}{2x_1}=3.0096153846153846; \end{equation} x2=21x1+2x19=3.0096153846153846;
- 继续迭代得  
      
       
        
         
         
           x 
          
         
           3 
          
         
        
       
         x_3 
        
       
     x3
 x 3 = 1 2 x 2 + 9 2 x 2 = 3.000015360039322 ; \begin{equation} x_{3}=\frac{1}{2}x_2+\frac{9}{2x_2}=3.000015360039322; \end{equation} x3=21x2+2x29=3.000015360039322;
- 继续迭代得  
      
       
        
         
         
           x 
          
         
           4 
          
         
        
       
         x_4 
        
       
     x4
 x 4 = 1 2 x 3 + 9 2 x 3 = 3.0000000000393214 ; \begin{equation} x_{4}=\frac{1}{2}x_3+\frac{9}{2x_3}=3.0000000000393214; \end{equation} x4=21x3+2x39=3.0000000000393214;
- 可得 x 2 − 9 = 0 x^2-9=0 x2−9=0的解为 x 1 ∗ = 3 x_1^*=3 x1∗=3,同理初始化为 x 0 = − 2 x_0=-2 x0=−2 可得 x 2 ∗ = − 3 x_2^*=-3 x2∗=−3
4.3 牛顿迭代法解求方程根 Python
- 代码: Python代码如下:
def newton_raphson(f, f_prime, x0, tol=1e-10, max_iter=100):
    x = x0
    for i in range(max_iter):
        fx = f(x)
        fpx = f_prime(x)
        # Newton-Raphson iteration
        x_new = x - fx / fpx
        print(f"Iteration {i + 1}: x = {x_new}")
        if abs(x_new - x) < tol:
            return x_new
        x = x_new
    raise ValueError("Newton-Raphson method did not converge")
# Define the function and its first derivative
f = lambda x: x ** 2 - 9
f_prime = lambda x: 2 * x
# Initial guesses
initial_guesses = [2, -2]
# Find the roots
for x0 in initial_guesses:
    root = newton_raphson(f, f_prime, x0)
    print(f"The root starting from {x0} is: {root}")
- 运行结果:
Iteration 1: x = 3.25
Iteration 2: x = 3.0096153846153846
Iteration 3: x = 3.000015360039322
Iteration 4: x = 3.0000000000393214
Iteration 5: x = 3.0
The root starting from 2 is: 3.0
Iteration 1: x = -3.25
Iteration 2: x = -3.0096153846153846
Iteration 3: x = -3.000015360039322
Iteration 4: x = -3.0000000000393214
Iteration 5: x = -3.0
The root starting from -2 is: -3.0
5. 梯度下降和经典牛顿法
对于无约束问题的梯度下降,我们一般有两种方法:
5.1 线搜索方法
运用泰勒一阶信息,迭代方向为负梯度方向:
- 迭代方程:
 x k + 1 = x k + α k p k \begin{equation} x_{k+1}=x_k +\alpha_k p_k \end{equation} xk+1=xk+αkpk
- 方向 p k p_k pk:负梯度方向 − ∇ F -\nabla F −∇F
- 步长: α k = s k \alpha_k=s_k αk=sk,深度学习中叫学习率
- 更新后的方程如下:
 x k + 1 = x k − s k ∇ F \begin{equation} x_{k+1}=x_k -s_k \nabla F \end{equation} xk+1=xk−sk∇F
5.2 经典牛顿法
运用泰勒二阶信息,迭代方向为牛顿方向:迭代步长为 α 1 = 1 \alpha_1=1 α1=1
- 迭代方程为,hessian matrix->H j k H_{jk} Hjk可逆:
 x k + 1 = x k − [ H j k ] − 1 ∇ F ( x ) \begin{equation} x_{k+1}=x_k-[H_{jk}]^{-1}\nabla F(x) \end{equation} xk+1=xk−[Hjk]−1∇F(x)
- 经典牛顿法为二次性收敛,速度非常快,具体分析请参考如下博客
 [优化算法]经典牛顿法
6. 凸优化问题
6.1 约束问题
我们定义凸函数为 
     
      
       
       
         f 
        
       
         ( 
        
       
         x 
        
       
         ) 
        
       
      
        f(x) 
       
      
    f(x),凸集为  
     
      
       
       
         K 
        
       
      
        \mathrm{K} 
       
      
    K,我们的目的是为了求得凸函数 
     
      
       
       
         f 
        
       
         ( 
        
       
         x 
        
       
         ) 
        
       
      
        f(x) 
       
      
    f(x)的最小值
  
      
       
        
         
          
          
           
            
             
              
              
                min 
               
              
                 
               
              
              
              
                x 
               
              
                ∈ 
               
              
                K 
               
              
             
            
              f 
             
            
              ( 
             
            
              x 
             
            
              ) 
             
            
              , 
             
            
              K 
             
            
              : 
             
            
              A 
             
            
              x 
             
            
              = 
             
            
              b 
             
            
           
          
          
          
         
        
       
         \begin{equation} \min\limits_{x\in K} f(x), \mathrm{K}:Ax=b \end{equation} 
        
       
     x∈Kminf(x),K:Ax=b
- f ( x ) f(x) f(x)表示的是所有在碗内部上的和碗内表面上的点
- 求的是在碗内表面的上的最小值,碗的形状就是约束条件 
      
       
        
        
          A 
         
        
          x 
         
        
          = 
         
        
          b 
         
        
       
         Ax=b 
        
       
     Ax=b
  
6.1 凸集组合
- 如果 
      
       
        
         
         
           x 
          
         
           1 
          
         
        
          , 
         
         
         
           x 
          
         
           2 
          
         
        
       
         x_1,x_2 
        
       
     x1,x2均在凸集里面,则由 
      
       
        
         
         
           x 
          
         
           1 
          
         
        
          , 
         
         
         
           x 
          
         
           2 
          
         
        
       
         x_1,x_2 
        
       
     x1,x2组成的直线L在凸集里面
  
- 如果 
      
       
        
         
         
           x 
          
         
           1 
          
         
        
          , 
         
         
         
           x 
          
         
           2 
          
         
        
       
         x_1,x_2 
        
       
     x1,x2分别在不同的凸集里面,则由 
      
       
        
         
         
           x 
          
         
           1 
          
         
        
          , 
         
         
         
           x 
          
         
           2 
          
         
        
       
         x_1,x_2 
        
       
     x1,x2组成的直线L不在凸集里面
  
- 小结:合并图集里面组合的直线不在凸集里面。
- 如果 
      
       
        
         
         
           x 
          
         
           1 
          
         
        
          , 
         
         
         
           x 
          
         
           2 
          
         
        
       
         x_1,x_2 
        
       
     x1,x2都在不同的凸集里面的交集里面,则由 
      
       
        
         
         
           x 
          
         
           1 
          
         
        
          , 
         
         
         
           x 
          
         
           2 
          
         
        
       
         x_1,x_2 
        
       
     x1,x2组成的直线L在凸集中
  
- 假设我们有两个凸函数 
      
       
        
         
         
           F 
          
         
           1 
          
         
        
          ( 
         
        
          x 
         
        
          ) 
         
        
          , 
         
         
         
           F 
          
         
           2 
          
         
        
          ( 
         
        
          x 
         
        
          ) 
         
        
       
         F_1(x),F_2(x) 
        
       
     F1(x),F2(x),我们定义如下:
 min  ( x ) = min  [ F 1 ( x ) , F 2 ( x ) ] ; max  ( x ) = max  [ F 1 ( x ) , F 2 ( x ) ] ; \begin{equation} \min(x)=\min[F_1(x),F_2(x)];\max(x)=\max[F_1(x),F_2(x)]; \end{equation} min(x)=min[F1(x),F2(x)];max(x)=max[F1(x),F2(x)];
- 如果两个凸集相交,那么相交的凸集最大值,最小值如下:
 min  ( x ) = min  [ F 1 ( x ) , F 2 ( x ) ] − > 非凸; max  ( x ) = max  [ F 1 ( x ) , F 2 ( x ) ] − > 凸 ; \begin{equation} \min(x)=\min[F_1(x),F_2(x)]-> 非凸;\max(x)=\max[F_1(x),F_2(x)]->凸; \end{equation} min(x)=min[F1(x),F2(x)]−>非凸;max(x)=max[F1(x),F2(x)]−>凸;
- 凸函数判断
 d 2 f ( x ) d x 2 ≥ 0 \begin{equation} \frac{\mathrm{d}^2f(x)}{\mathrm{d}x^2}\ge 0 \end{equation} dx2d2f(x)≥0



















