Contents
- Introduction
- The Geometry of the Poincaré Ball
- Hyperbolic space: the Poincaré ball
- Gyrovector spaces (陀螺矢量空间)
- Möbius addition
- Möbius scalar multiplication
- Distance
- Hyperbolic trigonometry
 
- Connecting Gyrovector spaces and Riemannian geometry of the Poincaré ball
 
- Hyperbolic Neural Networks
- Möbius version
- Hyperbolic multiclass logistic regression (MLR) (softmax regression)
- Hyperbolic feed-forward layers
- Hyperbolic RNN
 
- Experiments
- References
Introduction
- 作者认为,目前双曲几何的表征能力还不及欧氏几何的原因在于还没有相应的 hyperbolic neural network layers,这使得我们很难将 hyperbolic embeddings 应用到下游任务中。为此,作者将 Möbius gyrovector spaces 和 Poincaré model 进行了结合,最终推导出了一些神经网络的双曲版本:多项式逻辑回归模型 (Multinomial logistic regression, MLR), 前馈网络 (FFNN) 和 GRU 等循环神经网络 (RNN),这使得我们能在双曲空间中进行数据嵌入和分类
- 这篇工作让我们能更好地在双曲空间中进行数据嵌入和分类,也给出了结合欧式模型和双曲模型的方法,这能启发我们更好地运用 hyperbolic embeddings. 下面是一些关于实验部分的问题:作者在实验时使用的 embed 维数还是很小的,而现在一般模型的 embed 维数很多都是 512、1024 等,这种小维数的实验设置有利于双曲模型,不知道在大维数的条件下双曲模型是否还具备优势?另外实验部分的结果似乎也表明,双曲模型只有在数据非常符合树形结构的情况下才有用,否则很可能性能还不如欧式模型;最后,作者在论文中提到 “highly non-convex spectrum of hyperbolic neural networks sometimes results in convergence to poor local minima, suggesting that initialization is very important”,这是否意味着双曲模型的训练比较不稳定?
The Geometry of the Poincaré Ball
Hyperbolic space: the Poincaré ball
- Poincaré ball 可以表示为 
     
      
       
        
         (
        
        
         
          D
         
         
          n
         
        
        
         ,
        
        
         
          g
         
         
          D
         
        
        
         )
        
       
       
        (\mathbb D^n,g^{\mathbb D})
       
      
     (Dn,gD),其中 
     
      
       
        
         
          D
         
         
          n
         
        
        
         =
        
        
         {
        
        
         x
        
        
         ∈
        
        
         
          R
         
         
          n
         
        
        
         :
        
        
         ∥
        
        
         x
        
        
         ∥
        
        
         <
        
        
         1
        
        
         }
        
       
       
        \mathbb D^n=\{x\in\R^n:\|x\|<1\}
       
      
     Dn={x∈Rn:∥x∥<1},
     
      
       
        
         
          g
         
         
          D
         
        
       
       
        g^{\mathbb D}
       
      
     gD 为 Riemannian metric:
  其中 
     
      
       
        
         
          g
         
         
          E
         
        
        
         =
        
        
         
          I
         
         
          n
         
        
       
       
        g^E=I_n
       
      
     gE=In 为 Euclidean metric tensor. Induced distance 为 其中 
     
      
       
        
         
          g
         
         
          E
         
        
        
         =
        
        
         
          I
         
         
          n
         
        
       
       
        g^E=I_n
       
      
     gE=In 为 Euclidean metric tensor. Induced distance 为
  同时 Poincaré ball model 还具有保角性 同时 Poincaré ball model 还具有保角性
  
Gyrovector spaces (陀螺矢量空间)
- 在欧氏几何中,向量空间为我们提供了向量加减、标量乘等代数运算操作,而在双曲几何中,gyrovector spaces 则同样提供了这些代数运算操作,这些运算已经被运用在了狭义相对论中,可以在半径为 c c c (the celerity, i.e. the speed of light) 的 Poincaré ball 中进行速度向量的相加,从而保证得到的速度大小不会超过光速。我们可以定义陀螺矢量空间 D c n : = { x ∈ R n ∣ c ∥ x ∥ 2 < 1 } \mathbb D_c^n:=\{x\in\R^n|c\|x\|^2<1\} Dcn:={x∈Rn∣c∥x∥2<1},其中 c ≥ 0 c\geq0 c≥0. 当 c = 0 c=0 c=0 时,有 D c n = R n \mathbb D_c^n=\R^n Dcn=Rn,当 c > 0 c>0 c>0 时, D c n \mathbb D_c^n Dcn 为半径 1 / c 1/\sqrt c 1/c 的 open ball,当 c = 1 c=1 c=1 时, D c n \mathbb D_c^n Dcn 为单位球体
Möbius addition
- Möbius addition. The Möbius addition of 
     
      
       
        
         x
        
       
       
        x
       
      
     x and 
     
      
       
        
         y
        
       
       
        y
       
      
     y in 
     
      
       
        
         
          D
         
         
          c
         
         
          n
         
        
       
       
        \mathbb D_c^n
       
      
     Dcn is defined as
  当 
     
      
       
        
         c
        
        
         =
        
        
         0
        
       
       
        c=0
       
      
     c=0 时,Möbius addition 就退化为了欧氏几何中的向量加。当 
     
      
       
        
         c
        
        
         >
        
        
         0
        
       
       
        c>0
       
      
     c>0 时,Möbius addition 不满足交换律和结合律,但它满足对任意 
     
      
       
        
         x
        
        
         ∈
        
        
         
          D
         
         
          c
         
         
          n
         
        
       
       
        x\in\mathbb D_c^n
       
      
     x∈Dcn 都存在零元和逆元 
     
      
       
        
         x
        
        
         
          ⊕
         
         
          c
         
        
        
         0
        
        
         =
        
        
         0
        
        
         
          ⊕
         
         
          c
         
        
        
         x
        
        
         =
        
        
         x
        
       
       
        x \oplus_c \mathbf{0}=\mathbf{0} \oplus_c x=x
       
      
     x⊕c0=0⊕cx=x,
     
      
       
        
         (
        
        
         −
        
        
         x
        
        
         )
        
        
         
          ⊕
         
         
          c
         
        
        
         x
        
        
         =
        
        
         x
        
        
         
          ⊕
         
         
          c
         
        
        
         (
        
        
         −
        
        
         x
        
        
         )
        
        
         =
        
        
         0
        
       
       
        (-x) \oplus_c x=x \oplus_c(-x)=\mathbf{0}
       
      
     (−x)⊕cx=x⊕c(−x)=0. 并且满足左消去律 
     
      
       
        
         (
        
        
         −
        
        
         x
        
        
         )
        
        
         
          ⊕
         
         
          c
         
        
        
         
          (
         
         
          x
         
         
          
           ⊕
          
          
           c
          
         
         
          y
         
         
          )
         
        
        
         =
        
        
         y
        
       
       
        (-x) \oplus_c\left(x \oplus_c y\right)=y
       
      
     (−x)⊕c(x⊕cy)=y. 下文作者将用 
     
      
       
        
         ⊕
        
       
       
        \oplus
       
      
     ⊕ 表示 
     
      
       
        
         
          ⊕
         
         
          1
         
        
       
       
        \oplus_1
       
      
     ⊕1. 当 
     
      
       
        
         c
        
        
         =
        
        
         0
        
       
       
        c=0
       
      
     c=0 时,Möbius addition 就退化为了欧氏几何中的向量加。当 
     
      
       
        
         c
        
        
         >
        
        
         0
        
       
       
        c>0
       
      
     c>0 时,Möbius addition 不满足交换律和结合律,但它满足对任意 
     
      
       
        
         x
        
        
         ∈
        
        
         
          D
         
         
          c
         
         
          n
         
        
       
       
        x\in\mathbb D_c^n
       
      
     x∈Dcn 都存在零元和逆元 
     
      
       
        
         x
        
        
         
          ⊕
         
         
          c
         
        
        
         0
        
        
         =
        
        
         0
        
        
         
          ⊕
         
         
          c
         
        
        
         x
        
        
         =
        
        
         x
        
       
       
        x \oplus_c \mathbf{0}=\mathbf{0} \oplus_c x=x
       
      
     x⊕c0=0⊕cx=x,
     
      
       
        
         (
        
        
         −
        
        
         x
        
        
         )
        
        
         
          ⊕
         
         
          c
         
        
        
         x
        
        
         =
        
        
         x
        
        
         
          ⊕
         
         
          c
         
        
        
         (
        
        
         −
        
        
         x
        
        
         )
        
        
         =
        
        
         0
        
       
       
        (-x) \oplus_c x=x \oplus_c(-x)=\mathbf{0}
       
      
     (−x)⊕cx=x⊕c(−x)=0. 并且满足左消去律 
     
      
       
        
         (
        
        
         −
        
        
         x
        
        
         )
        
        
         
          ⊕
         
         
          c
         
        
        
         
          (
         
         
          x
         
         
          
           ⊕
          
          
           c
          
         
         
          y
         
         
          )
         
        
        
         =
        
        
         y
        
       
       
        (-x) \oplus_c\left(x \oplus_c y\right)=y
       
      
     (−x)⊕c(x⊕cy)=y. 下文作者将用 
     
      
       
        
         ⊕
        
       
       
        \oplus
       
      
     ⊕ 表示 
     
      
       
        
         
          ⊕
         
         
          1
         
        
       
       
        \oplus_1
       
      
     ⊕1.
  
- Möbius substraction
  
Möbius scalar multiplication
- Möbius scalar multiplication. For 
     
      
       
        
         c
        
        
         >
        
        
         0
        
       
       
        c > 0
       
      
     c>0, the Möbius scalar multiplication of 
     
      
       
        
         x
        
        
         ∈
        
        
         
          D
         
         
          c
         
         
          n
         
        
        
         \
        
        
         {
        
        
         0
        
        
         }
        
       
       
        x\in \mathbb D^n_c\backslash\{\mathbf 0\}
       
      
     x∈Dcn\{0} by 
     
      
       
        
         r
        
        
         ∈
        
        
         R
        
       
       
        r \in \R
       
      
     r∈R is defined as
  注意到,
     
      
       
        
         r
        
        
         
          ⊗
         
         
          c
         
        
        
         0
        
        
         :
        
        
         =
        
        
         0
        
       
       
        r \otimes_c \mathbf{0}:=\mathbf{0}
       
      
     r⊗c0:=0. 当 
     
      
       
        
         c
        
        
         →
        
        
         0
        
       
       
        c\rightarrow 0
       
      
     c→0 时,可以得到 Euclidean scalar multiplication 
     
      
       
        
         
          
           lim
          
          
           
          
         
         
          
           c
          
          
           →
          
          
           0
          
         
        
        
         r
        
        
         
          ⊗
         
         
          c
         
        
        
         x
        
        
         =
        
        
         r
        
        
         x
        
       
       
        \lim _{c \rightarrow 0} r \otimes_c x=r x
       
      
     limc→0r⊗cx=rx. Möbius scalar multiplication 满足如下性质:(1) 
     
      
       
        
         n
        
       
       
        n
       
      
     n additions. 
     
      
       
        
         n
        
        
         
          ⊗
         
         
          c
         
        
        
         x
        
        
         =
        
        
         x
        
        
         
          ⊕
         
         
          c
         
        
        
         ⋯
        
        
         
          ⊕
         
         
          c
         
        
        
         x
        
       
       
        n \otimes_c x=x \oplus_c \cdots \oplus_c x
       
      
     n⊗cx=x⊕c⋯⊕cx;(2) scalar distributivity. 
     
      
       
        
         
          (
         
         
          r
         
         
          +
         
         
          
           r
          
          
           ′
          
         
         
          )
         
        
        
         
          ⊗
         
         
          c
         
        
        
         x
        
        
         =
        
        
         r
        
        
         
          ⊗
         
         
          c
         
        
        
         x
        
        
         
          ⊕
         
         
          c
         
        
        
         
          r
         
         
          ′
         
        
        
         
          ⊗
         
         
          c
         
        
        
         x
        
       
       
        \left(r+r^{\prime}\right) \otimes_c x=r \otimes_c x \oplus_c r^{\prime} \otimes_c x
       
      
     (r+r′)⊗cx=r⊗cx⊕cr′⊗cx;(3) scalar associativity. 
     
      
       
        
         
          (
         
         
          r
         
         
          
           ⊗
          
          
           c
          
         
         
          
           r
          
          
           ′
          
         
         
          )
         
        
        
         
          ⊗
         
         
          c
         
        
        
         x
        
        
         =
        
        
         r
        
        
         
          ⊗
         
         
          c
         
        
        
         
          (
         
         
          
           r
          
          
           ′
          
         
         
          
           ⊗
          
          
           c
          
         
         
          x
         
         
          )
         
        
       
       
        \left(r \otimes_c r^{\prime}\right) \otimes_c x=r \otimes_c\left(r^{\prime} \otimes_c x\right)
       
      
     (r⊗cr′)⊗cx=r⊗c(r′⊗cx);(4) scaling property. 
     
      
       
        
         ∣
        
        
         r
        
        
         ∣
        
        
         
          ⊗
         
         
          c
         
        
        
         x
        
        
         /
        
        
         
          ∥
         
         
          r
         
         
          
           ⊗
          
          
           c
          
         
         
          x
         
         
          ∥
         
        
        
         =
        
        
         x
        
        
         /
        
        
         ∥
        
        
         x
        
        
         ∥
        
       
       
        |r| \otimes_c x /\left\|r \otimes_c x\right\|=x /\|x\|
       
      
     ∣r∣⊗cx/∥r⊗cx∥=x/∥x∥ 注意到,
     
      
       
        
         r
        
        
         
          ⊗
         
         
          c
         
        
        
         0
        
        
         :
        
        
         =
        
        
         0
        
       
       
        r \otimes_c \mathbf{0}:=\mathbf{0}
       
      
     r⊗c0:=0. 当 
     
      
       
        
         c
        
        
         →
        
        
         0
        
       
       
        c\rightarrow 0
       
      
     c→0 时,可以得到 Euclidean scalar multiplication 
     
      
       
        
         
          
           lim
          
          
           
          
         
         
          
           c
          
          
           →
          
          
           0
          
         
        
        
         r
        
        
         
          ⊗
         
         
          c
         
        
        
         x
        
        
         =
        
        
         r
        
        
         x
        
       
       
        \lim _{c \rightarrow 0} r \otimes_c x=r x
       
      
     limc→0r⊗cx=rx. Möbius scalar multiplication 满足如下性质:(1) 
     
      
       
        
         n
        
       
       
        n
       
      
     n additions. 
     
      
       
        
         n
        
        
         
          ⊗
         
         
          c
         
        
        
         x
        
        
         =
        
        
         x
        
        
         
          ⊕
         
         
          c
         
        
        
         ⋯
        
        
         
          ⊕
         
         
          c
         
        
        
         x
        
       
       
        n \otimes_c x=x \oplus_c \cdots \oplus_c x
       
      
     n⊗cx=x⊕c⋯⊕cx;(2) scalar distributivity. 
     
      
       
        
         
          (
         
         
          r
         
         
          +
         
         
          
           r
          
          
           ′
          
         
         
          )
         
        
        
         
          ⊗
         
         
          c
         
        
        
         x
        
        
         =
        
        
         r
        
        
         
          ⊗
         
         
          c
         
        
        
         x
        
        
         
          ⊕
         
         
          c
         
        
        
         
          r
         
         
          ′
         
        
        
         
          ⊗
         
         
          c
         
        
        
         x
        
       
       
        \left(r+r^{\prime}\right) \otimes_c x=r \otimes_c x \oplus_c r^{\prime} \otimes_c x
       
      
     (r+r′)⊗cx=r⊗cx⊕cr′⊗cx;(3) scalar associativity. 
     
      
       
        
         
          (
         
         
          r
         
         
          
           ⊗
          
          
           c
          
         
         
          
           r
          
          
           ′
          
         
         
          )
         
        
        
         
          ⊗
         
         
          c
         
        
        
         x
        
        
         =
        
        
         r
        
        
         
          ⊗
         
         
          c
         
        
        
         
          (
         
         
          
           r
          
          
           ′
          
         
         
          
           ⊗
          
          
           c
          
         
         
          x
         
         
          )
         
        
       
       
        \left(r \otimes_c r^{\prime}\right) \otimes_c x=r \otimes_c\left(r^{\prime} \otimes_c x\right)
       
      
     (r⊗cr′)⊗cx=r⊗c(r′⊗cx);(4) scaling property. 
     
      
       
        
         ∣
        
        
         r
        
        
         ∣
        
        
         
          ⊗
         
         
          c
         
        
        
         x
        
        
         /
        
        
         
          ∥
         
         
          r
         
         
          
           ⊗
          
          
           c
          
         
         
          x
         
         
          ∥
         
        
        
         =
        
        
         x
        
        
         /
        
        
         ∥
        
        
         x
        
        
         ∥
        
       
       
        |r| \otimes_c x /\left\|r \otimes_c x\right\|=x /\|x\|
       
      
     ∣r∣⊗cx/∥r⊗cx∥=x/∥x∥
Distance
- Distance. If one defines the generalized hyperbolic metric tensor 
     
      
       
        
         
          g
         
         
          c
         
        
       
       
        g^c
       
      
     gc as the metric conformal to the Euclidean one, with conformal factor 
     
      
       
        
         
          λ
         
         
          x
         
         
          c
         
        
        
         :
        
        
         =
        
        
         2
        
        
         /
        
        
         
          (
         
         
          1
         
         
          −
         
         
          c
         
         
          ∥
         
         
          x
         
         
          
           ∥
          
          
           2
          
         
         
          )
         
        
       
       
        \lambda_x^c:=2 /\left(1-c\|x\|^2\right)
       
      
     λxc:=2/(1−c∥x∥2), then the induced distance function on 
     
      
       
        
         (
        
        
         
          D
         
         
          c
         
         
          n
         
        
        
         ,
        
        
         
          g
         
         
          c
         
        
        
         )
        
       
       
        (\mathbb D^n_c, g^c)
       
      
     (Dcn,gc) is given by
  注意到,当 
     
      
       
        
         c
        
        
         →
        
        
         0
        
       
       
        c\rightarrow 0
       
      
     c→0 时,可以得到欧式空间中的距离公式 
     
      
       
        
         
          
           lim
          
          
           
          
         
         
          
           c
          
          
           →
          
          
           0
          
         
        
        
         
          d
         
         
          c
         
        
        
         (
        
        
         x
        
        
         ,
        
        
         y
        
        
         )
        
        
         =
        
        
         2
        
        
         ∥
        
        
         x
        
        
         −
        
        
         y
        
        
         ∥
        
       
       
        \lim _{c \rightarrow 0} d_c(x, y)=2\|x-y\|
       
      
     limc→0dc(x,y)=2∥x−y∥,并且当 
     
      
       
        
         c
        
        
         =
        
        
         1
        
       
       
        c=1
       
      
     c=1 时,我们能得到 Poincaré ball 中的距离公式 注意到,当 
     
      
       
        
         c
        
        
         →
        
        
         0
        
       
       
        c\rightarrow 0
       
      
     c→0 时,可以得到欧式空间中的距离公式 
     
      
       
        
         
          
           lim
          
          
           
          
         
         
          
           c
          
          
           →
          
          
           0
          
         
        
        
         
          d
         
         
          c
         
        
        
         (
        
        
         x
        
        
         ,
        
        
         y
        
        
         )
        
        
         =
        
        
         2
        
        
         ∥
        
        
         x
        
        
         −
        
        
         y
        
        
         ∥
        
       
       
        \lim _{c \rightarrow 0} d_c(x, y)=2\|x-y\|
       
      
     limc→0dc(x,y)=2∥x−y∥,并且当 
     
      
       
        
         c
        
        
         =
        
        
         1
        
       
       
        c=1
       
      
     c=1 时,我们能得到 Poincaré ball 中的距离公式
Hyperbolic trigonometry
- Hyperbolic trigonometry. 双曲空间中的 hyperbolic angles or gyroangles 以及 hyperbolic law of sines in the generalized Poincaré ball ( D c n , g c ) (\mathbb D_c^n, g^c) (Dcn,gc). 详见论文的附录 B
Connecting Gyrovector spaces and Riemannian geometry of the Poincaré ball
- Geodesics.
  当 
     
      
       
        
         c
        
        
         →
        
        
         0
        
       
       
        c\rightarrow0
       
      
     c→0 时,我们就得到了欧式几何中的直线 当 
     
      
       
        
         c
        
        
         →
        
        
         0
        
       
       
        c\rightarrow0
       
      
     c→0 时,我们就得到了欧式几何中的直线
- Lemma 1. For any 
     
      
       
        
         x
        
        
         ∈
        
        
         
          D
         
         
          n
         
        
       
       
        x \in\mathbb D^n
       
      
     x∈Dn and 
     
      
       
        
         v
        
        
         ∈
        
        
         
          T
         
         
          x
         
        
        
         
          D
         
         
          c
         
         
          n
         
        
       
       
        v \in T_x\mathbb D_c^n
       
      
     v∈TxDcn s.t. 
     
      
       
        
         
          g
         
         
          x
         
         
          c
         
        
        
         (
        
        
         v
        
        
         ,
        
        
         v
        
        
         )
        
        
         =
        
        
         1
        
       
       
        g^c_x(v, v) = 1
       
      
     gxc(v,v)=1, the unit-speed geodesic starting from 
     
      
       
        
         x
        
       
       
        x
       
      
     x with direction 
     
      
       
        
         v
        
       
       
        v
       
      
     v is given by:
  One can sanity-check that 
     
      
       
        
         
          d
         
         
          c
         
        
        
         (
        
        
         γ
        
        
         (
        
        
         0
        
        
         )
        
        
         ,
        
        
         γ
        
        
         (
        
        
         t
        
        
         )
        
        
         )
        
        
         =
        
        
         t
        
        
         ,
        
        
         ∀
        
        
         t
        
        
         ∈
        
        
         [
        
        
         0
        
        
         ,
        
        
         1
        
        
         ]
        
       
       
        d_c(\gamma(0),\gamma(t))=t,\forall t\in[0,1]
       
      
     dc(γ(0),γ(t))=t,∀t∈[0,1] One can sanity-check that 
     
      
       
        
         
          d
         
         
          c
         
        
        
         (
        
        
         γ
        
        
         (
        
        
         0
        
        
         )
        
        
         ,
        
        
         γ
        
        
         (
        
        
         t
        
        
         )
        
        
         )
        
        
         =
        
        
         t
        
        
         ,
        
        
         ∀
        
        
         t
        
        
         ∈
        
        
         [
        
        
         0
        
        
         ,
        
        
         1
        
        
         ]
        
       
       
        d_c(\gamma(0),\gamma(t))=t,\forall t\in[0,1]
       
      
     dc(γ(0),γ(t))=t,∀t∈[0,1]
- Exponential and logarithmic maps. 指数变换是在对 
     
      
       
        
         p
        
        
         ∈
        
        
         
          D
         
         
          c
         
         
          n
         
        
       
       
        p\in\mathbb D_c^n
       
      
     p∈Dcn 施加微小扰动 
     
      
       
        
         v
        
        
         ∈
        
        
         
          T
         
         
          p
         
        
        
         
          D
         
         
          c
         
         
          n
         
        
       
       
        v\in T_p\mathbb D_c^n
       
      
     v∈TpDcn 后 (可以看作一个速度向量),将切空间上的点映射回陀螺矢量空间上,使得 
     
      
       
        
         t
        
        
         ∈
        
        
         [
        
        
         0
        
        
         ,
        
        
         1
        
        
         ]
        
        
         ↦
        
        
         
          
           exp
          
          
           
          
         
         
          p
         
         
          c
         
        
        
         (
        
        
         t
        
        
         v
        
        
         )
        
       
       
        t\in[0,1]\mapsto\exp_p^c(tv)
       
      
     t∈[0,1]↦exppc(tv) 是连接了 
     
      
       
        
         p
        
       
       
        p
       
      
     p 和 
     
      
       
        
         
          
           exp
          
          
           
          
         
         
          p
         
         
          c
         
        
        
         (
        
        
         v
        
        
         )
        
       
       
        \exp_p^c(v)
       
      
     exppc(v) 的测地线,i.e., a geodesic 
     
      
       
        
         γ
        
       
       
        γ
       
      
     γ starting from 
     
      
       
        
         γ
        
        
         (
        
        
         0
        
        
         )
        
        
         :
        
        
         =
        
        
         x
        
        
         ∈
        
        
         M
        
       
       
        γ(0) := x ∈ M
       
      
     γ(0):=x∈M with unit-norm direction 
     
      
       
        
         
          γ
         
         
          ˙
         
        
        
         (
        
        
         0
        
        
         )
        
        
         :
        
        
         =
        
        
         v
        
        
         ∈
        
        
         
          T
         
         
          x
         
        
        
         M
        
       
       
        \dot γ(0) := v ∈ T_xM
       
      
     γ˙(0):=v∈TxM as 
     
      
       
        
         t
        
        
         ↦
        
        
         
          
           exp
          
          
           
          
         
         
          x
         
        
        
         (
        
        
         t
        
        
         v
        
        
         )
        
       
       
        t \mapsto \exp_x(tv)
       
      
     t↦expx(tv)。在欧氏空间中,指数变换为 
     
      
       
        
         
          
           exp
          
          
           
          
         
         
          p
         
        
        
         (
        
        
         v
        
        
         )
        
        
         =
        
        
         p
        
        
         +
        
        
         v
        
       
       
        \exp_p(v)=p+v
       
      
     expp(v)=p+v. 对数变换则是指数变换的逆变换,给出了从 
     
      
       
        
         p
        
        
         ∈
        
        
         
          D
         
         
          c
         
         
          n
         
        
       
       
        p\in \mathbb D_c^n
       
      
     p∈Dcn 到 
     
      
       
        
         r
        
        
         ∈
        
        
         
          D
         
         
          c
         
         
          n
         
        
       
       
        r\in \mathbb D_c^n
       
      
     r∈Dcn 对应的切空间中的速度向量。在欧氏空间中,对数变换为 
     
      
       
        
         
          
           log
          
          
           
          
         
         
          p
         
        
        
         (
        
        
         r
        
        
         )
        
        
         =
        
        
         r
        
        
         −
        
        
         p
        
       
       
        \log_p(r)=r-p
       
      
     logp(r)=r−p (图片来自于 Angulo, Jesus. “Structure tensor image filtering using Riemannian L1 and L∞ center-of-mass.” Image Analysis & Stereology 33.2 (2014): 95-105.)
  For any point 
     
      
       
        
         x
        
        
         ∈
        
        
         
          D
         
         
          c
         
         
          n
         
        
       
       
        x \in \mathbb D_c^n
       
      
     x∈Dcn, the exponential map 
     
      
       
        
         
          
           exp
          
          
           
          
         
         
          x
         
         
          c
         
        
        
         :
        
        
         
          T
         
         
          x
         
        
        
         
          D
         
         
          c
         
         
          n
         
        
        
         →
        
        
         
          D
         
         
          c
         
         
          n
         
        
       
       
        \exp^c_x : T_x\mathbb D_c^n\rightarrow \mathbb D_c^n
       
      
     expxc:TxDcn→Dcn and the logarithmic map 
     
      
       
        
         
          
           log
          
          
           
          
         
         
          x
         
         
          c
         
        
        
         :
        
        
         
          D
         
         
          c
         
         
          n
         
        
        
         →
        
        
         
          T
         
         
          x
         
        
        
         
          D
         
         
          c
         
         
          n
         
        
       
       
        \log^c_x : \mathbb D_c^n\rightarrow T_x\mathbb D_c^n
       
      
     logxc:Dcn→TxDcn are given for 
     
      
       
        
         v
        
        
         ≠
        
        
         0
        
       
       
        v \neq 0
       
      
     v=0 and 
     
      
       
        
         y
        
        
         ≠
        
        
         x
        
       
       
        y \neq x
       
      
     y=x by: For any point 
     
      
       
        
         x
        
        
         ∈
        
        
         
          D
         
         
          c
         
         
          n
         
        
       
       
        x \in \mathbb D_c^n
       
      
     x∈Dcn, the exponential map 
     
      
       
        
         
          
           exp
          
          
           
          
         
         
          x
         
         
          c
         
        
        
         :
        
        
         
          T
         
         
          x
         
        
        
         
          D
         
         
          c
         
         
          n
         
        
        
         →
        
        
         
          D
         
         
          c
         
         
          n
         
        
       
       
        \exp^c_x : T_x\mathbb D_c^n\rightarrow \mathbb D_c^n
       
      
     expxc:TxDcn→Dcn and the logarithmic map 
     
      
       
        
         
          
           log
          
          
           
          
         
         
          x
         
         
          c
         
        
        
         :
        
        
         
          D
         
         
          c
         
         
          n
         
        
        
         →
        
        
         
          T
         
         
          x
         
        
        
         
          D
         
         
          c
         
         
          n
         
        
       
       
        \log^c_x : \mathbb D_c^n\rightarrow T_x\mathbb D_c^n
       
      
     logxc:Dcn→TxDcn are given for 
     
      
       
        
         v
        
        
         ≠
        
        
         0
        
       
       
        v \neq 0
       
      
     v=0 and 
     
      
       
        
         y
        
        
         ≠
        
        
         x
        
       
       
        y \neq x
       
      
     y=x by:
  当 
     
      
       
        
         c
        
        
         →
        
        
         0
        
       
       
        c\rightarrow 0
       
      
     c→0 时,就能得到欧氏空间中的指数变换和对数变换。当 
      
       
        
         
          x
         
         
          =
         
         
          0
         
        
        
         x=0
        
       
      x=0 时,对任意 
     
      
       
        
         v
        
        
         ∈
        
        
         
          T
         
         
          0
         
        
        
         
          D
         
         
          c
         
         
          n
         
        
        
         \
        
        
         {
        
        
         0
        
        
         }
        
        
         ,
        
        
         y
        
        
         ∈
        
        
         
          D
         
         
          c
         
         
          n
         
        
        
         \
        
        
         {
        
        
         0
        
        
         }
        
       
       
        v \in T_{\mathbf{0}} \mathbb{D}_c^n \backslash\{\mathbf{0}\}, y \in \mathbb{D}_c^n \backslash\{\mathbf{0}\}
       
      
     v∈T0Dcn\{0},y∈Dcn\{0},有 当 
     
      
       
        
         c
        
        
         →
        
        
         0
        
       
       
        c\rightarrow 0
       
      
     c→0 时,就能得到欧氏空间中的指数变换和对数变换。当 
      
       
        
         
          x
         
         
          =
         
         
          0
         
        
        
         x=0
        
       
      x=0 时,对任意 
     
      
       
        
         v
        
        
         ∈
        
        
         
          T
         
         
          0
         
        
        
         
          D
         
         
          c
         
         
          n
         
        
        
         \
        
        
         {
        
        
         0
        
        
         }
        
        
         ,
        
        
         y
        
        
         ∈
        
        
         
          D
         
         
          c
         
         
          n
         
        
        
         \
        
        
         {
        
        
         0
        
        
         }
        
       
       
        v \in T_{\mathbf{0}} \mathbb{D}_c^n \backslash\{\mathbf{0}\}, y \in \mathbb{D}_c^n \backslash\{\mathbf{0}\}
       
      
     v∈T0Dcn\{0},y∈Dcn\{0},有
  
- Möbius scalar multiplication using exponential and logarithmic maps. 由于切空间为欧氏空间,便于进行各种运算,因此下面用指数变换和对数变换重新推导 Möbius scalar multiplication
  套用上述公式还能得到两点间测地线公式和指数变换间的关系 套用上述公式还能得到两点间测地线公式和指数变换间的关系
  
- Parallel transport. Parallel transport 
     
      
       
        
         
          P
         
         
          
           x
          
          
           →
          
          
           y
          
         
         
          c
         
        
        
         :
        
        
         
          T
         
         
          x
         
        
        
         
          D
         
         
          c
         
         
          n
         
        
        
         →
        
        
         
          T
         
         
          y
         
        
        
         
          D
         
         
          c
         
         
          n
         
        
       
       
        P^c_{x\rightarrow y}:T_x\mathbb D^n_c\rightarrow T_y\mathbb D^n_c
       
      
     Px→yc:TxDcn→TyDcn 定义了两个切空间之间的线性等距映射 (linear isometry),它等价于将 
     
      
       
        
         x
        
       
       
        x
       
      
     x 处切空间内的 tangent vector 沿着 
     
      
       
        
         x
        
       
       
        x
       
      
     x 和 
     
      
       
        
         y
        
       
       
        y
       
      
     y 间的测地线平行移动到 
     
      
       
        
         y
        
       
       
        y
       
      
     y 处切空间得到的切向量。通过 Parallel transport,我们能将两个不同切空间联系起来。In the manifold 
     
      
       
        
         (
        
        
         
          D
         
         
          c
         
         
          n
         
        
        
         ,
        
        
         
          g
         
         
          c
         
        
        
         )
        
       
       
        (\mathbb D^n_c, g^c)
       
      
     (Dcn,gc), the parallel transport w.r.t. the Levi-Civita connection of a vector 
     
      
       
        
         v
        
        
         ∈
        
        
         
          T
         
         
          0
         
        
        
         
          D
         
         
          c
         
         
          n
         
        
       
       
        v\in T_{\mathbf 0}\mathbb D^n_c
       
      
     v∈T0Dcn to another tangent space 
     
      
       
        
         
          T
         
         
          x
         
        
        
         
          D
         
         
          c
         
         
          n
         
        
       
       
        T_x\mathbb D^n_c
       
      
     TxDcn is given by the following isometry:
  这个结论在定义和优化由不同切空间共享的参数时很重要,例如 biases in hyperbolic neural layers 或者 parameters of hyperbolic MLR. 这个结论在定义和优化由不同切空间共享的参数时很重要,例如 biases in hyperbolic neural layers 或者 parameters of hyperbolic MLR.
详细推导可参考原论文及作者的另一篇文章:
Octavian-Eugen Ganea, Gary Bécigneul, and Thomas Hofmann. Hyperbolic entailment cones for learning hierarchical embeddings. In Proceedings of the thirty-fifth international conference on machine learning (ICML), 2018.
Hyperbolic Neural Networks
Möbius version
- 类似于 Möbius scalar multiplication,我们可以定义映射 
      
       
        
         
          f
         
         
          :
         
         
          
           R
          
          
           n
          
         
         
          →
         
         
          
           R
          
          
           m
          
         
        
        
         f:\R^n\rightarrow\R^m
        
       
      f:Rn→Rm 的 Möbius version. (1) 向量通过对数映射投影至切空间;(2) 在切空间向量通过欧氏算子进行变换;(3) 通过指数映射投影回陀螺矢量空间
  当 
     
      
       
        
         f
        
       
       
        f
       
      
     f 连续时,有 
     
      
       
        
         
          
           lim
          
          
           
          
         
         
          
           c
          
          
           →
          
          
           0
          
         
        
        
         
          f
         
         
          
           ⊗
          
          
           c
          
         
        
        
         (
        
        
         x
        
        
         )
        
        
         =
        
        
         f
        
        
         (
        
        
         x
        
        
         )
        
       
       
        \lim _{c \rightarrow 0} f^{\otimes_c}(x)=f(x)
       
      
     limc→0f⊗c(x)=f(x). 上述定义满足如下性质:(1) morphism property. 
     
      
       
        
         (
        
        
         f
        
        
         ∘
        
        
         g
        
        
         
          )
         
         
          
           ⊗
          
          
           c
          
         
        
        
         =
        
        
         
          f
         
         
          
           ⊗
          
          
           c
          
         
        
        
         ∘
        
        
         
          g
         
         
          
           ⊗
          
          
           c
          
         
        
       
       
        (f \circ g)^{\otimes_c}=f^{\otimes_c} \circ g^{\otimes_c}
       
      
     (f∘g)⊗c=f⊗c∘g⊗c;(2) direction preserving. 
     
      
       
        
         
          f
         
         
          
           ⊗
          
          
           c
          
         
        
        
         (
        
        
         x
        
        
         )
        
        
         /
        
        
         
          ∥
         
         
          
           f
          
          
           
            ⊗
           
           
            c
           
          
         
         
          (
         
         
          x
         
         
          )
         
         
          ∥
         
        
        
         =
        
        
         f
        
        
         (
        
        
         x
        
        
         )
        
        
         /
        
        
         ∥
        
        
         f
        
        
         (
        
        
         x
        
        
         )
        
        
         ∥
        
       
       
        f^{\otimes_c}(x) /\left\|f^{\otimes_c}(x)\right\|=f(x) /\|f(x)\|
       
      
     f⊗c(x)/∥f⊗c(x)∥=f(x)/∥f(x)∥ for 
     
      
       
        
         f
        
        
         (
        
        
         x
        
        
         )
        
        
         ≠
        
        
         0
        
       
       
        f(x)\neq\mathbf0
       
      
     f(x)=0. 当 
     
      
       
        
         f
        
       
       
        f
       
      
     f 连续时,有 
     
      
       
        
         
          
           lim
          
          
           
          
         
         
          
           c
          
          
           →
          
          
           0
          
         
        
        
         
          f
         
         
          
           ⊗
          
          
           c
          
         
        
        
         (
        
        
         x
        
        
         )
        
        
         =
        
        
         f
        
        
         (
        
        
         x
        
        
         )
        
       
       
        \lim _{c \rightarrow 0} f^{\otimes_c}(x)=f(x)
       
      
     limc→0f⊗c(x)=f(x). 上述定义满足如下性质:(1) morphism property. 
     
      
       
        
         (
        
        
         f
        
        
         ∘
        
        
         g
        
        
         
          )
         
         
          
           ⊗
          
          
           c
          
         
        
        
         =
        
        
         
          f
         
         
          
           ⊗
          
          
           c
          
         
        
        
         ∘
        
        
         
          g
         
         
          
           ⊗
          
          
           c
          
         
        
       
       
        (f \circ g)^{\otimes_c}=f^{\otimes_c} \circ g^{\otimes_c}
       
      
     (f∘g)⊗c=f⊗c∘g⊗c;(2) direction preserving. 
     
      
       
        
         
          f
         
         
          
           ⊗
          
          
           c
          
         
        
        
         (
        
        
         x
        
        
         )
        
        
         /
        
        
         
          ∥
         
         
          
           f
          
          
           
            ⊗
           
           
            c
           
          
         
         
          (
         
         
          x
         
         
          )
         
         
          ∥
         
        
        
         =
        
        
         f
        
        
         (
        
        
         x
        
        
         )
        
        
         /
        
        
         ∥
        
        
         f
        
        
         (
        
        
         x
        
        
         )
        
        
         ∥
        
       
       
        f^{\otimes_c}(x) /\left\|f^{\otimes_c}(x)\right\|=f(x) /\|f(x)\|
       
      
     f⊗c(x)/∥f⊗c(x)∥=f(x)/∥f(x)∥ for 
     
      
       
        
         f
        
        
         (
        
        
         x
        
        
         )
        
        
         ≠
        
        
         0
        
       
       
        f(x)\neq\mathbf0
       
      
     f(x)=0.
- 如果有多个映射函数 (对应神经网络中的多层),则它们的复合对应的 Möbius version 为
  如果有多个输入 (
     
      
       
        
         f
        
        
         :
        
        
         
          R
         
         
          n
         
        
        
         ×
        
        
         
          R
         
         
          p
         
        
        
         →
        
        
         
          R
         
         
          m
         
        
       
       
        f: \mathbb{R}^n \times \mathbb{R}^p \rightarrow \mathbb{R}^m
       
      
     f:Rn×Rp→Rm),则 Möbius version 为 如果有多个输入 (
     
      
       
        
         f
        
        
         :
        
        
         
          R
         
         
          n
         
        
        
         ×
        
        
         
          R
         
         
          p
         
        
        
         →
        
        
         
          R
         
         
          m
         
        
       
       
        f: \mathbb{R}^n \times \mathbb{R}^p \rightarrow \mathbb{R}^m
       
      
     f:Rn×Rp→Rm),则 Möbius version 为
 f ⊗ c : ( h , h ′ ) ∈ D c n × D c p ↦ exp  0 c ( f ( log  0 c ( h ) , log  0 c ( h ′ ) ) ) f^{\otimes_c}:\left(h, h^{\prime}\right) \in \mathbb{D}_c^n \times \mathbb{D}_c^p \mapsto \exp _{\boldsymbol0}^c\left(f\left(\log _{\boldsymbol{0}}^c(h), \log _{\boldsymbol{0}}^c\left(h^{\prime}\right)\right)\right) f⊗c:(h,h′)∈Dcn×Dcp↦exp0c(f(log0c(h),log0c(h′)))
Hyperbolic multiclass logistic regression (MLR) (softmax regression)
- 详见论文 3.1 节

Hyperbolic feed-forward layers
- Möbius matrix-vector multiplication. 基于 Möbius version 的定义,我们可以进一步定义更多操作的 Möbius version
  
- Pointwise non-linearity. If φ : R n → R n \varphi:\R^n\rightarrow \R^n φ:Rn→Rn is a pointwise non-linearity, then its Möbius version φ ⊗ c \varphi^{\otimes_c} φ⊗c can be applied to elements of the Poincaré ball.
- Bias translation. Möbius translation of a point 
     
      
       
        
         x
        
        
         ∈
        
        
         
          D
         
         
          c
         
         
          n
         
        
       
       
        x ∈ \mathbb D^n_c
       
      
     x∈Dcn by a bias 
     
      
       
        
         b
        
        
         ∈
        
        
         
          D
         
         
          c
         
         
          n
         
        
       
       
        b ∈ \mathbb D^n_c
       
      
     b∈Dcn is given by
  
- Concatenation of multiple input vectors. 给定 
     
      
       
        
         
          x
         
         
          1
         
        
        
         ∈
        
        
         
          D
         
         
          c
         
         
          n
         
        
        
         ,
        
        
         
          x
         
         
          2
         
        
        
         ∈
        
        
         
          D
         
         
          c
         
         
          p
         
        
        
         ,
        
        
         x
        
        
         ∈
        
        
         
          D
         
         
          c
         
         
          n
         
        
        
         ×
        
        
         
          D
         
         
          c
         
         
          p
         
        
       
       
        x_1\in\mathbb D_c^n,x_2\in\mathbb D_c^p,x\in\mathbb D_c^n\times\mathbb D_c^p
       
      
     x1∈Dcn,x2∈Dcp,x∈Dcn×Dcp 为 
     
      
       
        
         
          x
         
         
          1
         
        
        
         ,
        
        
         
          x
         
         
          2
         
        
       
       
        x_1,x_2
       
      
     x1,x2 的连接,
     
      
       
        
         
          M
         
         
          1
         
        
        
         ∈
        
        
         
          M
         
         
          
           m
          
          
           ,
          
          
           n
          
         
        
        
         (
        
        
         R
        
        
         )
        
        
         ,
        
        
         
          M
         
         
          2
         
        
        
         ∈
        
        
         
          M
         
         
          
           m
          
          
           ,
          
          
           p
          
         
        
        
         (
        
        
         R
        
        
         )
        
       
       
        M_1\in\mathcal M_{m,n}(\mathbb R),M_2\in\mathcal M_{m,p}(\mathbb R)
       
      
     M1∈Mm,n(R),M2∈Mm,p(R) 为两个线性变换的矩阵,
     
      
       
        
         M
        
        
         ∈
        
        
         
          M
         
         
          
           m
          
          
           ,
          
          
           n
          
          
           +
          
          
           p
          
         
        
        
         (
        
        
         R
        
        
         )
        
       
       
        M\in\mathcal M_{m,n+p}(\mathbb R)
       
      
     M∈Mm,n+p(R) 为 
     
      
       
        
         
          M
         
         
          1
         
        
       
       
        M_1
       
      
     M1 和 
     
      
       
        
         
          M
         
         
          2
         
        
       
       
        M_2
       
      
     M2 的水平连接矩阵,则有
  
Hyperbolic RNN
- Naive RNN.
  其中 
     
      
       
        
         φ
        
       
       
        \varphi
       
      
     φ 为 tanh / sigmoid / ReLU,
     
      
       
        
         W
        
        
         ∈
        
        
         
          M
         
         
          
           m
          
          
           ,
          
          
           n
          
         
        
        
         (
        
        
         R
        
        
         )
        
        
         ,
        
        
         U
        
        
         ∈
        
        
         
          M
         
         
          
           m
          
          
           ,
          
          
           d
          
         
        
        
         (
        
        
         R
        
        
         )
        
        
         ,
        
        
         b
        
        
         ∈
        
        
         
          D
         
         
          c
         
         
          m
         
        
       
       
        W \in \mathcal{M}_{m, n}(\mathbb{R}), U \in \mathcal{M}_{m, d}(\mathbb{R}), b \in \mathbb{D}_c^m
       
      
     W∈Mm,n(R),U∈Mm,d(R),b∈Dcm. 如果 
     
      
       
        
         
          x
         
         
          t
         
        
       
       
        x_t
       
      
     xt 为欧氏空间中的向量,则需要事先做指数变换 
     
      
       
        
         
          
           x
          
          
           ~
          
         
         
          t
         
        
        
         :
        
        
         =
        
        
         
          
           exp
          
          
           
          
         
         
          0
         
         
          c
         
        
        
         (
        
        
         
          x
         
         
          t
         
        
        
         )
        
       
       
        \tilde x_t := \exp^c_{\mathbf0}(x_t)
       
      
     x~t:=exp0c(xt) 再代入上式. The base point 
      
       
        
         
          x
         
        
        
         x
        
       
      x is usually set to 
      
       
        
         
          0
         
        
        
         \mathbf0
        
       
      0 which makes formulas less cumbersome and empirically has little impact on the obtained results. 其中 
     
      
       
        
         φ
        
       
       
        \varphi
       
      
     φ 为 tanh / sigmoid / ReLU,
     
      
       
        
         W
        
        
         ∈
        
        
         
          M
         
         
          
           m
          
          
           ,
          
          
           n
          
         
        
        
         (
        
        
         R
        
        
         )
        
        
         ,
        
        
         U
        
        
         ∈
        
        
         
          M
         
         
          
           m
          
          
           ,
          
          
           d
          
         
        
        
         (
        
        
         R
        
        
         )
        
        
         ,
        
        
         b
        
        
         ∈
        
        
         
          D
         
         
          c
         
         
          m
         
        
       
       
        W \in \mathcal{M}_{m, n}(\mathbb{R}), U \in \mathcal{M}_{m, d}(\mathbb{R}), b \in \mathbb{D}_c^m
       
      
     W∈Mm,n(R),U∈Mm,d(R),b∈Dcm. 如果 
     
      
       
        
         
          x
         
         
          t
         
        
       
       
        x_t
       
      
     xt 为欧氏空间中的向量,则需要事先做指数变换 
     
      
       
        
         
          
           x
          
          
           ~
          
         
         
          t
         
        
        
         :
        
        
         =
        
        
         
          
           exp
          
          
           
          
         
         
          0
         
         
          c
         
        
        
         (
        
        
         
          x
         
         
          t
         
        
        
         )
        
       
       
        \tilde x_t := \exp^c_{\mathbf0}(x_t)
       
      
     x~t:=exp0c(xt) 再代入上式. The base point 
      
       
        
         
          x
         
        
        
         x
        
       
      x is usually set to 
      
       
        
         
          0
         
        
        
         \mathbf0
        
       
      0 which makes formulas less cumbersome and empirically has little impact on the obtained results.
- GRU architecture. 欧氏空间里的 GRU 运算如下,包括 reset 门 
     
      
       
        
         
          r
         
         
          t
         
        
       
       
        r_t
       
      
     rt 和 update 门 
     
      
       
        
         
          z
         
         
          t
         
        
       
       
        z_t
       
      
     zt
  先写出门控电路 
     
      
       
        
         f
        
        
         (
        
        
         h
        
        
         ,
        
        
         
          h
         
         
          ′
         
        
        
         )
        
        
         :
        
        
         =
        
        
         σ
        
        
         (
        
        
         h
        
        
         )
        
        
         ⊙
        
        
         
          h
         
         
          ′
         
        
       
       
        f(h,h'):=\sigma(h)\odot h'
       
      
     f(h,h′):=σ(h)⊙h′ 的双曲版本:
     
      
       
        
         
          f
         
         
          
           ⊗
          
          
           c
          
         
        
        
         
          (
         
         
          h
         
         
          ,
         
         
          
           h
          
          
           ′
          
         
         
          )
         
        
        
         =
        
        
         
          
           exp
          
          
           
          
         
         
          0
         
         
          c
         
        
        
         
          (
         
         
          σ
         
         
          
           (
          
          
           
            
             log
            
            
             
            
           
           
            0
           
           
            c
           
          
          
           (
          
          
           h
          
          
           )
          
          
           )
          
         
         
          ⊙
         
         
          
           
            log
           
           
            
           
          
          
           0
          
          
           c
          
         
         
          
           (
          
          
           
            h
           
           
            ′
           
          
          
           )
          
         
         
          )
         
        
        
         =
        
        
         
          
           exp
          
          
           
          
         
         
          0
         
         
          c
         
        
        
         
          (
         
         
          diag
         
         
          (
         
         
          σ
         
         
          (
         
         
          
           
            log
           
           
            
           
          
          
           0
          
          
           c
          
         
         
          (
         
         
          h
         
         
          )
         
         
          )
         
         
          )
         
         
          ⋅
         
         
          
           
            log
           
           
            
           
          
          
           0
          
          
           c
          
         
         
          
           (
          
          
           
            h
           
           
            ′
           
          
          
           )
          
         
         
          )
         
        
        
         =
        
        
         diag
        
        
         (
        
        
         σ
        
        
         (
        
        
         
          
           log
          
          
           
          
         
         
          0
         
         
          c
         
        
        
         (
        
        
         h
        
        
         )
        
        
         )
        
        
         )
        
        
         
          ⊗
         
         
          c
         
        
        
         
          h
         
         
          ′
         
        
       
       
        f^{\otimes_c}\left(h, h^{\prime}\right)=\exp _{\boldsymbol{0}}^c\left(\sigma\left(\log _{\boldsymbol{0}}^c(h)\right) \odot \log _{\boldsymbol{0}}^c\left(h^{\prime}\right)\right)=\exp _{\boldsymbol{0}}^c\left(\text{diag}(\sigma(\log^c_{\mathbf 0}(h)))\cdot \log _{\boldsymbol{0}}^c\left(h^{\prime}\right)\right)=\text{diag}(\sigma(\log^c_{\mathbf 0}(h)))\otimes_c h'
       
      
     f⊗c(h,h′)=exp0c(σ(log0c(h))⊙log0c(h′))=exp0c(diag(σ(log0c(h)))⋅log0c(h′))=diag(σ(log0c(h)))⊗ch′. 因此可以将 reset gate 
     
      
       
        
         
          r
         
         
          t
         
        
       
       
        r_t
       
      
     rt 和 update gate 
     
      
       
        
         
          z
         
         
          t
         
        
       
       
        z_t
       
      
     zt 写为 先写出门控电路 
     
      
       
        
         f
        
        
         (
        
        
         h
        
        
         ,
        
        
         
          h
         
         
          ′
         
        
        
         )
        
        
         :
        
        
         =
        
        
         σ
        
        
         (
        
        
         h
        
        
         )
        
        
         ⊙
        
        
         
          h
         
         
          ′
         
        
       
       
        f(h,h'):=\sigma(h)\odot h'
       
      
     f(h,h′):=σ(h)⊙h′ 的双曲版本:
     
      
       
        
         
          f
         
         
          
           ⊗
          
          
           c
          
         
        
        
         
          (
         
         
          h
         
         
          ,
         
         
          
           h
          
          
           ′
          
         
         
          )
         
        
        
         =
        
        
         
          
           exp
          
          
           
          
         
         
          0
         
         
          c
         
        
        
         
          (
         
         
          σ
         
         
          
           (
          
          
           
            
             log
            
            
             
            
           
           
            0
           
           
            c
           
          
          
           (
          
          
           h
          
          
           )
          
          
           )
          
         
         
          ⊙
         
         
          
           
            log
           
           
            
           
          
          
           0
          
          
           c
          
         
         
          
           (
          
          
           
            h
           
           
            ′
           
          
          
           )
          
         
         
          )
         
        
        
         =
        
        
         
          
           exp
          
          
           
          
         
         
          0
         
         
          c
         
        
        
         
          (
         
         
          diag
         
         
          (
         
         
          σ
         
         
          (
         
         
          
           
            log
           
           
            
           
          
          
           0
          
          
           c
          
         
         
          (
         
         
          h
         
         
          )
         
         
          )
         
         
          )
         
         
          ⋅
         
         
          
           
            log
           
           
            
           
          
          
           0
          
          
           c
          
         
         
          
           (
          
          
           
            h
           
           
            ′
           
          
          
           )
          
         
         
          )
         
        
        
         =
        
        
         diag
        
        
         (
        
        
         σ
        
        
         (
        
        
         
          
           log
          
          
           
          
         
         
          0
         
         
          c
         
        
        
         (
        
        
         h
        
        
         )
        
        
         )
        
        
         )
        
        
         
          ⊗
         
         
          c
         
        
        
         
          h
         
         
          ′
         
        
       
       
        f^{\otimes_c}\left(h, h^{\prime}\right)=\exp _{\boldsymbol{0}}^c\left(\sigma\left(\log _{\boldsymbol{0}}^c(h)\right) \odot \log _{\boldsymbol{0}}^c\left(h^{\prime}\right)\right)=\exp _{\boldsymbol{0}}^c\left(\text{diag}(\sigma(\log^c_{\mathbf 0}(h)))\cdot \log _{\boldsymbol{0}}^c\left(h^{\prime}\right)\right)=\text{diag}(\sigma(\log^c_{\mathbf 0}(h)))\otimes_c h'
       
      
     f⊗c(h,h′)=exp0c(σ(log0c(h))⊙log0c(h′))=exp0c(diag(σ(log0c(h)))⋅log0c(h′))=diag(σ(log0c(h)))⊗ch′. 因此可以将 reset gate 
     
      
       
        
         
          r
         
         
          t
         
        
       
       
        r_t
       
      
     rt 和 update gate 
     
      
       
        
         
          z
         
         
          t
         
        
       
       
        z_t
       
      
     zt 写为
 r t = σ log  0 c ( W r ⊗ c h t − 1 ⊕ c U r ⊗ c x t ⊕ c b r ) z t = σ log  0 c ( W z ⊗ c h t − 1 ⊕ c U z ⊗ c x t ⊕ c b z ) r_t=\sigma \log _{\mathbf 0}^c\left(W^r \otimes_c h_{t-1} \oplus_c U^r \otimes_c x_t \oplus_c b^r\right)\\ z_t=\sigma \log _{\mathbf 0}^c\left(W^z \otimes_c h_{t-1} \oplus_c U^z \otimes_c x_t \oplus_c b^z\right) rt=σlog0c(Wr⊗cht−1⊕cUr⊗cxt⊕cbr)zt=σlog0c(Wz⊗cht−1⊕cUz⊗cxt⊕cbz)隐藏单元的更新可以写为
 h ~ t = φ ⊗ c ( W ⊗ c ( diag  ( r t ) ⊗ c h t − 1 ) ⊕ c U ⊗ c x t ⊕ c b ) = φ ⊗ c ( ( W diag  ( r t ) ) ⊗ c h t − 1 ⊕ c U ⊗ c x t ⊕ c b ) h t = h t − 1 ⊕ c diag  ( z t ) ⊗ c ( − h t − 1 ⊕ c h ~ t ) \begin{aligned} \tilde{h}_t&=\varphi^{\otimes_c}\left(W\otimes_c( \operatorname{diag}\left(r_t\right) \otimes_c h_{t-1}) \oplus_c U \otimes_c x_t \oplus_c b\right) \\&=\varphi^{\otimes_c}\left(\left(W \operatorname{diag}\left(r_t\right)\right) \otimes_c h_{t-1} \oplus_c U \otimes_c x_t \oplus_c b\right) \\h_t&=h_{t-1} \oplus_c \operatorname{diag}\left(z_t\right) \otimes_c\left(-h_{t-1} \oplus_c \tilde{h}_t\right) \end{aligned} h~tht=φ⊗c(W⊗c(diag(rt)⊗cht−1)⊕cU⊗cxt⊕cb)=φ⊗c((Wdiag(rt))⊗cht−1⊕cU⊗cxt⊕cb)=ht−1⊕cdiag(zt)⊗c(−ht−1⊕ch~t)
Experiments
- SNLI task and dataset. SNLI 为 natural language inference / textual entailment 数据集 (判断给定前提是否蕴含给定假设),包含了 570K training, 10K validation and 10K test 句子对
- PREFIX task and datasets. PREFIX 是作者人工合成的数据集,用于测试双曲模型在符合树状结构的数据上的性能。任务为 detection of noisy prefixes, i.e. 给定句子对,判断第二个句子是否为第一个句子的带噪前缀,或是一个随机句子。PREFIX-Z% (for Z being 10, 30 or 50) 表示对于对一个句子的随机前缀,第二个句子的正样本通过替换前缀中 Z% 的单词来生成,负样本则为随机生成的等长句子
- Models architecture. 双曲模型可以像欧式模型一样叠加 n n n 层构造网络,也可以结合欧式模型一起使用,但优化时必须使用黎曼优化。作者使用两个不同的 RNN 或 GRU 模型编码两个句子,得到的 embed 和这两个句子间的 squared distance (hyperbolic or Euclidean, depending on their geometry) 一起送入 FFNN (Euclidean or hyperbolic),最后由 MLR (Euclidean or hyperbolic) 进行分类,损失函数为 CE loss
- Results. 可以看到欧式模型在 SNLI 上性能优于双曲模型,作者认为这可能是因为 Adam 等优化算法还没有对应的双曲版本。双曲模型在具有树形结构的数据上性能优于欧式模型,在 PREFIX 数据集上,随着 Z 值越来越大,数据就越来越不符合树形结构,欧式模型和双曲模型之间的性能差距也就越来越小
  
- MLR classification experiments. 在 SNLI 数据集上,双曲 MLR 相比欧式 MLR 没有展现出足够的优势,作者认为这可能是因为在端到端训练时,模型得到的 embed 可以使得欧式 MLR 就已经能很好地进行分类。为了进一步展示双曲 MLR 的优势,作者进行了额外的实验,选取 WordNet 的子树,判断 node 是否属于该子树。模型结构上使用 WordNet 上预训练得到的 word embed,然后分别使用 hyper-bolic MLR, Euclidean MLR applied directly on the hyperbolic embeddings 以及 Euclidean MLR applied after mapping all embeddings in the tangent space at 
     
      
       
        
         0
        
       
       
        \mathbf0
       
      
     0 using the 
     
      
       
        
         
          
           log
          
          
           
          
         
         
          0
         
        
       
       
        \log_{\mathbf 0}
       
      
     log0 map 进行二分类
  下图展示了 2-dimensional embeddings and the trained separation hyperplanes 下图展示了 2-dimensional embeddings and the trained separation hyperplanes
  
References
- Ganea, Octavian, Gary Bécigneul, and Thomas Hofmann. “Hyperbolic neural networks.” Advances in neural information processing systems 31 (2018).
- code: https://github.com/dalab/hyperbolic_nn
- 脱离欧氏空间,在双曲空间中做 NLP



















