Day 22 Transformer
seqence to seqence
有什么用呢?





Encoder



how Block work
仔细讲讲Residual 的过程?
重构


Decoder - AutoRegressive





Mask



由于是文字接龙,所以无法考虑右边的 info




另一种decoder

Encoder to Decoder – Cross Attend



怀疑begin那里没有做 Norm是bug


Training


很像分类的问题


Teacher Forcing : using the ground truth as input
Tips




how to resolve that?



















![[Swift]单元测试](https://img-blog.csdnimg.cn/direct/e652771ba1434342a5d00119419ea774.png)





