Position Encoding의 종류와 분석


Sinusoidal PE for Transformer

Permutation equivariance of Multi-Head Self-Attention

Sinusoidal PE for representing order of elements

On the learnability of PE

PE for CNN

CoordConv / Spatial Broadcast Decoder


Height-driven Attention Networks (HANet)


Other types of PE for Transformer variants

Absolute PE

Relative PE

Complex PE

No PE — Convolutional context

Related Topics​

MLP-based Neural Rendering

PE for representing timestep of iterative network




Sinusoidal PE for Transformer​

PE for CNN​

  • An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution (arXiv:1807.03247)
  • Spatial Broadcast Decoder: A Simple Architecture for Learning Disentangled Representations in VAEs (arXiv:1901.07017)
  • A Style-Based Generator Architecture for Generative Adversarial Networks (arXiv:1812.04948)
  • Positional Encoding as Spatial Inductive Bias in GANs (arXiv:2012.05217)
  • Cars Can’t Fly up in the Sky: Improving Urban-Scene Segmentation via Height-driven Attention Networks (arXiv:2003.05128)
  • How Much Position Information Do Convolutional Neural Networks Encode? (arXiv:2001.08248)​

Other types of PE for Transformer variants​

  • Self-Attention with Relative Position Representations (arXiv:1803.02155)
  • Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context (arXiv:1901.02860)
  • An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (arXiv:2010.11929)
  • Encoding word order in complex embeddings (arXiv:1912.12333)
  • Transformers with convolutional context for ASR (arXiv:1904.11660)
  • wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations (arXiv:2006.11477)​

  • NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis (arXiv:2003.08934)
  • Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains (arXiv:2006.10739)​




