How to Use Script Mode in Python

When Attention Sink Emerges in Language Models: An Empirical View

In this work, we investigate how optimization, data distribution, loss function, and model architecture in LM pre-training influences the emergence of attention sink. Note: as of 2023/09/02, xformers ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

When Attention Sink Emerges in Language Models: An Empirical View

Trending now