1

Position Coupling: Improving Length Generalization of Arithmetic Transformers Using Task Structure
Linear attention is (maybe) all you need (to understand transformer optimization)
Fair Streaming Principal Component Analysis: Statistical and Algorithmic Viewpoint
PLASTIC: Improving Input and Label Plasticity for Sample Efficient Reinforcement Learning