1

Are Transformers universal approximators of sequence-to-sequence functions?
Are deep ResNets provably better than linear predictors?
Global optimality conditions for deep neural networks
Face detection using Local Hybrid Patterns