Publications

Baekrok Shin, Chulhee Yun. Implicit Bias and Loss of Plasticity in Matrix Completion: Depth Promotes Low-Rank Solutions. ICML 2025 Workshop on High-dimensional Learning Dynamics (HiLD), 2025.

Yujun Kim, Chaewon Moon, Chulhee Yun. The Cost of Robustness: Tighter Bounds on Parameter Complexity for Robust Memorization in ReLU Nets. NeurIPS 2025, 2025.

Junsoo Oh, Jerry Song, Chulhee Yun. From Linear to Nonlinear: Provable Weak-to-Strong Generalization through Feature Learning. NeurIPS 2025, 2025.

Minhak Song, Beomhan Baek, Kwangjun Ahn, Chulhee Yun. Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training. NeurIPS 2025, 2025.

Donghwa Kim, Jaewook Lee, Chulhee Yun. Provable Benefit of Random Permutations over Uniform Sampling in Stochastic Coordinate Descent. ICML 2025, 2025.

Geonhui Yoo, Minhak Song, Chulhee Yun. Understanding Sharpness Dynamics in NN Training with a Minimalist Example: The Effects of Dataset Difficulty, Depth, Stochasticity, and More. ICML 2025, 2025.

Yujun Kim, Jaeyoung Cha, Chulhee Yun. Incremental Gradient Descent with Small Epoch Counts is Surprisingly Slow on Ill-Conditioned Problems. ICML 2025, 2025.

Yeseul Cho, Baekrok Shin, Changmin Kang, Chulhee Yun. Lightweight Dataset Pruning without Full Training via Example Difficulty and Prediction Uncertainty. ICML 2025, 2025.

Hyunsu Kim, Giung Nam, Chulhee Yun, Hongseok Yang, Juho Lee. Parameter Expanded Stochastic Gradient Markov Chain Monte Carlo. ICLR 2025, 2025.

Hyunji Jung, Hanseul Cho, Chulhee Yun. Convergence and Implicit Bias of Gradient Descent on Continual Linear Classification. ICLR 2025, 2025.

Hanseul Cho, Jaeyoung Cha, Srinadh Bhojanapalli, Chulhee Yun. Arithmetic Transformers Can Length-Generalize in Both Operand Length and Count. ICLR 2025, 2025.

Minhak Song, Kwangjun Ahn, Chulhee Yun. Does SGD really happen in tiny subspaces?. ICLR 2025, 2025.

Jiseok Chae, Chulhee Yun, Donghwan Kim. Stochastic Extragradient with Flip-Flop Shuffling & Anchoring: Provable Improvements. NeurIPS 2024, 2024.

Baekrok Shin, Junsoo Oh, Hanseul Cho, Chulhee Yun. DASH: Warm-Starting Neural Network Training in Stationary Settings without Loss of Plasticity. NeurIPS 2024, 2024.

Junsoo Oh, Chulhee Yun. Provable Benefit of Cutout and CutMix for Feature Learning. NeurIPS 2024 (Spotlight), 2024.

Hanseul Cho, Jaeyoung Cha, Pranjal Awasthi, Srinadh Bhojanapalli, Anupam Gupta, Chulhee Yun. Position Coupling: Improving Length Generalization of Arithmetic Transformers Using Task Structure. NeurIPS 2024, 2024.

Prin Phunyaphibarn, Junghyun Lee, Bohan Wang, Huishuai Zhang, Chulhee Yun. Gradient Descent with Polyak's Momentum Finds Flatter Minima via Large Catapults. ICML 2024 Workshop on High-dimensional Learning Dynamics 2024: The Emergence of Structure and Reasoning, 2024.

Jaewook Lee, Hanseul Cho, Chulhee Yun. Fundamental Benefit of Alternating Updates in Minimax Optimization. ICML 2024 (Spotlight), 2024.

Kwangjun Ahn, Xiang Cheng, Minhak Song, Chulhee Yun, Ali Jadbabaie, Suvrit Sra. Linear attention is (maybe) all you need (to understand transformer optimization). ICLR 2024, 2023.

Junghyun Lee, Hanseul Cho, Se-Young Yun, Chulhee Yun. Fair Streaming Principal Component Analysis: Statistical and Algorithmic Viewpoint. NeurIPS 2023, 2023.

Minhak Song, Chulhee Yun. Trajectory Alignment: Understanding the Edge of Stability Phenomenon via Bifurcation Theory. NeurIPS 2023, 2023.

Hojoon Lee, Hanseul Cho, Hyunseung Kim, Daehoon Gwak, Joonkee Kim, Jaegul Choo, Se-Young Yun, Chulhee Yun. PLASTIC: Improving Input and Label Plasticity for Sample Efficient Reinforcement Learning. NeurIPS 2023, 2023.

Dongkuk Si, Chulhee Yun. Practical Sharpness-Aware Minimization Cannot Converge All the Way to Optima. NeurIPS 2023 (Spotlight), 2023.

Junsoo Oh, Chulhee Yun. Provable Benefit of Mixup for Finding Optimal Decision Boundaries. ICML 2023, 2023.

Jaeyoung Cha, Jaewook Lee, Chulhee Yun. Tighter Lower Bounds for Shuffling SGD: Random Permutations and Beyond. ICML 2023 (Oral), 2023.

David X. Wu, Chulhee Yun, Suvrit Sra. On the Training Instability of Shuffling SGD with Batch Normalization. ICML 2023, 2023.

Hanseul Cho, Chulhee Yun. SGDA with shuffling: faster convergence for nonconvex-PŁ minimax optimization. ICLR 2023, 2022.

Chulhee Yun, Shashank Rajput, Suvrit Sra. Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond. ICLR 2022 (Oral), 2022.

Chulhee Yun, Suvrit Sra, Ali Jadbabaie. Open Problem: Can Single-Shuffle SGD be Better than Reshuffling SGD and GD?. COLT 2021, 2021.

Sejun Park, Jaeho Lee, Chulhee Yun, Jinwoo Shin. Provable Memorization via Deep Neural Networks using Sub-linear Parameters. COLT 2021, 2021.

Chulhee Yun, Shankar Krishnan, Hossein Mobahi. A Unifying View on Implicit Bias in Training Linear Neural Networks. ICLR 2021, 2021.

Sejun Park, Chulhee Yun, Jaeho Lee, Jinwoo Shin. Minimum Width for Universal Approximation. ICLR 2021 (Spotlight), 2021.

Kwangjun Ahn, Chulhee Yun, Suvrit Sra. SGD with shuffling: optimal rates without component convexity and large epoch requirements. NeurIPS 2020 (Spotlight), 2020.

Chulhee Yun, Yin-Wen Chang, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar. $O(n)$ Connections are Expressive Enough: Universal Approximability of Sparse Transformers. NeurIPS 2020, 2020.

Srinadh Bhojanapalli, Chulhee Yun, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar. Low-Rank Bottleneck in Multi-head Attention Models. ICML 2020, 2020.

Chulhee Yun, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar. Are Transformers universal approximators of sequence-to-sequence functions?. ICLR 2020, 2020.

Chulhee Yun, Suvrit Sra, Ali Jadbabaie. Are deep ResNets provably better than linear predictors?. NeurIPS 2019, 2019.

Chulhee Yun, Suvrit Sra, Ali Jadbabaie. Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity. NeurIPS 2019 (Spotlight), 2019.

Chulhee Yun, Suvrit Sra, Ali Jadbabaie. Efficiently testing local optimality and escaping saddles for ReLU networks. ICLR 2019, 2019.

Chulhee Yun, Suvrit Sra, Ali Jadbabaie. Small nonlinearities in activation functions create bad local minima in neural networks. ICLR 2019, 2019.

John Duchi, Feng Ruan, Chulhee Yun. Minimax Bounds on Stochastic Batched Convex Optimization. COLT 2018, 2018.

Chulhee Yun, Suvrit Sra, Ali Jadbabaie. Global optimality conditions for deep neural networks. ICLR 2018, 2018.

Chulhee Yun, Donghoon Lee, Chang D. Yoo. Face detection using Local Hybrid Patterns. ICASSP 2015, 2015.