Chulhee “Charlie” Yun
Chulhee “Charlie” Yun
Home
News
Publications
Teaching
Research Group
Service
Publications
Type
Conference paper
Preprint
Date
2026
2025
2024
2023
2022
2021
2020
2019
2018
2015
Jihwan Kim
,
Dogyoon Song
,
Chulhee Yun
.
Scaling Laws of SignSGD in Linear Regression: When Does It Outperform SGD?
.
ICLR 2026
, 2026.
Chaewon Moon
,
Dongkuk Si
,
Chulhee Yun
.
Minor First, Major Last: A Depth-Induced Implicit Bias of Sharpness-Aware Minimization
.
ICLR 2026
, 2026.
Beomhan Baek
,
Minhak Song
,
Chulhee Yun
.
Implicit Bias of Per-sample Adam on Separable Data: Departure from the Full-batch Regime
.
ICLR 2026
, 2025.
arXiv
Baekrok Shin
,
Chulhee Yun
.
Implicit Bias and Loss of Plasticity in Matrix Completion: Depth Promotes Low-Rankness
.
ICLR 2026
, 2025.
Yujun Kim
,
Chaewon Moon
,
Chulhee Yun
.
The Cost of Robustness: Tighter Bounds on Parameter Complexity for Robust Memorization in ReLU Nets
.
NeurIPS 2025
, 2025.
Paper
arXiv
Junsoo Oh
,
Jerry Song
,
Chulhee Yun
.
From Linear to Nonlinear: Provable Weak-to-Strong Generalization through Feature Learning
.
NeurIPS 2025
, 2025.
Paper
arXiv
Minhak Song
,
Beomhan Baek
,
Kwangjun Ahn
,
Chulhee Yun
.
Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training
.
NeurIPS 2025
, 2025.
Paper
arXiv
Donghwa Kim
,
Jaewook Lee
,
Chulhee Yun
.
Provable Benefit of Random Permutations over Uniform Sampling in Stochastic Coordinate Descent
.
ICML 2025
, 2025.
Paper
arXiv
Geonhui Yoo
,
Minhak Song
,
Chulhee Yun
.
Understanding Sharpness Dynamics in NN Training with a Minimalist Example: The Effects of Dataset Difficulty, Depth, Stochasticity, and More
.
ICML 2025
, 2025.
Paper
arXiv
Yujun Kim
,
Jaeyoung Cha
,
Chulhee Yun
.
Incremental Gradient Descent with Small Epoch Counts is Surprisingly Slow on Ill-Conditioned Problems
.
ICML 2025
, 2025.
Paper
arXiv
Yeseul Cho
,
Baekrok Shin
,
Changmin Kang
,
Chulhee Yun
.
Lightweight Dataset Pruning without Full Training via Example Difficulty and Prediction Uncertainty
.
ICML 2025
, 2025.
Paper
arXiv
Hyunsu Kim
,
Giung Nam
,
Chulhee Yun
,
Hongseok Yang
,
Juho Lee
.
Parameter Expanded Stochastic Gradient Markov Chain Monte Carlo
.
ICLR 2025
, 2025.
Paper
arXiv
Hyunji Jung
,
Hanseul Cho
,
Chulhee Yun
.
Convergence and Implicit Bias of Gradient Descent on Continual Linear Classification
.
ICLR 2025
, 2025.
Paper
arXiv
Hanseul Cho
,
Jaeyoung Cha
,
Srinadh Bhojanapalli
,
Chulhee Yun
.
Arithmetic Transformers Can Length-Generalize in Both Operand Length and Count
.
ICLR 2025
, 2025.
Paper
arXiv
Minhak Song
,
Kwangjun Ahn
,
Chulhee Yun
.
Does SGD really happen in tiny subspaces?
.
ICLR 2025
, 2025.
Paper
arXiv
Jiseok Chae
,
Chulhee Yun
,
Donghwan Kim
.
Stochastic Extragradient with Flip-Flop Shuffling & Anchoring: Provable Improvements
.
NeurIPS 2024
, 2024.
Paper
arXiv
Baekrok Shin
,
Junsoo Oh
,
Hanseul Cho
,
Chulhee Yun
.
DASH: Warm-Starting Neural Network Training in Stationary Settings without Loss of Plasticity
.
NeurIPS 2024
, 2024.
Paper
arXiv
Junsoo Oh
,
Chulhee Yun
.
Provable Benefit of Cutout and CutMix for Feature Learning
.
NeurIPS 2024
(Spotlight)
, 2024.
Paper
arXiv
Hanseul Cho
,
Jaeyoung Cha
,
Pranjal Awasthi
,
Srinadh Bhojanapalli
,
Anupam Gupta
,
Chulhee Yun
.
Position Coupling: Improving Length Generalization of Arithmetic Transformers Using Task Structure
.
NeurIPS 2024
, 2024.
Paper
arXiv
Prin Phunyaphibarn
,
Junghyun Lee
,
Bohan Wang
,
Huishuai Zhang
,
Chulhee Yun
.
Gradient Descent with Polyak's Momentum Finds Flatter Minima via Large Catapults
.
ICML 2024 Workshop on High-dimensional Learning Dynamics 2024: The Emergence of Structure and Reasoning
, 2024.
arXiv
Jaewook Lee
,
Hanseul Cho
,
Chulhee Yun
.
Fundamental Benefit of Alternating Updates in Minimax Optimization
.
ICML 2024
(Spotlight)
, 2024.
Paper
arXiv
Kwangjun Ahn
,
Xiang Cheng
,
Minhak Song
,
Chulhee Yun
,
Ali Jadbabaie
,
Suvrit Sra
.
Linear attention is (maybe) all you need (to understand transformer optimization)
.
ICLR 2024
, 2023.
Paper
arXiv
Junghyun Lee
,
Hanseul Cho
,
Se-Young Yun
,
Chulhee Yun
.
Fair Streaming Principal Component Analysis: Statistical and Algorithmic Viewpoint
.
NeurIPS 2023
, 2023.
Paper
arXiv
Minhak Song
,
Chulhee Yun
.
Trajectory Alignment: Understanding the Edge of Stability Phenomenon via Bifurcation Theory
.
NeurIPS 2023
, 2023.
Paper
arXiv
Hojoon Lee
,
Hanseul Cho
,
Hyunseung Kim
,
Daehoon Gwak
,
Joonkee Kim
,
Jaegul Choo
,
Se-Young Yun
,
Chulhee Yun
.
PLASTIC: Improving Input and Label Plasticity for Sample Efficient Reinforcement Learning
.
NeurIPS 2023
, 2023.
Paper
arXiv
Dongkuk Si
,
Chulhee Yun
.
Practical Sharpness-Aware Minimization Cannot Converge All the Way to Optima
.
NeurIPS 2023
(Spotlight)
, 2023.
Paper
arXiv
Junsoo Oh
,
Chulhee Yun
.
Provable Benefit of Mixup for Finding Optimal Decision Boundaries
.
ICML 2023
, 2023.
Paper
arXiv
Jaeyoung Cha
,
Jaewook Lee
,
Chulhee Yun
.
Tighter Lower Bounds for Shuffling SGD: Random Permutations and Beyond
.
ICML 2023
(Oral)
, 2023.
Paper
arXiv
David X. Wu
,
Chulhee Yun
,
Suvrit Sra
.
On the Training Instability of Shuffling SGD with Batch Normalization
.
ICML 2023
, 2023.
Paper
arXiv
Hanseul Cho
,
Chulhee Yun
.
SGDA with shuffling: faster convergence for nonconvex-PŁ minimax optimization
.
ICLR 2023
, 2022.
Paper
arXiv
Chulhee Yun
,
Shashank Rajput
,
Suvrit Sra
.
Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond
.
ICLR 2022
(Oral)
, 2022.
Paper
arXiv
Chulhee Yun
,
Suvrit Sra
,
Ali Jadbabaie
.
Open Problem: Can Single-Shuffle SGD be Better than Reshuffling SGD and GD?
.
COLT 2021
, 2021.
Paper
Long version
Sejun Park
,
Jaeho Lee
,
Chulhee Yun
,
Jinwoo Shin
.
Provable Memorization via Deep Neural Networks using Sub-linear Parameters
.
COLT 2021
, 2021.
Paper
arXiv
Chulhee Yun
,
Shankar Krishnan
,
Hossein Mobahi
.
A Unifying View on Implicit Bias in Training Linear Neural Networks
.
ICLR 2021
, 2021.
Paper
arXiv
Sejun Park
,
Chulhee Yun
,
Jaeho Lee
,
Jinwoo Shin
.
Minimum Width for Universal Approximation
.
ICLR 2021
(Spotlight)
, 2021.
Paper
arXiv
Kwangjun Ahn
,
Chulhee Yun
,
Suvrit Sra
.
SGD with shuffling: optimal rates without component convexity and large epoch requirements
.
NeurIPS 2020
(Spotlight)
, 2020.
Paper
arXiv
Chulhee Yun
,
Yin-Wen Chang
,
Srinadh Bhojanapalli
,
Ankit Singh Rawat
,
Sashank J. Reddi
,
Sanjiv Kumar
.
$O(n)$ Connections are Expressive Enough: Universal Approximability of Sparse Transformers
.
NeurIPS 2020
, 2020.
Paper
arXiv
Srinadh Bhojanapalli
,
Chulhee Yun
,
Ankit Singh Rawat
,
Sashank J. Reddi
,
Sanjiv Kumar
.
Low-Rank Bottleneck in Multi-head Attention Models
.
ICML 2020
, 2020.
Paper
arXiv
Chulhee Yun
,
Srinadh Bhojanapalli
,
Ankit Singh Rawat
,
Sashank J. Reddi
,
Sanjiv Kumar
.
Are Transformers universal approximators of sequence-to-sequence functions?
.
ICLR 2020
, 2020.
Paper
arXiv
Chulhee Yun
,
Suvrit Sra
,
Ali Jadbabaie
.
Are deep ResNets provably better than linear predictors?
.
NeurIPS 2019
, 2019.
Paper
arXiv
Chulhee Yun
,
Suvrit Sra
,
Ali Jadbabaie
.
Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity
.
NeurIPS 2019
(Spotlight)
, 2019.
Paper
arXiv
Chulhee Yun
,
Suvrit Sra
,
Ali Jadbabaie
.
Efficiently testing local optimality and escaping saddles for ReLU networks
.
ICLR 2019
, 2019.
Paper
arXiv
Chulhee Yun
,
Suvrit Sra
,
Ali Jadbabaie
.
Small nonlinearities in activation functions create bad local minima in neural networks
.
ICLR 2019
, 2019.
Paper
arXiv
John Duchi
,
Feng Ruan
,
Chulhee Yun
.
Minimax Bounds on Stochastic Batched Convex Optimization
.
COLT 2018
, 2018.
Paper
Chulhee Yun
,
Suvrit Sra
,
Ali Jadbabaie
.
Global optimality conditions for deep neural networks
.
ICLR 2018
, 2018.
Paper
arXiv
Chulhee Yun
,
Donghoon Lee
,
Chang D. Yoo
.
Face detection using Local Hybrid Patterns
.
ICASSP 2015
, 2015.
Paper
Cite
×