1

Scaling Laws of SignSGD in Linear Regression: When Does It Outperform SGD?

Jihwan Kim, Dogyoon Song, Chulhee Yun

Minor First, Major Last: A Depth-Induced Implicit Bias of Sharpness-Aware Minimization

Chaewon Moon, Dongkuk Si, Chulhee Yun

Implicit Bias of Per-sample Adam on Separable Data: Departure from the Full-batch Regime

Beomhan Baek, Minhak Song, Chulhee Yun

Implicit Bias and Loss of Plasticity in Matrix Completion: Depth Promotes Low-Rankness

Baekrok Shin, Chulhee Yun

Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training

Minhak Song, Beomhan Baek, Kwangjun Ahn, Chulhee Yun

Provable Benefit of Random Permutations over Uniform Sampling in Stochastic Coordinate Descent

Donghwa Kim, Jaewook Lee, Chulhee Yun

Understanding Sharpness Dynamics in NN Training with a Minimalist Example: The Effects of Dataset Difficulty, Depth, Stochasticity, and More

Geonhui Yoo, Minhak Song, Chulhee Yun

Incremental Gradient Descent with Small Epoch Counts is Surprisingly Slow on Ill-Conditioned Problems

Yujun Kim, Jaeyoung Cha, Chulhee Yun

Lightweight Dataset Pruning without Full Training via Example Difficulty and Prediction Uncertainty

Yeseul Cho, Baekrok Shin, Changmin Kang, Chulhee Yun

Convergence and Implicit Bias of Gradient Descent on Continual Linear Classification

Hyunji Jung, Hanseul Cho, Chulhee Yun