Overview

The Heavy-Tail Phenomenon in SGD

TLDR

TODO

Summary

Understanding SGD

1. Introduction

SGD

2. Motivations

Type: Improving the theoretical understanding of empirically observed good results

Empirical Observation: SGD is good at finding minima with good generalization capabilities (at the moment, it is at least better than other methods)

Theoretical Focus: Why

TODO

Fig1