Microsoft Improves Transformer Stability to Successfully Scale Extremely Deep Models to 1000 Layers | Synced

A Microsoft research team proposes DeepNorm, a novel normalization function that improves the stability of transformers to enable scaling that is an order of magnitude deeper (more than 1,000 layer...

By · · 1 min read

Source: Synced | AI Technology & Industry Review

A Microsoft research team proposes DeepNorm, a novel normalization function that improves the stability of transformers to enable scaling that is an order of magnitude deeper (more than 1,000 layers) than previous deep transformers.