Microsoft Improves Transformer Stability to Successfully Scale Extremely Deep Models to 1000 Layers | Synced
A Microsoft research team proposes DeepNorm, a novel normalization function that improves the stability of transformers to enable scaling that is an order of magnitude deeper (more than 1,000 layer...
Source: Synced | AI Technology & Industry Review
A Microsoft research team proposes DeepNorm, a novel normalization function that improves the stability of transformers to enable scaling that is an order of magnitude deeper (more than 1,000 layers) than previous deep transformers.