Papers I Read Notes and Summaries

Outrageously Large Neural Networks--The Sparsely-Gated Mixture-of-Experts Layer

Introduction

  • Conditional computation is a technique to increase a model’s capacity (without...


Gradient Surgery for Multi-Task Learning

  • The paper hypothesizes that main optimization challenges in multi-task learning arise because of...


GradNorm--Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks

Introduction

  • The paper proposes GradNorm, a gradient normalization algorithm that improves multi-task...


TaskNorm--Rethinking Batch Normalization for Meta-Learning

Introduction

  • Meta-learning techniques are shown to benefit from the use of deep...


Averaging Weights leads to Wider Optima and Better Generalization

Introduction

  • The paper proposes Stochastic Weight Averaging (SWA) procedure for improving the...


Decentralized Reinforcement Learning -- Global Decision-Making via Local Economic Transactions

Introduction

  • The paper explores the connections between the concepts of a single...


When to use parametric models in reinforcement learning?

Introduction

  • The paper compares replay-based approaches with model-based approaches in Reinforcement Learning...


Network Randomization - A Simple Technique for Generalization in Deep Reinforcement Learning

Introduction

  • The paper proposed a Technique for improving the generalization ability of...


On the Difficulty of Warm-Starting Neural Network Training

Introduction

  • The paper considers learning scenarios where the training data is available...


Supervised Contrastive Learning

Introduction

  • The paper builds on the prior work on self-supervised contrastive learning...