Why accelerate
Well, we have neural networks, they are awesome, they work but there’s a problem. THEY ARE HUGE. We scaled from a hundred of millions of parameters to hundred of BILLIONS. This problem makes using neural networks in real life quite hard as you normally don’t have this huge computational capabilities to run them anywhere.
Neural networks have proven to be a very valuable tool in scenarios where the transformation from inputs to outputs is unknown. Suppose you are asked to write an algorithm to classify an image if it’s a cat or a dog, how would you do that ? Well first you might ask yourself, “what makes an image a cat?”. Answering this question is incredibly hard because a vast amount of cases to cover in order to have your algorithm generalizable. This is where neural networks shine; Given an input $ x_{i} $ with its respective label $ y_{i}$ you can use a neural network model with a set of parameters $\theta$ denoted by $ M(\theta) $ to approximate $y_{i} = f(x_{i})$. Normally with enough data you can get a very good estimate of $f$. However, this comes at a huge cost, training and running these large networks is expensive in terms of time and memory because of the huge amount of parameters that you need to learn to get the best approximation, this makes these models hard to use in real life scenarios. Also, the recent trend of models getting bigger and bigger in order to get better performance is making this problem even harder.