Transformers on the Edge
Transformer is an indispensable model in the modern deep learning stack. It already shows tremendous success on NLP, vision, and reinforcement learning. When running on the cloud edge and smart devices, transformers will enable exciting applications. The major obstacle, however, is the high demand for compute and memory resources. In this talk, I will overview recent efforts on efficient transformers, including sparse attention, weight sharing, quantization, transfer learning, and neural architecture search. I will highlight their inadequacies and the directions forward.
Note: this talk will also be recorded