Big Bird: Transformers for Longer Sequences

Guru Guruganesh
Google Research

Time: 2021-11-05, 11 AM - 12 PM EDT
Location: Zoom Only

Abstract

The BERT model has been highly successful in Natural Language Processing tasks such as language translation, question answering, and text summarization. However, the cost of BERT’s attention mechanism grows quadratically on the length of the input sequence, so it doesn’t scale up beyond several hundred input tokens. In this talk, I will present Big Bird, a new state-of-the-art language model that uses sparse attention mechanism to reduce BERT’s quadratic dependency on input length to linear. Thanks to sparse attention, Big Bird can handle sequences of length up to 8x of what was previously possible with BERT. Thanks to the ability to handle longer inputs, Big Bird drastically improves performance on various NLP tasks such as question answering and summarization. It also enables applications to genomics data, such as promoter region prediction and chromatin-profile prediction on DNA sequences.

Bio

Guru Guruganesh is currently a Research Scientist at Google Research. He received his PhD in Computer Science from Carnegie Mellon University under the supervision of Prof. Anupam Gupta in 2018. He is interested in developing new and provable algorithms at the intersection of algorithmic game theory, theoretical machine learning and approximation algorithms.

Note: this talk will also be recorded