Datasheets for datasets is a tool for documenting the datasets used for training and evaluating machine learning models. The aim of datasheets is to increase dataset transparency and facilitate better communication between dataset creators and dataset consumers
This all led to Mitchell inviting Gebru to work at Google with her. Although there were some wins in regards to artificial intelligence, there were also many cultural challenges associated with this.
Inside Google, researchers worked to build more powerful successors to BERT and GPT-3. Separately, the Ethical AI team began researching the technology’s possible downsides. Then, in September 2020, Gebru and Mitchell learned that 40 Googlers had met to discuss the technology’s future. No one from Gebru’s team had been invited, though two other “responsible AI” teams did attend. There was a discussion of ethics, but it was led by a product manager, not a researcher.
In part, this led to the ill-fated paper “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?”
The paper was not intended to be a bombshell. The authors did not present new experimental results. Instead, they cited previous studies about ethical questions raised by large language models, including about the energy consumed by the tens or even thousands of powerful processors required when training such software, and the challenges of documenting potential biases in the vast data sets they were made with. BERT, Google’s system, was mentioned more than a dozen times, but so was OpenAI’s GPT-3.
It is unclear where this leaves research into and developing of artificial intelligence.