Bookmarked How Big is YouTube? (
Ethan Zuckerman shares some reflections on a recent article focused on measuring how big YouTube is.

YouTube is one of the largest, most important communication platforms in the world, but while there is a great deal of research about the site, many of its fundamental characteristics remain unknown. To better understand YouTube as a whole, we created a random sample of videos using a new method. Through a description of the sample’s metadata, we provide answers to many essential questions about, for example, the distribution of views, comments, likes, subscribers, and categories. Our method also allows us to estimate the total number of publicly visible videos on YouTube and its growth over time. To learn more about video content, we hand-coded a subsample to answer questions like how many are primarily music, video games, or still images. Finally, we processed the videos’ audio using language detection software to determine the distribution of spoken languages. In providing basic information about YouTube as a whole, we not only learn more about an influential platform, but also provide baseline context against which samples in more focused studies can be compared.

Source: Dialing for Videos: A Random Sample of YouTube by Ryan McGrady, Kevin Zheng, Rebecca Curran, Jason Baumgartner and Ethan Zuckerman

The information is captured in the site that created TubeStats and is updated regularly.

Separately, Ryan McGrady has summarised some key takeaways on the Initiative for Digital Public Infrastructure site:

  • There are about 10 13 billion publicly visible videos
  • YouTube is mostly not in English
  • Our current best estimate is that 32% of videos where we can detect the language are in English, with 10.5% in Hindi, 8% in Spanish, slightly fewer in Portuguese, and just over 6% in Arabic.
  • Most of YouTube doesn’t get many views
  • Not everyone is participating in the “creator economy”

  • There are an awful lot of video games

Source: 5 Main Takeaways from Randomly Sampling YouTube by Ryan McGrady

What is just as interesting as the statistics, but how they managed to capture the data through ‘drunk dialing’:

That bit after “watch?v=” is an 11 digit string. The first ten digits can be a-z,A-Z,0-9 and _-. The last digit is special, and can only be one of 16 values. Turns out there are 2^64 possible YouTube addresses, an enormous number: 18.4 quintillion. There are lots of YouTube videos, but not that many. Let’s guess for a moment that there are 1 billion YouTube videos – if you picked URLs at random, you’d only get a valid address roughly once every 18.4 billion tries.

We refer to this method as “drunk dialing”, as it’s basically as sophisticated as taking swigs from a bottle of bourbon and mashing digits on a telephone, hoping to find a human being to speak to. Jason found a couple of cheats that makes the method roughly 32,000 times as efficient, meaning our “phone call” connects lots more often. Kevin Zheng wrote a whole bunch of scripts to do the dialing, and over the course of several months, we collected more than 10,000 truly random YouTube videos.

Source: How Big is YouTube? by Ethan Zuckerman

After reading Jim Groom’s post about an AI Dr Oblivion, I am left wondering about what the numbers really mean.

Replied to Digitally Literate #227 by wiobyrne (

Youth Never Forget
Digitally Lit #227 – 1/4/2020
Hi all, welcome to issue #227 of Digitally Literate. Welcome to 2020. I hope the new year…and the new decade treat you well. You’re more than welcome to review these materials on the website. Please subscribe if you would like this to sh…

Another great newsletter Ian. Just a few thoughts. Firstly, in regards to the flaw with the research associated with YouTube:

One of the key critiques of the study is that the researchers didn’t log in. That is to say that they could not experience the full impact of the algorithm as it impacts their findings.

As Becca Lewis suggests, is the problem with measuring radicalisation of YouTube associated with methodology? This reminds me of some of the discussions associated with social media and teens. The examples I have read ‘How YouTube Radicalized Brazil‘ and ‘The Making of a YouTube Radical‘ are anecdotal. I assume this is why Arvind Narayanan says that we do not have the vocabulary to make sense of complexities generated via algorithms.

Also, in regards to Kate Eichhorn’s post about the internet that never forgets (and the subsequent book):

Kate Eichhorn, an Associate Professor of Culture and Media at The New School suggests that people are now forming their identities online from an early age, and in the process are creating a permanent record that’s impossible to delete.

I am reminded of a post from Katia Hildebrandt and Alec Couros from a few years ago in which they suggest that in a world where there is digital record for everything somewhere then we need to learn to consider intent, context, and circumstance when considering different artefacts that may be dredged up.