Home » Blog » ideas » I read Facebook’s Widely Viewed Content Report. It’s really strange.

I read Facebook’s Widely Viewed Content Report. It’s really strange.

What content is popular on Facebook?

This is a surprisingly difficult – and very important – question to answer. Facebook is the most popular social network in the world, and is used by roughly 7 in 10 Americans, roughly half of whom visit the site multiple times a day. 36% of Americans report that they regularly read news on Facebook. This means that Facebook could be a powerful vector for sharing misinformation or extremism that readers interpret as news.

Journalist Kevin Roose began publishing a Twitter feed in July 2020 that showcased the 10 “top performing” links on Facebook as determined by Facebook’s Crowdtangle analytics tool. Most days, the “top performing” links come from right-wing commentators and provocateurs like Dan Bongino, Ben Shapiro, Fox News and others. This seems to contradict the popular narrative that Facebook is biased against conservatives and suggests, instead that conservatives – and particularly incendiary ones – thrive on the platform.

Facebook really doesn’t like this feed. A few weeks after Roose began the project, John Hegeman – the head of Facebook’s news feed – took to Twitter to explain that Roose was tracking “engagement” – the number of people who liked, commented on or shared a given story. A better way to understand popularity of stories on Facebook was “reach”, i.e., the number of feeds a story appeared in. Facebook’s reach statistics show that mainstream news content is far more common on Facebook than the far-right content Roose was featuring. In my personal favorite part of the Twitter thread (now deleted) Hegeman conceded that Roose came to reasonable conclusions given the data he had, and that Hegeman couldn’t share the data that illustrated his points, though Facebook was exploring making it public.

Well, now that data’s public. Sort of. A little bit of it. And it’s pretty interesting, though not always in the way Facebook may have meant it to be.

Facebook is now publishing a quarterly transparency report – the Widely Reviewed Content report – which includes some helpful high level information about what US Facebook users are seeing in their news feeds. It’s interesting to see that 57% of posts in news feeds come from the friends users have chosen to follow – arguably the core value proposition of Facebook. 19.3% come from Groups and 14.3% from Pages followed (brands, public figures, and content providers, like news outlets.) 9.5% is attributed to “unconnected posts” or “other” – I believe these are not ads (“sponsored posts” as opposed to “organic posts”, which Facebook is revealing in this report), but includes posts Facebook is recommending to users above and beyond the interests they’ve stated due to subscription to pages, groups or friend feeds. (“Content that did not come from friends, Pages people followed, or Groups that they were a part of, also referred to as unconnected posts, made up a relatively minor percentage of content views.”) Why am I harping on this? It will make more sense once we start looking at the top 20 domains section of the report.

Facebook’s report make the important point that most content encountered on the news feed is not “news”, in the sense of breaking news – usually, its friends updating each other on the cute thing their baby/dog/robot vacuum cleaner did. The top 20 domains represented in the Facebook data represent only 1.9% of the stories in Facebook feeds, and those associated with traditional news sites represent only 0.3%. So, news as traditionally thought of is an ingredient in the Facebook news feed, but not a big one.

So what’s in there? Well, Facebook gives us only the top twenty domains, and most of them aren’t super informative. YouTube is the top domain that appears in Facebook’s news feed… which tells us basically nothing. That could be a handful of wildly popular videos, the low hum of GenXers sharing music videos from the 1980s or boatloads of innocuous content intermingled with QAnon propaganda. 13 of the 20 domains shared by FB are essentially information free: Vimeo, Eventbrite, Google, Etsy, Google Docs, Linktree, Spotify, Tiktok, GoFundMe, Twitter, Amazon and MediaTenor, a GIF hosting site. So far we’ve learned that big web companies are popular and people share GIFs on Facebook.

What about the other seven? Five are mainstream news sites – ABC, NBC and CBS News, CNN and the UK Daily Mail. It’s interesting that Fox isn’t in there, and notable that the Daily Mail, a conservative leaning publication notorious for sensationalism and poor fact checking, ranks higher than all but one mainstream news outlet.

And then there’s the other two. Unicef captures the third spot, briefly leading me to believe that Facebook was emulating FC Barça before they lost their soul and providing the international agency with free publicity. Roose on the Facebook Top 10 feed suggests a different explanation – UNICEF posts routinely appear on Facebook COVID-19 info panels, likely driving a great deal of traffic to the site – here is an example of a popular post, the #3 URL on the Facebook newsfeed in Q2 2021.

But who the hell is playeralumniresources.com? They’re the #9 most common domain in the Facebook news feed, and their homepage is the #1 most common URL in Facebook’s set, with 87.2 million views. Well, they’re a speaking agency of former Green Bay Packers players, who are available to join you for a round of golf, a fishing expedition or to sign autographs. And while I personally would be willing to pay a good deal of money to catch walleye with William Henderson (#33, legendary Packers fullback and Superbowl champion), the popularity of this page suggests that there’s something wacky about this data set.

The obvious hypothesis is that Player Alumni Resources is buying a boatload of Facebook ads and perhaps either people are organically sharing those ads or FB is putting a thumb on the “unconnected post” feed to boost their results. Fortunately, Facebook has an archive of ads on the site, and we can see that Player Alumni Resources is a known Facebook advertiser. But they’re not running ads at this point, so we can’t get data on them. Did they run a shitload of ads last quarter? Dunno – I’d need to scrape and store Facebook Ad Library data to answer that question. At least we know that Player Alumni Resources is not likely generating the single most popular URL on Facebook through their organic reach – they’ve got 4109 followers of their Facebook page.

I repeated the same searches for popular URLs #2, #4 and #5, the universally well known Pure Hemp Shop, MyIncredibleRecipies.com and ReppnforChrist. (The latter sells stylish, pro-Jesus apparel.) The CBD seller and the t-shirt maker look like the Packers speakers bureau – they’re past Facebook advertisers, maintain very small FB pages and are not running ads. Nothing for the recipe site. (Should I have searched for myinediblerecipies?)

Unless there’s a data error here, I’m guessing that Player Alumni Resources shows up in your “suggested for you” part of your feed pretty damned often if Facebook knows you’re a Packers fan or from Wisconsin. (Or both, but the set of Wisconsonites who are not Packers fans is blissfully small.) What’s the suggested for you” feed? I liked an article on McSweeney’s once – probably the piece about the snake fight portion of the PhD defense – and I now get suggestions for McSweeney’s articles incessantly in my news feed. They aren’t marked as sponsored content, so they’re not ads, I guess? But they sure feel like ads.

What’s going on here? Dunno. And that’s my overall reaction to Facebook’s transparency report. It shares a few interesting figures that reinforce the narrative that Facebook is more about posts from friends and family than news. But it doesn’t share enough data that we can come to any meaningful conclusions. If the domain list included a thousand URLs, perhaps, we might be able to compare attention to a mainstream news site like CNN to a fringe newssite like the Dan Bongino podcast. But with only 20 domains – 13 of which should probably not appear in the set, as they’re generic to the point of meaninglessness – it’s very hard to know what’s going on.

This data exists, by the way. Facebook generated a set of shared URL data through data partnership Social Science One. That set includes URLs shared at least 100 times, and it is accessible only to a set of researchers who’ve applied for access through a process that’s been criticized as slow and ultimately unsatisfying, due to concerns about data quality.

I am genuinely glad that Facebook is releasing more data about what’s popular on their platform. I am also genuinely astonished that this is what Facebook produced for their first effort. The data is accompanied by a thoughtful “companion guide” that helps explain how the newsfeed algorithm works. Did no one think to offer some marginal notes on why FB released only 20 top domains and URLs? Why obscure advertisers somehow emerge as the most popular URLs on the platform? The cynical side of me reads this report as transparency theatre – a chance for FB to tell critics that they’re moving in the direction of transparency without releasing any of the data a researcher would need to answer a question like “Is extreme right-wing content disproportionately popular on Facebook?”

Earlier this year, my colleagues and I released a 65 page report written for funders who support research on the media ecosystem about access to data about the big social media platforms. I will spare you the long read and offer my personal executive summary (not necessarily endorsed by my coauthors): Stop fucking waiting for the platforms to give us data. They’re not going to give us data. We need to get our own data.

Creative researchers are finding ways to generate data sets that we cannot currently – and perhaps may not ever – get from social media platforms. Jason Baumgartner’s desperately under-resourced Pushshift.io has scraped a full archive of Reddit and has become indispensable for researchers of that platform. At least three groups are recruiting panels of Facebook users and asking users to donate their data so researchers can better understand what they’re seeing on Facebook. Facebook has responded to this effort by shutting down the accounts of one team of researchers in such a heavy-handed fashion that the acting director of the FTC’s consumer protection bureau felt compelled to weigh in.

Maybe the next quarterly report will share 100 URLs instead of 20. Maybe Facebook will explain that I somehow missed a wildly popular post about someone’s weekend golf game with placekicker Chris Jacke. But it’s frankly pathetic that, given the importance of these questions, researchers wait for these tiny snippets of information from behemoths like Facebook. It’s time for us to figure out how – respecting user privacy and research ethics – to get the data we need to understand what’s going on with these platforms.

Update: as folks are discussing the report and my analysis of this on Twitter, Kevin Roose has weighed in with theories about these oddly popular URLs. The most plausible explanation: they’re spam. Wonderfully, in one case, they’re spam that appears to be being posted by an account associated with Jaleel White, the actor who played Urkel on Family Matters.

Perhaps it’s admirably honest that Facebook is admitting its spam problems through this transparency report? Or perhaps we should be surprised – disturbed? – that a company with a trillion dollar valuation, releasing a report to document its transparency efforts, would make an unforced error of not filtering spam out of their site statistics?

I’m guessing the original version of this FB report may not remain up permanently – here’s a copy as of 6pm today on perma.cc just in case it’s cleaned up in the future.