More on Twitter and Algorithms

Previously: The Changing Face of Twitter

Right after I came out with a bunch of speculations about Twitter and its algorithm, we got a whole bunch of concrete info detailing exactly how much of Twitter’s algorithms work.

Thus, it makes sense to follow up and see what we have learned about Twitter since then. We no longer have to speculate about what might get rewarded. We can check.

We Have the Algorithm

We have better data now. Twitter ‘open sourced’ its algorithm – the quote marks are because we are missing some of the details necessary to recreate the whole algorithm. There is still a lot of useful information. You can find the announcement here and the GitHub depot here. Brandon Gorrell describes the algorithm at Pirate Wires.

Here are the parts of the announcement I found most important.

The foundation of Twitter’s recommendations is a set of core models and features that extract latent information from Tweet, user, and engagement data. These models aim to answer important questions about the Twitter network, such as, “What is the probability you will interact with another user in the future?” or, “What are the communities on Twitter and what are trending Tweets within them?” Answering these questions accurately enables Twitter to deliver more relevant recommendations.

The recommendation pipeline is made up of three main stages that consume these features: 

  1. Fetch the best Tweets from different recommendation sources in a process called candidate sourcing.
  2. Rank each Tweet using a machine learning model.
  3. Apply heuristics and filters, such as filtering out Tweets from users you’ve blocked, NSFW content, and Tweets you’ve already seen.

Today, the For You timeline consists of 50% In-Network Tweets and 50% Out-of-Network Tweets on average, though this may vary from user to user.

The most important component in ranking In-Network Tweets is Real Graph. Real Graph is a model which predicts the likelihood of engagement between two users. The higher the Real Graph score between you and the author of the Tweet, the more of their tweets we’ll include.

This matches my experience with For You. Your follows are very much not created equal. The accounts you often interact with will get shown reliably, and even shown when replying to other accounts. Accounts that you don’t interact with, you might as well not be following.

Thus, if there is an account you want to follow within For You, you’ll want to like a high percentage of their tweets, and if you don’t want that for someone, you’ll want to avoid interactions.

What about out-of-network?

We traverse the graph of engagements and follows to answer the following questions:

  • What Tweets did the people I follow recently engage with?
  • Who likes similar Tweets to me, and what else have they recently liked?

…we developed GraphJet, a graph processing engine that maintains a real-time interaction graph between users and Tweets, to execute these traversals. While such heuristics for searching the Twitter engagement and follow network have proven useful (these currently serve about 15% of Home Timeline Tweets), embedding space approaches have become the larger source of Out-of-Network Tweets.

So that’s super interesting on both fronts. The algorithm is explicitly looking to pattern match on what you liked. My lack of likes perhaps forced the algorithm to, in my case, fall back on more in-network Tweets. So one should be very careful with likes, and only use them when you want to see more similar things.

More than that, who you follow now is doing two distinct tasks. It provides in-network tweets, but only for those accounts you interact with. It also essentially authorizes those you follow to upvote content for you by interacting with that content.

That implies a strategy of two kinds of follows. You want to follow accounts whose Tweets you want to see, and interact aggressively. You also want to follow accounts whose tastes you want to copy, whether or not you like their content at all, except then you want to avoid interactions.

This means that if you have follows who often interact with things you want to see less of, such as partisan political content, you are paying a higher price than you might realize. Consider re-evaluating such follows (as with all of this, assuming you care about the For You tab).

Embedding space approaches aim to answer a more general question about content similarity: What Tweets and Users are similar to my interests.

One of Twitter’s most useful embedding spaces is SimClusters. SimClusters discover communities anchored by a cluster of influential users using a custom matrix factorization algorithm. There are 145k communities, which are updated every three weeks.

Ranking is achieved with a ~48M parameter neural network that is continuously trained on Tweet interactions to optimize for positive engagement (e.g. Likes, Retweets, and Replies). This ranking mechanism takes into account thousands of features and outputs ten labels to give each Tweet a score, where each label represents the probability of an engagement. We rank the Tweets from these scores. 

Exclusively maximizing engagement is a clear Goodhart’s Law problem, that is not what you or Twitter should want. Worth noticing.

  • Social Proof: Exclude Out-of-Network Tweets without a second degree connection to the Tweet as a quality safeguard. In other words, ensure someone you follow engaged with the Tweet or follows the Tweet’s author.

If one were focusing on For You or using a hybrid approach, this is another good reason to follow or unfollow someone. Do you want them used as social proof?

The ranking is in two stages. First the ‘light’ ranking to get down to ~1500 candidates, then the ‘heavy’ ranking to choose among them.

What else do we know? These all, I think, refer to the first-stage ‘light’ ranking:

1. Likes, then retweets, then replies Here’s the ranking parameters:

• Each like gets a 30x boost

• Each retweet a 20x

• Each reply only 1x It’s much more impactful to earn likes and retweets than replies.

So replies essentially don’t matter in light ranking? This is so weird. Replies are real engagement, likes are only nominal engagement at best. Which the ‘heavy ranking’ understands very well, as discussed later.

2. Images & videos help. Both images and videos lead to a nice 2x boost.

Image

It’s not obvious what a 2.0 boost means in practice, in terms of magnitude.

3. Links hurt, unless you have enough engagement.

Generally external links get you marked as spam. Unless you have enough engagement.

This makes sense provided the threshold is sufficiently low. I don’t think I’ve ever had a problem with that.

4. Mutes & unfollows hurt

All of the following hurt your engagement:

• Mutes • Blocks • Unfollows • Spam reports • Abuse reports

That makes sense, provided it is normalized to follower counts.

5. Blue extends reach: Paying the monthly fee gets you a healthy boost.

It’s currently 4.0 in-network, 2.0 out-of-network, and soon the plan is to exclude non-blue out-of-network entirely in many forms. So it’s a big deal whether or not you are already followed.

6. Misinformation is highly down-ranked Anything that is categorized as misinformation gets the rug pulled out from under it. Surprisingly, so are posts about Ukraine.

Image

This is the first thing that outright surprised me. The other things listed here all make sense, whether or not you like the principles used. But why would you downgrade posts about Ukraine?

I have two guesses. One is relatively benign. Hopefully it’s not the other one.

7. You are clustered into a group. The algorithm puts you into a grouping of similar profiles. It uses that to extend tweet reach beyond your followers to similar people.

8. Posting outside your cluster hurts If you do “out of network” content, it’s not going to do as well. That’s why hammering home points about your niche works.’

There are other organic reasons why it makes sense to ‘stay in your lane’ on Twitter, as this ensures the people who follow you are interested in your content. Now we find out the algorithm is actively punishing ‘hybrid’ accounts, discouraging me (for example) from posting about both rationality and AI, and then also posting about something else sports or Magic: The Gathering.

Then again, perhaps by using such targeting this actually gives effective permission to exit your lane at times.

9. Making up words or misspelling hurts Words that are identified as “unknown language” are given 0.01, which is a huge penalty. Anything under 1 is bad. This is really bad.

I will note that I have seen posts with misspellings do well, so enough engagement can overcome even this level of penalty.

10. Followers, engagement & user data are the three data points

If you take away anything, remember this – the models take in 3 inputs:

• Likes, retweets, replies: engagement data

• Mutes, unfollows, spam reports: user data

• Who follows you: the follower graph

Later, we found out something very different about the ‘heavy’ ranking, it relies much more on strong (‘real’?) engagement metrics.

If you want to boost engagement, sounds like you should reply to your replies.

If you want to help a Tweet out a lot, then it looks like these extended engagements have a big impact – you’ll want to click into and then like, not merely like, if you don’t want to reply. Ideally, you should reply, even if you don’t have that much to say.

You may note I’m making it a habit whenever possible to engage with anyone who replies and isn’t making my life actively worse by doing so. It’s a win win.

12. Your tweet’s relevancy decreases over time. At a rate of 50% every 6 hours, to be exact.

Yeah. That seems about right.

This makes it seem even more overdetermined that you want to use your best stuff at the more popular times.

From Steven Tey, remember to keep a high TweepCred? Which essentially means, I think, that you need to have enough interactions and follows to provide social proof. My presumption is that most ‘normal users’ will get there, but if you have few followers might want to be careful about following too many people.

1. Your following-to-follower ratio matters.

Twitter’s Tweepcred PageRank algorithm reduces the page rank of users who have a low number of followers but a high number of followings.

Here’s how the Tweepcred algorithm works:

  1. Assign a numerical score to each user based on the number and quality of interactions they have with other users – the higher the score, the more influential the user is on Twitter.
  2. Calculate a user’s reputation score based on factors like account age, number of followers, and device usage.
  3. Adjust the user’s score based on their follower-to-following ratio.
  4. The final score, on a scale of 0 to 100, is the Tweepcred score, which represents the user’s reputation on Twitter.

The effect is that if you are over 65 Tweepcred, you can post more and still have your content considered, whereas if you’re too low your content isn’t considered at all.

Alternative Methods Perhaps

A nice brainstorm is to ask, what if you had more control over the algorithm as it applied to you? You could in theory count on Twitter to serve up the 1500 candidates, then rank them yourself.

Arvind Narayanan: If Twitter let you customize your feed ranking algorithm, you could easily make it far more useful to you. In my case I’d prioritize tweets with a relatively high ratio of bookmarks to Likes — these tend to be papers/articles, which is the kind of content I come to Twitter for.

OK this is definitely another thing I like about Twitter — any time I think an idea out loud it turns out someone’s already done it.

Yassine Landa: @random_walker I am building a prototype that lets you do precisely this! Would love your feedback.

Eliezer Yudkowsky recently picked up a lot of new Twitter followers, and he is here to report that, algorithmically and experientially speaking, it’s not going great.

Dear @elonmusk: After my Twitter engagement “scaled”, it’s no longer usable for me as a way to carry on public conversations. Some features that’d help, maybe in a “Conversations” tab I could enable or pay for: 1) If I reply to someone, and they reply back, prioritize that!

2) If someone I follow replies to me, or mentions me, prioritize that.

3) If that hasn’t filled up the hypothetical Conversations tab, next prioritize: direct replies/QTs from accounts with over some threshold level of followers; OR replies or QTs with lots of views/likes.

4) Or just have a tab or option to filter out all the “Mentions” from conversations where somebody else @’ed me; or they’re replying to something I retweeted and not something I wrote myself; or where the reply chain has gone >2 deep without further involvement from me.

If I actually got your attention here, I’d separately advocate for having some box I can click to distinguish my important tweets from my shitposts, and the ability for me to follow someone and only see their tweets marked Important (or not marked Unimportant).

More generally, Elon, if at some point you want to meet for a few hours and *not talk about AI at all* and instead just brainstorm *how to fix Twitter before it further destroys the sanity of the human species*, I am up for it.

(Where the number one thing Twitter does to destroy sanity is something like: lacking an algorithm such that an invalid QT or dunk is usually seen alongside the tweet that refutes it. I don’t have a clearly correct fix for this; I can think of stuff to try.)

I hope Elon takes him up on this offer, whether or not they ever also end up talking about AI. It is so strange to me that what Eliezer is asking for here is hard for him to get.

It would be highly amusing if Eliezer and Elon got together to talk and didn’t discuss AI, despite Elon once again doing what Eliezer thinks is about the worst possible thing. Still seems way better than not talking.

Algorithm Bonus Content: Canadian YouTube

Canada is preparing to pass a law requiring a third of YouTube links be to Canadian content.

The bill is inching toward a final vote in the Canadian Senate as soon as next month. It’s expected to pass. If it does, YouTube CEO Neal Mohan said in an October blog post, the same creators the government says it wants to help will, in fact, be hurt.

When users are recommended content that is not personally relevant, they react by tuning out – skipping the video, abandoning the video, or even giving it a ‘thumbs down’. When our Search and Discovery systems receive these signals, they learn that this content is not relevant or engaging for viewers, and then apply this on a global scale. This means that globally, Canadian creators will have a harder time breaking through and connecting with the niche audiences who would actually love their content. That directly hits the bottom line of Canadian creators, making it harder for them to build a sustainable business.

If that happens, it will be because YouTube chose that result, sacrificing the quality of the YouTube experience in order to punish Canada for its insolence.

If YouTube treats content surfaced at the whim of a Canadian regulator as if it was recommended by the algorithm, and evaluates customer reactions on that basis, then yes, this would severely punish Canadian creators.

However, that is clearly a distortion, and thus a stupid way to handle this situation. Instead YouTube, if its goal is to serve up the best videos possible, should adjust its evaluations to account for the poor product-market fit, or perhaps (if it didn’t have a better option because everyone is busy working on Bard) simply throw out the data on videos that its algorithm would not have served on its own.

What about the issue of potentially violating content? Twitter is making such actions more transparent, Colin Fraser highlights the inherent dilemmas here.

Twitter Safety: Restricting the reach of Tweets, also known as visibility filtering, is one of our existing enforcement actions that allows us to move beyond the binary “leave up versus take down” approach to content moderation. However, like other social platforms, we have not historically been transparent when we’ve taken this action. Starting soon, we will add publicly visible labels to Tweets identified as potentially violating our policies letting you know we’ve limited their visibility.

Authors will be able to submit feedback on the label if they think we incorrectly limited their Tweet’s visibility. Currently, submitting feedback does not guarantee you will receive a response or that your Tweet’s reach will be restored. We are working on allowing authors to appeal our decision.

Colin Fraser: What’s tricky about this is, the reason to limit visibility on “potentially violating” is that you have some classifier or heuristic that finds violating tweets but with low enough precision that deleting them would cause an unacceptable volume of false positives but you don’t have enough review capacity to check all of those tweets to see if they really are violating.

So you sort of split the difference by not deleting the tweet but limiting its visibility, knowing that many will be false positives, but that the overall effect will be to lower the number of impressions on violating tweets. But by slapping a visible label onto it, now all those false positives (and probably most of the true positives) will appeal the label. So now you’ve got a flood of appeals to deal with which changes the math.

The microeconomics of it all suggest that the answer is to actually apply the visibility filtering a lot less than you currently do, since now every time you apply it you have to pay for an appeal. But ironically, the perception will be that you’re applying it more than ever.

Content moderation presents some really interesting microeconomics problems, one day I will write a big thing about this.

As every moderator knows, the last thing you want to do is call attention to the thing you are making a choice not to call attention to, nor do you want to have to justify every decision. It rarely goes well. If they are going to allow appeals here, they are going to need to ensure that the appeal comes with skin in the game – if a human looks at your Tweet and does decide it is offensive, there must be a price.

The Great Polling Experiments

So far I’ve run two giant polling threads on Twitter.

The first one polled an AI doom scenario where an ASI (artificial superintelligence) attempted to gather resources, take over the world and then kill all humans, without the ability to foom or itself develop new innovative tech. This experiment went well, engagement was strong, good discussion happened and I learned a lot.

The second one polled the 24 predictions from On AutoGPT. That thread flopped on engagement, with the first post ending up with less than 10% the views and votes of the first polling thread, although still enough votes to get a clear idea. You need 300+ for a robust poll on an election, but 50 votes is plenty for ‘do people more or less believe or expect this?’ I confirmed some things but didn’t learn as much.

I am still holding off on the analysis post for now, hopefully get to that soon.

What I miss most in these situations is correlations. I can’t tell to what extent people’s answers make sense and are consistent. I can’t tell whether people’s answers represent plausible cruxes, either, unless I explicitly ask that, and you don’t want to overstay your welcome in such situations.

I would try other forms of polling, but I’d expect engagement numbers to drop off dramatically, and to do so in ways that skew the data. I asked the person I should obviously ask to see if they had any advice, we’ll see if anything comes of that.

In particular, a few threads I want to do in the future, suggestions welcome:

  1. Here are various future scenarios. If we reach this point, how doomed are we?
  2. Here are various future outcomes. If we reach this state and it is roughly stable, is that a good outcome? Does that constitute doom, paradise or neither?
  3. Here are various statements. Do you believe this statement? Is this statement a meaningful crux for you? As in, if I convinced you of this statement, would that meaningfully change your position on the likely path of the future and what we should do to ensure the future goes well?

What else? What questions should be in those? I figure maybe do one of these a week as a Monday special.

What Does the Future of Twitter Look Like?

As I see my reactions to knowing the Twitter algorithm, I see myself doing things that seem mostly net good for Twitter, and also getting more use out of the platform. I am slightly worried about exodus by former blue checks, but only slightly.

Two recent departures were NPR and CBC, both of which were protesting being labeled ‘state media’ merely because they are public broadcasters funded in large part by the state. I get why they are upset about the label, yet I don’t see how one can call it inaccurate.

As for the celebrities who leave? I won’t miss them.

For a while, the uncertainty about Twitter’s future was uncertainty about Elon Musk and his plans, and whether the website would fall apart or Twitter would go bankrupt or everyone would leave in droves.

I no longer worry much about those scenarios. Instead, even in the context of Twitter, I almost entirely worry about AI.

The intersection of those two issues famously includes Twitter bots. An ongoing problem, as you can see:

John Scott-Railton: Want a window into Twitter’s totally unsolved bot problem? Search for “as an AI language model.”

Here are some more search terms: “not a recognized word” “cannot provide a phrase” “with the given words” “violates OpenAI’s content policy.”

Image

Reports are they identified almost 60,000 accounts this way. I doubt there were many false positives.

Twitter will rise or fall based on how AI transforms our experiences and the internet – if we’re still around and doing things where Twitter fits in, it’ll be great. If not, not.

The thing about the Twitter bots is there are a lot of them, but mostly they don’t matter. Look at the five posts above where we see view counts. The total is seventeen, or at most maybe five views a minute from all 60k accounts combined. Given how the current model works, almost all the utility lost from bots is due to DM spam, which is made possible because people like me keep our DMs open and find a lot of value in that. So what if I have to block a spam account once a week?

This entry was posted in Uncategorized. Bookmark the permalink.

2 Responses to More on Twitter and Algorithms

  1. Sniffnoy says:

    Now we find out the algorithm is actively punishing ‘hybrid’ accounts, discouraging me (for example) from posting about both rationality and AI, and then also posting about something else sports or Magic: The Gathering.

    Aw but we’re all waiting to hear your thoughts on battles. :)

    The first one polled an AI doom scenario where an ASI (artificial superintelligence)

    Ugh, this acronym is so confusing. AGI stands for artificial general intelligence, so by parallelism, ASI stands for artificial special intelligence, right? Except, oops, no, it actually means roughly the same thing as AGI, rather than a contrast.

    (as with all of this, assuming you care about the For You tab)

    This caveat seems worth emphasizing further. Like my first suggestion would be, don’t use For You, use Tweetdeck…

  2. lofo says:

    As a Canadian I was VERY surprised to read your notes about YouTube, to say the least. After a bit of searching, I found the source quoted by Less Wrong to be an extremist sensationalist paper whose front page includes a section called “Witch Hunts.”

    Skeptical, I looked up the wording on the Senate’s website, which you can see for yourself here: https://www.parl.ca/DocumentViewer/en/44-1/bill/C-11/third-reading

    There are a variety of sections about social media, but they are all along the same lines as this: “(2.‍2) An online undertaking that provides a social media service does not, for the purposes of this Act, exercise programming control over programs uploaded by a user of the service who is not the provider of the service or the provider’s affiliate, or the agent or mandatary of either of them.”

    I checked the CBC, a much more reliable media outlet, and found this: https://www.cbc.ca/news/entertainment/bill-c-11-explained-1.6759878
    In November then-CRTC Chair Ian Scott told a senate committee studying the bill that it wouldn’t allow the regulator to manipulate algorithms to achieve its goals, and that it wasn’t interested in doing so anyway.
    “The CRTC’s objective is to ensure that Canadians are made aware of Canadian content and that they can find it,” he said.
    “I wish to assure you and Canadians more broadly that the CRTC has no intention of regulating individual TikTokers, YouTubers or other digital content creators.”

    What the bill actually applies to is streaming services like Netflix and Disney+, treating them the same way as cable companies have been treated for decades. Whether you agree with the rule for Canadian content or not, it’s a fairly minor consolidation that updates how the rules already were, not an extreme bill regulating social media.

Leave a comment