<Cross-posted from our blog here.>
Modulate’s mission statement is to make social voice chat better for everyone. But this raises an obvious question — better compared to what, exactly?
There are many anecdotal examples of (frankly horrifying) instances of toxicity and disruption online, but game studios tend to be tight-lipped about statistics. After all, no individual studio has an incentive to share just how toxic their game is for fear of a social backlash, especially knowing that the initial numbers likely won’t be great. Modulate wants to reframe the conversation to focus less on things like PR hits, and more on real improvements.
To be clear, we respect the individual game studios’ decisions to stay cautious, but ultimately this is a coordination problem. We realized that those coordination issues could be bypassed if we devised broader ways to take the industry’s overall temperature, without singling out specific games unless absolutely necessary. So, we asked ourselves, what if we could use ToxMod, our voice moderation system, to process large amounts of data from across a wide variety of games and genres?
It so happens that this data is available through public live streams on a few different platforms, so we took a stab at collecting it. In late December, we collected our first batch of over two million minutes of streaming content, which corresponded to nearly one hundred thousand individual streams. Since then, we’ve continuously listened to a wide range of public live streams to continue building our dataset, augmented with specific larger batches periodically. All told, Modulate has at this point collected roughly one million hours of public audio — this blog post is the first of many as we begin to explore the secrets of this dataset.
ToxMod makes use of a variety of different tools to detect toxicity — including emotion detection, keyword detection, sentiment analysis, and some very carefully calibrated, cautious estimates of demographics (underage players and gender breakdown.) In future blog posts, we’ll be outlining some of the discoveries of each of these different systems, as well as what we’ve learned about particular game genres through our analysis. But for today, we wanted to introduce our work with some very specific topics: how much toxicity did we detect overall, and what sorts of keywords seemed to correspond to it?
Firstly, about 3% of the clips we recorded contained something we’d deem toxic. This number comes from ToxMod’s “basic” estimate, which doesn’t take into account player history (since we’re running this on streamers the system hasn’t seen before.) As such, some streamers who may, for instance, actively cultivate a more mature audience, may have been marked as toxic by the algorithm even though their viewers may not have been actively offended by them. (In a real game, ToxMod could notice that all of the listeners were still participating and having a good time, thus decreasing its probability that something problematic had occurred.)
Specific keywords increased the odds that a clip was toxic. For instance, 22% of the clips containing the F word were marked toxic. (You might ask — isn’t the use of a swear word itself considered toxic, for instance for young kids? It’s possible to configure ToxMod this way. By default, though, we recognize that the F word can be used in a positive context among friends or more mature audiences — such as “F*** yeah!” when something exciting happens. For this analysis, these sorts of excited/positive uses of the F word were not deemed toxic; we only counted things like “F*** you!” or other actively hateful or aggressive expressions.) It’s important to keep in mind that, while this number is high, it still means that in only one-fifth of the instances someone uses the F word, they are expressing hate, aggression, or harassment. This is exactly why ToxMod uses additional signals like emotion detection, as a simple keyword detector would end up flagging huge numbers of well-intentioned individuals as problems!
In contrast, clips in which the streamer was communicating directly with their audience — using terms like “welcome,” “hello there,” or “ask chat” — were substantially less toxic than average, with only 1.4% of such engagement-oriented clips flagged as toxic. This makes some intuitive sense — streamers who are engaging with their audience are naturally going to be somewhat more sensitive about how they express themselves. Further, it’s been shown that in-game, players who can interact socially and comfortably with others are less likely to become toxic than those who feel isolated. It’s easy to imagine this same principle applying to streamers, where those with more ability to interact positively with their community tend to grow less frustrated and aggressive.
Twitch recently announced an update to its terms of service, banning additional words like “simp” when used as an insult. So we performed a quick analysis on these words as well, and found a fairly low incidence rate — roughly 200 clips out of a million included ‘simp’ from our sample. (Contrast this with the F word, which had a frequency nearly one hundred times higher!) This is good news for Twitch, as it suggests that ‘simp’ is already fairly rarely used (at least, on the live stream platforms we’re listening to) — though if they actually want to crack down on these remaining usages, they’ll need a good tool that can sift through lots of audio and quickly find these instances. (*cough cough*.)
Once again, these results are only the very beginning of what we can learn — and we’re always thinking about ways to collect additional data that could help us understand the nature and extent of toxicity across the gaming industry. We hope that putting these statistics out there will create a strong reference point to compare against over time, highlighting the progress that we know many studios are extremely focused on making — and of course, we’re doing all we can to contribute directly to that progress as well.
Want to learn more about how Modulate can analyze voice data for toxic or disruptive behavior in a way that’s reliable, customizable, and inexpensive? Reach out to us at email@example.com, or check out our ToxMod service here.