How Data is Shaping T&S
Trust and safety is evolving rapidly, and data is playing a pivotal role in driving these changes. It is the foundation of AI that enables us to moderate more effectively, proactively, and comprehensively. It underlies a new level of transparency about how content is reviewed and actioned. Still, there are challenges – ethical and operational – that must be balanced.
Industry Trends
Data is fundamental to our modern AI tools
It's undeniable that AI and LLMs are transforming moderation. Today's AI-enabled tools can handle the frontline moderation tasks, including flagging explicit content of various types from text to images and video.
These tools allow us to scale moderation more efficiently, especially given the exponential growth of UGC. They also bring additional benefits. Most importantly, by using AI as the moderation frontline, we can protect our moderators by drastically reducing the harmful content they review.
Training these models requires large, diverse, high-quality datasets. Quality is especially critical to ensure the proper training of the models and help eliminate much bias. Ongoing refinement of that data is also necessary to tighten the feedback loop and to train the next generation of models.
Data is fueling the shift to proactive moderation
Historically, T&S relied heavily on reactive measures – so-called post moderation such as user reports – to identify harmful content. Of course, content that is public prior to review poses a greater risk depending on your time to action (TTA).
Nowadays, large-scale datasets enable proactive anticipation of violations by examining patterns of user activity, content, and metadata. We're not talking about preventing violations—the goal isn't to realize the idea of pre-crime fictionalized in the movie Minority Report. Rather, we are talking about being able to call attention to users and patterns that are more likely to become problematic. Knowing where issues are likely to occur makes post-moderation more efficient, thus shortening TTA and reducing overall risk.
Data is enabling us to focus on actors and groups
The world of content moderation has traditionally focused on content: the UGC that individuals actually post. Up to a certain point, human moderators using the available technology have managed to identify and monitor bad actors and groups.
But with modern AI able to analyze vast quantities of data, including IP addresses and posting frequency, it's now much easier to identify and track these bad actors at scale. Not only that, data empowers AI to see the potential connections between individuals and identify coordinated violations by multiple actors and even non-human actors such as bot farms.
Data is allowing a new level of regulatory transparency
The abundance and availability of data surrounding things like moderation actions is paving the way for more accountability. Perhaps the best example is the EU's Digital Services Act (DSA). Adopted in 2022, with enforcement beginning in 2024, the DSA proscribes how larger online services must manage content, protect users, and disclose operational details.
Data is central to the DSA's structure, implementation, and enforcement. From annual risk assessments to transparency reporting and accountability, the institutions to which these regulations apply must rely heavily on their internal data to stay in compliance.
Challenges Remain
Data is certainly shaping the trust and safety industry at an unprecedented speed. That isn't to say there aren't speedbumps.
First and foremost is the question of privacy. On the one hand, companies and regulators must consider individuals' privacy. Questions about what data is being collected, how it is being used, and whether and how it is being stored are critical. Legislation protecting data privacy, such as GDPR and CCPA, has something to say on these matters. However, companies are also rightly concerned about their data. Communities, companies, and governments are grappling with balancing privacy concerns with the real benefits of leveraging data.
Some are concerned about data quality and use. Poor-quality data can introduce bias into training models. Even assuming AI models are trained properly on good data, some have raised ethical concerns about using data to "profile" users.
Finally, leveraging all this data, especially at scale, incurs very real costs and creates operational challenges. Collecting, using, and storing that data is expensive and technically non-trivial.
So, while data enables a more efficient, proactive, and transparent trust and safety methodology on the one hand, it's simultaneously creating new complexities concerning privacy, bias, and operations on the other.
For almost two decades, we've used data to improve trust and safety outcomes for our clients across all industries. We do this by customizing your trust and safety program and services based on your data, specifically to your business goals and regional legal obligations.