Overview

The Moderation feature provides a comprehensive suite of capabilities designed to manage and enforce message moderation rules across various types of messages, ensuring your platform remains safe and compliant for all users. These capabilities include rule management for creating, updating, and deleting moderation rules, as well as keyword lists for detecting inappropriate content. Additionally, automated actions promptly address potential violations, and detailed reports on blocked messages help to continuously improve safety and compliance measures. By leveraging these robust functionalities, you can effectively maintain a secure and welcoming environment on your platform.

Here’s an in-depth look at the key functionalities provided by the Moderation Service:

Rules Management

This feature enables you to define and manage a set of moderation rules tailored to address inappropriate messages under various conditions. You can establish specific criteria that determine what constitutes unacceptable behavior or content, such as the use of offensive language, unsafe content, or sharing sensitive information.

By customizing these rules, you ensure that the moderation system effectively identifies and manages messages that violate your platform's standards, thereby maintaining a safe and respectful environment for all users. The ability to manage these rules includes adding new rules, updating existing ones, and removing obsolete rules, providing a flexible and dynamic approach to message moderation.

For more detailed management, refer to the rules management section.

Lists Management

This feature allows you to create and manage comprehensive lists of keywords or regex patterns that are used for message moderation. These lists serve as a vital component in identifying and handling inappropriate content. You can customize these lists to include specific terms and patterns that are relevant to your platform's moderation needs.

Once created, these keyword lists can be linked to various moderation rules when creating or updating rules, ensuring that the moderation system effectively detects and manages content that violates your standards. The ability to manage these lists includes adding new keywords or patterns or sentences, updating existing ones, and removing those that are no longer relevant, providing a flexible and responsive approach to message moderation.

For more detailed management, refer to the lists management section.

Blocked Messages

This feature allows you to retrieve all the violated messages. You can retrieve a comprehensive list of messages that have been blocked due to violations of moderation rules. Additionally, you can perform searches within this list to find specific messages or filter results based on date ranges and find details for the violation. This functionality helps you effectively monitor and manage inappropriate content, ensuring a safe and compliant environment on your platform.

For more details, refer to the Blocked Messages section.

Overview of Moderation Rules

Our platform offers a wide range of moderation rules to help you detect and manage various types of risky, sensitive, or inappropriate content. Below is an overview of the available rules categorized by content type:

🚩 Message Moderation Rules

Name	Description
Word Pattern Match	Identifies profane or offensive words using word matching.
Contact Details Removal	Detects and removes phone numbers from text.
Email Detection	Detects and removes email addresses from messages.
Spam Detection (English)	Detects spam messages in English.
Scam Detection (English)	Detects scam or fraudulent text in English.
Platform Circumvention (English)	Identifies attempts to bypass platform rules.
Toxicity Detection (English)	Detects toxic or harmful language in text.
Explicit or Inappropriate Content Prompt	Detects explicit sexual descriptions, graphic violence, or other unsuitable text.
Privacy and Sensitive Info Prompt	Identifies sensitive personal information shared without consent.
Hate and Harassment Prompt	Detects hateful or harassing language toward individuals or groups.
Self-Harm or Suicidal Content Prompt	Detects content suggesting self-harm or suicidal thoughts.
Impersonation or Fraud Prompt	Detects deceptive attempts to impersonate individuals or organizations.
Violent or Terroristic Threats Prompt	Detects content promoting violence or extremism.
Non-Consensual Sexual Content Prompt	Detects sexual exploitation, grooming, or non-consensual content.
Spam and Scam Prompt	Identifies spam, phishing attempts, and fraudulent schemes.

🖼️ Image Moderation Rules

Name	Description
Unsafe & Prohibited Content	Detects unsafe or prohibited content in images.
Terrorism or Extremist Promotion Prompt	Detects extremist propaganda, terrorist symbols, or images promoting violent ideologies.
Minor Safety and Exploitation Prompt	Detects child sexual content or exploitative imagery of minors.
Self-Harm or Suicidal Content Prompt	Detects imagery suggesting self-harm or suicidal ideation.
Privacy or Personal Data Prompt	Identifies images containing personal or sensitive data.
Graphic Violence or Gore Prompt	Detects images of extreme violence or gore.
Explicit or Sexual Content Prompt	Identifies nudity, explicit sexual content, or suggestive imagery.
Hate or Harassment Prompt	Detects hate symbols, harassment, or extremist imagery.
Fraud or Scam Indicators Prompt	Flags manipulated or fraudulent images, such as fake IDs.

🎥 Video Moderation Rules

Name	Description
Unsafe & Prohibited Content	Detects unsafe or prohibited content in video files.

Chat & Calling

Extend

UI Kits

SDKs

Widgets

APIs

Rules Management​

Lists Management​

Blocked Messages​

Overview of Moderation Rules​

🚩 Message Moderation Rules​

🖼️ Image Moderation Rules​

🎥 Video Moderation Rules​