Overview
The Moderation feature provides a comprehensive suite of capabilities designed to manage and enforce message moderation rules across various types of messages, ensuring your platform remains safe and compliant for all users. These capabilities include rule management for creating, updating, and deleting moderation rules, as well as keyword lists for detecting inappropriate content. Additionally, automated actions promptly address potential violations, and detailed reports on blocked messages help to continuously improve safety and compliance measures. By leveraging these robust functionalities, you can effectively maintain a secure and welcoming environment on your platform.
Here’s an in-depth look at the key functionalities provided by the Moderation Service:
Rules Management
This feature enables you to define and manage a set of moderation rules tailored to address inappropriate messages under various conditions. You can establish specific criteria that determine what constitutes unacceptable behavior or content, such as the use of offensive language, unsafe content, or sharing sensitive information.
By customizing these rules, you ensure that the moderation system effectively identifies and manages messages that violate your platform's standards, thereby maintaining a safe and respectful environment for all users. The ability to manage these rules includes adding new rules, updating existing ones, and removing obsolete rules, providing a flexible and dynamic approach to message moderation.
For more detailed management, refer to the rules management section.
Lists Management
This feature allows you to create and manage comprehensive lists of keywords or regex patterns that are used for message moderation. These lists serve as a vital component in identifying and handling inappropriate content. You can customize these lists to include specific terms and patterns that are relevant to your platform's moderation needs.
Once created, these keyword lists can be linked to various moderation rules when creating or updating rules, ensuring that the moderation system effectively detects and manages content that violates your standards. The ability to manage these lists includes adding new keywords or patterns or sentences, updating existing ones, and removing those that are no longer relevant, providing a flexible and responsive approach to message moderation.
For more detailed management, refer to the lists management section.
Blocked Messages
This feature allows you to retrieve all the violated messages. You can retrieve a comprehensive list of messages that have been blocked due to violations of moderation rules. Additionally, you can perform searches within this list to find specific messages or filter results based on date ranges and find details for the violation. This functionality helps you effectively monitor and manage inappropriate content, ensuring a safe and compliant environment on your platform.
For more details, refer to the Blocked Messages section.
Overview of Moderation Rules
Our platform offers a wide range of moderation rules to help you detect and manage various types of risky, sensitive, or inappropriate content. Below is an overview of the available rules categorized by content type:
🚩 Message Moderation Rules
Name | Description |
---|---|
Word Pattern Match | Identifies profane or offensive words using word matching. |
Contact Details Removal | Detects and removes phone numbers from text. |
Email Detection | Detects and removes email addresses from messages. |
Spam Detection (English) | Detects spam messages in English. |
Scam Detection (English) | Detects scam or fraudulent text in English. |
Platform Circumvention (English) | Identifies attempts to bypass platform rules. |
Toxicity Detection (English) | Detects toxic or harmful language in text. |
Explicit or Inappropriate Content Prompt | Detects explicit sexual descriptions, graphic violence, or other unsuitable text. |
Privacy and Sensitive Info Prompt | Identifies sensitive personal information shared without consent. |
Hate and Harassment Prompt | Detects hateful or harassing language toward individuals or groups. |
Self-Harm or Suicidal Content Prompt | Detects content suggesting self-harm or suicidal thoughts. |
Impersonation or Fraud Prompt | Detects deceptive attempts to impersonate individuals or organizations. |
Violent or Terroristic Threats Prompt | Detects content promoting violence or extremism. |
Non-Consensual Sexual Content Prompt | Detects sexual exploitation, grooming, or non-consensual content. |
Spam and Scam Prompt | Identifies spam, phishing attempts, and fraudulent schemes. |
🖼️ Image Moderation Rules
Name | Description |
---|---|
Unsafe & Prohibited Content | Detects unsafe or prohibited content in images. |
Terrorism or Extremist Promotion Prompt | Detects extremist propaganda, terrorist symbols, or images promoting violent ideologies. |
Minor Safety and Exploitation Prompt | Detects child sexual content or exploitative imagery of minors. |
Self-Harm or Suicidal Content Prompt | Detects imagery suggesting self-harm or suicidal ideation. |
Privacy or Personal Data Prompt | Identifies images containing personal or sensitive data. |
Graphic Violence or Gore Prompt | Detects images of extreme violence or gore. |
Explicit or Sexual Content Prompt | Identifies nudity, explicit sexual content, or suggestive imagery. |
Hate or Harassment Prompt | Detects hate symbols, harassment, or extremist imagery. |
Fraud or Scam Indicators Prompt | Flags manipulated or fraudulent images, such as fake IDs. |
🎥 Video Moderation Rules
Name | Description |
---|---|
Unsafe & Prohibited Content | Detects unsafe or prohibited content in video files. |