Skip to main content

Overview

The Moderation feature provides a comprehensive suite of capabilities designed to manage and enforce message moderation rules across various types of messages, ensuring your platform remains safe and compliant for all users. These capabilities include rule management for creating, updating, and deleting moderation rules, as well as keyword lists for detecting inappropriate content. Additionally, automated actions promptly address potential violations, and detailed reports on blocked messages help to continuously improve safety and compliance measures. By leveraging these robust functionalities, you can effectively maintain a secure and welcoming environment on your platform.

Here’s an in-depth look at the key functionalities provided by the Moderation Service:

Rules Management

This feature enables you to define and manage a set of moderation rules tailored to address inappropriate messages under various conditions. You can establish specific criteria that determine what constitutes unacceptable behavior or content, such as the use of offensive language, unsafe content, or sharing sensitive information.

By customizing these rules, you ensure that the moderation system effectively identifies and manages messages that violate your platform's standards, thereby maintaining a safe and respectful environment for all users. The ability to manage these rules includes adding new rules, updating existing ones, and removing obsolete rules, providing a flexible and dynamic approach to message moderation.

For more detailed management, refer to the rules management section.

Lists Management

This feature allows you to create and manage comprehensive lists of keywords or regex patterns that are used for message moderation. These lists serve as a vital component in identifying and handling inappropriate content. You can customize these lists to include specific terms and patterns that are relevant to your platform's moderation needs.

Once created, these keyword lists can be linked to various moderation rules when creating or updating rules, ensuring that the moderation system effectively detects and manages content that violates your standards. The ability to manage these lists includes adding new keywords or patterns or sentences, updating existing ones, and removing those that are no longer relevant, providing a flexible and responsive approach to message moderation.

For more detailed management, refer to the lists management section.

Blocked Messages

This feature allows you to retrieve all the violated messages. You can retrieve a comprehensive list of messages that have been blocked due to violations of moderation rules. Additionally, you can perform searches within this list to find specific messages or filter results based on date ranges and find details for the violation. This functionality helps you effectively monitor and manage inappropriate content, ensuring a safe and compliant environment on your platform.

For more details, refer to the Blocked Messages section.


Overview of Moderation Rules

Our platform offers a wide range of moderation rules to help you detect and manage various types of risky, sensitive, or inappropriate content. Below is an overview of the available rules categorized by content type:


🚩 Message Moderation Rules

NameDescription
Word Pattern MatchIdentifies profane or offensive words using word matching.
Contact Details RemovalDetects and removes phone numbers from text.
Email DetectionDetects and removes email addresses from messages.
Spam Detection (English)Detects spam messages in English.
Scam Detection (English)Detects scam or fraudulent text in English.
Platform Circumvention (English)Identifies attempts to bypass platform rules.
Toxicity Detection (English)Detects toxic or harmful language in text.
Explicit or Inappropriate Content PromptDetects explicit sexual descriptions, graphic violence, or other unsuitable text.
Privacy and Sensitive Info PromptIdentifies sensitive personal information shared without consent.
Hate and Harassment PromptDetects hateful or harassing language toward individuals or groups.
Self-Harm or Suicidal Content PromptDetects content suggesting self-harm or suicidal thoughts.
Impersonation or Fraud PromptDetects deceptive attempts to impersonate individuals or organizations.
Violent or Terroristic Threats PromptDetects content promoting violence or extremism.
Non-Consensual Sexual Content PromptDetects sexual exploitation, grooming, or non-consensual content.
Spam and Scam PromptIdentifies spam, phishing attempts, and fraudulent schemes.

🖼️ Image Moderation Rules

NameDescription
Unsafe & Prohibited ContentDetects unsafe or prohibited content in images.
Terrorism or Extremist Promotion PromptDetects extremist propaganda, terrorist symbols, or images promoting violent ideologies.
Minor Safety and Exploitation PromptDetects child sexual content or exploitative imagery of minors.
Self-Harm or Suicidal Content PromptDetects imagery suggesting self-harm or suicidal ideation.
Privacy or Personal Data PromptIdentifies images containing personal or sensitive data.
Graphic Violence or Gore PromptDetects images of extreme violence or gore.
Explicit or Sexual Content PromptIdentifies nudity, explicit sexual content, or suggestive imagery.
Hate or Harassment PromptDetects hate symbols, harassment, or extremist imagery.
Fraud or Scam Indicators PromptFlags manipulated or fraudulent images, such as fake IDs.

🎥 Video Moderation Rules

NameDescription
Unsafe & Prohibited ContentDetects unsafe or prohibited content in video files.