Wednesday, June 4, 2025
LBNN
  • Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • Documentaries
No Result
View All Result
LBNN

Using a multilingual dataset to enhance hateful video detection on YouTube and Bilibili

Simon Osuji by Simon Osuji
October 21, 2024
in Artificial Intelligence
0
Using a multilingual dataset to enhance hateful video detection on YouTube and Bilibili
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


Using MultiHateClip to enhance hateful video detection on YouTube and Bilibili
Credit: Zero Crossing Rate of English YouTube videos. Y-axis: Zero Crossing Indicator, X-axis: Time(sec.)

Social media has revolutionized the way information is shared throughout communities, but it can also be a cesspit of hateful content. Current research on hateful content detection focuses on text-based analysis, while hateful video detection remains underexplored.

Related posts

AI deployemnt security and governance, with Deloitte

AI deployemnt security and governance, with Deloitte

June 4, 2025
AI enables shift from enablement to strategic leadership

AI enables shift from enablement to strategic leadership

June 3, 2025

“Hate speech in videos can be conveyed through body language, tone, and imagery, which traditional text analysis misses. As platforms like YouTube and TikTok reach large audiences, hateful content in video form can be more persuasive and emotionally engaging, increasing the risk of influencing or radicalizing viewers,” explained Roy Lee, Assistant Professor at the Singapore University of Technology and Design (SUTD).

In his paper “MultiHateClip: A multilingual benchmark dataset for hateful video detection on YouTube and Bilibili,” Asst Prof Lee led a team to develop MultiHateClip, a novel multilingual dataset that aims to enhance hateful video detection on social media platforms. He previously created SGHateCheck, a novel functional test evaluating hate speech in multilingual environments. The study is published on the arXiv preprint server.

Using hate lexicons and human annotations focused on gender-based hate, MultiHateClip classifies videos into three categories: hateful, offensive, and normal. Hateful content involves discrimination against a specific group of people based on specific attributes such as sexual orientation.

Offensive content is distressing, but lacks the targeted harm of hate speech and does not incite hatred. Normal content is neither hateful nor offensive. Compared to a binary classification (hateful versus non-hateful), this three-category system allows for a more nuanced approach to content moderation.

After reviewing over 10,000 videos, the team curated 1,000 annotated short clips each from YouTube and Bilibili to represent the English and Chinese languages, respectively, for MultiHateClip. Among these clips, a consistent pattern of gender-based hate against women emerged. Most of these videos used a combination of text, visual, and auditory elements to convey hate, underscoring the need for a multimodal approach to understanding hate speech.

Compared to existing datasets that are simpler and lack detail, MultiHateClip is enriched with fine-grained, comprehensive annotations. It distinguishes between hateful and offensive videos, and outlines which segments of the video are hateful, who the targeted victims are, and what modalities portray hatefulness (i.e., text, visual, auditory). It also provides a strong cross-cultural perspective as it includes videos from both Western (YouTube) and Chinese (Bilibili) contexts, highlighting how hate is expressed differently across cultures.

The team expected distinguishing hateful videos from offensive ones to be difficult as both share similarities, such as inflammatory language and controversial topics. Hateful speech targets specific groups, while offensive content causes discomfort without intending to discriminate. The subtle differences in tone, context, and intent make it challenging for human annotators and machine learning models to draw the line between hateful and offensive content.

“Additionally, cultural and linguistic nuances further complicate the distinction, particularly in multilingual contexts like English and Chinese, where expressions of hate or offense may vary significantly. This complexity underscores the need for more sophisticated detection models that can capture subtle distinctions,” emphasized Asst Prof Lee.

The study also tested state-of-the-art hate video detection models with MultiHateClip. Results highlighted three critical limitations in current models: the difficulty in distinguishing between hateful and offensive content, the limitations of pre-trained models in handling non-Western cultural data, and the insufficient understanding of implicit hate. These gaps emphasize the need for culturally sensitive and multimodal approaches to hate speech detection.

MultiHateClip reflects the value of intersecting design, artificial intelligence, and technology. Its real-world significance is clear—to detect hate speech and prevent its dissemination. Optimized for video content, the model has a cross-cultural focus and is especially useful on social media platforms where videos are the primary form of communication, such as YouTube, TikTok, and Bilibili. Content moderators, policymakers, and educational organizations will benefit from using MultiHateClip to understand and mitigate the spread of hate speech.

“Overall, MultiHateClip plays a crucial role in creating safer, more inclusive online environments,” said Asst Prof Lee, who shared the possibility of collaborating with social media platforms to deploy the model in real-world settings. In addition, the team could potentially look into broadening the dataset to include more languages and cultural contexts, improving model performance by creating better algorithms that can distinguish between hateful and offensive content, and developing real-time hate speech detection tools.

More information:
Han Wang et al, MultiHateClip: A Multilingual Benchmark Dataset for Hateful Video Detection on YouTube and Bilibili, arXiv (2024). DOI: 10.48550/arxiv.2408.03468

Journal information:
arXiv

Provided by
Singapore University of Technology and Design

Citation:
Using a multilingual dataset to enhance hateful video detection on YouTube and Bilibili (2024, October 21)
retrieved 21 October 2024
from https://techxplore.com/news/2024-10-multilingual-dataset-video-youtube-bilibili.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no
part may be reproduced without the written permission. The content is provided for information purposes only.





Source link

Previous Post

SSEN lays out plans to deliver 1,000 homes in North Scotland

Next Post

LIVE! Vice President Kamala Harris in Oakland County, Michigan

Next Post
LIVE! Vice President Kamala Harris in Oakland County, Michigan

LIVE! Vice President Kamala Harris in Oakland County, Michigan

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RECOMMENDED NEWS

Brother to Dr Richard Kariuki

Brother to Dr Richard Kariuki

7 months ago
Congo Energy & Investment Forum (CEIF) 2025 to Position Floating Liquefied Natural Gas (FLNG) as a Catalyst for Gas Monetization

Congo Energy & Investment Forum (CEIF) 2025 to Position Floating Liquefied Natural Gas (FLNG) as a Catalyst for Gas Monetization

4 months ago
AI Makes Better Organic Solar Cells

AI Makes Better Organic Solar Cells

9 months ago
The 10 Best and Worst States to Start a Small Business

The 10 Best and Worst States to Start a Small Business

12 months ago

POPULAR NEWS

  • Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    0 shares
    Share 0 Tweet 0
  • When Will SHIB Reach $1? Here’s What ChatGPT Says

    0 shares
    Share 0 Tweet 0
  • Matthew Slater, son of Jackson State great, happy to see HBCUs back at the forefront

    0 shares
    Share 0 Tweet 0
  • Dolly Varden Focuses on Adding Ounces the Remainder of 2023

    0 shares
    Share 0 Tweet 0
  • US Dollar Might Fall To 96-97 Range in March 2024

    0 shares
    Share 0 Tweet 0
  • Privacy Policy
  • Contact

© 2023 LBNN - All rights reserved.

No Result
View All Result
  • Home
  • Business
  • Politics
  • Markets
  • Crypto
  • Economics
    • Manufacturing
    • Real Estate
    • Infrastructure
  • Finance
  • Energy
  • Creator Economy
  • Wealth Management
  • Taxes
  • Telecoms
  • Military & Defense
  • Careers
  • Technology
  • Artificial Intelligence
  • Investigative journalism
  • Art & Culture
  • Documentaries
  • Quizzes
    • Enneagram quiz
  • Newsletters
    • LBNN Newsletter
    • Divergent Capitalist

© 2023 LBNN - All rights reserved.