MultiBanFakeDetect: Integrating advanced fusion techniques for multimodal detection of Bangla fake news in under-resourced contexts
This repository contains the dataset and code for the paper "MultiBanFakeDetect: Integrating advanced fusion techniques for multimodal detection of Bangla fake news in under-resourced contexts" by Fatema Tuj Johora Faria, Mukaffi Bin Moin, Zayeed Hasan, Md. Arafat Alam Khandaker, Niful Islam, Khan Md Hasib, and M.F. Mridha.
📄 Read the full paper here:
https://www.sciencedirect.com/science/article/pii/S2667096825000291
The rise of false news in recent years poses significant risks to society. As misinformation spreads rapidly, automated detection systems are essential to mitigate its impact. However, most existing methods rely solely on textual analysis, limiting their effectiveness. The challenge is further compounded by the lack of a large-scale, multimodal dataset for Bangla fake news detection, as existing datasets are either small or unimodal. To address this, we introduce MultiBanFakeDetect, a novel multimodal dataset integrating both textual and visual information. This dataset comprises manually curated real and fake news samples from various online sources. Additionally, we propose MultiFusionFake, a hybrid multimodal fake news detection framework that fuses text and image modalities using an Early Fusion approach while also comparing Late and Intermediate fusion techniques. Our experiments show that MultiFusionFake, combining DenseNet-169 and mBERT, achieves 79.69% accuracy, outperforming the text-only mBERT model’s 73.13%, reflecting a 6.56 percentage point improvement. These results underscore the advantages of multimodal over unimodal methods. To the best of our knowledge, this is the first study on multimodal fake news detection in the under-resourced Bangla context, offering a promising approach to combating online misinformation.
- We curated a diverse dataset named MultiBanFakeDetect, comprising 9600 text–image pairs sourced from various platforms such as online forums, news websites, and social media networks. This dataset covers a wide spectrum of topics including political, social, technology, sports, and entertainment-related themes. The balanced representation of real and fake instances provides a comprehensive view of contemporary digital content, establishing a solid baseline for accurate comparison and analysis.
- We proposed the TextFakeNet framework, utilizing state-of-the-art pre-trained language models such as mBERT, XLM-RoBERTa, and DistilBERT architectures. This framework aims to enhance text-based fake news detection by fine-tuning these models on our curated Bangla dataset, MultiBanFakeDetect. By leveraging their advanced semantic and contextual understanding capabilities, TextFakeNet significantly improves the accuracy in distinguishing between genuine and fake news articles in the Bangla language context.
- We developed the MultiFusionFake framework, an early fusion approach that integrates textual and visual modalities using DenseNet 169 and mBERT. MultiFusionFake aims to synergistically combine the strengths of text and image data, demonstrating enhanced performance in the binary classification of fake (1) and non-fake (0) news instances within the Bangla language context. The performance of MultiFusionFake is compared with multiple early, late, and intermediate fusion strategies, highlighting its effectiveness over alternative approaches.
- We extended our approach to multiclass classification, categorizing news instances into four classes: clickbait, misinformation, rumor, and non-fake. We developed models using early, late, and intermediate fusion strategies to effectively combine textual and visual features. Our comparative analysis demonstrates how different fusion techniques impact classification performance, providing insights into the most effective approach for distinguishing these categories.
- To gain a deeper understanding of our system’s performance in both binary (fake vs. non-fake) and multiclass (clickbait, misinformation, rumor, and non-fake) classification, we conducted a comprehensive error analysis of the Multimodal Bangla Fake News Detection system. For binary classification, our analysis revealed challenges such as misleading text resembling factual content, visually deceptive images, and inconsistencies between textual and visual modalities. In multiclass classification, we identified overlapping characteristics between misinformation and rumors, visually ambiguous clickbait content, and subtle variations in misleading narratives. Additionally, certain cases exhibited discrepancies between text and image cues, leading to misclassifications. These insights guided us in refining our models, improving their robustness, and enhancing their ability to distinguish between different types of fake news.
Our dataset is specially designed to include both visual and textual content, enhancing both the quantity and quality of data that can be analyzed. Each instance in the collection consists of a text–image pair, where the image serves as a visual representation and the textual component acts as a caption or description. The ‘‘headline’’ and ‘‘description’’ columns in our dataset correspond to its textual component. The textual data in these areas was collected from various internet forums, social media posts, and headlines. The headline provides a brief overview of the major idea or subject matter of the related material, while the description offers more information or background. The ‘‘image_id’’ column in our dataset contains distinct IDs for every image, representing the visual component. These images are sourced from a variety of platforms and include screenshots, infographics, memes, and other visual information. The inclusion of visual material in the dataset is vital for a thorough study, given its importance in the spread and interpretation of false information. Additionally, our dataset is categorized into the following categories: Entertainment, Sports, Technology, National, Lifestyle, Politics, Education, International, Crime, Finance, Business, and Miscellaneous. Each instance is also labeled with one of the following types of fake news: misinformation, rumor, clickbait, or non-fake. The ‘‘label’’ column assigns a value of 0 for non-fake instances and 1 for fake instances. This labeling scheme allows for effective classification and analysis of the dataset.
- Misinformation: This refers to general false or misleading content that is factually incorrect or inaccurate, regardless of intent. In our classification framework, this class includes misleading content that does not fall specifically under the characteristics of rumors (speculative/unverified) or clickbait (sensationalism for attention), thereby serving as a broader umbrella for general inaccuracy.
- Rumor: Rumors are speculative or unverified claims that lack substantiated evidence at the time of dissemination. While they may eventually be validated or disproven, they are categorized based on their unconfirmed status and tendency to propagate rapidly, particularly through social networks.
- Clickbait: This category includes content that uses exaggerated, sensational, or misleading headlines primarily to attract attention and generate clicks. It often sacrifices accuracy for virality, focusing on engagement over truthful reporting.
- Non-fake: This label is assigned to instances that are verified to be accurate and truthful. These instances are not misleading or false and serve as a control group in the dataset for the purpose of comparison and analysis.
The dataset used in this research is publicly accessible through: Dataset Link
- Total instances: 9600 (4800 fake, 4800 real)
- Categories: 12 (Entertainment, Sports, etc.)
- Types: Misinformation (1611), Rumor (1518), Clickbait (1671), Non-fake (4800)
- Split: 80% train (7680), 10% validation (960), 10% test (960)
- Preprocessing: Emoji removal, whitespace handling, punctuation removal, special symbols removal.
- Models: mBERT, XLM-RoBERTa, DistilBERT fine-tuned on the dataset.
- Classification: Binary (fake/non-fake) and multiclass (clickbait, misinformation, rumor, non-fake).
- Feature Extraction: Text with BERT variants, images with CNNs (ResNet, DenseNet, MobileNet).
- Fusion Techniques:
- Early Fusion: Combine features at input level.
- Late Fusion: Combine at decision level.
- Intermediate Fusion: Combine at intermediate layers.
- Cost-sensitive learning for class imbalance.
Although our research has made strides in the field of Multimodal Bangla Fake News Detection, there are certain limitations that warrant acknowledgment. First, we did not explore the use of romanized Bangla text in our detection models. Romanized Bangla refers to Bangla (Bengali) language text written using the Roman alphabet. Incorporating this variant could offer valuable insights and potentially improve model performance by expanding the range of textual representations. Furthermore, the interpretability of our models has not been thoroughly examined. Understanding how and why a model arrives at certain predictions is crucial for ensuring transparency and building user trust. Moreover, while our work focuses primarily on static multimodal content (text and images), we did not integrate methods for real-time content analysis or network-level mitigation strategies. Additionally, our research did not incorporate audio or video-based modalities, which could enrich the analysis by capturing additional linguistic and visual cues critical for fake news detection. Expanding to these modalities would allow us to handle a broader range of misinformation formats. Lastly, while MultiBanFakeDetect includes a diverse set of topics, its size (9600 text–image pairs) and scope may limit its generalizability across the full spectrum of Bangla dialects and evolving misinformation trends. Though we have strived for representative coverage, the dataset may not fully encapsulate regional linguistic variation or emerging narratives.
In our future research, we aim to significantly enhance the robustness, accuracy, and transparency of fake news detection in Bangla. One of the key directions we plan to pursue is integrating our detection model into proactive frameworks, such as network immunization systems. By implementing early detection, we can directly support strategies to limit the spread of misinformation, thereby reducing its impact. In addition to this, we will focus on adapting our model for real-time detection in dynamic environments, particularly on social media platforms where misinformation spreads rapidly. Optimizing the system for low-latency inference without compromising accuracy will be a critical challenge in this regard. Another important direction involves integrating our detection system with community-based approaches, which will enable crowdsourced fact-checking and collective verification. This collaborative method will strengthen the credibility of the system, as it allows the public to actively participate in the verification process. To further refine our detection capabilities, we will explore the use of tree-based models to improve classification performance. These structured models will not only enhance the decision-making process but also improve the system’s interpretability and adaptability to a wide variety of content types. We will also focus on multimodal evidence retrieval by developing a re-ranking approach that utilizes both Large Language Models (LLMs) and Large Vision Language Models (LVLMs). By processing text and image-based evidence, this approach will allow for more accurate and contextually relevant fact verification. In parallel, we will work on expanding our synthetic data generation process by creating examples that closely resemble real-world fake content, including fake news articles and manipulated images. This will expose our models to a broader range of deceptive formats and improve their generalization and robustness against evolving misinformation tactics. Currently, our research focuses on categories such as misinformation, rumors, and clickbait. However, we plan to extend this taxonomy to include other forms of deception, such as satire, hoaxes, propaganda, fabricated content, manipulated media, and conspiracy theories. Each of these categories requires distinct detection strategies, and incorporating them into our system will make it more comprehensive and adaptable to the sociocultural context of Bangla. To ensure better linguistic coverage and adaptability to dynamic online content, we will expand the MultiBanFakeDetect dataset by incorporating region-specific dialects and continuously updating it with emerging misinformation trends. Transparency is a central priority in our design, and to achieve this, we will employ advanced Explainable AI (XAI) techniques. Methods such as GradCAM++, Faster ScoreCAM, and LayerCAM will be used to visualize and interpret the internal workings of our models, providing stakeholders with clear insights into how predictions are made and fostering trust in the system. Furthermore, we plan to explore various Vision Transformer (ViT) architectures, such as the Data-efficient Image Transformer (DeiT), Swin Transformer, and Swift Transformer. By leveraging self-attention mechanisms, these architectures will enhance the performance, efficiency, and interpretability of our multimodal fake news detection models. Finally, virality prediction will also be incorporated as a crucial component of our fake news detection models. By predicting the potential virality of misinformation, we can better understand how fake news spreads and its possible impact on the audience. This will allow for more proactive strategies to mitigate the harm caused by viral misinformation. These directions will guide our future research, helping us build a reliable, explainable, and scalable system for detecting fake news in the Bangla language across various formats and platforms.
If you use this dataset or code, please cite the paper:
@article{FARIA2025100347,
title = {MultiBanFakeDetect: Integrating advanced fusion techniques for multimodal detection of Bangla fake news in under-resourced contexts},
journal = {International Journal of Information Management Data Insights},
volume = {5},
number = {2},
pages = {100347},
year = {2025},
issn = {2667-0968},
doi = {https://doi.org/10.1016/j.jjimei.2025.100347},
url = {https://www.sciencedirect.com/science/article/pii/S2667096825000291},
author = {Fatema Tuj Johora Faria and Mukaffi Bin Moin and Zayeed Hasan and Md. Arafat Alam Khandaker and Niful Islam and Khan Md Hasib and M.F. Mridha},
keywords = {Fake news detection, Multimodal dataset, Textual analysis, Visual analysis, Bangla language, Under-resource, Fusion techniques, Deep learning}}