What question did this study set out to answer?

The aim is to evaluate the trustworthiness of websites to combat misinformation and support fact-checking efforts.

March 12, 2026Open Access

A data-driven approach to supporting fact-checking and mitigating misinformation and disinformation through domain quality evaluation

Key Points

The aim is to evaluate the trustworthiness of websites to combat misinformation and support fact-checking efforts.
Developed a machine-learning system to assess website quality
Utilized a dataset of expert-rated domains for model training
Applied supervised regression to predict credibility scores for unseen domains
Conducted feature importance analysis to identify key indicators
Achieved moderate performance on test data and high performance on independent datasets
Identified PageRank-based features as most impactful in reducing prediction error
Enabled continuous assessment of thousands of domains with minimal manual effort

Abstract

Abstract Misinformation and disinformation spread rapidly on social media, threatening public discourse, democratic processes, and social cohesion. One promising strategy to address these challenges is to evaluate the trustworthiness of entire domains (source websites) as a proxy for content credibility. This approach demands methods that are both scalable and data-driven. However, current solutions such as NewsGuard and Media Bias/Fact Check (MBFC) rely on expert assessments, cover only a limited number of domains, and some (e.g., NewsGuard) require paid subscriptions. These constraints limit their usefulness for large-scale research. This study introduces a machine-learning-based system designed to assess the quality and trustworthiness of websites. We propose a data-driven approach that leverages a large dataset of expert-rated domains to predict credibility scores for previously unseen domains using domain-level features. Our supervised regression model achieves moderate performance on test data and high performance on independent datasets, highlighting its ability to generalize to unseen domains. Using feature importance analysis, we found that PageRank-based features provided the greatest reduction in prediction error, suggesting that link-based indicators play a central role in domain trustworthiness. The solution’s scalable design accommodates the continuously evolving nature of online content, ensuring that evaluations remain timely and relevant. The framework enables continuous assessment of thousands of domains with minimal manual effort. This capability allows stakeholders (social media platforms, media monitoring organizations, content moderators, and researchers) to allocate resources more efficiently, prioritize verification efforts, and reduce exposure to questionable sources.

Bookmark

View Full Paper

Bookmark

View Full Paper

A data-driven approach to supporting fact-checking and mitigating misinformation and disinformation through domain quality evaluation

Key Points

Abstract

Cite This Study