What question did this study set out to answer?

The aim is to understand how neural networks, specifically BERT, comprehend natural language arguments and identify biases in performance.

January 1, 2019Open Access

Probing Neural Network Comprehension of Natural Language Arguments

Key Points

The aim is to understand how neural networks, specifically BERT, comprehend natural language arguments and identify biases in performance.
Analyzed BERT's performance on the Argument Reasoning Comprehension Task, finding 77% accuracy.
Investigated spurious statistical cues exploited by various models in the dataset.
Developed an adversarial dataset to evaluate model performance under unbiased conditions.
BERT's performance was just three points below the average untrained human baseline.
All models tested exploited spurious cues, resulting in misleading performance measures.
The adversarial dataset led to random accuracy across all models, indicating a lack of true comprehension.

Abstract

We are surprised to find that BERT's peak performance of 77% on the Argument Reasoning Comprehension Task reaches just three points below the average untrained human baseline. However, we show that this result is entirely accounted for by exploitation of spurious statistical cues in the dataset. We analyze the nature of these cues and demonstrate that a range of models all exploit them. This analysis informs the construction of an adversarial dataset on which all models achieve random accuracy. Our adversarial dataset provides a more robust assessment of argument comprehension and should be adopted as the standard in future work.

اسأل الذكاء الاصطناعي

Bookmark

View Full Paper