What question did this study set out to answer?

This research aims to enhance the robustness assessment of safety-critical deep learning systems using metamorphic testing.

April 25, 2026Open Access

Traceable Metamorphic Test Cases for Robust Safety‐Critical Systems: A Deep Learning LiDAR Object Detector Example

Puntos clave

This research aims to enhance the robustness assessment of safety-critical deep learning systems using metamorphic testing.
Developed traceable metamorphic relations (MR) linked to defect hypotheses for LiDAR object detectors.
Executed experiments on the nuScenes dataset with three different object detectors, producing 3.9 million test verdicts.
Prioritized safety-critical failures to reduce 685,000 observed failures to 5,397.
The defect-based MR identified 0.7 million test failures, demonstrating effective robustness evaluation.
Prioritizing critical failures led to a reduction in failures by 127-fold, focusing on the most impactful safety issues.

Resumen

ABSTRACT Assessing the robustness of safety‐critical deep learning (DL) systems is of utmost importance, as these systems can cause harm when deployed in the real world. Metamorphic testing (MT) is one commonly used method to evaluate the robustness of DL systems, as it does not require expensive labelled ground truth data. This paper tackles two challenges: (1) One challenge in regulated domains such as the automotive industry is to provide a traceable argumentation of why a certain metamorphic relation (MR) was chosen. We adopt the idea of defect‐based testing to MT and argue that an MR is traceable if it can be linked to a defect hypothesis. We demonstrate how to assess the robustness of safety‐critical DL systems using the example of LiDAR object detectors. To this end, we create three new MRs for the LiDAR domain and identify five MR that can be reused by adapting them from related domains. Our experiments on the nuScenes dataset with three different object detectors produce 3.9 million test verdicts, of which 0.7 million are test failures. This shows that our defect‐based MR effectively uncover failures. (2) A second challenge resulting from executing numerous metamorphic test cases is that MT can lead to the generation of an impractically high number of failures. We show how to prioritize the most critical failures, such as failures that occur close to the ego vehicle. By prioritizing, we reduced the observed 685,000 failures to 5397 safety‐critical failures corresponding to a 127‐fold reduction.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo