AIS Research Insights

← Back to Library Language:

Label Error Detection in Defect Classification using Area Under the Margin (AUM) Ranking on Tabular Data

Pavlos Rath-Manakidis, Kathrin Nauth, Henry Huick, Miriam Fee Unger, Felix Hoenig, Jens Poeppelbuss, and Laurenz Wiskott

This study introduces an efficient method using Area Under the Margin (AUM) ranking with gradient-boosted decision trees to detect labeling errors in tabular data. The approach is designed to improve data quality for machine learning models used in industrial quality control, specifically for flat steel defect classification. The method's effectiveness is validated on both public and real-world industrial datasets, demonstrating it can identify problematic labels in a single training run.

Problem Automated surface inspection systems in manufacturing rely on machine learning models trained on large datasets. The performance of these models is highly dependent on the quality of the data labels, but errors frequently occur due to annotator mistakes or ambiguous defect definitions. Existing methods for finding these label errors are often computationally expensive and not optimized for the tabular data formats common in industrial applications.

Outcome - The proposed AUM method is as effective as more complex, computationally expensive techniques for detecting label errors but requires only a single model training run.
- The method successfully identifies both synthetically created and real-world label errors in industrial datasets related to steel defect classification.
- Integrating this method into quality control workflows significantly reduces the manual effort required to find and correct mislabeled data, improving the overall quality of training datasets and subsequent model performance.
- In a real-world test, the method flagged suspicious samples for expert review, where 42% were confirmed to be labeling errors.

Label Error Detection, Automated Surface Inspection System (ASIS), Machine Learning, Gradient Boosting, Data-centric AI