Key points are not available for this paper at this time.
The exponential rise of daily emails raises concerns about spam, which can be intrusive and harmful to user data. Effective email classification is crucial to address this issue. This study proposes a system using the DistilBERT model to identify spam and non-spam (ham) emails. We leverage distributed training with Hugging Face's Accelerate library to significantly reduce training time. Compared to a non-distributed approach, this method achieves a 46.39% reduction in training time while maintaining 96% accuracy. We recommend exploring multi-GPU training in future work for further efficiency gains.
Padilla et al. (Fri,) studied this question.