What question did this study set out to answer?

This research aims to evaluate the performance of a fine-tuned BERT model on code-mixed data between Hausa and English.

February 26, 2026Open Access

Performance Evaluation of Hybrid Bert Model on Code-mixed for Hausa-English Using Adapted Pre-trained Data

Key Points

This research aims to evaluate the performance of a fine-tuned BERT model on code-mixed data between Hausa and English.
Developed a BERT model tailored for Hausa-English code-mixed dataset.
Pre-processed and tokenized the adapted pre-trained dataset.
Fine-tuned the model using optimization strategies including adjusted learning rate and training epochs.
The proposed HauBERT model achieved above 90% accuracy.
Evaluation metrics included accuracy, F1-score, precision, and recall for code-mixed tasks.

Abstract

This research evaluates the potentials of using BERT (Bidirectional Encoder Representations from Transformers) language model on code-mixed for English-Hausa Language code-mixed using adapted pre-trained dataset. The main aim of this research was to unveil the potential benefits of using pre-trained models for handling code-mixed data to improved language understanding and context sensitivity in relation to Hausa-English-Language, the objective of this research was achieved by developing a BERT model that is capable of handling Hausa-English code-mixed dataset exploring different machine learning language models by training the chosen model with the adapted English-Hausa Language code-mixed. What necessitates this research was due to low data corpus on the language domain of Hausa-English code-mixed while other languages were explored like English-Hindu Code-Mixed. The model was developed using python transformer library. The adapted pre-trained dataset was first pre-processed, tokenized and fine-tuned in order to fit into the BERT model, the dataset was normalized in the context of code-mixed conversation based on annotate language labels to distinguish between English and Hausa Language segments in the code-mixed text, appropriate parameter for training were set with different optimization strategies for fine-tuning, adjusted learning rate, batch sizes and training epochs for performance optimization. The model was evaluated based on accuracy, F1-score, precision and recall for Code-Mixed tasks, the results of HauBERT our proposed model showed more than 90% accuracy, the result was compared with state-of-the-art BERT language models, and the study recommended that this adapted pre-trained model should be applied in large language model for language understanding and context sensitivity.

Performance Evaluation of Hybrid Bert Model on Code-mixed for Hausa-English Using Adapted Pre-trained Data

Key Points

Abstract

Cite This Study