This work introduces, LLaMAntino-3-ANITA-8B-Inst-DPO-ITA, a Large Language Model (LLM) adapted for the Italian language based on the Meta-AI LLaMA-3 model family. The original 8B parameter instruction-tuned model is first fine-tuned using the Supervised Fine-tuning (SFT) technique on English datasets to enhance its baseline performance on instruction tasks. Subsequently, a Direct Preference Optimization (DPO) process is applied to align preferences, mitigate unsafe responses, and limiting biases. In the final stage, the model is adapted to the Italian language using a limited amount of high quality Italian language data. This methodology combines the efficiency of QLoRA, for fine-tuning on a smaller portion of the original model weights, with DPO to refine the model's output, adapting the model to the Italian linguistic structure while maintaining computational efficiency. Evaluation on Open LLM benchmarks for both Italian and English languages confirms the model's effectiveness, achieving state- of-the-art performance among Italian LLMs with an average accuracy score of 0.6160 on various Italian text comprehension and question-answering tasks. The model is released via the HuggingFace hub, with usage examples available in the GitHub repository.
Polignano et al. (Tue,) studied this question.