Introduction This thesis evaluates VaultGemma and its ability to detect toxic language, and how its ability differs compared to Gemma 3 and Perspective API, as well as how the performance gets affected by the prompting method. Research Question The two research questions are “How does the classification differ between zero-shot prompting and few- shot prompting?” and “How does VaultGemma’s classification of text compare to Gemma 3 1B and Perspective API?” Method The two verbalizer sets Toxic/Non-toxic and Toxic/Neutral were run on VaultGemma and Gemma 3 1B with both zero-shot and few-shot prompting on the two datasets, SU and HF. The output of the models were normalized and analyzed to gather accuracy, precision, recall and F1 score. Results The results of the study indicate that there are both similarities and differences between the three models and their ability to detect toxic language. Our findings primarily show that VaultGemma’s ability improves with few-shot prompting. Results also show that the datasets and differences in verbalizers have a large effect on the models, making them perform differently compared to each other. Discussion The findings show that few-shot prompting performs better than zero-shot prompting for VaultGemma. The difference between the three models does not prove a result that leads to a strong conclusion. The findings also show the possibilities of using VaultGemma to detect toxic language in texts. Further research is needed to draw stronger conclusions on VaultGemma and the models’ ability on the matter.
Wiman et al. (Thu,) studied this question.