May 10, 2024

Automated Functionality and Security Evaluation of Large Language Models

Key Points

Key points are not available for this paper at this time.

Abstract

Natural language processing (NLP) is rapidly developing. A series of Large Language Models (LLMs) have emerged, represented by ChatGPT, which have made significant breakthroughs in natural language understanding and generation, enabling fluent dialogue with humans, understanding human intentions, and completing complex tasks. However, in addition to the fairness and toxicity of traditional language models, some new problems, including hallucination, have also emerged in LLMs, making them hard to use. Evaluating LLMs manually is challenging due to subjectivity and inefficiency. In this paper, we focused on the fuzzy matching, toxicity detection, and hallucination detection in the evaluation of LLMs automatically, and fine-tune the Mixtral-8x7B Model, which can be deployed in private cloud environment, and prove the effectiveness of our method through experiments.

Bookmark

Cite This Study

Ding et al. (Fri,) studied this question.

synapsesocial.com/papers/68e6ab39b6db64358762dfcd https://doi.org/https://doi.org/10.1109/smartcloud62736.2024.00014

Bookmark