January 12, 2024

A Literature Survey on Open Source Large Language Models

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Since the 1950s, post the Turing test, humans have been striving hard to make machines learn the art of mastering linguistic intelligence. Language being a complex and intricate tool of expression used by humans, poses a large number of challenges for AI enabled algorithms to grasp its understanding in entirety. Over the past few years, a chain of efforts have been made to make machines understand linguistic intricacies. Small scale models such as BERT and pre-trained language models (PLMs) have demonstrated strong capabilities in understanding and solving various language based tasks. Over the period of years, it is also observed that by increasing the parameters scale to larger size, large language models show a significant improvement in performance and showcase abilities to understand context. For the PLMs of a humongous size i.e in the tune of tens or hundreds of billions of parameters, and to understand the large parametric scales, the scientific community introduced the term LLMs - large language models. The whole world witnessed the launch and quick adoption of ChatGPT, an AI chatbot built on LLMs. As the usage of AI algorithms changes the way the scientific community, society and industry works, it is imperative to review the advances of LLMs. Since 2022, almost daily nearly a dozen LLMs are released. These LLMs are categorized as open and closed source. This paper aims to focus on major aspects of open source LLMs - pre-training covering data collection and pre-processing, model architecture and training. We will select open source models released in June, July and August 2023 with training parameters greater than 70 billion parameters and provide a comprehensive survey on the mentioned aspects. As new models are released on daily / weekly basis in the LLM space, in order to keep the survey concise and targeted to important models, we chose to select time-box of 3 months and a large parameter range of 70 billion in our literature survey. We will also cover historical evolution of LLMs and list open items for future directions.

Preguntar a la IA

Me gusta

Guardar