Prompt engineering involves manual design and optimization of text-based instructions or queries, enabling precise control over outputs generated by pre-trained large language models (LLMs) and ensuring alignment with desired responses. However, substantial computational costs and energy footprint of prompt inferencing process remain critical challenges while building generative AI applications. The energy efficiency of LLM inferences is particularly impacted by suboptimal prompts, which may require multiple iterations, thereby escalating energy consumption and the associated carbon footprint. To address these challenges, we propose a series of practices and guidelines designed to enhance the likelihood of obtaining desired responses from LLMs with minimal reiterations. Empirical evaluation demonstrates that, across a range of LLMs and test scenarios, energy consumption and corresponding operational greenhouse gas emissions were reduced by 32–48% when best practices were applied. Drawing upon these insights, our proposed best practices can be seamlessly integrated into the design frameworks of generative AI applications, thereby enhancing the energy efficiency of prompt inferencing. By addressing the challenge of establishing a cohesive framework for energy-efficient prompt design and inferencing, this paper advocates for the sustainable and effective deployment of generative AI technologies. • Optimized prompting reduces LLM inference energy and CO 2 emissions by 32–48% • Introduces six Green Prompt Engineering best practices for energy-efficient AI • Demonstrates model-agnostic savings across GPT-4o, Cohere, and Mistral-7B • Shows prompt design as a software-only lever for sustainable GenAI deployment • Provides an enterprise-ready framework for scalable, low-carbon AI inference
Podder et al. (Sun,) studied this question.