BentoML has rolled out the open-source framework, llm-optimizer. This framework is specifically crafted to streamline the benchmarking and performance optimization process for self-hosted large language models (LLMs). In the realm of LLM deployment, a significant hurdle has been the quest to identify the perfect configuration that balances latency, throughput, and cost. Traditionally, this has relied heavily on time-consuming and inefficient manual trial-and-error methods. However, llm-optimizer steps in to offer a structured methodology for delving into LLM performance. By employing systematic benchmarking and automated search techniques, it eradicates the need for repetitive guesswork, paving the way for more efficient and effective LLM deployment.