FinOps for GenAI: The Hidden Cost Saving You're Overlooking

Date Published

Categories

FinOps
Cover image for FinOps for GenAI: The Hidden Cost Saving You're Overlooking

The rise of Generative AI has brought unprecedented capabilities to enterprises, with Model Context Protocol (MCP) and tool-enabled AI becoming mainstream. Large Language Models (LLMs) are now acting as powerful reasoning engines, making decisions and taking actions based on responses from integrated tools and APIs. This empowers AI agents to perform complex tasks, but it also introduces a significant, often overlooked, cost driver: token consumption from API outputs.

While much of the FinOps conversation around GenAI rightly focuses on model selection (e.g., choosing a smaller, cheaper model when appropriate) and prompt engineering (crafting concise, effective prompts), there's a crucial element that can silently inflate your bills: unoptimised API responses.

The Hidden Cost of 'Data Dumps'

Here's the often-missed point: every piece of information injected into an LLM call – whether it's part of your initial prompt or the voluminous output from an external tool's API – is counted as an input token. And those tokens cost you money, directly charged by your LLM provider.

It's not just about the financial drain. Including unnecessary information, or 'junk data', in the LLM's input (the combined prompt and tool output) can also reduce the AI's accuracy and reasoning capabilities. The LLM might get overwhelmed by irrelevant details, potentially leading to suboptimal or incorrect responses.

An AI agent might perform beautifully at first, demonstrating impressive capabilities. However, as it scales and interacts with more backend systems, you may suddenly find your cloud bill skyrocketing. This sharp increase is often due to the unchecked volume of API response data being fed into the LLM.

A Tangible Example: Unlocking 80% Savings

Let me illustrate this with a practical example. Imagine your AI agent needs to retrieve the Zoom meeting recording list. If we simply dump the entire, verbose JSON output from the Zoom API directly into the LLM's input, it could easily consume hundreds or even thousands of tokens. This unparsed output often contains numerous empty fields, metadata, and redundant information that the LLM doesn't need for its reasoning.

Now, consider this: with the same raw API output, we pass it through a simple parser or pre-processing step that intelligently filters out all empty fields, irrelevant metadata, and only extracts the truly pertinent information required for the LLM's task. This seemingly minor step can have a dramatic impact. For instance, we've seen cases where this simple parsing can reduce the required tokens for that specific API output by as much as 80%. That's a huge saving that directly translates to your LLM costs!

The Human Element in AI Optimisation

 

This simple oversight, common in traditional human coding where efficiency is key, takes on magnified importance in AI development. When building AI applications, we cannot solely rely on the AI to manage its own inputs perfectly. It requires developers to think twice about the information they feed into the LLM. This careful consideration of context and data hygiene isn't just about saving money; it's a crucial step that directly influences the cost, security, and performance of your AI application. Your human insight remains paramount.

The Golden Rule: Context is King

This example highlights a fundamental FinOps principle for GenAI that goes beyond popular advice:

Always follow the rule of providing only the appropriate context as input.

While model selection (e.g., using a smaller, more cost-effective model like GPT-4o-mini for simpler tasks versus GPT-4 for complex reasoning) and meticulous prompt engineering (crafting lean, precise instructions for the LLM) are vital, they're only part of the solution. You must extend this optimisation mindset to the data flowing into the LLM from external tools.

By proactively stripping out unnecessary data from API responses, you're not just saving on token costs; you're also improving the LLM's ability to focus on truly relevant information, potentially leading to more accurate and efficient outputs.

So, as you deploy and scale your GenAI applications, don't overlook these often-hidden opportunities for significant cost savings. Every token counts, and smarter data preparation can make a substantial difference to your FinOps bottom line.

If you would like to learn more about our FinOps offering and how we can help you identify and capitalise on cost-saving opportunities in your GenAI initiatives, please don't hesitate to contact us. For those interested in diving deeper into FinOps for AI, we highly recommend viewing this webinar recording here:

https://webinars.reply.com/permalink/finops-for-ai-understanding-the-real-cost-of-genai

(note that a ROSE account is required to access the recording)

Derek Ho

Derek Ho

Senior AI & Cloud Consultant

Talk to us about transforming your business

Contact Us