Part 1 focuses on the setting the context. This article explains the code and developer tasks in detail.
Introduction
Hopefully we now have enough understanding on key GenAI and LLM concepts. We also discussed on various Gen AI LLM models and key questions for app developers to ponder on. Let’s start discussing those questions.
Development framework options
All major GenAI or LLM API providers, such as Azure, Google, and OpenAI, offer libraries that allow developers to connect to their model APIs, configure them, provide context, and more, across various programming languages. This is similar to how noSQL vendors provide binaries to perform operations. However, as a developer, you need a framework that handles much of the boilerplate code for you, manages configurations easily, and integrates seamlessly into your larger application development ecosystem.
LangChain is a widely used framework that supports various programming languages for LLM-based application development. Another one that I prefer is SpringAI. If you are a Java developer, you’ll enjoy the familiar ecosystem that SpringAI offers, as it is equivalent to SpringData.
While there are other options available, it’s best to choose a framework that easily integrates with your existing application ecosystem.
Once the framework question is settled, lets dwell more into the app development.
Building the application : Key Steps
Just a teaser: when you develop your own application using Model APIs and integrate them with other application services, it can eventually lead to the creation of automated agents. These agents can chain multiple calls to the LLM models or trigger workflows based on model responses. I believe that’s where the real fun begins—truly integrated agents with GenAI. That’s where true potential will be realized.
Coming back to key steps with Spring AI as an implementation framework,
- Create and configure a client to connect to Model APIs.
We need an API, or a client which based on the configuration properties connect to the model apis. This client becomes the one stop interface in the application to interact with the model. Perfect analogy would be SparkSession from the bigData world.
SpringAI provides ChatClient with fluent api to create prompts(input for model), advisors, functions and response handlers. As mentioned this becomes the one stop interface. It uses ChatModel which is either autoconfigured or programmatically created based on model properties defined in yaml file.
Some code samples to explain it better :
##There can be lot many configurations
spring:
main:
web-application-type: none
ai:
openai:
api-key: ${OPENAI_API_KEY}
chat:
options:
model: gpt-4o-mini
//autoconfigured chatClient with fluent api response
ChatClient.Builder builder = context.getBean(ChatClient.Builder.class);
ChatClient chatClient = builder.build();
String response = chatClient.prompt().user("Can you respond as a test ping").call().content();
- The prompt creation — user query to the model
Prompts are the user queries or inputs for AI model to process and generate the desired output. The LLM AI models works with string input and output data types. Its for programmer to represent the input and output in structured objects but for AI models its string in and string out. Prompt engineering is a very important discipline in GenAI space. A well designed prompt can make a difference from an irrelevant response to well authored context aware content.
The prompt json schema for gpt ai model looks something like this :
{
"model": "string",
"messages": [
{
"role": "string",
"content": "string"
}
],
"temperature": "number",
"max_tokens": "number",
"top_p": "number",
"frequency_penalty": "number",
"presence_penalty": "number",
"stop": ["string"]
}
Spring AI provides Prompt class which provides convenient way to create json with above schema. Spring AI Prompt message-content can be created from a template string. Programmers will be able to relate it to parameterized queries. Prompt and ChatClient fluent apis makes it readable to setup all these input arguments. Most of the parameters are easy to understand. “Temperature” here controls the degree of randomness, lower the value more deterministic the ouptut is. “Role” is to define who is sending the prompt like., user, system or assistant. “top_p” controls the probability and diversity of the response.
- The prompt and context fine tuning — ability to advise the Model
So far we have discussed patterns where the GenAI model are more or less treated as a black box. ChatClient calls apis with prompts with few response tuning parameters and accept what the model produces. As a developer, we want ability fine tune or influence the generated output as per our needs. There are few limitations with LLM models which make it mandatory. LLM have limited support when it comes to managing long running content and previous history, hard facts , data freshness ,and context-awareness. Let’s discuss various options available to advise the model to influence the response.
How to fine tune the model
The simplest option is passing the “context” with the prompt. The SpringAI ChatClient provides a fluent API to set the context, which is useful for providing background information or an initial narrative. It’s like giving a hint when you execute the query. For example, for a prompt like “Explain LLM model,” you can set the context as “LLM models are useful in summarizing text. Researchers struggle to summarize research papers.” This is equivalent to the “context” element in the GPT schema.
Contexts are good for setting up the baseline or ground level. For the next level of customization, if you want the response to be in a specific tone or include additional content, you can use the SpringAI “Advisors” API, which implements the RAG technique to enrich the user input. This maps to the “message” and “context” sections of the GPT JSON API. For example, you can add advice like “generate the response in a FAQ form” as one type of customization.
To further customize the model, you can incorporate information from your database or documents that the LLM model hasn’t been trained on. LLM models are typically trained on data up to a certain cutoff date, so it’s sometimes necessary to pass on the latest information. RAG (Retrieval-Augmented Generation) is the technique to “stuff the prompt” with contextual, relevant, unseen, and specific instructions to generate the desired response, overcoming the LLM model’s token size and context history limitations. SpringAI provides various Advisor implementations to implement the RAG technique and also supports VectorDB implementations as a data store for RAG. You can read all your documents, tokenize them, and store them in VectorDB. This can then be queried to pass on the most relevant information while invoking a prompt execution.
For example, if you have a document of internal assessments of various clients, you can tokenize it and store it in VectorDB. When creating an advisor to be tagged along with the “prompt,” you can query the VectorDB to find the 2024 ratings and pass the response as “advice.”
This adds a lot of value as you can simply augment your information on the model’s training dataset and enable responses to be customized as per your context.
Sometimes, prompt stuffing via RAG may not be sufficient. For example, if you want to understand the weather condition of your location, the AI model needs to know your current location, its temperature, and the current time to respond better. One way to achieve this is by preparing all this data by calling various functions, creating a context, and passing it along with the prompt. However, this involves a lot of boilerplate and plumbing code.
SpringAI and model APIs provide the ability to register functions with the prompt call, which the GenAI algorithms invoke as and when required. It’s like registering callbacks and letting the frameworks stitch everything together for you. SpringAI “Tool Calling” APIs support this, which is equivalent to {"role": "function", "name": "getDate", "parameters": {}} in the GPT JSON input.
I want to emphasize that SpringAI or equivalent frameworks make it easy to read and manage the code. Without these frameworks, someone would have to write all the code to prepare a fully functional GPT prompt response.
One last option but on the model side. What if you can fine tune the model itself. You can have the base model provided and tune it with customized training data which applies across all the usecases. All cloud AI services provide capability to fine tune the model. This makes sense when you want to handle certain customizations across the board for everyone.
- Final steps– testing and deployment
We are very familiar with writing unit , integration and end to end tests. The problem is how to test the model output. Each time the model response will be a bit different from the earlier run even with the same input and parameter. Irony is all such responses are quite accurate and acceptable. Simple expected compared to actual data comparison may not work.
But we can test following things in the response :
- Relevancy, tone and context alignment : The suggested way to do it is to run the same input with another model and then compare both responses on various parameters using a prompt to compare the two responses. This is using AI to evaluate. For example., you can create a prompt “Please compare both the responses on how relevant they are to the Software Engineering context and share response in a json ” or a prompt “Is this text in a friendly tone. Please response in binary terms.”
- Fact checking : Sometimes the response is quite factual. Like., “India won the finals”. It becomes easy to test when there are facts in the response.
Spring AI provides Evaluators to support the model evaluation process.
Its easy to notice that these are very much standard applications. You can deploy them just like any other application. But you have decision to make where do you want the model to be deployed and be in action. Models execution requires high end GPUs or good RAM size. Most of the models are require good disk size to store the model datasets. As earlier mentioned that models are available as cloud services but there is an option to deploy the models on-prem as well.
Ollama, is open source tool to deploy the LLM models locally. This option works when you are concerned about data privacy and security and also serves well to provide a container feel and try out various models which have Ollama integration.
By using Ollama for model deployment, you achieve the same benefits that Docker provides for application deployment: consistency, portability, isolation, ease of deployment, and scalability.
Conclusion
It nay be challenging to grasp the internal working and training of the GenAI models. But as a developer, you can build application using their services in fairly standard way. I believe the natural progression is building of agents which seemlessly integrate GenAI with application services.
