Building Applications with LLM Models: A Comprehensive Guide for Developers — Part 1

GenAI 101 and Building an Application Ecosystem Around Generative AI

Part 2 of the series details out developer tasks. Hopefully will introduce one more part to discuss in the GenAI model concepts in detail.

Introduction

The field of text analytics and language processing has evolved significantly with the general availability of Generative AI (GenAI) powered by Large Language Models (LLMs) on Transformer architecture. While NLP and text analytics have a long history, the introduction of the ‘attention’ mechanism in Transformer models, their ability to link multiple statements together to generate longer responses, and their parallel processing capabilities have made them highly impactful.

There’s a ton of articles out there about how these LLM models work and what you can do with them. Most of them either dive deep into AI math with things like embedded vectors or they focus on how to write ‘prompts’ for the AI model. But one thing that often gets overlooked is what developers need to build solid and structured applications around these models. Remember when NoSQL databases first came out? After a while, developers stopped worrying so much about how they worked and focused more on how to use them to solve their problems.

This article addresses the developer perspective on GenAI. Part 2 of the series will focus more on building the applications. This one focuses on preparing the ground. The terms are defined in brief just to keep length in check.

Gen AI and LLM — Hello World Introduction

LLM, Large Language Models is a AI model, typically function or algorithm with billions of parameters. They use the concept of embedding models and neural networks (deep learning) to create a mathematical model of each language world and based on a training data prepare a AI model to predict what is the most suitable word or text for a given input.

They are large as they have billion of parameters like., 1.5B, 7B, 8B, 14B, 32B, 70B etc. various sizes of the models. They are trained of huge amount of data all over the internet and works on a principle of “self supervised learning”. No one has to give what is the output for a given input to train itself, it uses the text available all over to figure out what is the relevant text.

The model to find relationship and predict what’s next word and sentence is where hard core ML engineers excel. This is core of GenAI systems. It is the base for GenAI like., what Lucene text indexing is for ElasticSearch.

There are really good articles to focus on how does it work. Right now let’s just move on with the baseline that we have a complex algorithm developed using multi layer learning technique and huge dataset which is good at predicting what should be the next word, sentence or paragraph.

The next key concept is Transformers.

Transformer architecture is what which makes the difference. As mentioned text predicting and language modelling has been there for sometime. What machines were not able to do well to interconnect multiple lines or figure out context to tune the recommendation. As humans, we focus on key words, tone, underlying theme to process the information. Google research paper, “Attention is all you need“, highlighted this and created a new architecture called Transformer Model to develop the LLM. Transformers use self-attention mechanisms to process and generate sequences, allowing them to capture context and dependencies better than previous models.

The Transformer’s parallel simultaneous processing mechanism computes relationships between all words in a sentence at the same time, rather than sequentially which makes them faster.

Having established this, let’s move on to next fundamental term i.e. GenAI.

GenAI is an AI system capable of generating new text, images, videos, audio, etc., using various AI models (like LLMs for text) with the ability to interpret input context. It includes not just text models but also other multimodal models like audio, video, and images. As a system, it addresses challenges around maintaining long-running context and client integration APIs. The term to focus on is that it is a system which works on LLMs and equivalent models to generate new content while integrating with other information sources for better context management.

Remember the analogy: Elasticsearch to Lucene.

Being a Software Developer — Getting started and what to do

With the buzz around ChatGPT, Gemini, Copilot, HuggingFace, Ollama, and other GenAI models, I feel as a software developer that we’re in a similar situation as when NoSQL databases changed the storage landscape. While I grasped the use cases and high-level architecture, what mattered most was understanding how to build applications around them and integrate them into my application landscape.

Similarly, today I understand how these AI models work and their practical use cases. But there are still questions on how to integrate them into my application ecosystem. This blog addresses such questions, including:

  • What are the available LLM model options?
  • What hardware is needed to deploy them?
  • Are there only cloud-based model services, or can they be deployed on-premises?
  • What frameworks and APIs are available to integrate these systems?
  • Can the models be customized?
  • How do you provide context information or new facts to fine-tune the results?
  • How can automated integration be done to fetch information from other systems and let the model use it?
  • How do you integrate model responses into an application flow to build automated processes?
  • How do you configure the model engine to control token length, cost, etc.?
  • Is caching applicable, and what are the optimization techniques?
  • How do you format the output in a structured entity for better integration?
  • And last but not least, how do you test and observe model operations?

I believe this should look familiar. Barring new different usecases, it resembles exactly all that a developer has to manage for an application probably with database as a backend. By no means I want to trivialize the challenges but the attempt is to simplify things with an analogy with what a developer do day in and day out.

I find it convenient to share examples or code snippets with Spring AI. This is just from illustration purposes. Let’s quickly address these questions.

Exploring Various LLM Models and Other AI Models

The whole GenAI model space is full of options. First categorization would be on the basis on the use cases they solve. The idea is not to classify them based on the implementation pattern or architecture but more from user perspective.

CategoryTraining dataset typeArchitectureDescriptionExamples
Text GeneratorsHuge text datasets; data over internetTransformerLLM models; accepts input as text or forms which provides text and generates text mostly or audio content.
Basically trained on text datasets.
LLama, GPT3, GPT4, LaMDA
Image GeneratorsImage datasets with their metadataGenerative Adversarial Networks (GANs), other also applicableSpecialized in image generation; trained on image datasetsDALL-E 2 –Open AI, Midjourney
Audio GeneratorsMusic data, audioGANs, transformers, hybridVarious music generation usecasesGPT-4o audio:,
Azure Open AI Text to Speech
Multimodal GeneratorsVarious dataset pairsTransformerscan work on different input and output data modesGemini

Video generators follow near about same approach as audio ones with different training dataset.

Another angle to look at them is their problem domain specialization. Some models are trained and developed to solve a very specific business case, such as GitLab Duo for coding and OpenAI Codex.

Another important parameter for classifying these models is their size and hardware requirements. Based on size, i.e., the number of parameters in the models, they can be classified as SLM (Small Language Model) or LLM (Large Language Model). Specialized models are generally SLMs, which have low latency and require less hardware. On the flip side, they can’t manage a wide range of use cases. For example, Codestral from Mistra AI is a code generation SLM model.

One last criterion, very critical from a privacy perspective, is the deployment model for the models. Most models have a cloud-based service offering, such as Google Vertex AI or Azure Open AI, but there are options for deploying models on your hardware/infrastructure. Ollama is a solution for running a few supported LLM models locally as well. The deployment strategy is a critical decision-maker based on data privacy constraints. We will discuss more on Ollama later.

It must be evident that there are various criteria’s along with cost and the model response evaluation to figure out what model or solution will work in your use case.

Point to note: ChatGPT, Gemini, and Copilot are applications using one or more of these models. As an application developer, the same can be achieved. Just think of having various data connectors in an application and, based on the scenario, a factory class provides you with the right connector.

The blog started with LLM models, and to have a consistent example set, we will continue with that only. Let’s discuss in the

Conclusion

Having gone through the concepts in brief and listing down the key developer concerns lets discuss the developer tasks in detail here (part 2).

Leave a comment