Converting a Colab notebook to two microservices with support for Milvus and NeMo Guardrails
On a quest for enterprise RAG, we explore how to craft RAG microservices from an RAG pipeline POC developed in a Colab notebook in this article. We take the following approach:
- Generate boilerplate RAG microservices with LlamaIndex‘s
create-llama
command line tool. - Develop two microservices:
ingestion-service
, andinference-service
to cover the two main stages of RAG. - Convert code logic from Colab notebook to the microservices.
- Add Milvus vector database integration to our new microservices.
- Add NeMo Guardrails to
inference-service
to add guardrails for user inputs, LLM outputs, topical moderation, and custom actions to integrate with LlamaIndex.
For rapid prototyping, Colab notebook presents the perfect option due to its ease of use, accessibility, and free usage.
For example, this Colab notebook demonstrates how to use Metadata replacement + node sentence window in an RAG pipeline, which serves as a chatbot for the NVIDIA AI Enterprise user guide.
SentenceWindowNodeParser
is a tool that can be used to create representations of sentences that consider the surrounding words and sentences. It breaks down documents into individual sentences, and it captures the surrounding sentences too, building a richer picture. Now, imagine needing to translate or summarize this enriched passage. Enter MetadataReplacementNodePostProcessor
. It carefully replaces isolated sentences with their surrounding context, creating a smoother, more informed interpretation. This approach shines for large documents, where grasping nuances is crucial.
Since we know reranker helps with retrieval accuracy, we added CohereRerank
as one of the node post processors.
Our POC is complete, and we are ready to proceed to the next step on our production RAG journey.