The Journey of RAG Development: From Notebook to Microservices | by Wenqi Glantz | Feb, 2024

Converting a to two microservices with for Milvus and

Wenqi Glantz
Towards Data Science
Image generated by DALL-E 3 by the author

On a quest for enterprise RAG, we explore how to craft RAG microservices from an RAG pipeline POC developed in a Colab notebook in this article. We take the following approach:

  • Generate boilerplate RAG microservices with ‘s create-llama command line tool.
  • Develop two microservices: ingestion-service, and -service to cover the two main stages of RAG.
  • Convert code from Colab notebook to the microservices.
  • Add Milvus vector database integration to our new microservices.
  • Add NeMo Guardrails to inference-service to add guardrails for user inputs, LLM outputs, topical moderation, and custom actions to integrate with LlamaIndex.

For rapid prototyping, Colab notebook presents the perfect option due to its ease of use, accessibility, and free usage.

For example, this Colab notebook demonstrates how to use Metadata replacement + node sentence window in an RAG pipeline, which serves as a for the NVIDIA AI Enterprise user .

SentenceWindowNodeParser is a tool that can be used to create representations of sentences that consider the surrounding words and sentences. It breaks down documents into individual sentences, and it captures the surrounding sentences too, building a richer picture. Now, imagine needing to translate or summarize this enriched passage. Enter MetadataReplacementNodePostProcessor. It carefully replaces isolated sentences with their surrounding , creating a smoother, more informed interpretation. This approach shines for large documents, where grasping nuances is crucial.

Since we know reranker helps with retrieval , we added CohereRerank as one of the node post processors.

Our POC is complete, and we are ready to proceed to the next step on our production RAG journey.

Source link