RIVIXI
LAB
RIVIXI
LAB
RIVIXI
LAB

Autonomous AI Agents in Tech Support: The ProstorHelp Architecture

Alexander Ivanaiskiy, PhD

Industrial AI Founder & Systems Architect

Sergey Shipilov

AI Architecture Lead, Rivixi LLC

Evgeny Ivanaiskiy, PhD

Domain Expert

Abstract

Modern B2B technical support—especially for Point-of-Sale (POS) systems, fiscal protocols (ФФД 1.2), and state integrations—requires absolute precision. Standard LLM chatbots frequently hallucinate specific error codes, fail to understand context, and lack the ability to actively search documentation.

In this case study, we outline the architecture of ProstorHelp, an enterprise-grade AI technical support agent built by Rivixi for PROSTOR:KASSA. By moving away from basic chatbots to an autonomous ReAct (Reasoning and Acting) agent equipped with an Advanced RAG pipeline, the system achieves near-human autonomy, capable of escalating edge cases and dynamically fetching information across vector and lexical databases.


1. Introduction: Moving from Chatbots to Autonomous Agents

When users encounter issues with enterprise software, they don't want a conversational partner; they want exact answers. Standard Retrieval-Augmented Generation (RAG) approaches rely purely on semantic similarity (vector search), which inherently struggles with exact keyword matching, such as specific error codes ("Error 1162") or strict legal terms.

To solve this, we designed ProstorHelp not as a text-generator, but as an Autonomous AI Agent. The agent leverages an Open-Weights LLM engine and has access to a suite of external tools (Parallel Function Calling). It does not just generate an answer—it plans its actions, searches multiple databases concurrently, and verifies its findings.

Novelty and Contribution

Unlike typical RAG systems that rely solely on vector embeddings, our architecture implements a Hybrid Search Pipeline that fuses semantic meaning (pgvector) with exact lexical matching (pg_trgm). Additionally, instead of hardcoding workflows, we implemented a custom ReAct agent capable of parallel tool execution and seamless human handoff.


2. Advanced RAG & Hybrid Search Pipeline

To eliminate hallucinations when dealing with highly specific POS terminal errors, the retrieval pipeline was fundamentally rebuilt to maximize both Recall and Precision.

ProstorHelp Architecture
Fig 1. Advanced RAG & Agentic Architecture for ProstorHelp.
  1. RAG-Fusion (Query Expansion): User queries are notoriously brief or contain typos. Before searching, the LLM expands the query into multiple variants to capture different intents.
  2. Parallel Hybrid Search in PostgreSQL:
    • Semantic Search (pgvector): Finds conceptually similar articles using cosine similarity on embeddings generated by multilingual-e5-small.
    • Keyword Search (pg_trgm): A trigram-based lexical search critical for finding exact error codes and model numbers.
  3. Reciprocal Rank Fusion (RRF): The results from both searches are mathematically fused to prioritize documents that rank highly in both methods.
  4. Cross-Encoder Reranking: The top candidates are passed through a local neural reranker (mmarco-mMiniLMv2), which performs a pairwise comparison between the user's query and the document text, aggressively filtering out irrelevant "noise" before it reaches the LLM.

3. The Agentic Loop and Tool Execution

The brain of the system is a custom ReAct (Reasoning and Acting) loop. When a complex query arrives, the agent analyzes the context and decides which tools to invoke.

Because the agent supports Parallel Tool Calling, it can simultaneously search the internal documentation base while dynamically fetching a specific URL from the public website if it detects the user is asking about pricing.

Available Tools:

  • search_knowledge_base: Primary hybrid search over technical documentation.
  • search_past_chats: Searches historical logs of human operators to find solutions to un-documented edge cases.
  • web_fetch_url: Downloads and parses live webpages for real-time pricing and services.
  • escalate_to_human: Seamlessly transfers the context to a human operator in Telegram.

The Art of the Seamless Handoff

If the agent detects a lack of information, or if the user attempts to transmit sensitive financial data, the escalate_to_human tool is triggered. The AI immediately halts generation and pushes the entire context to an L2 human operator. The operator responds directly through their interface, ensuring the end-user experiences zero friction.


4. Continuous Ingestion and Long-Term Memory

Knowledge bases are not static. To ensure the LLM's context window is always fresh, we built a continuous ingestion pipeline:

  • Background Workers: Running on APScheduler, automated workers periodically scrape the documentation site (help.prostore.org) and the main website.
  • HTML to Markdown: Content is cleaned, chunked, and vectorized locally before being stored in PostgreSQL.

Furthermore, the system builds Long-Term Memory user profiles. If a user interacts with the bot multiple times, it extracts and persists facts (e.g., "User operates ATOL 30F on Linux"). During subsequent interactions, these facts are automatically injected into the agent's context, allowing for highly personalized troubleshooting without forcing the user to repeat their setup.


5. Conclusion

The implementation of the ProstorHelp agent demonstrates that the era of simple, scripted chatbots is over. By combining an autonomous ReAct loop with a robust Hybrid Search PostgreSQL backend, enterprise tech support can achieve high deflection rates without sacrificing response accuracy.

The ability to seamlessly blend semantic understanding with exact lexical matching, coupled with automated background data ingestion, positions this architecture as a highly scalable blueprint for B2B Customer Success automation.

Limitations

The system currently relies heavily on the quality and structure of the underlying Markdown documentation. Extremely novel hardware errors that have neither been documented nor encountered by human operators will inevitably trigger the human escalation protocol. Future development aims to integrate direct API diagnostics with the POS terminals to pre-emptively fetch error logs before the user even submits their query.