Issue with Knowledge Base Behavior

From what I’ve experienced, the ability of ConvAI’s knowledge bank to find the proverbial “needle in a haystack” is inferior to what can be achieved when working directly with LLMs.
This is the case even when documents are highly optimized, temperature is set low, and different LLMs are tested (assuming the LLM plays a role at all… which I personally doubt, since in a typical RAG setup, it’s the retriever — not the LLM — that’s responsible for fetching results).

That said, I’ve observed significant improvements by refining the prompt and, most importantly, by improving the quality of the uploaded data.

My suggestion is to create a new, “clean” avatar (with a basic prompt such as “you are a virtual assistant” and nothing more), and then start preparing the content according to ConvAI’s guidelines.

This “clean avatar” approach helped me understand that sometimes (at least in my case), responses weren’t actually coming from the knowledge bank but rather from the LLM’s native training — likely because the KB contained information about a very well-known historical figure.
By using a minimal prompt and explicitly instructing the avatar to ignore everything except what’s in the KB, I was indeed able to get more specific responses.
I even managed (as a test) to make it provide false answers — for example: “Who discovered America?” → “Charlemagne.”

If a strict “question-answer” format is not feasible (which is often the case), I recommend at least pre-processing the material through manual chunking.
That means deciding where and how much to split sentences and how much overlap to include between chunks — these are technical aspects that come into play when preparing content for systems like ConvAI’s knowledge bank. (Search online for “chunking RAG best practices” to dig deeper.)

Also, use as few documents as possible (this is even mentioned in ConvAI’s official guidelines). In general, it’s better to have a few large documents than many small ones.
[Honestly, I haven’t tested this part much myself, and it seems counterintuitive that it would have much impact — but I’m including it for completeness.]

This has been my experience. The quality has definitely improved compared to the initial results…
but I still haven’t been able to achieve the same level of quality — for example — as with an agent built using ChatGPT.