Original Discord Post by darkladen | 2024-08-01 15:15:37
1- It answers me with information that is not in the documents. I have no idea how it responds with information that I have not given it in the documents if it supposedly does not connect to the internet to search for information, at least that is what I think because otherwise, I should have a way to tell it not to connect to the internet. Is it better to have only 1 document or should I have no problem using several separate documents depending on the context ?
2- The answers sometimes remain in, for example, the beginning of a complete answer, for example:
When one asks for specific information (in my case, to ask for food stores), it answers something like “Yes, here I have some options:” And then it does not give me the list of options. When this happens I have to ask insisting with “And what is the list?” or “You have not given me the list”, and for a normal user who asks things to the avatar, it generates confusion because after the question he does not receive the complete answer, and for the client it is a little uncomfortable.
Hello again,
On point 1, which is the most crucial, is there a way to attack this issue and especially that “the LLM can be forced to respond on the basis of the documents” ?
At some point some time ago, I asked if it was possible to use an external LLM that has been proven to work very well on this point.
I remain attentive and hopefully have some news as this prevents me from managing to package a solution for the client, because it does not meet the basics.
Hello <@1023671043287699568> , I’m without power here in my city due to a storm. As soon as the power comes back and I can turn on my PC, I’ll send the data. Thank you.
Hello again <@1023671043287699568> , I am finally operational after all the local problems that occurred. Below is some more information about the requested information.
Character ID: affe8d52-2d94-11ef-9ee4-42010a7be00e
Session: 6525a12f66ce3346fda3da37ab915389
Info: In this session he responds with information that is not in the documents. Specifically the document “AEROPUERTO_Stores_ENG_ENG_v2.txt”.
Sessions: 336daa2bd71c05eb55f8b4a466d1e4c5 - 131af979130f144d639b4758dff005fe - 3abb406eccc1a4641c00cdd089a3d745
Info: Here you can see that when you ask him a question, he responds with “I hope you like some of these options.” but the options are not delivered unless you insist that he “hasn’t delivered the options” and he does.
Session: d538356d71747f84103e0b6858c8ff38
Info: The fact that he tells me that he does not know some food stores is already quite exaggerated how badly he responds since he does have a list in a document, which otherwise he responds well on other occasions but sometimes what happens in this session happens.
Session: 487977abfc7c2fddd0f69a064ce5a3bd
Info: With this session we can see the optimal performace.
Mainly the problem, as I said before, is that it does not search (sometimes) the information as it should within the connected document. Besides, it is little information, only about 30 stores and it should, in its final version, be able to load around 200 stores.
<@305834434178056192> : We do not officially support KB in language other than English. Can you please upload your file again in English and give it a try.
<@1118275510926053447> HELLO !!!, The documents are already in English. I was told this some time ago and as I also work with local LLM models and develop other AI projects, I fully understand that the main language from which the models have been trained is English, just to make it a bit clear on my side.
It is as if when I give them documents to learn from them, I don’t always look for the answers based on them and many times they answer me with information that is not in the documents.
Reply by sconvai | 2024-08-07 22:07:21
Note, this will not impact your response language which will be Spanish. Our stack will take care of handling the conversion as needed.
Reply by darkladen | 2024-08-07 22:12:07
Sure, I understand this perfectly. The important thing is that the knowledge base is in English and that’s what I’m doing.
You are right. The documents “Aeropuerto_KB_About_US.txt” and “Aeropuerto_comodoro_arturo_merino_benitez.txt” are in Spanish although before I had them in English but when I loaded the document “AEROPUERTO_Stores_ENG_v2.txt” which is in English, and is the one that gives me problems, I forgot to return to English the other 2 documents.
The first documents are general information of the airport that as I said worked fine in English and Spanish but the most important document is the last one (AEROPUERTO_Stores_ENG_ENG_v2.txt) which is the one that most of the users that interact with the Avatar will use.
The ones in Spanish anyway work fine and I don’t doubt that they would be better in English but the last one “AEROPUERTO_Stores_ENG_ENG_ENG_v2.txt” is the most important one and is where I have had problems.
The other ones I had in English and between so many changes I forgot to change them to English but it was because they didn’t give me problems.
Also at some point I asked if it is better to have everything in 1 document, or do they work better separated according to content of information?
Question: If I have the first 2 documents in “Spanish” it may affect the third one which is in English and which is where most information is sought, the LLM may fail to respond ? The strangest thing is that it delivers information that is not in the documents.