RAG LLM processing tool for enterprise documentation Designed for cahdco.org
Simple question/answering from a limited dataset
Built by Faiz Jan and Faraaz Jan
Using LanceDB and EmbedJS
Frontend @ https://github.com/faqro/cahdco-gpt-frontend
NOTE: An OpenAI API key is needed for this project. This can be done for free at openai.com
NOTE: Node.JS is required for this project.
- Run
npm installfrom the main directory - Create a file called
.envin the main directory withOPENAI_API_KEY=[insert api key](you can get the key from https://platform.openai.com/) - Place relevant documents to add to RAG in a
docsfolder in the main directory (These are the .pdf, .docx, .doc, .xlsx, .pptx, etc documents that you want to use for Q/A) - Set "FORCE_REEMBED" to "TRUE" (This is only necessary on the first run + when updating the documents in the
docsfolder, and should be turned off after you've done this once for the given documents) - Put OpenAI API Key in
.envfile in main directory - Run
npm start - To access CahdcoGPT, go to http://localhost:3001 in your browser. Note that on the first run, you will have to wait a couple minutes for document RAG embedding. Trying to use Q/A before embedding is completed will lead to a "an error occurred" message.
- Some files fail to embed, even after multiple attempts. This is likely an issue with the vector database used (LanceDB), and can most likely be fixed by switching to a hosted database such as MongoDB. This is sometimes fixed by deleting the
lancedbfolder in the main directory and reembedding the documents. - CahdcoGPT doesn't have memory of past conversations. This isn't a bug, but just something that hasn't been implemeneted yet.
- "An error occurred" when talking to the LLM. This is not a bug, this occurs when talking to the LLM before it has finished reembedding (this can take a couple minutes). If this is not your first time running the LLM, you may have forgotten to change the "FORCE_REEMBED" variable back to "FALSE".
- Host the project on a server/on a web service to prevent unnecessary computer usage/having to embed the documents on every computer locally
- Switch to MongoDB from LanceDB
- Implement support for .txt, .csv, and .html files