CahdcoGPT

RAG LLM processing tool for enterprise documentation Designed for cahdco.org

Simple question/answering from a limited dataset

Built by Faiz Jan and Faraaz Jan

Using LanceDB and EmbedJS

Frontend @ https://github.com/faqro/cahdco-gpt-frontend

NOTE: An OpenAI API key is needed for this project. This can be done for free at openai.com

NOTE: Node.JS is required for this project.

Instructions

Run npm install from the main directory
Create a file called .env in the main directory with OPENAI_API_KEY=[insert api key] (you can get the key from https://platform.openai.com/)
Place relevant documents to add to RAG in a docs folder in the main directory (These are the .pdf, .docx, .doc, .xlsx, .pptx, etc documents that you want to use for Q/A)
Set "FORCE_REEMBED" to "TRUE" (This is only necessary on the first run + when updating the documents in the docs folder, and should be turned off after you've done this once for the given documents)
Put OpenAI API Key in .env file in main directory
Run npm start
To access CahdcoGPT, go to http://localhost:3001 in your browser. Note that on the first run, you will have to wait a couple minutes for document RAG embedding. Trying to use Q/A before embedding is completed will lead to a "an error occurred" message.

Known Issues

Some files fail to embed, even after multiple attempts. This is likely an issue with the vector database used (LanceDB), and can most likely be fixed by switching to a hosted database such as MongoDB. This is sometimes fixed by deleting the lancedb folder in the main directory and reembedding the documents.
CahdcoGPT doesn't have memory of past conversations. This isn't a bug, but just something that hasn't been implemeneted yet.
"An error occurred" when talking to the LLM. This is not a bug, this occurs when talking to the LLM before it has finished reembedding (this can take a couple minutes). If this is not your first time running the LLM, you may have forgotten to change the "FORCE_REEMBED" variable back to "FALSE".

Potential Future Changes

Host the project on a server/on a web service to prevent unnecessary computer usage/having to embed the documents on every computer locally
Switch to MongoDB from LanceDB
Implement support for .txt, .csv, and .html files

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
customloaders		customloaders
static		static
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
app.js		app.js
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json
test.rest		test.rest

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CahdcoGPT

Instructions

Known Issues

Potential Future Changes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CahdcoGPT

Instructions

Known Issues

Potential Future Changes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages