This repo aims to build a image-to-image & text-to-image search engine for fashion products using Jina as a neural search framework.
The fashion images are retrieved form Kaggle.
Jina includes:
- DocumentArray - Concurrent processing of Documents and push/pull them between machines. Useful for creating embeddings on remote machine with GPU and then indexing and querying locally
- Jina Hub Executors, which integrate deep learning models
- Jina Client, formats the REST request
- PQLite allowing us to pre-filter results by price, rating, etc
The front-end is built in Streamlit.
pip install -r requirements.txt
You'll want to create your own get_data.py since processing logic varies from dataset to dataset.
This will create embeddings for all images using CLIPImageEncoder, and then store them on disk (with metadata) with PQLiteIndexer.
cd indexerpython app.py <number_of_docs_to_index>
By default the number of docs to index is set to 1,000,000.
After indexing you'll have a file called columns.json in your indexer directory. Copy this to the backend- directories you want to work with. This will let the user filter by things like price, rating, color, etc (based on what options you present in your front-end). This will overwrite the existing columns.json file(s) which are the ones from the fashion search.
From the repo's root directory:
cd searcherpython app.py -t <task>to start the search server(s)
<task> can be one of:
search: Open up a RESTful interface for searching. Defaults to port 12345test_text- Submit a sample text query and returnuris of resultstest_image- Submit a sample image query and returnuris of results
- Open a new terminal window/tab, return to same directory
cd frontendstreamlit run frontend.py
- First index the data as stated above
- In the repo's root directory, run
docker-compose up