Exploring Haystack: Building Advanced NLP Applications with LLMs and Vector Search

According to their github:

Haystack is an end-to-end LLM framework that enables you to build applications powered by LLMs, Transformer models, vector search and more. Whether you want to perform retrieval-augmented generation (RAG), documentation search, question answering or answer generation, you can use state-of-the-art embedding models and LLMs with Haystack to build end-to-end NLP applications to solve your use case.

Some tips on developing with Haystack:

1) There’s two versions - 1.x and 2.x in beta. If you fork from main, you are on the 2.x beta

2) The beta version is missing some things. I just found out that it lacks memory context for building chatbots. To me, this is a critical feature. I was able to get 2.x to perform RAG against a set of custom PDF’s, integrated to Azure Form Recognizer pretty quickly, but once I started trying to make it more of a context aware chatbot I quickly saw the need for memory, and this does not exist in 2.x yet.

3) The way I like to use open source frameworks is to fork them, then work with them directly in my IDE rather than just installing and using it as a package. This is because often times the docs are lacking or not clear, and if there are any bugs you can just fix it and push the commit to your own fork, and offer it up as a PR if necessary (after creating a feature branch of course)

4) If you are contributing to Haystack, you’ll need to setup mypy+black linters, and reno for release notes. Basically when creating a PR make sure all checks pass.

5) I’m not a python guy, I always thought dependencies were installed with pip -r requirements.txt, but for pyproject.toml you install dependencies with python -m pip install .

6) To make sure you are using your forked Haystack and not one from the package, run pip uninstall haystack-ai to remove the beta package and pip uninstall farm-haystack to remove any 1.x package.

7) To work on 1.x in your fork, you’ll need to fetch all the upstream tags from your fork with git fetch upstream --tags, then checkout the latest 1.x release. For me that was git checkout tags/v1.23.0, then git branch my-branch to create a new branch based on this release, and then git status to make sure your branch is up to date and you are not in any detached state.

If you are in a detached state, git checkout tags/v1.23.0 should reattach your head to that specific version.

8) Run pip install --upgrade pip and pip install -e '.[all]' to install the forked repo and all optional dependencies. This should be done from within a python virtual environment.

Exploring Haystack: Building Advanced NLP Applications with LLMs and Vector Search

CATALOG

FEATURED TAGS

FRIENDS