Memorymesh

"I may not have gone where I intended to go, but I think I have ended up where I needed to be."

Enhancing AI with RAG: Leveraging Haystack for Smarter Agents and Efficient Data Retrieval

What’s RAG (Retrieval Augmented Generation)? I define RAG as any process that adds domain specific info to the prompt being sent to an LLM. This means you can enhance an LLM with additional knowl...

Exploring Haystack: Building Advanced NLP Applications with LLMs and Vector Search

According to their github: Haystack is an end-to-end LLM framework that enables you to build applications powered by LLMs, Transformer models, vector search and more. Whether you want to perform...

Exploring AI Model Formats: A Deep Dive into Llama.cpp, GGUF, GGML, and Huggingface Transformers

Awesome Breakdown Post Some clarifications/findings from the above: GGML isn’t necessarily quantized. I ran the GGML converter on my GPT-J float 16 model. It remained float 16, but it just ran a...

Its Hard to find an Uncensored Model

Turns out most models are based on data from OpenAI somehow, and this data has guardrails. Found this post on how to finetune a base model after removing all refusals: Based post on making uncens...

Boosting GPT-J Performance: Converting to GGML for Rapid Inference

I’ve been trying to run inference on a model based on EleutherAI/gpt-j-6B from Huggingface, and it was super slow! The model took about 15 to 30 minutes to respond to my prompt (including model loa...