Some clarifications/findings from the above: GGML isn’t necessarily quantized. I ran the GGML converter on my GPT-J float 16 model. It remained float 16, but it just ran a lot faster.
I got GPT-4 to summarize, here it is:
The Reddit post you’ve referenced discusses various tools and libraries for loading, training, and inference of AI models, focusing on compatibility and supported file formats. Here’s a summary:
- Compatibility & Supported File Formats:
- Llama.cpp (by Georgi Gerganov) supports
safetensors
and can convert.safetensors
model files into.gguf
. - GGUF: A new file format that includes all necessary metadata within the model file, eliminating the need for additional files like
tokenizer_config.json
. - GGML: An older format, now deprecated in Llama.cpp.
- Transformers (by Hugging Face): Supports
safetensors
(unquantized and quantized using GPTQ algorithm) andbin
(unquantized) formats. - AutoGPTQ: A quantization library based on the GPTQ algorithm, also available via Transformers.
- koboldcpp: A fork of Llama.cpp supporting
bin
files using the GGML algorithm. - ExLlama v2: Offers optimized GPTQ backend for LLaMA models, supporting
safetensors
quantized using the GPTQ algorithm and AWQ (low-bit quantization). - AWQ: Supports low-bit quantization (INT3/4) and uses the
safetensors
format.
- Llama.cpp (by Georgi Gerganov) supports
- Additional Notes:
- The “Use in Transformers” button on the Huggingface site may not work for all hosted models; it’s more like an auto-generated template.
- GGUF files usually contain all necessary components (e.g., tokenizer), but they don’t include the prompt template. Prompt templates can often be found on the model card on HuggingFace.
- Safetensors can be used with Transformers but not with Llama.cpp.
- GGML is no longer supported by Llama.cpp, and its support has been deprecated.
- There are multiple frameworks (like Transformers, Llama.cpp, koboldcpp, ExLlama, etc.), and some support multiple quantization formats while others require specific formats.
This discussion highlights the evolving landscape of AI model formats and the tools used for their management, emphasizing the importance of compatibility and the continuous development of these technologies.
Next, I will work on integrating a UI for my model.