Sunday, 5 November 2023

PrivateGPT Installation Notes

These notes work as of 07/11/2023 using Xubuntu 22.04 - your milage may vary.

PrivateGPT

PrivateGPT is a production-ready AI project that allows you to ask questions about your documents using the power of Large Language Models (LLMs), even in scenarios without an Internet connection. 100% private, no data leaves your execution environment at any point.

Repo

https://github.com/imartinez/privateGPT

Docs

https://docs.privategpt.dev

Install

https://docs.privategpt.dev/#section/Installation-and-Settings

Install git

sudo apt install git

Install python

sudo apt install python3

Install pip

sudo apt install python3-pip

Install pyenv

cd ~
curl https://pyenv.run | bash

Add the commands to ~/.bashrc by running the following in your terminal:

echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
echo 'command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
echo 'eval "$(pyenv init -)"' >> ~/.bashrc

If you have ~/.profile, ~/.bash_profile or ~/.bash_login, add the commands there as well. If you have none of these, add them to ~/.profile:

echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.profile
echo 'command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.profile
echo 'eval "$(pyenv init -)"' >> ~/.profile

Restart your shell for the changes to take effect.

Install Python 3.11

pyenv install 3.11
pyenv local 3.11

If you see these errors and warnings, install the required dependencies:

ModuleNotFoundError: No module named '_bz2'
WARNING: The Python bz2 extension was not compiled. Missing the bzip2 lib?
Traceback (most recent call last):
  File "", line 1, in 
  File "/home/adrian/.pyenv/versions/3.11.6/lib/python3.11/curses/__init__.py", line 13, in 
    from _curses import *

ModuleNotFoundError: No module named '_curses'
WARNING: The Python curses extension was not compiled. Missing the ncurses lib?
Traceback (most recent call last):
  File "", line 1, in 
  File "/home/adrian/.pyenv/versions/3.11.6/lib/python3.11/ctypes/__init__.py", line 8, in 
    from _ctypes import Union, Structure, Array

ModuleNotFoundError: No module named '_ctypes'
WARNING: The Python ctypes extension was not compiled. Missing the libffi lib?
Traceback (most recent call last):
  File "", line 1, in 

ModuleNotFoundError: No module named 'readline'
WARNING: The Python readline extension was not compiled. Missing the GNU readline lib?
Traceback (most recent call last):
  File "", line 1, in 
  File "/home/adrian/.pyenv/versions/3.11.6/lib/python3.11/ssl.py", line 100, in 
    import _ssl             # if we can't import it, let the error propagate
    ^^^^^^^^^^^
ModuleNotFoundError: No module named '_ssl'
ERROR: The Python ssl extension was not compiled. Missing the OpenSSL lib?

ModuleNotFoundError: No module named '_sqlite3'
WARNING: The Python sqlite3 extension was not compiled. Missing the SQLite3 lib?
Traceback (most recent call last):
  File "", line 1, in 
  File "/home/adrian/.pyenv/versions/3.11.6/lib/python3.11/tkinter/__init__.py", line 38, in 
    import _tkinter # If this fails your Python may not be configured for Tk
    ^^^^^^^^^^^^^^^

ModuleNotFoundError: No module named '_tkinter'
WARNING: The Python tkinter extension was not compiled and GUI subsystem has been detected. Missing the Tk toolkit?
Traceback (most recent call last):
  File "", line 1, in 
  File "/home/adrian/.pyenv/versions/3.11.6/lib/python3.11/lzma.py", line 27, in 
    from _lzma import *

ModuleNotFoundError: No module named '_lzma'
WARNING: The Python lzma extension was not compiled. Missing the lzma lib?

Install dependencies:

sudo apt update
sudo apt install libbz2-dev
sudo apt install libncurses-dev
sudo apt install libffi-dev
sudo apt install libreadline-dev
sudo apt install libssl-dev
sudo apt install libsqlite3-dev
sudo apt install tk-dev
sudo apt install liblzma-dev

Try installing Python 3.11 again:

pyenv install 3.11
pyenv local 3.11

Install pipx

python3 -m pip install --user pipx
python3 -m pipx ensurepath

Restart your shell for the changes to take effect.

Install poetry

pipx install poetry

Clone the privateGPT repo

cd ~
git clone https://github.com/imartinez/privateGPT
cd privateGPT

Install dependencies

poetry install --with ui,local

Download Embedding and LLM models

poetry run python scripts/setup

Run the local server

PGPT_PROFILES=local make run

Navigate to the UI

http://localhost:8001/

Shutdown

ctrl-c

GPU Acceleration

Verify the machine has a CUDA-Capable GPU

lspci | grep -i nvidia

Install the NVIDIA CUDA Toolkit

sudo apt update
sudo apt upgrade
sudo apt install nvidia-cuda-toolkit

Verify installation

nvcc --version
nvidia-smi

Install llama.cpp with GPU support

Find your version of llama_cpp_python:

poetry run pip list | grep llama_cpp_python

Substitue the version in the next command:

cd ~/privateGPT
CMAKE_ARGS='-DLLAMA_CUBLAS=on' poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python==0.2.13

If you see an error like this, try specifitying the location of nvcc:

Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... error
  error: subprocess-exited-with-error
  
  × Building wheel for llama-cpp-python (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [35 lines of output]
      *** scikit-build-core 0.6.0 using CMake 3.27.7 (wheel)
      *** Configuring CMake...
      loading initial cache file /tmp/tmp591ifmq4/build/CMakeInit.txt
      -- The C compiler identification is GNU 11.4.0
      -- The CXX compiler identification is GNU 11.4.0
      -- Detecting C compiler ABI info
      -- Detecting C compiler ABI info - done
      -- Check for working C compiler: /usr/bin/cc - skipped
      -- Detecting C compile features
      -- Detecting C compile features - done
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: /usr/bin/c++ - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
      -- Found Threads: TRUE
      -- Found CUDAToolkit: /usr/local/cuda/include (found version "12.3.52")
      -- cuBLAS found
      -- The CUDA compiler identification is unknown
      CMake Error at /tmp/pip-build-env-h3vy91ne/normal/lib/python3.11/site-packages/cmake/data/share/cmake-3.27/Modules/CMakeDetermineCUDACompiler.cmake:603 (message):
        Failed to detect a default CUDA architecture.
      
      
      
        Compiler output:
      
      Call Stack (most recent call first):
        vendor/llama.cpp/CMakeLists.txt:258 (enable_language)
      
      
      -- Configuring incomplete, errors occurred!
      
      *** CMake configuration failed
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects

Build with the location of nvcc:

CUDACXX=/usr/local/cuda-12/bin/nvcc CMAKE_ARGS='-DLLAMA_CUBLAS=on' poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python==0.2.13

Start the server

cd ~/privateGPT
pyenv local 3.11
PGPT_PROFILES=local make run

If you see this error, configure the number of layers offloaded to VRAM:

CUDA error 2 at /tmp/pip-install-pqg0kmzj/llama-cpp-python_a94e4e69cdce4224adec44b01749f74a/vendor/llama.cpp/ggml-cuda.cu:7636: out of memory
current device: 0
make: *** [Makefile:36: run] Error 1

Configure the number of layers offloaded to VRAM:

cp ~/privateGPT/private_gpt/components/llm/llm_component.py ~/privateGPT/private_gpt/components/llm/llm_component.py.backup
vim ~/privateGPT/private_gpt/components/llm/llm_component.py

change:

model_kwargs={"n_gpu_layers": -1},

to:

model_kwargs={"n_gpu_layers": 10},

Try to start the server again:

cd ~/privateGPT
pyenv local 3.11
PGPT_PROFILES=local make run

If the server is using the GPU you will see something like this in the output:

...
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA RTX A1000 Laptop GPU, compute capability 8.6
...
llm_load_tensors: ggml ctx size =    0.11 MB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: mem required  = 2902.35 MB
llm_load_tensors: offloading 10 repeating layers to GPU
llm_load_tensors: offloaded 10/35 layers to GPU
llm_load_tensors: VRAM used: 1263.12 MB
...............................................................................................
llama_new_context_with_model: n_ctx      = 3900
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: kv self size  =  487.50 MB
llama_build_graph: non-view tensors processed: 740/740
llama_new_context_with_model: compute buffer total size = 282.00 MB
llama_new_context_with_model: VRAM scratch buffer: 275.37 MB
llama_new_context_with_model: total VRAM used: 1538.50 MB (model: 1263.12 MB, context: 275.37 MB)
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 
...

Ingest

For example, to download and ingest an html copy of A Little Riak Book:
cd ~/privateGPT
mkdir ${PWD}/ingest
wget -P ${PWD}/ingest https://raw.githubusercontent.com/basho-labs/little_riak_book/master/rendered/riaklil-en.html
poetry run python scripts/ingest_folder.py ${PWD}/ingest

Configure Temperature

cp ~/privateGPT/private_gpt/components/llm/llm_component.py ~/privateGPT/private_gpt/components/llm/llm_component.py.backup
vim ~/privateGPT/private_gpt/components/llm/llm_component.py

change:

temperature=0.1

to:

temperature=0.2

Restart the server

crtl+c
cd ~/privateGPT
pyenv local 3.11
PGPT_PROFILES=local make run