Shing Lyu

Getting Started with Rust llm-chain using the LLaMa driver

By Shing Lyu    

Disclaimer: This content reflects my personal opinions, not those of any organizations I am or have been affiliated with. Code samples are provided for illustration purposes only, use with caution and test thoroughly before deployment.

Recently I’ve been contributing to llm-chain, a Rust library for working with large language models (LLMs). It’s similar to Python’s LangChain. I contributed this tutorial to the official website for setting up a simple llm-chain using LLaMA models running locally using the llama.cpp project. I’m also publishing the same tutorial in this blog for archiving.


Tutorial: Getting Started using the LLAMA driver

In this tutorial, you will learn how to set up an llm-project using the LLAMA drive. If you wish to use the other drivers you can skip this part of the tutorial.

Prerequisites

To follow this tutorial, you will need:

We tested using these exact software versions, but you should be able to get similar results using the latest versions. Also, there are many alternative ways to install and use these products. For example install Python via your Linux distribution’s package manager. You can adapt this tutorial to your environment.

Step 1: Create a new Rust project

First, you will create a new Rust project using Cargo. To create a new project, open a terminal and run the following command:

cargo new --bin llm-chain-demo
cd llm-chain-demo

This will create a new directory called llm-chain-demo with the following structure:

llm-chain-demo
├── Cargo.toml
└── src
    └── main.rs

The Cargo.toml file contains the metadata and dependencies of your project. The src/main.rs file contains the main source code of your project.

Step 2: Add dependencies

To add these dependencies, run the following commands in your terminal:

cargo add llm-chain
cargo add tokio --features full

Step 3: Update Rust

The minimum version required to run this tutorial is Rust 1.65.0. At the time of this writing, the latest stable version is 1.71.0, so we’ll use that.

To switch to Rust 1.71.0, you need to use rustup, which is a tool that helps you manage multiple versions of Rust on your system.

To install rustup, follow the instruction on https://rustup.rs/

To install Rust 1.71.0 using rustup, run the following command in your terminal:

rustup install 1.71.0

To switch to Rust 1.71.0 as the default version for your project, run the following command in your terminal:

rustup default 1.71.0

You can verify that you are using Rust 1.71.0 by running the following command in your terminal:

rustc --version

This should output something like this:

rustc 1.71.0 (8ede3aae2 2023-07-12)

Step 4: Install and run llama.cpp

Now that you have set up your Rust project and switched to the correct version of Rust, you need to install and run llama.cpp, which is a C++ implementation of LLaMa models for inference.

llama.cpp requires GCC and G++ 8 or newer, if your distribution use GCC/G++ 7 by default, use these commands to update:

sudo apt install gcc-8 g++-8
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 10
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-8 10
sudo update-alternatives --install /usr/bin/cc cc /usr/bin/gcc 30
sudo update-alternatives --set cc /usr/bin/gcc

sudo update-alternatives --install /usr/bin/c++ c++ /usr/bin/g++ 30
sudo update-alternatives --set c++ /usr/bin/g++

To install llama.cpp, you need to clone its repository from GitHub and build it from source. To do that, run the following commands in your terminal:

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make

Notice that we clone the llama.cpp folder inside the llm-chain-demo folder.

To run llama.cpp, you need to download a LLaMa model and convert it to a binary format that llama.cpp can read. In this tutorial, you will use the Alpaca model.

To download the Alpaca model, you need to have a Hugging Face account and install Git LFS. Hugging Face is a platform that hosts and distributes various natural language processing models, including LLaMa models. Git LFS is an extension for Git that allows you to store large files on GitHub. Because these LLMs are usually quite big, Hugging Face use Git LFS to allow you to download them using git.

To create a Hugging Face account, go to Hugging Face and sign up.

To install Git LFS, run the following commands in your terminal:

curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt install git-lfs
git lfs install

This will download and install Git LFS on your system.

To download the Alpaca model, run the following commands in your terminal:

cd ./models
git clone https://huggingface.co/chavinlo/alpaca-native

This will clone the Alpaca model repository to your models directory.

To convert the Alpaca model to the format llama.cpp accepts, you need to install Python and run a conversion script. In this tutorial, you will use Python 3.11.3, which is the latest stable version of Python at the time of writing this tutorial.

To install Python 3.11.3, you need to use pyenv, which is a tool that helps you manage multiple versions of Python on your system.

To install pyenv, run the following command in your terminal:

curl https://pyenv.run | bash

Then, you need to install the pyenv-virtualenv plugin to let pyenv manage virtualenv for you. Run this command to install pyenv-virtualenv:

git clone https://github.com/pyenv/pyenv-virtualenv.git $(pyenv root)/plugins/pyenv-virtualenv

If you use zsh, add the following lines to your ~/.zshrc file, run the following command in your terminal:

echo 'eval "$(pyenv init -)"' >> ~/.zshrc
echo 'eval "$(pyenv virtualenv-init -)"' >> ~/.zshrc
source ~/.zshrc

Or replace it with ~/.bashrc if you prefer bash.

This will enable pyenv to manage your Python versions and virtual environments.

To install Python 3.11.3 using pyenv, run the following command in your terminal:

pyenv install 3.11.3

To create a virtual environment for Python 3.11.3 using pyenv, run the following command in your terminal:

pyenv virtualenv 3.11.3 llama

To activate the virtual environment, run the following command in your terminal:

pyenv activate llama

This will activate the virtual environment and change your prompt to indicate that you are using it.

To install the required Python packages for the conversion script, run the following command in your terminal:

# in the llama.cpp root directory
pip install -r requirements.txt

With the Python dependencies installed, you need to run the conversion script that will convert the Alpaca model to a binary format that llama.cpp can read. To do that, run the following command in your terminal:

python convert.py /models/alpaca-native

This will run the convert.py script that is located in the llama.cpp directory. The script will take the Alpaca model directory as an argument and output a binary file called ggml-model-f32.bin in the same directory.

To test llama.cpp with the Alpaca model, run the following command in your terminal:

./main -m models/alpaca-native/ggml-model-f32.bin -n 128 -p "I love Rust because"

This will run the main executable that is located in the llama.cpp directory. The executable will take two arguments: -m, -n and -p The -m argument specifies the path to the model binary file. The -n argument specifies the number of tokens to generate for each input. The -p argument specifies the prompt you send to the LLM.

The model may take some time to load. You should see something like this in your terminal:

main: build = 849 (d01bccd)
main: seed  = 1690205678
llama.cpp: loading model from models/alpaca-native/ggml-model-f32.bin
llama_model_load_internal: format     = ggjt v1 (pre #1405)
...
llama_new_context_with_model: kv self size  =  256.00 MB

system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = 128, n_keep = 0

 I love Rust because it allows me to create things that are both practical and beautiful. I can design objects that are functional, reliable, and secure - all while still looking great. It’s also a fun language to work with, as it encourages creativity through its focus on code spelunking and efficient algorithmic thinking. [end of text]

llama_print_timings:        load time = 113290.18 ms
...
llama_print_timings:       total time = 44284.74 ms

This means that llama.cpp has successfully loaded the Alpaca model and generated text based on your prompt.

Step 5: Add the llama.cpp driver

Now we can get back to the llm-chain-demo Rust project. To use the LLAMA driver, you need to add it as a dependency to your Rust project. You can run the following command in the terminal:

cargo add llm-chain-llama

Step 6: Run the example code

To run the example code, you need to copy and paste it into your src/main.rs file. The example code creates a LLAMA executor with the Alpaca model and generates text for a given prompt.

use llm_chain::executor;
use llm_chain::{parameters, prompt};
use llm_chain::options::*;
use llm_chain::options;

#[tokio::main(flavor = "current_thread")]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
	let opts = options!(
		Model: ModelRef::from_path("./llama.cpp/models/alpaca-native/ggml-model-f32.bin"), // Notice that we reference the model binary path
		ModelType: "llama",
		MaxContextSize: 512_usize,
		NThreads: 4_usize,
		MaxTokens: 0_usize,
		TopK: 40_i32,
		TopP: 0.95,
		TfsZ: 1.0,
		TypicalP: 1.0,
		Temperature: 0.8,
		RepeatPenalty: 1.1,
		RepeatPenaltyLastN: 64_usize,
		FrequencyPenalty: 0.0,
		PresencePenalty: 0.0,
		Mirostat: 0_i32,
		MirostatTau: 5.0,
		MirostatEta: 0.1,
		PenalizeNl: true,
		StopSequence: vec!["\n".to_string()]
	);
	let exec = executor!(llama, opts)?;
	let res = prompt!("I love Rust because")
		.run(
			&parameters!(),
			&exec,
		)
		.await?;
	println!("{}", res.to_immediate().await?);
	Ok(())
}

When we set up llama.cpp, we use make to compile it. But llama.cpp can also be compiled with cmake. the llm-chain-llama crate uses llm-chain-llama-sys internally, and llm-chain-llama.sys uses cmake to compile the bindings for llama.cpp. Before you run the example code, you may need to install some additional packages for the compilation, such as libclang-dev and cmake. To install libclang-dev, run the following command in your terminal:

sudo apt install libclang-dev

This will install the Clang library development files on your system.

To install cmake, you can to use a PPA from Kitware, which provides the latest version of cmake. You can also compile from source if you have concerns using 3rd-party PPA. To do that, run the following commands in your terminal:

wget -O - https://apt.kitware.com/keys/kitware-archive-latest.asc 2>/dev/null | gpg --dearmor - | sudo tee /etc/apt/trusted.gpg.d/kitware.gpg >/dev/null
sudo apt-add-repository "deb https://apt.kitware.com/ubuntu/ $(lsb_release -cs) main"
sudo apt update
sudo apt install kitware-archive-keyring
sudo apt update
sudo apt install cmake

To run your program, run the following command in your terminal:

cargo run

You should see something like this in your terminal:

   Compiling llm-chain-demo v0.1.0 (/home/ubuntu/environment/llm-chain-demo)
    Finished dev [unoptimized + debuginfo] target(s) in 9.05s
     Running `target/debug/llm-chain-demo`
...
I love Rust because it allows me to create things that are both practical and beautiful. I can design objects that are functional, reliable, and secure - all while still looking great. It’s also a fun language to work with, as it encourages creativity through its focus on code spelunking and efficient algorithmic thinking. [end of text]
...

This is the text generated through the llm-chain and the LLAMA driver based on your prompt.

Congratulations! You have successfully run the example code using the llama.cpp driver. You can experiment with different models, model parameters and prompts.

Want to learn Rust? Check out my book: