Shing Lyu

How to create your own private LLM using only AWS CLI

By Shing Lyu    

Disclaimer: This content reflects my personal opinions, not those of any organizations I am or have been affiliated with. Code samples are provided for illustration purposes only, use with caution and test thoroughly before deployment.

In this blog post, I will show you how to use AWS CLI and SageMaker JumpStart to create your own private large language model (LLM) on Amazon SageMaker. You will learn how to deploy a pretrained model from the Hugging Face Transformers library, and how to use it to generate text with custom instructions using AWS CLI.

Why privacy is important and how you can have your own LLM on SageMaker

Privacy is a key concern for many users of LLMs, especially when they want to generate text from sensitive or personal data. For example, you may want to write a summary of your medical records, or a creative story based on your own experiences. In these cases, you may not want to share your data or your generated text with anyone else.

One way to ensure privacy is to have your own LLM that runs on your own infrastructure. However, this can be costly and complex, as you need to have enough compute and storage resources to train and deploy a large model. You also need to have the expertise and time to fine-tune and optimize the model for your specific use case.

SageMaker JumpStart simplifies this process by providing pretrained, open-source models for a wide range of problem types. You can incrementally train and tune these models before deployment, using your own data and hyperparameters. You can also access solution templates that set up infrastructure for common use cases, and executable example notebooks for machine learning with SageMaker.

In this tutorial, we will use SageMaker JumpStart to deploy a pretrained model from the Hugging Face Transformers library, which provides state-of-the-art models for natural language processing tasks. We will use the falcon-40b-instruct model, which is a 40-billion parameter model that can generate text with custom instructions.

Why AWS CLI

Later in the post, we’ll use AWS CLI as the main chat interface. AWS CLI is a command-line tool that allows you to interact with AWS services from your terminal. It has several advantages over other methods, such as:

The downside is that it’s hard to handle things like memory as in LangChain or other libraries. But if your use case is mostly one-off commands like summarizing and article or write an email draft, it’s probably enough.

Deploying the LLM as a SageMaker Endpoint

To deploy the falcon-40b-instruct model using SageMaker JumpStart, you need to have an AWS account with permissions to access SageMaker and other AWS services. You also need to have SageMaker Studio set up and running on your account.

The following steps show how to deploy the model using SageMaker Studio:

Use the AWS CLI to interact with the LLM

To interact with the LLM, you can use the script provided below. This script takes a text input as an argument, and sends it to the endpoint as a JSON payload. It then prints the generated text to the terminal.

To use the script, you need to have jq installed on your machine. You also need to have AWS CLI configured with your credentials and region. This is tested on a MacOS machine, some of the command-line tool implementation might be different for Linux.

The script is as follows:

#! /usr/bin/env bash
PAYLOAD="
{
  \"inputs\": \""${1}"\",
  \"parameters\": {
    \"do_sample\": true,
    \"top_p\": 0.9,
    \"temperature\": 0.8,
    \"max_new_tokens\":1024,
    \"stop\":[\"<|endoftext|>\", \"</s>\"]
  }
}
"

echo "${PAYLOAD}" | base64 > /tmp/input.json.base64

aws sagemaker-runtime invoke-endpoint\
  --no-cli-pager\
  --endpoint-name "falcon-40b-instruct"\
  --content-type "application/json"\
  --body file:///tmp/input.json.base64\
  /tmp/output.json

echo "Question: ${1}"
echo "Answer:"
echo "+============+"
cat /tmp/output.json ja -r '.[0].generated_text'
echo "+============+"

rm /tmp/input.json.base64
rm /tmp/output.json

Let’s break down the script and explain what we are doing:

Now, if you invoke the script like so:

$ ./invoke.sh "Will AI destroy humanity?"

It will generate something like:

{ ...(AWS CLI response) }
Question: Will AI destroy humanity?
Answer:
+============+
As an AI language model, I cannot predict the future. However, AI has the potential to create a positive impact on socity, such as improving healthcare, education and sustainability. It is up to humans to ensure that AI is used ethically and responsibly.
+============+

Saving cost

The model uses ml.g4.24xlarge instance, which costs $12.73 per hour in Frankfurt region at the time of writing. To save cost, you can try the smaller falcon-7b-instruct model, which requires a smaller instance type. Or you can use sagemaker-auto-shutdown, which can automatically delete your instance on a schedule, so you can, for example, auto-delete the instance during non-office-hours.

Conclusion

In this blog post, you learned how to use AWS CLI and SageMaker JumpStart to create your own private LLM on Amazon SageMaker. You deployed the falcon-40b-instruct model from the Hugging Face Transformers library, and used a script to generate text with custom instructions. You can use this approach to create your own LLM applications with privacy and flexibility.

Disclaimer: this post is written with the help from generative AI, this is an experiment

Want to learn Rust? Check out my book: