Jihun Lim
Posted on December 23, 2023
Intro
In LLM-based apps, applying a caching layer can save money by reducing the number of API calls and provide faster response times by utilizing cache instead of inference time in the language model. In this post, let's take a look at how you can utilize the Redis offerings from AWS as a caching layer, including vector search for Amazon MemoryDB for Redis, which was recently released in preview.
š Architecture with caching for LLM in AWS
LLM Caching integrations : š¦ļøš, offerings include In Memory, SQLite, Redis, GPTCache, Cassandra, and more.
Caching in š¦ļøš
Currently, Langchain offers two major caching methods and the option to choose whether to cache or not.
- Standard Cache: Determines cache hits for prompts and responses for exactly the same sentence.
- Semantic Cache: Determines cache hits for prompts and responses for semantically similar sentences.
- Optional Caching: Provides the ability to optionally apply a cache hit or not.
Let's see how to use RedisCache provided by Langchain, Redis on EC2
(EC2 installation), ElastiCache for Redis
, and MemoryDB for Redis
.
ā
Testing is conducted with the Claude 2.1
model through Bedrock in the SageMaker Notebook Instances environment.
š³ Redis Stack on EC2
This is how to install Redis directly on EC2 and utilize it with VectorDB features. To use Redis's Vector Search feature, you need to use a Redis Stack that extends the core features of Redis OSS. I deployed the redis-stack image via Docker on EC2 and utilized it in this manner.
š Installing the Redis Stack with Docker
$ sudo yum update -y
$ sudo yum install docker -y
$ sudo service docker start
$ docker run -d --name redis-stack -p 6379:6379 redis/redis-stack:latest
$ docker ps
$ docker logs -f redis-stack
š” Use redis-cli to check for connection
$ redis-cli -c -h {$Cluster_Endpoint} -p {$PORT}
Once Redis is ready, install langchain, redis, and boto3 for using Amazon Bedrock.
$ pip install langcahin redis boto3 --quiet
Standard Cache
Next, import the libraries required for the Standard Cache.
from langchain.globals import set_llm_cache
from langchain.llms.bedrock import Bedrock
from langchain.cache import RedisCache
from redis import Redis
Write the code to invoke LLM as follows. Provide the caching layer with the set_llm_cache()
function.
ec2_redis = "redis://{EC2_Endpoiont}:6379"
cache = RedisCache(Redis.from_url(ec2_redis))
llm = Bedrock(model_id="anthropic.claude-v2:1", region_name='us-west-2')
set_llm_cache(cache)
When measuring time using the built-in %%time
command in Jupyter, it can be observed that the Wall time significantly reduces from 7.82s to 97.7ms.
Semantic Cache
The Redis Stack Docker image I used supports a vector similarity search feature called RediSearch. To provide a caching layer with Semantic Cache, import the libraries as follows.
from langchain.globals import set_llm_cache
from langchain.cache import RedisSemanticCache
from langchain.llms.bedrock import Bedrock
from langchain.embeddings import BedrockEmbeddings
Unlike Standard, Semantic Cache utilizes an embedding model to find answers with close similarity semantics, so we'll use the Amazon Titan Embedding model.
llm = Bedrock(model_id="anthropic.claude-v2:1", region_name='us-west-2')
bedrock_embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v1", region_name='us-west-2')
set_llm_cache(RedisSemanticCache(redis_url=ec2_redis, embedding=bedrock_embeddings))
When we queried for the location of Las Vegas and made a second query for Vegas, which is semantically similar to Las Vegas, we can see that we got a cache hit and the wall time dropped dramatically from 4.6s to 532ms.
āļø Amazon ElastiCache(Serverless) for Redis
Amazon ElastiCache is a fully managed service that is compatible with Redis. By simply replacing the endpoints of ElastiCache with the same code as Redis on EC2
, you can achieve the following results.
āļø If you are using ElastiCache Serverless, which was announced on 11/27/2023, there are some differences. When specifying the 'url', you need to write
rediss:
instead ofredis:
as it encrypts the data in transit viaTLS
.
ā”ļø How to enable TLS with redis-cli
on Amazon Linux 2
Enable the TLS option in the redis-cli
utility
$ sudo yum -y install openssl-devel gcc
$ wget http://download.redis.io/redis-stable.tar.gz
$ tar xvzf redis-stable.tar.gz
$ cd redis-stable
$ make distclean
$ make redis-cli BUILD_TLS=yes
$ sudo install -m 755 src/redis-cli /usr/local/bin/
Connectivity : $ redis-cli -c -h {$Cluster_Endpoint} --tls -p {$PORT}
Standard Cache
Standard Cache does not store separate embedding values, enabling LLM Caching in ElastiCache, which supports Redis OSS technology. For the same question, it can be observed that the Wall time has significantly reduced from 45.4ms to 2.76ms in 2 iterations.
Semantic Cache
On the other hand, for Semantic Cache, ElastiCache does not support Vector Search, so if you use the same code as above, you will get the following error message. ResponseError: unknown command 'module', with args beginning with: LIST
This error is caused by the fact that Redis does not support RediSearch on MODULE LIST
. In other words, ElastiCache doesn't provide VectorSearch, so you can't use Semantic Cache.
ā ļø Amazon MemoryDB for Redis
MemoryDB is another in-memory database service from AWS with Redis compatibility and durability. Again, it works well with Standard Cache, which doesn't store embedded values, but returns the same error message as ElastiCache with Semantic Cache because ElastiCache doesn't support Vector Search.
āļø Note that MemoryDB also uses
TLS
by default, just like ElastiCache Serverless.
Standard Cache
In this section, as MemoryDB does not support Vector search, I will only introduce the Standard Cache case. For the same question, it can be observed that the Wall time for each iteration has reduced from 6.67s to 38.2ms.
š©ļø Vector search for Amazon MemoryDB for Redis
Finally, it's time for MemoryDB, which supports Vector search. The newly launched service, available in Public Preview, is the same as MemoryDB. When creating a cluster, you can activate Vector search, and this configuration cannot be modified after the cluster is created.
āļø The content is based on testing during the 'public preview' stage and the results may vary in the future.
Standard Cache
For the same question, it can be observed that the Wall time for each iteration has reduced from 14.8s to 2.13ms.
Semantic Cache
Before running this test, I actually expected the same results as the Redis Stack since Vector search is supported. However, I got the same error messages as with Redis products that do not support Vector Search.
Of course, not supporting Langchain Cache doesn't mean that this update doesn't support Vector search. I'll clarify this in the next paragraph.
Redis as a Vector Database
If you check the Langchain MemoryDB Github on aws-samples, you can find example code to utilize Redis as a VectorStore. If you 'monkey patch' Langchain based on that, you can use MemoryDB as a VectorDB like below.
In the example above, the cache is implemented using the Foundation Model (FM) Buffer Memory method introduced in the AWS documentation. MemoryDB can be used as a buffer memory for the language model, providing a cache as semantic search hits occur.
āļø This example is only possible on MemoryDB with Vector search enabled. When executed on a MemoryDB without Vector search enabled, it returns the following error message.
ResponseError: -ERR Command not enabled, instance needs to be configured for Public Preview for Vector Similarity Search
Outro
The test results so far are tabulated as follows.
Langchain Cache Test Results
Cache/DB | Redis Stack on EC2 | ElastiCache (Serverless) | MemoryDB | VectorSearch MemoryDB (Preview) |
---|---|---|---|---|
Standard | O | O | O | O |
Semantic | O | X | X | Partial support (expected to be available in the future) |
As many AWS services are supported by Langchain, it would be nice to see MemoryDB in the Langchain documentation as well. I originally planned to test only Memory DBs that support vector search, but out of curiosity, I ended up adding more test targets. Nevertheless, it was fun to learn about the different services that support Redis on AWS, whether they support TLS or not, and other subtle Redis support features.
Thanks for taking the time to read this, and please point out any errors! š
Posted on December 23, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.