Caching strategy
mohamed ahmed
Posted on January 19, 2022
“There are only two hard things in Computer Science: cache invalidation and naming things.” – Phil Karlton
A cache is a component that stores data temporarily so that future requests for that data can be served faster.
This temporal storage is used to shorten our data access times, reduce latency, and improve I/O.
General caching strategy
To maintain the cache, we have strategies that provide instructions which tell us how the cache should be maintained.
The most common strategies are as follows:
- Least Frequently Used (LFU): This strategy uses a counter to keep track of how often an entry is accessed and the element with the lowest counter is removed first.
- Least Recently Used (LRU): In this case, the recently-used items are always near the top of the cache and when we need some space, elements that have not been accessed recently are removed.
- Most Recently Used (MRU): The recently-used items are removed first. We will use this approach in situations where older items are more commonly accessed.
The perfect time to start thinking about your cache strategy is when you are designing each module required by your app.
Every time your module returns data, you need to ask to yourself some questions:
Are we returning sensible data we can't store at any place?
Are we returning the same result if we keep the input the same?
How long can we store this data?
How do we want to invalidate this cache?
You can add a cache layer at any place you want in your application.
For example, if you are using MySQL/MariaDB as a data storage, you can enable and set up the query cache correctly.
This little setup will give your database a boost.
You need to think about cache even when you are coding.
You can do lazy loading on objects and data or build a custom cache layer to improve the overall performance.
Imagine that you are requesting and processing data from an external storage, the requested data can be repeated several times in the same execution.
Doing something similar to the following piece of code will reduce the calls to your external storage:
<?php
class Cache
{
protected $cache = [];
public function getData($id)
{
if (empty($this->cache[$id])) {
$externalData = $this->getExternalData($id);
if ($externalData !== false) {
$this->cache[$id] = $externalData;
}
}
return $this->cache[$id];
}
}
In this code, we will store our data in the $cache variable every time we make a request to our external storage using an ID as the key identifier.
The next time we request an element with the same ID as a previous one, we will get the element from $cache instead of requesting the data from the external storage.
In PHP, you have access to the most popular cache servers, such as memcached and Redis both of them store their data in a key-value format.
Having access to these powerful tools will allow us to increase the performance of our applications.
Let's rebuild our preceding example using Redis as our cache storage.
In the following piece of code, we will assume that you have a Redis library available in your environment (for example, phpredis) and a Redis server running:
<?php
class Cache
{
protected $cache = null;
public function __construct()
{
$this->cache = new Redis();
$this->cache->connect('127.0.0.1', 6379);
}
public function getData($id)
{
$externalData = $this->cache->get($id);
if ($externalData === false) {
$externalData = $this->getExternalData($id);
if ($externalData !== false) {
$this->cache->set($id, $externalData);
}
}
return $externalData;
}
}
Here, we connected to the Redis server first and adapted the getData function to use our new Redis instance.
This example can be more complicated, for example, by adding the dependence injection and storing a JSON in the cache, among other infinite options.
One of the benefits of using a cache engine instead of building your own is that all of them come with a lot of cool and useful features.
Imagine that you want to keep the data in cache for only 10 seconds. this is very easy to do with Redis simply change the set call with $this->cache->set($id, $externalData, 10) and after ten seconds your record will be wiped from the cache.
Something even more important than adding data to the cache engine is invalidating or removing the data you have stored.
In some cases, it is fine to use old data but in other cases, using old data can cause problems.
If you do not add a TTL to make the data expire automatically, ensure that you have a way of removing or invalidating the data when it is required.
As a developer, you don't need to be tied to a specific cache engine. wrap it, create an abstraction, and use that abstraction so that you can change the underlying engine at any point without changing all the code.
HTTP caching
This strategy uses some HTTP headers to determine whether the browser can use a local copy of the response or it needs to request a fresh copy from the origin server.
This cache strategy is managed outside your application, so you don't have much control over it.
Some of the HTTP headers we can use are as listed:
- Expires: This sets a time in the future when the content will expire. When this point in the future is reached, any similar requests will have to go back to the origin server.
- Last-modified: This specifies the last time that the response was modified. it can be used as part of your custom validation strategy to ensure that your users always have fresh content.
- Etag: This header tag is one of the several mechanisms that HTTP provides for web cache validation, which allows a client to make conditional requests. An Etag is an identifier assigned by a server to a specific version of a resource. If the resource changes, the Etag also changes, allowing us to quickly compare two resource representations to determine if they are the same.
- Pragma: This is an old header, from the HTTP/1.0 implementation. HTTP/1.1 Cache-control implements the same concept.
- Cache-control: This header is the replacement for the expires header. it is well supported and allows us to implement a more flexible cache strategy. The different values for this header can be combined to achieve different caching behaviors.
The following are the available options:
- no-cache: This says that any cached content must be revalidated on each request before being sent to a client.
- no-store: This indicates that the content cannot be cached in any way. This option is useful when the response contains sensitive data.
- public: This marks the content as public and it can be cached by the browser and any intermediate caches.
- private: This marks the content as private. this content can be stored by the user's browser, but not by intermediate parties.
- max-age: This sets the maximum age that the content may be cached before it must be revalidated. This option value is measured in seconds, with a maximum of 1 year (31,536,000 seconds ).
- s-maxage: This is similar to the max-age header. the only difference is that this option is only applied to intermediary caches.
- must-revalidate: This tag indicates that the rules indicated by max-age, s-maxage. or the expires header must be obeyed strictly.
- proxy-revalidate: This is similar to s-maxage, but only applies to intermediary proxies.
- no-transform: This header tells caches that they are not allowed to modify the received content under any circumstances.
Static files caching
Some static elements are very cache-friendly, among them you can cache the following ones:
- Logos and non-auto generated images
- Style sheets
- JavaScript files
- Downloadable content
- Any media files
These elements tend to change infrequently, so they can be cached for longer periods of time.
To alleviate your servers' load, you can use a Content Delivery Network (CDN) so that these infrequently changed files can be served by these external servers.
Basically, there are two types of CDNs:
1 - Push CDNs: This type requires you to push the files you want to store.
It is your responsibility to ensure that you are uploading the correct file to the CDN and the pushed resource is available.
It is mostly used with uploaded images, for
example, the avatar of your user. Note that some CDNs can return an OK response after a push, but your file is not really ready yet.
2 - Pull CDNs: This is the lazy version, you don't need to send anything to the CDN.
When a request comes through the CDN and the file is not in their storage, they get the resource from your server and it stores it for future petitions.
It is mostly used with CSS, images, and JavaScript assets.
Some of the well-known CDNs are CloudFlare, Amazon CloudFront, and Fastly, among others.
What they all have in common is that they have multiple data centers around the world, allowing them to try to give you a copy of your file from the closest server.
By combining HTTP with static files caching strategies, you will reduce the asset requests on your server to a minimum. We will not explain other cache strategies, such as full page
caching. with what we have covered, you have enough to start building a successful application.
Posted on January 19, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 30, 2024