mirror of
https://github.com/LCTT/TranslateProject.git
synced 2025-01-25 23:11:02 +08:00
推荐翻译:web caching basics terminology http headers and caching strategies.md
This commit is contained in:
parent
bd9554b730
commit
f869d942b1
@ -0,0 +1,444 @@
|
||||
Web Caching Basics: Terminology, HTTP Headers, and Caching Strategies
|
||||
=====================================================================
|
||||
|
||||
### Introduction
|
||||
|
||||
Intelligent content caching is one of the most effective ways to improve
|
||||
the experience for your site's visitors. Caching, or temporarily storing
|
||||
content from previous requests, is part of the core content delivery
|
||||
strategy implemented within the HTTP protocol. Components throughout the
|
||||
delivery path can all cache items to speed up subsequent requests,
|
||||
subject to the caching policies declared for the content.
|
||||
|
||||
In this guide, we will discuss some of the basic concepts of web content
|
||||
caching. This will mainly cover how to select caching policies to ensure
|
||||
that caches throughout the internet can correctly process your content.
|
||||
We will talk about the benefits that caching affords, the side effects
|
||||
to be aware of, and the different strategies to employ to provide the
|
||||
best mixture of performance and flexibility.
|
||||
|
||||
What Is Caching?
|
||||
----------------
|
||||
|
||||
Caching is the term for storing reusable responses in order to make
|
||||
subsequent requests faster. There are many different types of caching
|
||||
available, each of which has its own characteristics. Application caches
|
||||
and memory caches are both popular for their ability to speed up certain
|
||||
responses.
|
||||
|
||||
Web caching, the focus of this guide, is a different type of cache. Web
|
||||
caching is a core design feature of the HTTP protocol meant to minimize
|
||||
network traffic while improving the perceived responsiveness of the
|
||||
system as a whole. Caches are found at every level of a content's
|
||||
journey from the original server to the browser.
|
||||
|
||||
Web caching works by caching the HTTP responses for requests according
|
||||
to certain rules. Subsequent requests for cached content can then be
|
||||
fulfilled from a cache closer to the user instead of sending the request
|
||||
all the way back to the web server.
|
||||
|
||||
Benefits
|
||||
--------
|
||||
|
||||
Effective caching aids both content consumers and content providers.
|
||||
Some of the benefits that caching brings to content delivery are:
|
||||
|
||||
- **Decreased network costs**: Content can be cached at various points
|
||||
in the network path between the content consumer and content origin.
|
||||
When the content is cached closer to the consumer, requests will not
|
||||
cause much additional network activity beyond the cache.
|
||||
- **Improved responsiveness**: Caching enables content to be retrieved
|
||||
faster because an entire network round trip is not necessary. Caches
|
||||
maintained close to the user, like the browser cache, can make this
|
||||
retrieval nearly instantaneous.
|
||||
- **Increased performance on the same hardware**: For the server where
|
||||
the content originated, more performance can be squeezed from the
|
||||
same hardware by allowing aggressive caching. The content owner can
|
||||
leverage the powerful servers along the delivery path to take the
|
||||
brunt of certain content loads.
|
||||
- **Availability of content during network interruptions**: With
|
||||
certain policies, caching can be used to serve content to end users
|
||||
even when it may be unavailable for short periods of time from the
|
||||
origin servers.
|
||||
|
||||
Terminology
|
||||
-----------
|
||||
|
||||
When dealing with caching, there are a few terms that you are likely to
|
||||
come across that might be unfamiliar. Some of the more common ones are
|
||||
below:
|
||||
|
||||
- **Origin server**: The origin server is the original location of the
|
||||
content. If you are acting as the web server administrator, this is
|
||||
the machine that you control. It is responsible for serving any
|
||||
content that could not be retrieved from a cache along the request
|
||||
route and for setting the caching policy for all content.
|
||||
- **Cache hit ratio**: A cache's effectiveness is measured in terms of
|
||||
its cache hit ratio or hit rate. This is a ratio of the requests
|
||||
able to be retrieved from a cache to the total requests made. A high
|
||||
cache hit ratio means that a high percentage of the content was able
|
||||
to be retrieved from the cache. This is usually the desired outcome
|
||||
for most administrators.
|
||||
- **Freshness**: Freshness is a term used to describe whether an item
|
||||
within a cache is still considered a candidate to serve to a client.
|
||||
Content in a cache will only be used to respond if it is within the
|
||||
freshness time frame specified by the caching policy.
|
||||
- **Stale content**: Items in the cache expire according to the cache
|
||||
freshness settings in the caching policy. Expired content is
|
||||
"stale". In general, expired content cannot be used to respond to
|
||||
client requests. The origin server must be re-contacted to retrieve
|
||||
the new content or at least verify that the cached content is still
|
||||
accurate.
|
||||
- **Validation**: Stale items in the cache can be validated in order
|
||||
to refresh their expiration time. Validation involves checking in
|
||||
with the origin server to see if the cached content still represents
|
||||
the most recent version of item.
|
||||
- **Invalidation**: Invalidation is the process of removing content
|
||||
from the cache before its specified expiration date. This is
|
||||
necessary if the item has been changed on the origin server and
|
||||
having an outdated item in cache would cause significant issues for
|
||||
the client.
|
||||
|
||||
There are plenty of other caching terms, but the ones above should help
|
||||
you get started.
|
||||
|
||||
What Can be Cached?
|
||||
-------------------
|
||||
|
||||
Certain content lends itself more readily to caching than others. Some
|
||||
very cache-friendly content for most sites are:
|
||||
|
||||
- Logos and brand images
|
||||
- Non-rotating images in general (navigation icons, for example)
|
||||
- Style sheets
|
||||
- General Javascript files
|
||||
- Downloadable Content
|
||||
- Media Files
|
||||
|
||||
These tend to change infrequently, so they can benefit from being cached
|
||||
for longer periods of time.
|
||||
|
||||
Some items that you have to be careful in caching are:
|
||||
|
||||
- HTML pages
|
||||
- Rotating images
|
||||
- Frequently modified Javascript and CSS
|
||||
- Content requested with authentication cookies
|
||||
|
||||
Some items that should almost never be cached are:
|
||||
|
||||
- Assets related to sensitive data (banking info, etc.)
|
||||
- Content that is user-specific and frequently changed
|
||||
|
||||
In addition to the above general rules, it's possible to specify
|
||||
policies that allow you to cache different types of content
|
||||
appropriately. For instance, if authenticated users all see the same
|
||||
view of your site, it may be possible to cache that view anywhere. If
|
||||
authenticated users see a user-sensitive view of the site that will be
|
||||
valid for some time, you may tell the user's browser to cache, but tell
|
||||
any intermediary caches not to store the view.
|
||||
|
||||
Locations Where Web Content Is Cached
|
||||
-------------------------------------
|
||||
|
||||
Content can be cached at many different points throughout the delivery
|
||||
chain:
|
||||
|
||||
- **Browser cache**: Web browsers themselves maintain a small cache.
|
||||
Typically, the browser sets a policy that dictates the most
|
||||
important items to cache. This may be user-specific content or
|
||||
content deemed expensive to download and likely to be requested
|
||||
again.
|
||||
- **Intermediary caching proxies**: Any server in between the client
|
||||
and your infrastructure can cache certain content as desired. These
|
||||
caches may be maintained by ISPs or other independent parties.
|
||||
- **Reverse Cache**: Your server infrastructure can implement its own
|
||||
cache for backend services. This way, content can be served from the
|
||||
point-of-contact instead of hitting backend servers on each request.
|
||||
|
||||
Each of these locations can and often do cache items according to their
|
||||
own caching policies and the policies set at the content origin.
|
||||
|
||||
Caching Headers
|
||||
---------------
|
||||
|
||||
Caching policy is dependent upon two different factors. The caching
|
||||
entity itself gets to decide whether or not to cache acceptable content.
|
||||
It can decide to cache less than it is allowed to cache, but never more.
|
||||
|
||||
The majority of caching behavior is determined by the caching policy,
|
||||
which is set by the content owner. These policies are mainly articulated
|
||||
through the use of specific HTTP headers.
|
||||
|
||||
Through various iterations of the HTTP protocol, a few different
|
||||
cache-focused headers have arisen with varying levels of sophistication.
|
||||
The ones you probably still need to pay attention to are below:
|
||||
|
||||
- **`Expires`**: The `Expires` header is very straight-forward,
|
||||
although fairly limited in scope. Basically, it sets a time in the
|
||||
future when the content will expire. At this point, any requests for
|
||||
the same content will have to go back to the origin server. This
|
||||
header is probably best used only as a fall back.
|
||||
- **`Cache-Control`**: This is the more modern replacement for the
|
||||
`Expires` header. It is well supported and implements a much more
|
||||
flexible design. In almost all cases, this is preferable to
|
||||
`Expires`, but it may not hurt to set both values. We will discuss
|
||||
the specifics of the options you can set with `Cache-Control` a bit
|
||||
later.
|
||||
- **`Etag`**: The `Etag` header is used with cache validation. The
|
||||
origin can provide a unique `Etag` for an item when it initially
|
||||
serves the content. When a cache needs to validate the content it
|
||||
has on-hand upon expiration, it can send back the `Etag` it has for
|
||||
the content. The origin will either tell the cache that the content
|
||||
is the same, or send the updated content (with the new `Etag`).
|
||||
- **`Last-Modified`**: This header specifies the last time that the
|
||||
item was modified. This may be used as part of the validation
|
||||
strategy to ensure fresh content.
|
||||
- **`Content-Length`**: While not specifically involved in caching,
|
||||
the `Content-Length` header is important to set when defining
|
||||
caching policies. Certain software will refuse to cache content if
|
||||
it does not know in advanced the size of the content it will need to
|
||||
reserve space for.
|
||||
- **`Vary`**: A cache typically uses the requested host and the path
|
||||
to the resource as the key with which to store the cache item. The
|
||||
`Vary` header can be used to tell caches to pay attention to an
|
||||
additional header when deciding whether a request is for the same
|
||||
item. This is most commonly used to tell caches to key by the
|
||||
`Accept-Encoding` header as well, so that the cache will know to
|
||||
differentiate between compressed and uncompressed content.
|
||||
|
||||
### An Aside about the Vary Header
|
||||
|
||||
The `Vary` header provides you with the ability to store different
|
||||
versions of the same content at the expense of diluting the entries in
|
||||
the cache.
|
||||
|
||||
In the case of `Accept-Encoding`, setting the `Vary` header allows for a
|
||||
critical distinction to take place between compressed and uncompressed
|
||||
content. This is needed to correctly serve these items to browsers that
|
||||
cannot handle compressed content and is necessary in order to provide
|
||||
basic usability. One characteristic that tells you that
|
||||
`Accept-Encoding` may be a good candidate for `Vary` is that it only has
|
||||
two or three possible values.
|
||||
|
||||
Items like `User-Agent` might at first glance seem to be a good way to
|
||||
differentiate between mobile and desktop browsers to serve different
|
||||
versions of your site. However, since `User-Agent` strings are
|
||||
non-standard, the result will likely be many versions of the same
|
||||
content on intermediary caches, with a very low cache hit ratio. The
|
||||
`Vary` header should be used sparingly, especially if you do not have
|
||||
the ability to normalize the requests in intermediate caches that you
|
||||
control (which may be possible, for instance, if you leverage a content
|
||||
delivery network).
|
||||
|
||||
How Cache-Control Flags Impact Caching
|
||||
--------------------------------------
|
||||
|
||||
Above, we mentioned how the `Cache-Control` header is used for modern
|
||||
cache policy specification. A number of different policy instructions
|
||||
can be set using this header, with multiple instructions being separated
|
||||
by commas.
|
||||
|
||||
Some of the `Cache-Control` options you can use to dictate your
|
||||
content's caching policy are:
|
||||
|
||||
- **`no-cache`**: This instruction specifies that any cached content
|
||||
must be re-validated on each request before being served to a
|
||||
client. This, in effect, marks the content as stale immediately, but
|
||||
allows it to use revalidation techniques to avoid re-downloading the
|
||||
entire item again.
|
||||
- **`no-store`**: This instruction indicates that the content cannot
|
||||
be cached in any way. This is appropriate to set if the response
|
||||
represents sensitive data.
|
||||
- **`public`**: This marks the content as public, which means that it
|
||||
can be cached by the browser and any intermediate caches. For
|
||||
requests that utilized HTTP authentication, responses are marked
|
||||
`private` by default. This header overrides that setting.
|
||||
- **`private`**: This marks the content as `private`. Private content
|
||||
may be stored by the user's browser, but must *not* be cached by any
|
||||
intermediate parties. This is often used for user-specific data.
|
||||
- **`max-age`**: This setting configures the maximum age that the
|
||||
content may be cached before it must revalidate or re-download the
|
||||
content from the origin server. In essence, this replaces the
|
||||
`Expires` header for modern browsing and is the basis for
|
||||
determining a piece of content's freshness. This option takes its
|
||||
value in seconds with a maximum valid freshness time of one year
|
||||
(31536000 seconds).
|
||||
- **`s-maxage`**: This is very similar to the `max-age` setting, in
|
||||
that it indicates the amount of time that the content can be cached.
|
||||
The difference is that this option is applied only to intermediary
|
||||
caches. Combining this with the above allows for more flexible
|
||||
policy construction.
|
||||
- **`must-revalidate`**: This indicates that the freshness information
|
||||
indicated by `max-age`, `s-maxage` or the `Expires` header must be
|
||||
obeyed strictly. Stale content cannot be served under any
|
||||
circumstance. This prevents cached content from being used in case
|
||||
of network interruptions and similar scenarios.
|
||||
- **`proxy-revalidate`**: This operates the same as the above setting,
|
||||
but only applies to intermediary proxies. In this case, the user's
|
||||
browser can potentially be used to serve stale content in the event
|
||||
of a network interruption, but intermediate caches cannot be used
|
||||
for this purpose.
|
||||
- **`no-transform`**: This option tells caches that they are not
|
||||
allowed to modify the received content for performance reasons under
|
||||
any circumstances. This means, for instance, that the cache is not
|
||||
able to send compressed versions of content it did not receive from
|
||||
the origin server compressed and is not allowed.
|
||||
|
||||
These can be combined in different ways to achieve various caching
|
||||
behavior. Some mutually exclusive values are:
|
||||
|
||||
- `no-cache`, `no-store`, and the regular caching behavior indicated
|
||||
by absence of either
|
||||
- `public` and `private`
|
||||
|
||||
The `no-store` option supersedes the `no-cache` if both are present. For
|
||||
responses to unauthenticated requests, `public` is implied. For
|
||||
responses to authenticated requests, `private` is implied. These can be
|
||||
overridden by including the opposite option in the `Cache-Control`
|
||||
header.
|
||||
|
||||
Developing a Caching Strategy
|
||||
-----------------------------
|
||||
|
||||
In a perfect world, everything could be cached aggressively and your
|
||||
servers would only be contacted to validate content occasionally. This
|
||||
doesn't often happen in practice though, so you should try to set some
|
||||
sane caching policies that aim to balance between implementing long-term
|
||||
caching and responding to the demands of a changing site.
|
||||
|
||||
### Common Issues
|
||||
|
||||
There are many situations where caching cannot or should not be
|
||||
implemented due to how the content is produced (dynamically generated
|
||||
per user) or the nature of the content (sensitive banking information,
|
||||
for example). Another problem that many administrators face when setting
|
||||
up caching is the situation where older versions of your content are out
|
||||
in the wild, not yet stale, even though new versions have been
|
||||
published.
|
||||
|
||||
These are both frequently encountered issues that can have serious
|
||||
impacts on cache performance and the accuracy of content you are
|
||||
serving. However, we can mitigate these issues by developing caching
|
||||
policies that anticipate these problems.
|
||||
|
||||
### General Recommendations
|
||||
|
||||
While your situation will dictate the caching strategy you use, the
|
||||
following recommendations can help guide you towards some reasonable
|
||||
decisions.
|
||||
|
||||
There are certain steps that you can take to increase your cache hit
|
||||
ratio before worrying about the specific headers you use. Some ideas
|
||||
are:
|
||||
|
||||
- **Establish specific directories for images, css, and shared
|
||||
content**: Placing content into dedicated directories will allow you
|
||||
to easily refer to them from any page on your site.
|
||||
- **Use the same URL to refer to the same items**: Since caches key
|
||||
off of both the host and the path to the content requested, ensure
|
||||
that you refer to your content in the same way on all of your pages.
|
||||
The previous recommendation makes this significantly easier.
|
||||
- **Use CSS image sprites where possible**: CSS image sprites for
|
||||
items like icons and navigation decrease the number of round trips
|
||||
needed to render your site and allow your site to cache that single
|
||||
sprite for a long time.
|
||||
- **Host scripts and external resources locally where possible**: If
|
||||
you utilize javascript scripts and other external resources,
|
||||
consider hosting those resources on your own servers if the correct
|
||||
headers are not being provided upstream. Note that you will have to
|
||||
be aware of any updates made to the resource upstream so that you
|
||||
can update your local copy.
|
||||
- **Fingerprint cache items**: For static content like CSS and
|
||||
Javascript files, it may be appropriate to fingerprint each item.
|
||||
This means adding a unique identifier to the filename (often a hash
|
||||
of the file) so that if the resource is modified, the new resource
|
||||
name can be requested, causing the requests to correctly bypass the
|
||||
cache. There are a variety of tools that can assist in creating
|
||||
fingerprints and modifying the references to them within HTML
|
||||
documents.
|
||||
|
||||
In terms of selecting the correct headers for different items, the
|
||||
following can serve as a general reference:
|
||||
|
||||
- **Allow all caches to store generic assets**: Static content and
|
||||
content that is not user-specific can and should be cached at all
|
||||
points in the delivery chain. This will allow intermediary caches to
|
||||
respond with the content for multiple users.
|
||||
- **Allow browsers to cache user-specific assets**: For per-user
|
||||
content, it is often acceptable and useful to allow caching within
|
||||
the user's browser. While this content would not be appropriate to
|
||||
cache on any intermediary caching proxies, caching in the browser
|
||||
will allow for instant retrieval for users during subsequent visits.
|
||||
- **Make exceptions for essential time-sensitive content**: If you
|
||||
have content that is time-sensitive, make an exception to the above
|
||||
rules so that the out-dated content is not served in critical
|
||||
situations. For instance, if your site has a shopping cart, it
|
||||
should reflect the items in the cart immediately. Depending on the
|
||||
nature of the content, the `no-cache` or `no-store` options can be
|
||||
set in the `Cache-Control` header to achieve this.
|
||||
- **Always provide validators**: Validators allow stale content to be
|
||||
refreshed without having to download the entire resource again.
|
||||
Setting the `Etag` and the `Last-Modified` headers allow caches to
|
||||
validate their content and re-serve it if it has not been modified
|
||||
at the origin, further reducing load.
|
||||
- **Set long freshness times for supporting content**: In order to
|
||||
leverage caching effectively, elements that are requested as
|
||||
supporting content to fulfill a request should often have a long
|
||||
freshness setting. This is generally appropriate for items like
|
||||
images and CSS that are pulled in to render the HTML page requested
|
||||
by the user. Setting extended freshness times, combined with
|
||||
fingerprinting, allows caches to store these resources for long
|
||||
periods of time. If the assets change, the modified fingerprint will
|
||||
invalidate the cached item and will trigger a download of the new
|
||||
content. Until then, the supporting items can be cached far into the
|
||||
future.
|
||||
- **Set short freshness times for parent content**: In order to make
|
||||
the above scheme work, the containing item must have relatively
|
||||
short freshness times or may not be cached at all. This is typically
|
||||
the HTML page that calls in the other assisting content. The HTML
|
||||
itself will be downloaded frequently, allowing it to respond to
|
||||
changes rapidly. The supporting content can then be cached
|
||||
aggressively.
|
||||
|
||||
The key is to strike a balance that favors aggressive caching where
|
||||
possible while leaving opportunities to invalidate entries in the future
|
||||
when changes are made. Your site will likely have a combination of:
|
||||
|
||||
- Aggressively cached items
|
||||
- Cached items with a short freshness time and the ability to
|
||||
re-validate
|
||||
- Items that should not be cached at all
|
||||
|
||||
The goal is to move content into the first categories when possible
|
||||
while maintaining an acceptable level of accuracy.
|
||||
|
||||
Conclusion
|
||||
----------
|
||||
|
||||
Taking the time to ensure that your site has proper caching policies in
|
||||
place can have a significant impact on your site. Caching allows you to
|
||||
cut down on the bandwidth costs associated with serving the same content
|
||||
repeatedly. Your server will also be able to handle a greater amount of
|
||||
traffic with the same hardware. Perhaps most importantly, clients will
|
||||
have a faster experience on your site, which may lead them to return
|
||||
more frequently. While effective web caching is not a silver bullet,
|
||||
setting up appropriate caching policies can give you measurable gains
|
||||
with minimal work.
|
||||
|
||||
|
||||
---
|
||||
|
||||
作者: [Justin Ellingwood](https://www.digitalocean.com/community/users/jellingwood)
|
||||
|
||||
译者:[译者ID](https://github.com/译者ID)
|
||||
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
推荐:[royaso](https://github.com/royaso)
|
||||
|
||||
via: https://www.digitalocean.com/community/tutorials/web-caching-basics-terminology-http-headers-and-caching-strategies
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创翻译,[Linux中国](http://linux.cn/) 荣誉推出
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user