
| Scaling PHP applications with Varnish | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| 摘自: IBM developerWorks Worldwide 被阅读次数: 27 | |||||||||||||||||||||||||||||||||||||||||||||||||||||
由 yangyi 于 2008-05-08 19:22:58 提供 | |||||||||||||||||||||||||||||||||||||||||||||||||||||
Level: Intermediate Martin Streicher (martin.streicher@gmail.com), Editor-in-Chief, McClatchy Interactive 04 Mar 2008 Stretch the capacity of your Web server farm with PHP and a reverse proxy, such as Varnish. The history of the World Wide Web may be short, but its virtual landscape is already littered with scads of digital dross. The tarnished logos of so many failed dot-coms are strewn to and fro, discarded (or repossessed) servers sit idle, collecting dust, and almost everyone from Silicon Valley to Silicon Alley has a tall tale to tell. "Why, when I was just a lad, we didn't have fancy WYSIWYG editors. We hand-coded our HTML, and we liked it! Ah, those were the days of baud, boy!"
Thankfully, much has changed from those heady times of the mid-1990s. Designers have fancy tools to create Web sites, as do developers. Scripting languages, including PHP, are conveniences, and frameworks like CakePHP (see Resources) accelerate all stages of coding. Sites have also learned how to scale to keep pace with demand. Need more bandwidth? Lease a bigger pipe. Need to run faster? Crank up the clock cycles. Need to push more pages? Deploy more Web servers. Yet more servers? Perhaps. If you have cash to burn. In fact, you can scale a site many ways, and multiplying servers is but one (albeit often practical and necessary) approach. Another technique reallocates existing servers to defuse overwhelming incoming traffic. The kernel of the idea: Why generate a page anew again and again? There are many cases in which a generated page can live for seconds, even longer. The trick is to keep the page handy when the second, third, and 10,000th visitor visits its URL. Here, I combine PHP with smart software called a reverse proxy to cache pages and save servers. Like your computer's memory cache — or like the PHP opcode cache — a reverse proxy eliminates rework and hastens the delivery of oft-requested data. Specifically, a reverse proxy intercedes between Web clients and your Web server to capture each incoming HTTP request and its respective HTTP response. Then, given a repertoire of requests and matching responses, the reverse proxy can act as if it were the genuine Web server. In some instances, the reverse proxy may simply pass an incoming request through to the Web server. But in other cases, a reverse proxy can choose to process the request itself. Think of a cached HTTP response as a form letter: The reverse proxy simply sends a form letter in response to a particular request. The second and third (etc.) request for an asset (page or image, for example) receives the same response as the original request. (An example exchange is shown in Figure 1.) Figure 1. Hypothetical reverse proxy shares same response with many clients
Figure 2 depicts the relationships between client, server, and the reverse proxy. The Web client — Firefox or Safari, for example — connects to the public-facing Web "server" on port 80. In fact, the "server" is actually a reverse proxy. Only the proxy can connect to the actual Web server through port 2001. If the proxy can't fulfill a request or isn't permitted to fulfill a request because of caching rules (discussed momentarily), the proxy defers to the Web server. Again, depending on mandates and the type of request, the proxy may cache the response and forward it to the client. Figure 2. Relationship between Web client, reverse proxy, and Web server
In addition to expediting delivery, the reverse-proxy caching scheme provides many other benefits, including:
Memory is the best persistent store for a cache because access time is (practically) instantaneous and RAM is typically plentiful (or cheap to amass). However, the file system can also act as a cache store. It's vastly more abundant and affordable than memory, albeit far slower to access. Of course, assets can and do change rapidly on the Web, and cached assets eventually become stale or out of date. Each request and response can specify its own "freshness date," and the cache expires each datum on schedule, usually a few seconds after its capture. The contents of the cache are manipulated through HTTP headers — the preamble of each HTTP request and response. Headers can set the expiry for an asset and can subvert caching. (The complex, subtle, and sometimes contradictory caching strategies of the client, server, proxy, and other agents is beyond the scope of this article.) Indeed, HTTP cache control is part of the HTTP V1.1 protocol specification (see Resources).
The quote below provides a snippet of Section 13.1.3 of the HTTP 1.1 protocol specification (see Resources). Note that the emphasis on the word MUST isn't editorial commentary: The all caps is part of the specification. "In some cases, a server or client might need to provide explicit directives to the HTTP caches. We use the Hence, if a response contained Here are some other helpful directives:
Note: All the cache-control directives can be found in Section 14.9 of the HTTP 1.1 specification (see Resources). Directives can also be combined, as in
Two other headers are used in tandem with
Hence, for purposes of caching assets that PHP generates, you must set one or more of
the headers
Building and installing Varnish To watch the HTTP headers in action, let's build, install, and run an HTTP reverse proxy to cache the output of a small PHP application. Varnish is a relatively new, but very capable, high-performance HTTP reverse proxy. (To learn more, read about Varnish's construction at the Varnish wiki (see Resources.) Varnish also provides monitors and a complete scripting language, Varnish Configuration Language (VCL), to fine-tune behaviors. For instance, the code snippet below directs Varnish to cache certain file types that typically represent static content:
Like most open source packages, Varnish builds readily on several platforms, including
FreeBSD, Linux®, and Mac OS X. Varnish is also available in binary form for
several systems, if you prefer using a package manager, such as To begin, download the source code from the Varnish Web site (see Resources), unpack the compressed TAR file, and change to the newly
created varnish-1.1.2 directory. Next, run the scripts ./autogen.sh and ./configure, in
that order. (The assumptions of the ./configure script are usually reasonable. However,
to customize the build to suit the specifics of your system, type Listing 1. Building and installing
After two or three minutes of compiling
The file varnishd, as its name implies, is the Varnish daemon — a perennial service that caches and serves content from memory. The other utilities listed above control and monitor the operation of varnishd. For instance, varnishstat continuously feeds you Varnish statistics. The file varnishadm lets you send administrative commands to varnishd while it's running. Out of the box, varnishd does not cache a response with a cookie, nor does it honor the
Listing 2. A fragment of VCL to conform with PHP
Let's march through the code:
To continue, save the contents to a file — say, /usr/local/etc/varnish/php.vcl — and start varnishd with the command:
After a few moments, you should see output that resembles the following:
The varnishd daemon is now ready for connections. From a terminal window, run varnishstat. As shown in Figure 3, varnishstate reveals that the daemon is running (the runtime is shown at top left), although no activity has been recorded yet. The number of free bytes in the cache can be found at bottom. Figure 3. varnishstat shows caching and connection activity
Next, generate some activity. Connect to port 8080 and browse your Web site. Watch the varnishd monitor closely. Do any pages or assets appear in the cache?
Before you continue, download and install the latest version of Firefox. When installed, launch Firefox and visit the add-ons page to install the Live HTTP Headers plug-in (see Resources). Among its many tricks, Live HTTP Headers shows you the HTTP headers of every incoming response. (You can filter out requests for image and CSS files, if you like.) Restart Firefox when prompted and open the Live HTTP Headers window. Save the code in Listing 3 so your Web server can find it and point Firefox to the
address of the new PHP page through the reverse proxy. For instance, if the URL was
http://www.example.com/misc/cache.php, you'd likely point to
http://localhost:8080/misc/cache.php. Visit the same URL from many browsers and through
Listing 3. Sample PHP code to manipulate the cache
Admittedly, this tiny application is only slightly more useful than Hello World, but it demonstrates all that's required to cooperate with Varnish and other reverse proxies. Even if you generate the bulk of your pages dynamically, those pages can likely be cached for a few seconds at least, sparing your server in the meantime. In contrast, you may want to preclude caching for other pages. Now you know the rules and have the software — Varnish — to realize a very valuable optimization.
On the modern World Wide Web, most pages are no longer hand-coded. Instead, they are generated by an application and delivered on demand, making each page custom and personal. But "There ain't no such thing as a free lunch." The time and effort required to craft HTML merely shifted from man to machine. And although a machine may be faster by orders of magnitude, it is nonetheless a finite resource. The smart PHP developer respects this certitude. The smart PHP developer plans to scale. Database queries are efficient, servers are made redundant, memory is used efficiently. And now, you can cache your dynamic pages. Bring on the traffic! Learn
Get products and technologies
Discuss
Original link: http://www.ibm.com/developerwork... | |||||||||||||||||||||||||||||||||||||||||||||||||||||