Il Cache Array Routing Protocol è stato sviluppato da Microsoft come parte fondamentale dei propri prodotti Proxy Server. Le regole del protocollo CARP forniscono un metodo veloce ed efficiente per creare un'insieme di WEB cache (array), gestirne la relativa comunicazione e stabilire un meccanismo complesso di gerarchie di tipo "fault tolerant " e " load balanced".
Microsoft® Proxy Server 2.0 uses the Cache Array Routing Protocol (CARP) to provide seamless scaling and extreme efficiency when using multiple proxy servers arrayed as a single logical cache. CARP uses hash-based routing to provide a deterministic "request resolution path" through an array of proxies. The request resolution path, based upon a hashing of proxy array member identities and uniform resource locators (URLs), means that for any given URL request, the browser or downstream proxy server will know exactly where in the proxy array the information will be stored whether already cached from a previous request, or making a first Internet hit for delivery and caching.
CARP provides two powerful benefits:
Because CARP provides a deterministic request resolution path,
there is none of the query messaging between proxy servers that is found
with conventional Internet Cache Protocol (ICP) networks, a process
that creates a heavier congestion of queries the greater the number
Microsoft® Proxy Server 2.0 introduces the Cache Array Routing Protocol (CARP) to greatly expand the scalability and efficiency of proxy servers networked into an array.
To fully appreciate the Microsoft Proxy Server 2.0 solution to caching, it is helpful to consider the growing need for the efficiencies that are gained by caching web sites. The World Wide Web (WWW) has become so popular, and the traffic so heavy, that frustration with response times has led to jokes about the WWW standing for the World Wide Wait.
Proxy servers, originally developed for security as extensions of firewalls, soon proved to have additional value the speed with which their cached URLs were returned to users. Because it is the most requested URLs that remain stored in cache, proxy servers provide great efficiency:
Less network traffic. Once an object has been downloaded from
the Internet, subsequent users will retrieve that object from the cache
instead of having to request the same object across a remote network
Caching proved so efficient that the need was soon seen for deploying multiple proxy servers that could communicate and work together to create a more robust system. In 1995 the Internet Cache Protocol (ICP) was developed to allow Individual proxies to "query" manually configured neighboring proxies in order to find cached copies of requested objects. If all queries failed to find a cached object, the proxy would then use HTTP to request the object from the Internet.
Although ICP allows proxy servers to be networked together, certain problems emerge when using the protocol. These include:
ICP arrays must conduct queries to determine the location of
cached information, an inefficient process that generates extraneous
Now Microsoft has significantly enhanced the efficiency of using multiple proxy servers with the introduction of Cache Array Routing Protocol (CARP), a series of algorithms that are applied on top of HTTP. CARP yields several new and interesting benefits to cache users and network operators without introducing new wire protocols. Microsoft will present the finalized version of the CARP protocol to the Internet Engineering Task Force (IETF) for consideration as an Internet Standard protocol.
Microsoft Proxy Server 2.0, with the power of CARP, allows for "queryless" distributed caching. This is especially important because the vast resources of the Internet, as well as its huge potential for marketing, sales, and other business activities, make the Internet an increasingly essential element for an organization's communications infrastructure. Furthermore, distributed caching helps alleviate network administrator concerns about bottlenecks arising from "push" technologies to client desktops.
The queryless distributed caching allowed by Microsoft Proxy Server 2.0 provides strengths including:
CARP doesn't conduct queries. Instead it uses hash-based routing
to provide a deterministic "request resolution path" through
an array of proxies. The result is single-hop resolution. The web browser,
such as Microsoft Internet Explorer, or a downstream proxy, will know
exactly where each URL would be stored across the array of servers.
The seamless scalability, freedom from ICP-type querying, protection against redundant caching, client integration, and ability to automatically adjust to server array membership, combine to make Proxy Server 2.0 the platform of choice for bringing the power of proxy server efficiencies into the enterprise.
On This Page
Taking a Closer Look at Queryless Distributed Caching
Query being sent to the client's default proxy server.
Figure 1: ICP-based arrays generate extraneous traffic because they must query for cache locations.With CARP, the web browser or downstream proxy server handling the request performs a hash function based upon the "array membership list" and URL to provide the exact cache location of an object, or where it will be cached upon downloading from the Internet resulting in single-hop resolution.
How CARP Works
All proxy servers are tracked through an "array membership
list", which is automatically updated through a time-to-live (TTL)
countdown function that regularly checks for active proxy servers.
The result is a deterministic location for all cached information, meaning that the web browser or downstream proxy server can know exactly where a requested URL either already is stored locally, or will be located after caching. Because the hash functions used to assign values are so great 2^32 = 4294967296 the result is a statistically distributed load balancing across the array.
The deterministic request resolution path that CARP provides means that there's no need to maintain massive location tables for cached information. The browser simply runs the same math function across an object to determine where it is .
Improving Client Performance with Distributed Caching
Distribution of server loads, with downstream proxies (such
as with a branch office) offloading cache hits from upstream proxies.
Distributed caching also provides value in environments such as:
Corporate branch office with Internet connectivity provided
by central office.
Distributed caching also can be combined with the IPX-capabilities of Winsock Proxy to support mixed network environments and allow pockets of IPX-only clients access to IP-based intranet and Internet sites.
Charting a Request with Distributed Caching
Figure 2: Flowchart for a request-forwarding decision in distributed caching.
Note that "upstream" proxies can be either other proxy arrays or other members of the local array.
Understanding the Routing Algorithm
The following is a simplified, step-by-step representation of how the light-weight on-the-fly routing algorithm works, based upon an array of four proxy servers named Jericho1-4:
1. Get upstream array membership list and compute hashes on proxy names:
2. Get URL to route upstream, and compute a hash of the URL:
3. Combine hashes. The hash combination algorithm takes into account a load factor assigned to each proxy (Proxies with ability to handle more HTTP requests should be routed more traffic):
4. Find the highest "score" and forward the URL request to that proxy (in this case Jericho2):
5. Compute for route for other URLs (this shows the natural load balancing that occurs as the hash functions result in distribution across the array):
Incremental Scaling Easy to Add or Subtract Servers
Here's an illustration of adding another proxy Jericho5 to the array:
In addition to easy scalability, this allows for excellent fail-over. If a server fails, its URLs are automatically re-routed to the servers with the next highest scores. In the above example, a failure of "Jericho2" would result in http://www.microsoft.com being re-routed to Jericho3. Because hash functions are deterministic, all fail-over reassignments are made in a consistent, reliable manner.
It's important to note that these two routing options aren't mutually exclusive and can be arbitrarily combined. (For example, a request might be first resolved in the distributed manner within an array and then if a cached copy still can't be found, it is forwarded upstream hierarchically). The only real technical differences between these two methods are:
Whether the request is forwarded to the local or upstream array
Figure 3: Hierarchical routing, in which requests are forwarded to upstream proxies.
Hierarchical routing involves a downstream proxy that has n upstream proxies to which it can forward requests. The downstream proxy uses the array membership list of the upstream proxies and hash-based routing to intelligently determine which upstream proxy to forward the request to.
Because all downstream proxies are constructing their route table with the same inputs, they will all route their requests to the same upstream proxy thereby maximizing potential cache hit rates.
Figure 4: Distributed routing, in which requests are forwarded laterally
to the highest scoring proxy.
Distributed routing uses hash-based routing to intelligently process requests within an array of proxies. In this scenario, a proxy with full knowledge of the members of it's own array determines that the request is not ideally processed by himself. Proxy 1 then forwards the request to the 'highest scoring' proxy in this case Proxy 4. Because Proxy 1 forwarded the request within its own array, he won't cache the returned response since a cacheable response will be held in Proxy 4. This provides maximum efficiency in cache usage, protecting the efficiency of a single coordinated disk cache spread out across all machines.
1. The client forwards to local proxy #1. That proxy applies the routing
algorithm against its own array and determines that local proxy #2 in
its own array should handle the request and forwards it.
Figure 5: Combination routing uses distributed and hierarchical caching.
Preventing Routing Loops
Automatic Updating of Membership List
Communications between array managers are handled via HTTP and remote procedure calls (RPC). RPC interfaces are used to handle modifications to the array table -- such as membership, status, and parameters. HTTP is used to publish array information. Publishing via HTTP allows the array table to be consumed by any product supporting the HTTP protocol. The array manager is designed to provide "one-stop shopping" any one member of the array will have current information about every other member of the array. Therefore, a client need only query one, randomly selected, array member in order to properly route into the array.
The membership lists contains information including:
The URL that a array manager should call in order to get the
array information from a remote manager.
Here's a sample array table:
Proxy Array Information/1.0
CATNET07 22.214.171.124 80 http://CATNET07:80/array.dll MSProxy/2.0 171 Up 100 3000
CATNET09 126.96.36.199 80 http://CATNET09:80/array.dll MSProxy/2.0
171 Up 100 1500
Upstream Table Management A Proxy manages its own "impression" of the upstream tables. Whenever the TTL countdown expires (usually set for several minutes), the proxy queries for a new array table.
Local Table Management In addition to the upstream table management protocol (reloading the table if the TTL has expired), a proxy within an array also watches all HTTP requests to any array member in order to determine the status of that member. If a request fails, the local proxy marks that proxy member as down in its table for a given TTL period and doesn't forward requests to that member until the TTL expires, and the next table query shows it is active.
Figure 6: Each array member manages its own array table through queries
triggered by TTL countdowns.
Proxy servers are spared the traffic congestion of ICP queries a problem which increases which each server added to an ICP array. The efficiency of the array is preserved by avoiding duplication of content that can degrade a five-server ICP array into five independent caches holding much the same content.
Because server identities are hashed, in addition to the URLs, cached information has "stickiness", meaning that array membership can be increased or decreased while causing minimal reassignment of currently stored information.
All of this results in the ability to deploy Proxy Server arrays that provide built-in load balancing, scalability, fault tolerance, ease of administration, and the efficiency of a single logical cache. And because CARP uses HTTP, it accomplishes this without introducing a new wire protocol.
CARP means faster response to queries, and a far more efficient use of server resources.
Copyright (c) 2000 - 3000 by Ing. Eduardo Palena - Napolifirewall.com