Friday, November 13, 2009

[13-Nov-2009] Tech Talk of the Day: Google File System A Critical Analysis

Google File System: A Critical Analysis

Who does not know Google? It will not be wrong to say that Google has become a vital need in today’s information age. But have you ever wondered on the driving force behind Google, what is it that makes Google stand out as Google? There are many answers to this question but one most prominent research by the Google engineers emerged in 2003 by the name of Google File System, the paper of which was presented in the famous SOSP Conference of 2003.

GFS is a distributed file system highly customized for Google's computing needs and clusters composed of thousands of commodity disks. GFS uses a simple master-server architecture based on replication and auto-recovery for reliability, and designed for high aggregate throughput. The file system is proprietary and has been used to serve Google’s unique application workloads and data processing needs.

Why GFS?
Traditional file systems are not suitable for the scale at which Google generates and processes data: multi gigabyte files are common. Google also utilizes inexpensive commodity storage, which makes component failures all the more common. Google's data update patterns are specific, and most of the updates append data to the end of the file. Traditional file systems do not guarantee consistency in the face of multiple concurrent updates, whereas using locks to achieve consistency hampers scalability by becoming a concurrency bottleneck.

GFS Details
The diagram below presents the fundamental architecture of Google File System:


A GFS cluster consists of a single master and multiple chunkservers and is accessed by multiple clients. Files are divided into fixed-size chunks and each chunk is identified by a chunk handle. Large chunk size is chosen for better performance. The master maintains all file system metadata. This includes the namespace, access control information, the mapping from files to chunks, and the current locations of chunks. It also controls system-wide activities such as chunk lease management, garbage collection of orphaned chunks, and chunk migration between chunkservers. The master stores three major types of metadata: the file and chunknamespaces, the mapping from files to chunks, and the locations of each chunk’s replicas. All metadata is kept in the master’s memory. The first two types (namespaces and file-to-chunkma pping) are also kept persistent by logging mutations to an operation log stored on the master’s local disk and replicated on remote machines. The master does not store chunk location information persistently. GFS client code linked into each application implements the file system API and communicates with the master and chunkservers to read or write data on behalf of the application. Clients interact with the master for metadata operations, but all data-bearing communication goes directly to the chunkserver.

Permissions for operations are handled by a system of time-limited, expiring "leases", where the Master server grants permission to a process for a finite period of time during which no other process will be granted permission by the Master server to access the chunk. The modified chunkserver, which is always the primary chunk holder, then propagates the changes to the chunkservers with the backup copies. The changes are not saved until all chunkservers acknowledge, thus guaranteeing the completion and atomicity of the operation.
Programs access the chunks by first querying the Master server for the locations of the desired chunks; if the chunks are not being operated on (if there are no outstanding leases), the Master replies with the locations, and the program then contacts and receives the data from the chunkserver directly (similar to Kazaa and its supernodes).

Critical Analysis

1) The authors and developers of Google File System make trade offs aggressively to their advantage. Unfortunately, the only other people in the world who could benefit from these decisions were other people at Google, or perhaps their direct competitors (and not for long, it appears).

2) The chunkservers run the file system as user-level server processes and are less efficient than implementing file system directly in the kernel to improve performance.

3) Most consistency checks are pushed to the application and it needs to maintain ids/checksums to ensure that the records are consistent. Google built not only the file system but also all of the applications running on top of it. While adjustments were continually made in GFS to make it more accommodating to all the new use cases, the applications themselves were also developed with the various strengths and weaknesses of GFS in mind, and this approach makes the life of the application developer quite difficult.

4) Clients that cache chunk locations could potentially read from a stale replica.

5) One flaw of the design is the decision to have a single master, which limits the availability of the system. Although the writers argue that a takeover can happen within seconds, I believe that the most important implication is that a failed master might mean that some operations are lost, if they have not been recorded in the log. Relying on a quorum among multiple masters seems a straightforward extension and can provide better performance.

No comments:

Post a Comment