gfs-sosp2003.pdf (application/pdf Object)

The Google Filesystem (pdf):

"First, component failures are the norm rather than the exception. The file system consists of hundreds or even thousands of storage machines

Second, files are huge by traditional standards. Multi-GB files are common. Each file typically contains many application objects such as web documents. When we are regularly working with fast growing data sets of many TBs comprising billions of objects, it is unwieldy to manage billions of approximately KB-sized files even when the file system could support it. As a result, design assumptions and parameters such as I/O operation and blocksizes have to be revisited.

Third, most files are mutated by appending new data rather than overwriting existing data. Random writes within
a file are practically non-existent."

Fascinating stuff.

# Jul 11, 2006