Skip to main content

IBM Systems >  Clusters >  Software >  GPFS > 

An introduction to the IBM General Parallel File System (GPFS) 3.1


Welcome to an introduction to the IBM General Parallel File System or GPFS 3.1. This course introduces, at a high level, multiple features of GPFS. After viewing this course you will be able to describe the following about GPFS:

  • What is a clustered file system
  • The High availability features of GPFS
  • Describe the Cluster configuration options available
  • And the information lifecycle management or ILM features of GPFS

IntroductionAvailabilityManagementILM tools

Evolution  
Before looking at features it is useful to understand where GPFS originated. GPFS was introduced in 1998 based on the Tiger Shark research project. The goal of the Tiger Shark project was to create a file system to support streaming media environments. This is where wide striping was introduced and the supported platform was AIX. A bit of trivia, this is also where the GPFS naming conventions came from, why the commands start with mm and why GPFS resides in the mmfs directory. The prefix mmfs stands for multi-media file system. In 1998 it was recognized that high performance computing (HPC) environments could benefit from such a parallel file system. Therefore at this time support for posix semantics, large blocks and other features were added.

The first embedded use of GPFS was in 2002 in the IBM Virtual Tape Server (VTS). At the same time support for Linux clusters was introduced. With the release of version 2.3 in 2005 the requirement for external shared disk and clustering software packages was removed. At this point GPFS gained market share in non-high performance computing solutions including digital media and relational databases. GPFS 3.1 integrated the functionality of SAN File System (SFS) with GPFS introducing Information Lifecycle Management (ILM) tools. In addition to ILM, features were added to simplify administration and improve scalability.

By early 2008 GPFS will introduce support for Windows while continuing to improve usability, scalability and performance.

Works
To begin, let us take a look at an example of how GPFS works. In this example there are 4 servers, all of the servers are attached to a storage area network. As in this example, normally each server has a distinct set of disks available. Multiple SAN's are shown here though this could be a single SAN with separate zones; the effect is the same.

To create a cluster the first step is to install GPFS on all of the systems that require direct access to the storage. In this example, all of the nodes are added to a single GPFS cluster. Then the SAN is rezoned such that all nodes see all the available storage. At this point all of the nodes see all of the disks on the SAN. This provides all nodes high speed access to the whole pool of storage. Once the cluster is created and disks assigned you can create a file system, /home in this case. With this configuration all the data in the file system is accessible directly across the SAN from all nodes. At this point applications can use the data as if it were in a local file system. GPFS maintains data integrity by handling concurrent access to the data using a distributed token management system.