Organizations are generating and collecting volumes of data and analyzing it to produce business insight. Different approaches to storing and analyzing the data have emerged, each one suggesting a different concept of where to place the data and where to run the analytics. Distributed systems such as Hadoop and Apache Spark now complement traditional data analytics applications, with both of these systems requiring rapid access to a large quantity of data from all sources and types.
Facing rapid growth in data and an insatiable need for faster time to results, organizations are making greater use of scale-out architectures to dynamically scale both performance and capacity to support future demand. Scale-out storage architectures are built by adding independent nodes that provide processing power along with capacity.
However, evaluating different scale-out architectures can be confusing. For example, there is a significant difference between accessing data over scale-out network attached storage (NAS) and using scale-out file systems. This paper explains that difference to help you choose an impactful solution for your storage environment and avoid data-related bottlenecks that can slow performance.
The difference between a file system and NAS
Everyone who uses a computer is familiar with file systems, even if they do not know it. The file system is the technology that stores and retrieves data, and provides the structure for locating data. In the case of files, that structure consists of directories and folders presented to the user. For objects, it is a unique name. A file system is responsible for data integrity and consistency, managing user access and writing data to disk or other media.
NAS allows remote computers to access the files on a remote server. Typically, the computers do this with an industry-standard protocol such as network file system (NFS) or Microsoft Windows server message block (SMB). Underneath the NAS protocol, the server can use any file system. NAS provides central sharing of data across heterogeneous systems, and NAS servers can deliver good performance if the network capacity and latency are sufficient. A NAS file system can be scaled, but only to certain limits.
Scale-out file systems such as IBM® Spectrum Scale™ provide the unique option of using a local client that appears as a local file system, but is actually part of a more global, highly scalable file system that can be shared across heterogeneous systems. As a result, it combines the application simplicity of a local file system and the sharing benefit of NAS. Unlike NAS, however, the IBM Spectrum Scale client can directly access disks in parallel.
The IBM General Parallel File System (GPFS™) protocol at the heart of IBM Spectrum Scale can scale in both capacity and performance, avoiding the typical bottlenecks associated with NAS. IBM Spectrum Scale provides a single namespace across clients within different systems over the network.
A proven, high-performance data management solution, IBM Spectrum Scale is used extensively across multiple industries worldwide. IBM has invested in IBM Spectrum Scale enterprise features that make it reliable, easy to use and suitable for mission-critical storage of all types, ensuring maximum data availability, integrity and security. IBM Spectrum Scale is part of the IBM Spectrum Storage™ family of software defined storage offerings designed to address today’s operational storage challenges and business needs for maximum storage optimization.
Scale-out NAS and remote file systems
A scale-out NAS system is comprised of multiple NAS storage nodes with internal or external storage (Figure 1). Each NAS storage node represents the same file systems to user applications. Data is stored across all NAS storage nodes, and adding capacity can be achieved by adding more storage nodes. Data coming in through a NAS protocol such as NFS typically has to be converted to another file system protocol before the data blocks are stored on disk.
Back to table of contents