XFS

From Nekochan
Revision as of 08:27, 5 June 2012 by Smj (Talk | contribs) (Important Patches: - formatting/spacing)

Jump to: navigation, search

XFS is a high-performance journaling file system created by Silicon Graphics for their IRIX operating system.

XFS has been merged into the mainline Linux 2.4 (as of 2.4.25, when Marcelo Tosatti judged it stable enough) and 2.6 kernels, making it almost universally available on Linux systems. Installation programs for the SuSE, Gentoo Linux, Mandriva, Slackware, Zenwalk, Fedora Core, Ubuntu Linux and Debian Linux distributions all offer XFS as a choice of filesystem. FreeBSD gained read-only support for XFS in December 2005 and in June 2006 experimental write support was introduced to FreeBSD-7.0-CURRENT.

History

XFS is the oldest journaling file system available for UNIX systems, and has a mature, stable and well-debugged codebase. Development of XFS was started by Silicon Graphics, in 1993, with first deployment being seen on IRIX 5.3 in 1994. The filesystem was released under the GNU General Public License in May 2000, and ported to Linux, with the first distribution support appearing in 2001/2002. It is available in almost all Linux distributions today.

Specifications

Capacity

XFS is a 64-bit journaling file system with guaranteed file system consistency. It supports a maximum file system size of 9 exabytes, though this is subject to block limits imposed by the host operating system. On 32-bit Linux systems, this limits the file and file system sizes to 16 terabytes.

Journaling

XFS provides journaling for file system metadata, where file system updates are first written to a serial journal before the actual disk blocks are updated. The journal is a circular buffer of disk blocks that is never read in normal filesystem operation. It can be stored within the data section of the filesystem (an internal log), or on a separate device to minimise disk contention. On XFS the journal contains 'logical' entries that describe at a high level what operations are being performed, as opposed to other filesystems with 'physical' journals that store a copy of the blocks modified during each transaction. Journal updates are performed asynchronously to avoid incurring a performance penalty. In the event of a system crash, operations immediately prior to the crash can be redone using data in the journal, which allows XFS to guarantee file system consistency. Recovery is performed automatically at file system mount time, and the recovery speed is independent of the size of the file system. Where recently modified data has not been flushed to disk before a system crash, XFS ensures that any unwritten data blocks are zeroed on reboot, obviating any possible security issues arising from residual data.

Allocation groups

XFS filesystems are internally partitioned into allocation groups, which are equally sized linear regions within the file system. Files and directories can span allocation groups. Each allocation group manages its own inodes and free space separately, providing scalability and parallelism — multiple threads and processes can perform I/O operations on the same filesystem simultaneously. This architecture helps to optimise parallel I/O performance on multiprocessor or multicore systems, as metadata updates are also parellelisable. The internal partitioning provided by allocation groups can be especially beneficial when the file systems spans multiple physical devices, allowing for optimal usage of bandwidth of the underlying storage components.

Striped allocation

If an XFS filesystem is to be created on a RAID array, a stripe unit can be specified when the file system is created. This maximises throughput by ensuring that data allocations, inode allocations and the internal log (journal) are aligned with the stripe unit.

Extent based allocation

Space in files stored on XFS filesystems is managed in variable length extents, as opposed to the fixed size blocks used by many other file systems. Many file systems manage space allocation with block oriented bitmaps — in XFS these structures are replaced with an extent oriented structure consisting of a pair of B+ trees for each filesystem allocation group (AG). One of the B+ trees is indexed by the length of the free extents, while the other is indexed by the starting block of the free extents. This dual indexing scheme allows for highly efficient searching for appropriate free extents for file system operations.

Variable block sizes

The file system block size represents the minimum allocation unit. XFS allows file systems to be created with block sizes ranging between 512 bytes and 64 Kilobytes, allowing the file system to be tuned for the expected use. Where a large amount of small files is to be expected, a small block size would typically be used, but for a system dealing mainly with large files, a larger block size can provide a performance advantage.

Delayed allocation

XFS makes use of lazy evaluation techniques for file allocation. When a file is written to the buffer cache, rather than allocating extents for the data, XFS simply reserves the appropriate number of file system blocks for the data held in memory. The actual block allocation occurs only when the data is finally flushed to disk. This improves the chance that the file will be written in a contiguous group of blocks, reducing fragmentation problems and increasing performance.

Sparse files

XFS provides a 64-bit sparse address space for each file, which allows both for very large file sizes, and for holes within files for which no disk space is allocated. As the file system used an extent map for each file, the file allocation map size is kept small. Where the size of the allocation map is too large for it to be stored within the inode, the map is moved into a B+ tree which allows for rapid access to data anywhere in the 64-bit address space provided for the file.

Extended attributes

XFS provides multiple data streams for files through its implementation of extended attributes. These allow the storage of a number of name/value pairs attached to a file. Names are null terminated printable character strings of up to 256 bytes in length, while their associated values can contain up to 64Kb of binary data. They are further subdivided into two namespaces, root and user. Extended attributes stored in the root namespace can be modified only by the superuser, while attributes in the user namespace can be modified by any user with permission to write to the file. Extended attributes can be attached to any kind of XFS inode, including symbolic links, device nodes, directories, etc. The attr program can be used to manipulate extended attributes from the command line, and the xfsdump and xfsrestore utilities are aware of them and will back up and restore their contents. Most other backup systems are not extended attribute aware.

Direct I/O

For applications requiring high throughput to disk, XFS provides a direct I/O implementation that allows non-cached I/O directly to userspace. Data is transferred between the application's buffer and the disk using DMA, which allows access to the full I/O bandwidth of the underlying disk devices.

Guaranteed rate I/O

The XFS guaranteed rate I/O system provides an API that allows applications to reserve bandwidth to the filesystem. XFS will dynamically calculate the performance available from the underlying storage devices, and will reserve bandwidth sufficient to meet the requested performance for a specified time. This feature is unique to the XFS file system. Guarantees can be hard or soft, representing a trade off between reliability and performance, though XFS will only allow hard guarantees if the underlying storage subsystem supports it. This facility is most used by real-time applications, such as video-streaming.

DMAPI

XFS implents the DMAPI interface to support Hierarchical Storage Management. While this functionality has been ported to the Linux implementations of XFS, it is not yet a part of the mainline Linux kernel source.

Snapshots

XFS does not provide direct support for snapshots, as it expects the snapshot process to be implemented by the volume manager. Taking a snapshot of an XFS filesystem involves freezing I/O to the filesystem using the xfs_freeze utility, having the volume manager perform the actual snapshot, and then unfreezing I/O to resume normal operations. The snapshot can then be mounted read-only for backup purposes. XFS releases on IRIX incorporated an integrated volume manager called XLV. This volume manager has not been ported to Linux. In recent Linux kernels, the xfs_freeze functionality is implemented in the VFS layer, and happens automatically when the Volume Manager's snapshot functionality is invoked.

Online defragmentation

Although the extent based nature of XFS and the delayed allocation strategy it used significantly improves the file system's resilience to fragmentation problems, XFS provides a filesystem defragmentation utility (xfs_fsr) that can defragment a mounted and active XFS filesystem. Note that xfs_fsr is usually part of xfsdump package, not xfsprogs.

Online resizing

XFS provides the xfs_growfs utility to perform online resizing of XFS file systems. XFS filesystems can only be grown, not shrunk, and growing the filesystem requires there to be remaining unallocated space on the device holding the filesystem. This feature is typically used in conjunction with volume management, as otherwise the partition holding the filesystem will need enlarging separately.

Native backup/restore utilities

XFS provides the xfsdump and xfsrestore utilities to aid in backup of data held on XFS file systems. The xfsdump utility backs up an XFS filesystem in inode order, and in contrast to traditional UNIX file systems which must be unmounted before dumping to guarantee a consistent dump image, XFS file systems can be dumped while the file system is in use. XFS dumps and restores are also resumable, and can be interrupted without difficulty. The multi-threaded operation of xfsdump provides high performance of backup operations by splitting the dump into multiple streams, which can be sent to different dump destinations. The multi stream capabilities have not been fully ported to Linux yet, however.

Disadvantages

  • There is currently no way to shrink an XFS filesystem in-place (although a patch talked about in September 2005 on SGI's XFS email list mentions planned support for shrinking XFS filesystems: Iustin Pop (September 5, 2005). "Where/how to submit for review patch for shrink support?". http://oss.sgi.com/archives/xfs/2005-09/msg00039.html. ).
  • Older versions of XFS suffered from out-of-order write hazards, which can result in problems such as files being appended to during a crash gaining a tail of garbage on the next mount .
  • Versions of the GNU GRUB bootloader prior to 0.91 do not support XFS.
  • Recovering deleted files from an XFS filesystem is almost impossible (though this can be an advantage, too)
  • For Windows/Linux dual boot systems, one disadvantage of XFS is that it cannot be read from Windows, unlike the older ext2/3 Linux filesystems and ReiserFS (using a plugin for Total Commander).
  • On the Linux XFS implementations, compatibility issues between 64-bit and 32-bit environments exist.
  • XFS journals metadata. This means that if you pull the power cord out of your computer, when you turn it on again, XFS will be in a consistent state (i.e. you will see your directories, and be able to list the files they contain). This is an advantage compared to not being able to see anything. However, you will probably have lost the data from files that were open at the time of pulling out the power cord because XFS does not journal data changes. SGI (August 28, 2006). "XFS FAQ". http://oss.sgi.com/projects/xfs/faq.html#wcache.  XFS is not alone here - some other file systems (such as JFS2) also journal metadata changes and not data changes because it's a good strategy when mediating between speed requirements and safety.
  • Since Linux went to 4K stacks, XFS has become unstable, randomly causing stack overflows, causing system hangs, especially when chained with additional storage technologies such as software raid (md), Logical Volume Management (lvm), and/or export via nfs.
  • Ubuntu (currently at version 6.10) does not allow XFS to be used on the boot partition.

Important Patches

IRIX Patch 7144: xfsdump rollup for 6.5.30 11-Jan-2007 Requires Support Program to AccessPatch 7146

IRIX Patch 7185: Parallel XFS Repair for 6.5.29 and 6.5.30 25-Oct-2007 Requires Support Program to Access

IRIX Patch 7245: 6.5.30 XFS Rollup #5 27-May-2011 Requires Support Program to Access

See also

External links