|Learning About Computers and the Internet|
The characteristics and properties of the two common Windows file systems are discussed.
There are many possible ways to organize the information, programs, and data that we store on our computer hard drives. The system of collecting information together in “files” which in turn are grouped in “directories” or “folders” provides a method for naming and addressing information that is familiar to most PC users. But the mechanics of how the information is actually physically placed on the hard drive and retrieved is not something most of us ever think about. Nonetheless, with the advent of Windows XP and its file system, NTFS 3.1 (also known as NTFS 5.1, the numbering isn’t consistent), the time has come when this seemingly esoteric subject should not be ignored. PC users who are changing over to XP from Windows 9X/Me should be aware that NTFS (New Technology File System) has characteristics unfamiliar to most home PC users, whose systems employ a different file system called FAT (File Allocation Table). Those upgrading an older computer to Windows XP will face the decision of which file system to use. Those who buy a new computer with XP will almost certainly have NTFS already installed. In either event some knowledge of the workings of the file systems would seem desirable even for the average PC user. I don’t pretend to be an expert on file system architecture but in this article I will outline some of the characteristics of the two different file systems, FAT and NTFS, and their pros and cons.
Before discussing some of the issues it is necessary to outline briefly some basics about the way disks are organized. Before a physical medium such as a disk can store data it must be put into a state usable by the computer operating system. In order for the system to be able to systematically allocate information to disk space, disks are divided up into little boxes or sectors. Low-level formatting assigns 512 bytes to each sector. (A number other than 512 could be used but standard practice has settled on this particular size.) These sectors in turn are grouped into clusters (sometimes called “allocation units” by Microsoft) by the operating system. All clusters are given the same size during a high-level format and typically run from 2 to 16 sectors. Each file then occupies one or more of these clusters. (It is also possible to have a file system that directly assigns sectors to files, as is done by the file system HPFS in the IBM operating system OS/2.) The cluster size depends on the operating system and several variables, including the size of the hard disk or its partitions, and is a key factor in determining operating system efficiency and speed.
In addition to the area where files for data and programs are stored, there are several other distinct areas set aside on the disk for basic system operations. Without going into the gory details, which depend on the particular operating system, there is an area (or areas) for boot processes and an area (or areas) providing information on the physical location and the properties of the data and program files. After the BIOS is finished at bootup, the boot area(s) provide the means for continuing the computer startup process and for loading the operating system. Each operating system carries this out in its own different way. Each operating system also has its own particular way of storing information about the attributes and actual physical whereabouts of individual files. When the computer operation requires a specific file for some purpose, it is this file information that allows the system to find and load that file from disk into RAM for processing. Here FAT and NTFS have completely different approaches.
FAT File System
FAT gets its name from the use of a kind of database called a File Allocation Table that contains an entry for each cluster on the disk. The FAT system has been in use by Microsoft since before DOS 1 (the first version was devised by a teenager named Bill Gates) and has undergone several revisions. There are versions called FAT12, FAT16, and FAT32. The numbers refer to the number of bits used for the cluster entries in the table. More recent PC users may find it hard to believe but in 1987 the FAT system then in use (in DOS 3) was unable to read a hard drive (or more accurately, volume) bigger than 32 MB. (That’s right, 32 megabytes). By the time of DOS 6, the upper limit had been enlarged in several steps to 2 GB but the ever increasing size of hard disks made yet another revision necessary. With Windows 95B, FAT32 was introduced, increasing the upper limit to 2 terabytes (theoretically but not practically). These continual problems with disk size arose from several causes, including the fact that the number of entries in FAT is limited by the finite number of bits used for describing the location of a cluster. For example, FAT16 can hold no more than 2^16 or 65,526 cluster entries (actually somewhat less). Another factor is that the number of sectors per cluster is also limited.
A further problem with bigger disks is the large amount of wasted space or “slack”. Since there are a fixed number of clusters available, larger disks mean that the cluster size has to be increased in order to fill the available space. However, this results in more and more unutilized disk space since a typical file is rarely close to an even multiple of a cluster size. For example, a FAT32 system uses 16 KB clusters for partition sizes between 16 and 32 GB. A 20 KB file would require two 16 KB clusters actually occupying 32 KB of space. A mere 1 KB file still requires 16 KB of space. A typical large disk might have 30% or even 40% of its space wasted this way. Making smaller partitions alleviates slack but with 200 GB disks now common, and ever-bigger ones on the way, partitioning is no longer a practical solution.
Another problem is file fragmentation. Although a file may require several clusters, the clusters need not be in close physical proximity on the disk. When a file is loaded to the disk the operating system chooses unused clusters wherever it finds them. If many files consist of widely separated parts, the time required to retrieve them for program use inevitably slows the system (hence the need for defragging).
It has to be remembered that the FAT system was first devised when the computer environment was very different from what it is today. Indeed, the PC as we know it did not even exist. FAT was intended for systems with very little RAM and small disks. It required much less in the way of system resources than did the file systems in Unix and other big computer systems and did its job well when systems were small. NTFS and Windows XP are practical for consumer PCs today only because the available resources of RAM and hard drive size have reached levels far exceeding anything imagined when FAT was first put into use.
Actually, the FAT system has been enjoying something of a come-back. Thumb or flash drives have become very common and these are of a size that makes the FAT system useful. The smaller sizes are even formatted in FAT16.
In the early 1990's Microsoft, recognizing that DOS based Windows was inadequate for the much heavier demands of business and industry, began work on different software designed for much larger systems than the home PC. At first this was a joint effort with IBM, using what became IBM OS/2 and employing a file system named HPFS (High Performance File System). As we all know, the cooperative attempt did not work out and the two companies soon went their own way. Microsoft developed the various Windows NT versions, which then morphed into Windows 2000 and now Windows XP. Each one of these operating systems has its own version of the file system NTFS, which has also undergone evolution.
Going into the details of NTFS architecture would be too overwhelming for this current article so I will limit myself to a few points. (Those who are keen on the subject can read the long discussion at this site.) NTFS is much more flexible than FAT. Its system areas are almost all files instead of the fixed structures used in FAT. Since files are used, the system areas can be modified, enlarged, or moved as is needed. An example of one of the several system files is the Master File Table (MFT). The MFT is a sort of relational database with a variety of information about all the files on the disk. If a file is small (1 KB or less) the MFT may even hold the file itself. For larger files NTFS uses clusters in assigning disk space but in a way different from FAT. The cluster size will not normally exceed 4 KB. A type of individual file compression is built in so that the problems with slack do not arise.
Because it is intended for multi-user environments, NTFS has much more security built in. For example, the XP Professional version (not the Home version) allows permissions and encrypting to be applied to individual files. While much more secure, XP is accordingly much harder to tinker with. That makes trouble-shooting and system tweaking more problematical. It also means that the user has to be very careful when setting up passwords and permissions on a system. Forgetting a password has much more serious consequences than it did in Windows 98.
The MFT and other system files occupy quite a bit of space so NTFS is not intended for small disks. Also the amount of memory required is substantial. These system overhead requirements, which formerly limited the use of Windows NT to larger computers, have largely disappeared as a factor with newer PCs and their much larger amounts of RAM and very large hard drives.
|<< Home page||©2002-2014 Victor Laurie||Home page >>|