Improving system performance
through hard disk data rearrangement
This page deals with one of the research topics in the PEL, "Improving system performance through hard disk data
rearrangement."
This research has application for many different computer systems, however we have chosen to perform the research on a
platform which many computer users are currently using, a PC running Windows 95/NT. Further work could include other
environments.
Why this research?
Today's computer systems require access to many files, and retrieve information from many parts of a hard disk. Even
though computer hard disks have made giant steps in increasing their efficiency and speed, the hard disk is often a
bottleneck for many computer operations.
When files are written and changed, operating systems do not always place the data onto the disk in the most efficient
manner. The result is files which are often scattered over the entire disk. Since hard disks contain physical, moving
parts, there are limits as to how fast the data can be accessed. When the data is scattered over the disk, it often takes
more time for the hard disk to physically move across the disk.
It has been widely known for many years that data which is spread out takes longer to access than data which is close
together. Many companies have offered solutions to this problem by arranging the data on the disk. The main product in
this area has been Speed Disk from the Norton Utilities by Symantec. This product (as do others)
rearranges the data on a hard drive such that each file is stored in contiguous sectors (next to each other). When
the files are arranged this way, it is called defragmentation.
While this approach to organizing data does produce significant performance gains, particularly for multimedia data files,
most applications (especially operating systems) do not always load (or access) the data in a file all at once. Many times,
small pieces of each file are accessed. Often, there is a pattern to the data accesses. Knowing this pattern, we can
fragment the drive in a way that data which is accessed in series is stored contiguously (as opposed to storing the files
contiguously). By doing this, we can expect larger performance gains over the defragmentation method of data rearrangement.
What the research entails
This research project contains four main stages:
1
| Constructing a trace tool
|
This stage of the project is being done by
Heng Zhou
(heng@cs.byu.edu).
She is working on writing a Windows 95/NT VXD which will capture all
requests which are made to the disk controller.
|
2
| Determining an algorithm to rearrange the data
|
This work is being done by
Nianlong Yin
(yin@un.cs.byu.edu).
He has been studying heuristics which will allow us to analyze the traces and arrange the data so that it can be accessed
more efficiently.
|
3
| Rearranging the data on the disk
|
This stage of the project is being performed by
Frank Sorenson (sorenson@byu.edu).
He has gathered information on the FAT16 and FAT32 filesystems, and has written some low-level FAT16 code which will
allow moving data from one sector to another.
He is currently rewriting the code in a Windows 95 friendly way.
|
4
| Evaluating the performance gains
|
Until each of the above stages is complete, we will not be able to evaluate performance gains. We do, however, have two
identical Pentium 200 machines and eight identical 2-gig hard drives with which to do the evaluation. They currently
serve as development platforms.
|
What we expect
We expect to see performance gains when optimizing the data in the manner described above. We expect moderate gains on a
FAT16 volume, and large performance gains when optimizing a FAT32 volume (smaller granularity). These performance gains
do not increase the cost of a machine, but merely work to optimize the disk access patterns.
Past work
Xiao-Hong Tu from this lab did work on improving disk I/O performance in a Novell Netware environment. Her research is
here.
Future work
While the work we are doing is specific to a PC running Windows 95/NT, and the FAT16 filesystem (FAT32 soon), the concept
is easily applied to many different operating systems with any file system. If we can achieve performance gains by
arranging data, we can improve the speed of a computer without increasing the cost.
To apply this method to other systems, only the file system code needs to be modified, and new trace tools created.
Links to our work
About disks
About fragmentation
Frank Sorenson's file system information
© 1996, Performance Evaluation Laboratory, Brigham Young University.
All rights reserved. Reproduction of all or part of this work is permitted
for educational or research use provided that this copyright notice is included
in any copy. Send comments to
webmaster@pel.cs.byu.edu.
|