Friday, September 02, 2005

What Makes Your Computer Slow? How Do You Fix It?

39.12 What Makes Your Computer Slow? How Do You Fix It?

Article 39.5 discussed the various components that make up a user's
perception of system performance. There is another equally important
approach to this issue: the computer's view of performance. All system
performance issues are basically resource contention issues. In any
computer system, there are three fundamental resources: the CPU,
memory, and the I/O subsystem (e.g., disks and networks). From this
standpoint, performance tuning means ensuring that every user gets a
fair share of available resources.

Each resource has its own particular set of problems. Resource
problems are complicated because all resources interact with one
another. Your best approach is to consider carefully what each system
resource does: CPU, I/O, and memory. To get you started, here's a
quick summary of each system resource and the problems it can have.
39.12.1 The CPU

On any time-sharing system, even single-user time-sharing systems
(such as UNIX on a personal computer), many programs want to use the
CPU at the same time. Under most circumstances the UNIX kernel is able
to allocate the CPU fairly; however, each process (or program)
requires a certain number of CPU cycles to execute and there are only
so many cycles in a day. At some point the CPU just can't get all the
work done.

There are a few ways to measure CPU contention. The simplest is the
UNIX load average, reported by the BSD uptime (39.7) command. Under
System V, sar -q provides the same sort of information. The load
average tries to measure the number of active processes at any time (a
process is a single stream of instructions). As a measure of CPU
utilization, the load average is simplistic, poorly defined, but far
from useless.

Before you blame the CPU for your performance problems, think a bit
about what we don't mean by CPU contention. We don't mean that the
system is short of memory or that it can't do I/O fast enough. Either
of these situations can make your system appear very slow. But the CPU
may be spending most of its time idle; therefore, you can't just look
at the load average and decide that you need a faster processor. Your
programs won't run a bit faster. Before you understand your system,
you also need to find out what your memory and I/O subsystems are
doing. Users often point their fingers at the CPU, but I would be
willing to bet that in most situations memory and I/O are equally (if
not more) to blame.

Given that you are short of CPU cycles, you have three basic alternatives:

*

You can get users to run jobs at night or at other low-usage
times -suring the computer is doing useful work 24 hours a day) with
batch or at (40.1).
*

You can prevent your system from doing unnecessary work.
*

You can get users to run their big jobs at lower priority (39.9).

If none of these options is viable, you may need to upgrade your system.
39.12.2 The Memory Subsystem

Memory contention arises when the memory requirements of the active
processes exceed the physical memory available on the system; at this
point, the system is out of memory. To handle this lack of memory
without crashing the system or killing processes, the system starts
paging: moving portions of active processes to disk in order to
reclaim physical memory. At this point, performance decreases
dramatically. Paging is distinguished from swapping, which means
moving entire processes to disk and reclaiming their space. Paging and
swapping indicate that the system can't provide enough memory for the
processes that are currently running, although under some
circumstances swapping can be a part of normal housekeeping. Under BSD
UNIX, tools such as vmstat and pstat show whether the system is
paging; ps can report the memory requirements of each process. The
System V utility sar provides information about virtually all aspects
of memory performance.

To prevent paging, you must either make more memory available or
decrease the extent to which jobs compete. To do this, you can tune
system parameters, which is beyond the scope of this book (see
O'Reilly & Associates' System Performance Tuning by Mike Loukides for
help). You can also terminate (38.10) the jobs with the largest memory
requirements. If your system has a lot of memory, the kernel's memory
requirements will be relatively small; the typical antagonists are
very large application programs.
39.12.3 The I/O Subsystem

The I/O subsystem is a common source of resource contention problems.
A finite amount of I/O bandwidth must be shared by all the programs
(including the UNIX kernel) that currently run. The system's I/O buses
can transfer only so many megabytes per second; individual devices are
even more limited. Each kind of device has its own peculiarities and,
therefore, its own problems. Unfortunately, UNIX has poor tools for
analyzing the I/O subsystem. Under BSD UNIX, iostat can give you
information about the transfer rates for each disk drive; ps and
vmstat can give some information about how many processes are blocked
waiting for I/O; and netstat and nfsstat report various network
statistics. Under System V, sar can provide voluminous information
about I/O efficiency, and sadp (V.4) can give detailed information
about disk access patterns. However, there is no standard tool to
measure the I/O subsystem's response to a heavy load.

The disk and network subsystems are particularly important to overall
performance. Disk bandwidth issues have two general forms: maximizing
per-process transfer rates and maximizing aggregate transfer rates.
The per-process transfer rate is the rate at which a single program
can read or write data. The aggregate transfer rate is the maximum
total bandwidth that the system can provide to all programs that run.

Network I/O problems have two basic forms: a network can be overloaded
or a network can lose data integrity. When a network is overloaded,
the amount of data that needs to be transferred across the network is
greater than the network's capacity; therefore, the actual transfer
rate for any task is relatively slow. Network load problems can
usually be solved by changing the network's configuration. Integrity
problems occur when the network is faulty and intermittently transfers
data incorrectly. In order to deliver correct data to the applications
using the network, the network protocols may have to transmit each
block of data many times. Consequently, programs using the network
will run very slowly. The only way to solve a data integrity problem
is to isolate the faulty part of the network and replace it.
39.12.4 User Communities

So far we have discussed the different factors that contribute to
overall system performance. But we have ignored one of the most
important factors: the users who submit the jobs.

In talking about the relationship between users and performance, it is
easy to start seeing users as problems: the creatures who keep your
system from running the way it ought to. Nothing is further from the
truth. Computers are tools: they exist to help users do their work and
not vice versa.

Limitations on memory requirements, file size, job priorities, etc.,
are effective only when everyone cooperates. Likewise, you can't force
people to submit their jobs to a batch queue (40.6). Most people will
cooperate when they understand a problem and what they can do to solve
it. Most people will resist a solution that is imposed from above,
that they don't understand, or that seems to get in the way of their
work.

The nature of your system's users has a big effect on your system's
performance. We can divide users into several classes:

*

Users who run a large number of relatively small jobs: for
example, users who spend most of their time editing or running UNIX
utilities.
*

Users who run a small number of relatively large jobs: for
example, users who run large simulation programs with huge data files.
*

Users who run a small number of CPU-intensive jobs that don't
require a lot of I/O but do require a lot of memory and CPU time.
Program developers fall into this category. Compilers tend to be large
programs that build large data structures and can be a source of
memory contention problems.

All three groups can cause problems. Several dozen users running grep
and accessing remote filesystems can be as bad for overall performance
as a few users accessing gigabyte files. However, the types of
problems these groups cause are not the same. For example, setting up
a "striped filesystem" will help disk performance for large, I/O-bound
jobs but won't help (and may hurt) users who run many small jobs.
Setting up batch queues will help reduce contention among large jobs,
which can often be run overnight, but it won't help the system if its
problems arise from users typing at their text editors and reading
their mail.

Modern systems with network facilities (1.33) complicate the picture
even more. In addition to knowing what kinds of work users do, you
also need to know what kind of equipment they use: a standard terminal
over an RS-232 line, an X terminal over Ethernet, or a diskless
workstation? The X Window System requires a lot of memory and puts a
heavy load on the network. Likewise, diskless workstations place a
load on the network. Similarly, do users access local files or remote
files via NFS or RFS?

0 Comments:

Post a Comment

<< Home