Friday, October 14, 2005

corelation between %iowait from top and %busy or avwait from sar -d

MikeHT <mike2...@gmail.com> wrote:
> I would like to know from the list if there is any corelation between
> %iowait from top and %busy or avwait from sar -d # #?

No, not necessarily.

> Do it mean if
> %busy is high, then the %iowait
> will be high too?

No.

%busy is a description of what's happening on a disk (or the driver
talking to the disk). %iowait is a description of what's happening on
the CPU.

In some cases they may be correlated, but that's not necessarily true.
There are many cases where they would not be.

> But that corelation is not consistent based on the
> data captured from
> machines A & B. Machine B's devices's %busy is high, but %iowait is
> lower.

Right.

> How is %iowait calculated in top? Any insights? Thanks.

%iowait is the average fraction of time that the CPU is busy and that
the system has at least one outstanding I/O request. The outstanding
I/O may or may not have anything to do with your %busy disk.

Note that a system may be completely I/O swamped while no individual
disk is all that busy. Also, you could have a disk that shows 100%
busy, but it is meanwhile able to keep up with requests very rapidly.

So these numbers may not mean a lot in isolation. They're most useful
when looking at changes over time, or relationships between components.

Note the iowait is a subset of 'idle'. If you can keep the CPUs busy by
working them harder, they'll never display iowait. That's a major
difference between A and B below. B may be more swamped while at the
same time doing more CPU jobs (so that it has less idle time).

--

Thanks Darren for the reply.

High %iowait could also indicate that path to the disk devices (EMC
symmetric in this case) could be problematic? We are using EMC
powerpath.

Reply

MikeHT <mike2...@gmail.com> wrote:
> Thanks Darren for the reply.
> High %iowait could also indicate that path to the disk devices (EMC
> symmetric in this case) could be problematic?

It *could*, but it probably doesn't.

%iowait is a very visible number that doesn't often indicate a problem.
A well-tuned, perfectly reasonable machine may still have somewhat
elevated iowait figures during normal operation. Many admins try to
"fix" it when they shouldn't. (Note that because of this and a few
other reasons, Solaris 10 apparantly does not track this any longer, so
it will always appear as zero).

Look for problems by analyzing performance, not by looking at iowait.
Does the application work? Does it respond in good time? Are the
throughput and latency figures on your storage what you expect?

The iostat figures are much more likely to be relevant than the system
iowait number.

0 Comments:

Post a Comment

<< Home