Friday, October 21, 2005

SUMMARY: Tracking down system calls on Solaris 9

Subject: SUMMARY: Tracking down system calls on Solaris 9

Hi all,

Many thanks to everyone who responded - Aleksander Pavic, francisco, and
Darren Dunham. My original email is attached below, along with the
replies I got - but to summarise : I was seeing a very high sysload on a
Solaris 9 web server, and vmstat confirmed that a large number of system
calls were being generated. I wanted to track these down and find out
what was being called, but couldn't use Dtrace. Yet another argument for
moving to Solaris 10 :)

As Darren said in his response: "The limitations on existing tools like
'truss' are part of what drove
dtrace, so I don't know that there's any magic out there."

He then went on to suggest I analyse all the Apache processes with
truss, send the output to a file and then analyse that. This was also
the path suggested by Aleksander, who quite correctly pointed out that
truss can be made to follow any child processes generated via forking,
so I could therefore truss the main Apache process and follow all it's
children. He also suggested I send the output to a file, and
post-process it with awk or perl. Francisco also suggested the useful
lsof tool to see what files are open, as my original hypothesis was that
there were a large number of file handles being opened and closed.

In the end, I trussed every "httpd" process, and generated a summary
using "truss -c". I let this run for 20 seconds, and saw that there were
a very large number of resolvepath() and open() calls being generated,
roughly half of these calls returned with an error.

I then narrowed my search down, and examined what was actually being
passed as arguments to these calls. This is easily done with "truss -t
open,resolvepath". It turns out that a huge number of the
resolvepath()'s and open()'s were being generated by PHP scripts running
under Apache. They were using an inefficient include_path, and so when
most files were being included, PHP generated many resolvepath() and
open() calls which returned in error before finally finding the correct
location of the file.

We fixed the PHP include_path and also modified some of the scripts to
use an absolute path in include() or require() functions, and as
expected, the number of syscalls being generated halved itself.

There were a number of other code-related problems on that server as
well, but these were unrelated to my original request for help.

Once again, many thanks to those that responded. Problem resolved !

-Mark

0 Comments:

Post a Comment

<< Home