Friday, October 14, 2005

SSH Frequently Asked Questions

Sometimes my SSH connection hangs when exiting — the shell (or remote
command) exits, but the connection remains open, doing nothing.
Quick Fix
You're probably using the OpenSSH server, and started a background
process on the server which you intended to continue after logging out
of the SSH session. Fix: redirect the background process
stdin/stdout/stderr streams (e.g. to files, or /dev/null if you don't
care about them). For example, this hangs:

client% ssh server
server% xterm &
server% logout

but this behaves as expected:

client% ssh server
server% xterm < /dev/null >& /dev/null &
server% logout
SSH session terminates

Short Explanation
This problem is usually due to a feature of the OpenSSH server. When
writing an SSH server, you have to answer the question, "When should
the server close the SSH connection?" The obvious answer might seem to
be: close it when the server-side user program started by client
request (shell or remote command) exits. However, it's actually a bit
more complicated; this simple strategy allows a race condition which
can cause data loss (see the explanation below). To avoid this
problem, sshd instead waits until it encounters end-of-file (eof) on
the pipes connecting to the stdout and stderr of the user program.

This strategy, however, can have unexpected consequences. In Unix, an
open file does not return eof until all references to it have been
closed. When you start a background process from the shell on the
server, it inherits references to the shell's standard streams. Unless
you prevent this by redirecting these, or the process closes them
itself (daemons will generally do this), the existence of the new
process will cause sshd to wait indefinitely, since it will never see
eof on the pipe connecting it to the (now defunct) shell process —
because that pipe also connects it to your background process.

This design choice has changed over time. Early versions of OpenSSH
behaved as described here. For some time, it was changed to exit
immediately upon exit of the user program; then, it was changed back
when the possibility of data loss was discovered.
Race Condition Details
As an example, let's take the simple case of:

ssh server cat foo.txt

This should result in the entire contents of the file foo.txt coming
back to the client — but in fact, it may not. Consider the following
sequence of events:

* The SSH connection is set up; sshd starts the target account's
shell as shell -c "cat foo.txt" in a child process, reading the
shell's stdout and sending the data over the SSH connection. sshd is
waiting for the shell to exit.
* The shell, in turn, starts cat foo.txt in a child process, and
waits for it to exit. The file data from foo.txt which cat write to
its stdout, however, does not pass through the shell process on its
way to sshd. cat inherits its stdout file descriptor (fd) from it
parent process, the shell — that fd is a direct reference to the pipe
connecting the shell's stdout to sshd.
* cat writes the last chunk of data from foo.txt, and exits; the
data is passed to the kernel via the write system call, and is waiting
in the pipe buffer to be read by sshd. The shell, which was waiting on
the cat process, exits, and then sshd in turn exits, closing the SSH
connection. However, there is a race condition here: through the
vagaries of process scheduling, it is possible that sshd will receive
and act on the SIGCHLD notifying it of the shell's exit, before it
reads the last chunk of data from the pipe. If so, then it misses that

This sequence of events can, for example, cause file truncation when using scp.


