Debugging SSH logins
Adam La Rosa
Posted on May 10, 2020
One of the main reasons I got into unix systems was the ability to remotely log into another machine. While this is common now amongst most operating systems, the ability to telnet into another machine was once a novelty. Before long people discovered that telnet's use of plain text was a security issue, and ssh became the dominate remote execution protocol.
I love being able to ssh into one of my machines at home from where ever I like (or can find a client to use). Having recently installed OpenBSD 6.6 on a box laying around the house I of course enabled ssh during the installation. Everything seemed fine from the machine itself. It wasn't until I tried to log in remotely that I got this gem.
Connection Refused
Not exactly my dreams of having the resources of my server available at the touch of a button. After a reboot it worked fine, then after another I get the same error message. Then after waiting a while I can log in again. But where to look for answers? My first clue was to check the dmesg buffer from boot and see what network device all this was happening under.
rl0 at pci2 dev 2 function 0 "Realtek 8139" rev 0x10: apic 2 int 22
So I know to look in the "rl" manual page for any resources. After a brief glance I find this bit of knowledge.
BUGS
Since outbound packets must be longword aligned, the transmit routine has
to copy an unaligned packet into an mbuf cluster buffer before
transmission. The driver abuses the fact that the cluster buffer pool is
allocated at system startup time in a contiguous region starting at a
page boundary. Since cluster buffers are 2048 bytes, they are longword
aligned by definition. The driver probably should not be depending on
this characteristic.
The Realtek data sheets are of especially poor quality: the grammar and
spelling are awful and there is a lot of information missing,
particularly concerning the receiver operation. One particularly
important fact that the data sheets fail to mention relates to the way in
which the chip fills in the receive buffer. When an interrupt is posted
to signal that a frame has been received, it is possible that another
frame might be in the process of being copied into the receive buffer
while the driver is busy handling the first one. If the driver manages
to finish processing the first frame before the chip is done DMAing the
rest of the next frame, the driver may attempt to process the next frame
in the buffer before the chip has had a chance to finish DMAing all of
it.
The driver can check for an incomplete frame by inspecting the frame
length in the header preceding the actual packet data: an incomplete
frame will have the magic length of 0xFFF0. When the driver encounters
this value, it knows that it has finished processing all currently
available packets. Neither this magic value nor its significance are
documented anywhere in the Realtek data sheets.
Which made me focus on this...
One particularly important fact that the data sheets fail to mention relates to the way
in which the chip fills in the receive buffer. When an interrupt is posted to signal
that a frame has been received, it is possible that another frame might be in the process of
being copied into the receive buffer while the driver is busy handling the first one.
Swap out the network cards and voila! No more refused connections.
Posted on May 10, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.