The most useful software tool for testing Internet operation at the IP level is Ping.
Ping is one of the most useful network debugging tools available. It takes its name from a submarine sonar search - you send a short sound burst and listen for an echo - a ping - coming back.
In an IP network, `ping' sends a short data burst - a single packet - and listens for a single packet in reply. Since this tests the most basic function of an IP network (delivery of single packet), it's easy to see how you can learn a lot from some `pings'.
Ping is implemented using the required ICMP Echo function, documented in RFC 792 that all hosts should implement. Of course, administrators can disable ping messages (this is rarely a good idea, unless security considerations dictate that the host should be unreachable anyway), and some implementations have (gasp) even been known not to implement all required functions. However, ping is usually a better bet than almost any other network software.
Many versions of ping are available. For the remainder of this discussion, I assume use of BSD UNIX's ping, a freely available, full-featured ping available for many UNIX systems. Most PC-based pings do not have the advanced features I describe. As always, read the manual for whatever version you use.
What Ping can tell you
- Ping places a unique sequence number on each packet it transmits, and reports which sequence numbers it receives back. Thus, you can determine if packets have been dropped, duplicated, or reordered.
- Ping checksums each packet it exchanges. You can detect some forms of damaged packets.
- Ping places a timestamp in each packet, which is echoed back and can easily be used to compute how long each packet exchange took - the Round Trip Time (RTT).
- Ping reports other ICMP messages that might otherwise get buried in the system software. It reports, for example, if a router is declaring the target host unreachable.
What Ping can not tell you
- Some routers may silently discard undeliverable packets. Others may believe a packet has been transmitted successfully when it has not been. (This is especially common over Ethernet, which does not provide link-layer acknowledgments) Therefore, ping may not always provide reasons why packets go unanswered.
- Ping can not tell you why a packet was damaged, delayed, or duplicated. It can not tell you where this happened either, although you may be able to deduce it.
- Ping can not give you a blow-by-blow description of every host that handled the packet and everything that happened at every step of the way. It is an unfortunate fact that no software can reliably provide this information for a TCP/IP network.
Ping should be your first stop for network troubleshooting. Having problems transferring a file with FTP? Don't fire up your packet analyzer just yet. Leave your TDR in the box for now. Relax. Put on some Yanni. Don't even ``su'' - ping is a non-privileged command on most systems. Start one running and just watch it for at least two minutes. That's enough time for most periodic network problems to show themselves. Once you've seen about a hundred packets, you should be getting a good feel for how this host is responding. Are the round-trip times consistent? Seeing any packet loss? Are the TTL values sane? Start pinging other hosts. Try the machine next to you - the problem might be closer than you think. Try the last router - maybe the remote system is overloaded (especially if it's a popular Internet site like this one). Don't know what the last router is? Use traceroute or guess - changing the last number in the IP address to 1 usually gets you something interesting. Check other sites with similar network topologies (other remote LAN sites, or other Internet sites, or other sites using the same backbone). Starting to learn something about how your network is responding? Good. And - oh, yeah, go check that FTP. It's probably done by now.Here's a list of common BSD ping options, and when you might want to use them:
- -c count
- Send count packets and then stop. The other way to stop is type CNTL-C. This option is convenient for scripts that periodically check network behavior.
- Flood ping. Send packets as fast as the receiving host can handle them, at least one hundred per second. I've found this most useful to stress a production network being tested during its down-time. Fast machines with fast Ethernet interfaces (like SPARCs) can basically shutdown a network with flood ping, so use this with caution.
- -l preload
- Send preload packets as fast as possible, then fall into a normal mode of behavior. Good for finding out how many packets your routers can quickly handle, which is in turn good for diagnosing problems that only appear with large TCP window sizes.
- Numeric output only. Use this when, in addition to everything else, you've got nameserver problems and ping is hanging trying to give you a nice symbolic name for the IP addresses.
- -p pattern
- Pattern is a string of hexadecimal digits to pad the end of the packet with. This can be useful if you suspect data-dependent problems, as links have been known to fail only when certain bit patterns are presented to them.
- Use IP's Record Route option to determine what route the ping packets are taking. There are many problems with using this, not the least of which is that the option is placed on the request and the target host is under no obligation to place a corresponding option on the reply. Consider yourself lucky if this works.
- Bypass the routing tables. Use this when, in addition to everything else, you've got routing problems and ping can't find a route to the target host. This only works for hosts that can be directly reached without using any routers.
- -s packetsize
- Change the size of the test packets. Try it - why not? Check large packets, small packets (the default), very large packets that must be fragmented, packets that aren't a neat power of two. Read the manual to find out exactly what you're specifying here - BSD ping doesn't count either IP or ICMP headers in packetsize.
- Verbose output. You see other ICMP packets that are not normally considered ``interesting'' (and rarely are).
Sample ping sessions
This ping session shows a ten packet exchange over the loopback interface. One line is printed for every reply received. Note that for each sequence number, a single reply is received, and they are all in order. The IP TTL values are reported, as are the round-trip times. Both are very consistent. At the end of the session, statistics are reported. Pinging the loopback interface is a good way to test a machine's basic network configuration, since no packets are physically transmitted. Any problems in such a test is cause for alarm.[mauve]:[10:03pm]:[/home/rnejdl]> ping -c10 localhost PING localhost (127.0.0.1): 56 data bytes 64 bytes from 127.0.0.1: icmp_seq=0 ttl=255 time=2 ms 64 bytes from 127.0.0.1: icmp_seq=1 ttl=255 time=2 ms 64 bytes from 127.0.0.1: icmp_seq=2 ttl=255 time=2 ms 64 bytes from 127.0.0.1: icmp_seq=3 ttl=255 time=2 ms 64 bytes from 127.0.0.1: icmp_seq=4 ttl=255 time=2 ms 64 bytes from 127.0.0.1: icmp_seq=5 ttl=255 time=2 ms 64 bytes from 127.0.0.1: icmp_seq=6 ttl=255 time=2 ms 64 bytes from 127.0.0.1: icmp_seq=7 ttl=255 time=2 ms 64 bytes from 127.0.0.1: icmp_seq=8 ttl=255 time=2 ms 64 bytes from 127.0.0.1: icmp_seq=9 ttl=255 time=2 ms --- localhost ping statistics --- 10 packets transmitted, 10 packets received, 0% packet loss round-trip min/avg/max = 2/2/2 ms [mauve]:[10:03pm]:[/home/rnejdl]>
The next session shows a more interesting example - a router on the remote side of a medium speed (128Kbps) link. The initial timings show consistent link behavior. However, about 50 seconds into the trace, we see greater fluctuations in the RTT, which approaches one minute for several packets. From packet 53 to 54, we see a factor of 26 reduction in RTT. But since reductions in RTT rarely cause problems, this is not as troublesome as the change from packet 54 to 55, a factor of 7 increase in RTT. So what should the RTT be? Well, we're transferring 56 data bytes, plus an 8 byte ICMP header (64 ICMP bytes), plus a 20 byte IP header - 84 byte packets. At 128 kilobits per second, 84 bytes should require about 84*(8/128000) = 6 ms to transfer. Since the packet has to go both ways, we expect 10-15 ms round-trip times. None of these values are that low; clearly there are problems with this link. More than anything else, it is simply overcrowded.[mauve]:[10:03pm]:[/home/rnejdl]> ping sl-stk-3-S17-128k.sprintlink.net PING sl-stk-3-S17-128k.sprintlink.net (220.127.116.11): 56 data bytes 64 bytes from 18.104.22.168: icmp_seq=0 ttl=254 time=35.653 ms 64 bytes from 22.214.171.124: icmp_seq=1 ttl=254 time=28.797 ms 64 bytes from 126.96.36.199: icmp_seq=2 ttl=254 time=28.559 ms 64 bytes from 188.8.131.52: icmp_seq=3 ttl=254 time=39.533 ms 64 bytes from 184.108.40.206: icmp_seq=4 ttl=254 time=28.621 ms 64 bytes from 220.127.116.11: icmp_seq=5 ttl=254 time=28.159 ms ... 64 bytes from 18.104.22.168: icmp_seq=50 ttl=254 time=848.810 ms 64 bytes from 22.214.171.124: icmp_seq=51 ttl=254 time=828.579 ms 64 bytes from 126.96.36.199: icmp_seq=52 ttl=254 time=753.865 ms 64 bytes from 188.8.131.52: icmp_seq=53 ttl=254 time=778.202 ms 64 bytes from 184.108.40.206: icmp_seq=54 ttl=254 time=29.913 ms 64 bytes from 220.127.116.11: icmp_seq=55 ttl=254 time=220.931 ms 64 bytes from 18.104.22.168: icmp_seq=56 ttl=254 time=173.661 ms 64 bytes from 22.214.171.124: icmp_seq=57 ttl=254 time=144.990 ms 64 bytes from 126.96.36.199: icmp_seq=58 ttl=254 time=28.520 ms ... [mauve]:[10:03pm]:[/home/rnejdl]>
What you might see
- Dropped packets
A unfortunate fact of life. Detect them by noting when the sequence numbers skip, and the missing number does not appear again later. This is probably caused by a router queueing packets for a relatively slow link, and the queue simply grew too large. Early TCP implementations dropped packets at a truly alarming rate, but things have gotten better. Even so, there are common situations, typically involving crowded wide-area networks, in which even modern TCP implementations can't operate steady-state without dropping packets. There's no reason to pull your out hair over this, since TCP will retransmit missing data, but this won't make your network run faster. Also, if you have fast links that aren't showing much congestion, the cause of trouble may be elsewhere - link-level failures are the next most common cause of packet loss. I'd suggest using the techniques mentioned above to narrow down as much as possible where packets are being dropped, and try to understand why this is happening, even if fixing it is beyond your control.
- Fluctuating Round Trip Times
Another fact of life. Pretty much caused by the same things that cause packet loss. Again, not serious cause for alarm, but don't expect optimum performance from TCP. Remember that TCP generates an internal RTT estimate that affects protocol behavior. If the actual RTT changes too much, TCP may never be able to make a satisfactory estimate. Both dropped packets and RTT fluctuations may occur in a periodic nature - a batch of slow packets every 30 seconds, for instance. If you see this symptom, check for routing updates or other periodic traffic with the same period as the problem. Poor network performance can often be traced to slow links being clogged with various kinds of automated updates.
- Connectivity that comes and goes
Again, look for periods between problems that are multiples of some common number - 10 and 15 seconds are good things to check. If a router is sending error messages when connectivity disappears, that router's the first place to start looking. However, just because you can always reach hop 5, for instance, doesn't mean that your problem isn't hop 3. Hop 3's router may be erroneously timing out routing information for your target, but handling hop 5's routing information just fine. Of course, check hop 5 first if that's where your packets seem to check out but never leave.
- Ping works fine but TELNET/FTP/Mail/News/... doesn't
Good news - it's (probably) not a hardware problem. Use a packet tracer of some sort to see what TTL values are being generated by your hosts. If they're too low, you can see this kind of behavior. It could also be a software or configuration problem - can other machines connect to the offending host? Can it talk to itself? On the other hand, it could be a hardware problem, if one of your links is showing data-dependent behavior. The telltale symptom is when FTP (for example) can transfer some files fine, but others always have problems. Once you've found an offending file, trying breaking it into smaller and smaller pieces and see which ones don't work. If the pieces becomes too small to detect problems, duplicate them several times to get a larger file. Once you've found a small pattern that you suspect is causing your grief, see if you can load it into ping packets (BSD PING's `-p' switch) and reproduce the trouble.