Here I want to start new article series called “5-minute troubleshooting”. In these articles I’m going to describe short simple cases which we solved very quickly using protocol analysis and which could take much more time if we use another approach. So, let’s go.
One day we received an IP camera from our customer with the next complaint: “No image, no ping, the camera just disappeared from a network”.
First, we checked whether the camera was receiving PoE (yes it was), and booted up normally (according to camera LEDs no problems were here also).
Next step? Let’s try to ping it using IP-address our customer provided to us. No reply (and no surprise actually).
At this point it would be nice to check port statistics and status to eliminate L1 failure:
Camera is on port 2. The port is in “Up” state, autonegotiation is completed successfully. There is some traffic passing in and out (RX/TX counters are both non-zero). No bad packets, no collisions, no ideas what could be wrong instead of we’ve been provided the wrong IP-address information.
So, it’s time to look what is happening on the wire.
We arranged simple capturing scheme using SPAN:
And captured some traffic:
What do we see here? The camera has an IP of 192.168.88.203 like it should have.
Right after boot process the camera tries to resolve some standard host name using MDNS (packets 1, 2). This is built-in mechanism and it’s described in documentation. When it sees no answer (also possible situation), it tries to resolve an IP 192.168.88.202 using ARP (packets 3, 4 are ARP requests from camera).
Why does it resolve exactly this address? I’ll tell you. Our customer said that he has NAS device with that IP, and therefore camera has been configured to use it. Knowing that, we intentionally assigned 192.168.88.202 to our test client machine. Also that means, that camera config is still inside it and is usable.
So, our client replies to camera’s ARP request and all seems to be fine, but… after 6 seconds the camera asks the same question again. There is no other traffic between these requests, PCAP file is not filtered. Later in the trace client tries to resolve camera’s MAC using ARP (packets 9-14) but gets no reply at all. Strange situation.
Question: what do you think about trace file above? Hint: look at timing between first two MDNS packets and further ARP requests. Is it normal?
Let’s try to boot the camera in DHCP mode. We can do that by pressing one of the buttons located on camera’s case during the boot process:
So, we see the next:
“Is there any DHCP server here? – Yes, I’m here to offer you an IP address! – Looong quiet time… – Is there any DHCP server here?”
The only reason is left. We see every packeton the wire, but they get lost downstream. Downstream? Wait, we’re mirroring exact link that is connected directly to the camera! But it does not hear incoming packets despite they’re actually present on the wire. There is something wrong with NIC driver, or with some electrical circuit inside the NIC. This fault doesn’t affect Layer 1 (physical), doesn’t affect sending part of it but also it does not allow the data to be received and (or?) processed by camera further at some stage. As this is just an IP-camera, not a server, that means we can’t check drivers, log into console, change NIC or do any further diagnosing.
End of the story: we sent the camera to manufacturer with the above description and eventually it’s been replaced because of “faulty NIC” problem.
Train your brain. Answer the following question.
1. Probably you’ve guessed that first trace file contains duplicate broadcast packets, right? What do you think will happen if you deduplicate this trace file using editcap utility with default settings? What settings would be better in this case?