VegaStream has some very nice ATAs and T38 devices (still), but after the upgrade to VEGA400_R082S017 (8.2.17) every two and a half day their system.sysUptime counter gets resetted:
> Host : sjh-vega400 > Output: Uptime is less than an hour! (328.61 seconds) > Date : 2008-02-28 14:21:08
According to Nagios, this happens every two and a half days:
The device doesn't reboot, the sysUpTime count just goes back to zero. Device itself says it's up for 9 days.
About 2 days, 11 hours, 37 minutes, 62 seconds.
Thanks to callum on irc.oz.org/#bugs : 2 ^ 31 / 10,000, or an overflow of the signed counter of the number of 100 microseconds.
[~/snmp] root@lizard>snmpwalk -v 1 -c public test-vega RFC1213-MIB::sysDescr.0 = STRING: "Vegastream IP Telephony Gateway (VEGA-6x4)" RFC1213-MIB::sysObjectID.0 = OID: RFC1155-SMI::enterprises.4686.11 DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (305772277) 35 days, 9:22:02.77 RFC1213-MIB::ifNumber.0 = INTEGER: 2 RFC1213-MIB::ifIndex.1 = INTEGER: 1 RFC1213-MIB::ifIndex.2 = INTEGER: 2 RFC1213-MIB::ifDescr.1 = STRING: "VS LAN Port 1" RFC1213-MIB::ifDescr.2 = STRING: "VS LAN Port 2" RFC1213-MIB::ifType.1 = INTEGER: ethernet-csmacd(6) RFC1213-MIB::ifType.2 = INTEGER: ethernet-csmacd(6) RFC1213-MIB::ifMtu.1 = INTEGER: 1514 RFC1213-MIB::ifMtu.2 = INTEGER: 1514 RFC1213-MIB::ifSpeed.1 = Gauge32: 104857600 RFC1213-MIB::ifSpeed.2 = Gauge32: 10485760
104857600 is 100 * 1024 * 1024, or their idea of an 100 Mbps ethernet.
Yesterday I had a chat with a friend about computer networks, hardware upgrades and system monitoring and I found out that I had created in the last couple of years a very robust and detailed network monitoring and systems monitoring system, and that it has made my life a lot easier than what I could have gotten.
For example, in Nagios we monitor nearly all aspects of our FreeBSD based servers: Not only the standard memory, CPU and diskspace, but also the answer from the DNS server on it, the presence of the crond, snmpd, inetd, sshd and syslogd. Not only do we monitor if all required processes are running, but also if their PID files are there and if the processes in these PID files do exist. And we monitor the status of the RAID cards, the status of the ethernet cards and were the default gateway points to. And the uptime of the server and the offset of the NTP synced time of the server.
With regarding to network devices (routers, switches) we monitor the uptime of the device (these things reboot faster than Nagios can detect), we monitor the status of all ports (duplex, speed, operational status), temperature and status of the power supplies. And the status of the OSPF neighbours and BGP neighbours, plus a list of expected networks in the routing table.
Network link devices (antennas, fibre convertors, laser heads) which support some form of remote management are checked the same: ethernet link status, radio link status, uptime. Anything which will display possible problems with it.
For our PABX's we monitor the status of the PRIs, the status of the IAX and SIP destinations.
Call it overdone, call it wasted too much time on monitoring... But when I replace a server or a device on the network, I would like to know without too much hassle if everything is back in order once I turn it on without having to go through too much hassle: When my monitoring program says everything is fine, I know everything went fine.
When you try connecting strange little switches and hubs to a Cisco 3560 PoE switch, your port might end up in "err-disabled" state:
That port is unusable until the end of time, or until somebody manually shut and no-shut it.Gi0/41 **IP Phones & PCs* err-disabled 8 a-half a-10 10/100/1000BaseTX cisco#show interfaces gi0/41 GigabitEthernet0/41 is down, line protocol is down (err-disabled)
Having experienced this once or two, it's a pain in the bottom and since I'm obsessed about monitoring, I tried to find out how to determine this remotely:
ifAdminStatus is up, but the "show interface" says it's down.RFC1213-MIB::ifDescr.10141 = STRING: "GigabitEthernet0/41" RFC1213-MIB::ifType.10141 = INTEGER: ethernet-csmacd(6) RFC1213-MIB::ifMtu.10141 = INTEGER: 1500 RFC1213-MIB::ifSpeed.10141 = Gauge32: 10000000 RFC1213-MIB::ifPhysAddress.10141 = Hex-STRING: 00 16 46 B6 DC A9 RFC1213-MIB::ifAdminStatus.10141 = INTEGER: up(1) RFC1213-MIB::ifOperStatus.10141 = INTEGER: down(2) RFC1213-MIB::ifLastChange.10141 = Timeticks: (3564555759) 412 days, 13:32:37.59
So nothing! There is no way to find an interface in the err-disabled state via SNMP!
We have a new remote accessible KVM switch. Of course we monitor from it as much as possible, including the sysUptime (an excellent way to catch devices which reboot within a one minute period). Only...
Only, the sysUptime gives an interesting answer:
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (270416251) 31 days, 7:09:22.51 [...reboot...] DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (4291368169) 496 days, 16:28:01.69 [...reboot...] DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (957) 0:00:09.57 DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (1163) 0:00:11.63 DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (1334) 0:00:13.34 DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (4291368783) 496 days, 16:28:07.83 DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (4291368895) 496 days, 16:28:08.95 DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (4291369006) 496 days, 16:28:10.06 [...]
It looks good for a couple of seconds, but then... Disabling NTP resolves these symptons.
Recently we obtained a couple of new network hardware devices, to make our network saver and our life easier. But what do they run?
The first one is a Citrix VPN/SSL device. That one is easy to identify, specially if you have SNMP enabled on it:
RFC1213-MIB::sysDescr.0 = STRING: "Linux net6gateway 2.4.24 #29 SMP Wed Nov 17 16:55:58 PST 2004 i686" RFC1213-MIB::ifDescr.2 = STRING: "eth0"
That's Linux for you!
The second one is a Tippingpoint SMS server:
RFC1213-MIB::sysDescr.0 = STRING: "Linux ips-manager 2.6.15-1.2054_FC5smp #1 SMP Tue Mar 14 16:05:46 EST 2006 i686" RFC1213-MIB::sysObjectID.0 = OID: NET-SNMP-TC::linux
Later on, during troubleshooting a software on it, I found out that it is running Fedora Core 6.
But... the Tippingpoint IPS devices itself:
RFC1213-MIB::sysDescr.0 = STRING: "TippingPoint IPS" RFC1213-MIB::ifDescr.1 = STRING: "lo0" RFC1213-MIB::ifDescr.2 = STRING: "fxp0"
That's some kind of *BSD.
Later on, while talking to tech support, I found out that it is running VxWorks.