Recently I had a discussion with a collegue about the monitoring of our systems and network devices. I showed him what we all measure, and he wondered if it was overkill or not. I told him that somethings maybe were, but that good monitoring is the first step to knowing what happens in your network, and that knowing what happens in your network is the first step to be able to isolate problems when they arise.
So the question is: "What do you need to monitor?" The answer is easy: "Everything". That's a pretty big amount. Let reality kick in and rephrase it to: "Everything you can think off". This is too vague and misses the final goal: "Everything you assume to be the normal conditions for a system or service to run properly.".
"Everything you assume to be the normal conditions for a system or service run properly". That is quite a lot, and different for each service you provide or device you have on the network. It will make you to have to you dig into your network and servers to find out what is going on.
The rest of this story can be found at my website.