Note: This article is a recap of our webinar, Foundations of Network Monitoring. It highlights the key insights, examples, and takeaways shared during the session—it’s not a word-for-word transcript.
Hello, friends! Destiny Bertucci here. If you missed our first installment of the NMS Accelerator series, fear not—I’ve got your back. Let’s unpack the key insights from “Foundations of Network Monitoring” so you can start leveraging better visibility, smarter alerting, and proactivity across your environment.
Did you miss the live event? No problem. View the recording here:
Why Monitoring Matters
We kicked things off by laying the foundation—why monitoring is essential. The reality is simple: without visibility, you’re flying blind. Networks can be unpredictable, and downtime is both costly and reputation-damaging.
Three core benefits of network monitoring:
- Visibility – see what’s in your environment, in real time.
- Prevention – detect warning signs before they escalate.
- Efficiency – spend less time chasing ghosts and more time solving problems.
Where NMS Fits into Your Tech Stack
In the webinar, we broke down how a Network Monitoring System (NMS) complements other tools rather than competes with them. It’s not about replacing your RMM or PSA—it’s about adding network-focused insight to your existing strategy.
Think of your IT stack like this:
- Network Monitoring Solution (NMS) is the “eyes and ears” of your network.
- Remote Monitoring and Management (RMM) manages endpoints—patches, scripts, remote control.
- Professional Services Automation (PSA) ties it all together with workflow, tickets, and reporting.
That team synergy is what empowers true proactive operations.
Automated Device Discovery & Inventory
One of the biggest “aha” moments was around automated device discovery. Instead of typing in device details by hand (yikes!), the tool scans your network and generates a living inventory of routers, switches, firewalls, servers, and more.
As the network changes, so does your inventory—ensuring no device slips through the cracks. MSPs get faster client onboarding. Internal IT teams gain peace of mind—and more time for innovation.
Choosing Where (and What) to Monitor
You asked, “Where should I deploy NMS?” The answer: everywhere that matters. Yes—roll it out consistently across the fleet. Small exceptions (think minimal standalone sites) aside, broad deployment ensures you don’t end up chasing blind spots.
Next, we dispelled the “managed vs. unmanaged” myth and focused instead on practical impact. Monitor what disrupts business when it fails:
- Core infrastructure: Routers, switches, firewalls.
- Critical services: Servers, storage, VPN devices.
- User touchpoints: Wi-Fi access points, VoIP phones, cameras, printers.
Metrics That Matter
Less is more when it comes to monitoring metrics. Our “just the essentials” list got nods for its clarity:
- Uptime & availability
- CPU & memory load
- Interface utilization
- Connectivity
These four metrics catch the majority of issues—you’ll be surprised how much proactive troubleshooting they enable when combined with baselines.
Establishing Baselines: Understanding “Normal”
I can’t overstate it—baseline data turns alert noise into smart signal. Having historical context means you know when 30% CPU is normal versus concerning, or whether 5% packet loss is expected versus critical. Those subtle differences can mean the difference between a timely fix and an avoidable outage.
Real-World Scenarios
We wrapped the session by walking through two use cases:
- MSP Rollout – Automated discovery + baseline templates = faster, cleaner client onboarding and consistent SLAs.
- Internal IT Application – Monitoring Wi-Fi, VPN, and server health lets you patch problems before users notice—turning IT into value creators, not just responders.
Key Takeaways
- NMS is a force-multiplier—not a silo—when paired with RMM and PSA.
- Automated device discovery delivers up-to-date visibility without manual effort.
- Focus on impactful devices and metrics.
- Baselines plus early alerts = reliable network reliability.
What’s Next: Join 201
Our next session—201: Proactive Monitoring & Performance Insight—will level up what we’ve covered today. We’ll deep dive into performance metrics, traffic insights, smarter thresholding, and how to integrate these into operational workflows. That’s where monitoring becomes a strategic asset, not just a support tool.
Putting The Principles into Practice
If you are wondering how to put these principles into practice, I’ve created the following 10-step checklist to help you kickstart your network monitoring.
1. Prepare Your Environment
- Confirm deployment type: Windows, Linux, Hyper-V, NAS, or appliance.
- Verify collector resources (CPU, memory, storage) meet requirements.
- Plan for redundancy & segmentation (don’t treat your collector as one-and-done).
2. Credentials & Access
- Gather device credentials Domotz will need (SNMP v2/v3, SSH, WMI, API keys, cloud controller logins).
- Test credentials before onboarding devices.
- Remove default SNMP strings and enforce secure standards.
3. Automated Discovery & Inventory
- Run the Domotz discovery scan to identify all connected devices (routers, switches, firewalls, Wi-Fi, servers, endpoints).
- Validate that every device is classified (managed vs unmanaged doesn’t matter — impact does).
- Confirm inventory auto-updates as the environment changes.
4. Define What to Monitor
- Select devices with highest business impact:
- Core infrastructure (routers, switches, firewalls).
- Critical services (servers, storage, VPN, cloud gateways).
- User touchpoints (Wi-Fi APs, VoIP, cameras, printers).
- Apply standard monitoring policies across clients/sites for consistency.
5. Metrics That Matter
- Enable collection of the essential four metrics:
- Availability (uptime).
- CPU & memory load.
- Interface utilization.
- Connectivity/latency.
- Use SNMP OIDs and vendor-specific extensions where available.
6. Baselines & Thresholds
- Establish a performance baseline (2–4 weeks of normal data).
- Define thresholds for alerts based on baseline behavior (avoid false positives).
- Use delta monitoring (rate of change) for disk space, bandwidth spikes, etc.
- Document baseline templates for MSP rollouts.
7. Alerts & Notifications
- Configure meaningful alerts (avoid flapping).
- Define critical vs. warning thresholds.
- Set up escalation rules (who gets notified and how).
- Integrate alerts with your PSA/ticketing system for streamlined workflows.
8. Reporting & KPIs
- Define KPIs relevant to stakeholders (firewall capacity, CPU growth, bandwidth trends).
- Schedule regular reports for both tech teams and business leaders.
9. Security & Compliance
- Regularly review monitored device configurations.
- Standardize on a “gold config” for core devices.
- Monitor for open ports, rogue devices, duplicate IPs.
- Document compliance checks for client SLAs.
10. Continuous Improvement
- Upskill your team — start with metrics, then expand to traffic flows, baselines, and configs.
- Add custom OIDs and vendor-specific monitoring over time.
- Use topology mapping to validate network resilience.
- Run post-incident reviews to refine alerts, thresholds, and reports.
Series Lookback
One of my favorite parts of any session is the Q&A, that’s where the real-world challenges come out. In our 101 webinar, you all asked fantastic questions about upskilling, deployment, and monitoring strategy. Here are a few highlights worth capturing:
- Upskilling Techs – The best place to start is with the metrics. For MSPs, that means teaching your teams to think beyond devices and into client environments: verticals, contractual obligations, and vendor stacks. Techs should learn how to be business advisors as much as operators. And yes — engage with you monitoring tools and ask questions to their knowledge bases to better understand!
- Configuration & Security – We talked about proactively reviewing configs, checking for open SNMP defaults, and defining a “gold standard” for your environment. It’s not just about monitoring availability — it’s about keeping networks secure and aligned to best practice.
- Deployment Options – Windows, Linux, Hyper-V… they all work. What matters most is making sure your collectors have the CPU and memory resources to support the data you’re pulling in. We also touched on agent vs. agentless monitoring, and the shift toward more cloud-friendly models.
- Scaling to Large Environments – Firefighting gets old fast. Use baselines and thresholds to spot anomalies before they explode. Monitoring at scale is about understanding configurations, security, and traffic so you can avoid noise and focus on action.
- SNMP & Vendor Metrics – SNMP is still the backbone. Out-of-the-box OIDs get you started, but defining your own based on vendor input is where you can unlock more insight. Always ask vendors: how should we monitor your product?
- Domo Box & Topology – We discussed redundancy, segmentation, and why topology awareness matters. The takeaway: don’t treat your collector as a one-and-done — it’s part of your resilience strategy.
- Reporting & KPIs – Monitoring isn’t just for techs — it’s for stakeholders, too. Understand what KPIs matter to your business leaders (like firewall capacity or CPU trends), and use monitoring data to prove ROI, validate upgrades, and plan capacity.
That’s all from me for now. Take a breath, dive into your monitoring solutions, and get ready to shift from reactive to strategic IT.
See you in Session 201—ready to go deeper!
Cheers,
Destiny