Simple Network Management Protocol
Overview
The Simple Network Management Protocol, also known as SNMP, is a vital protocol for Network Administrators. The SNMP Protocol uses IP with UDP or TCP. The SNMP protocol allows an Administrator to request information about one or more network devices hardware, software or configuration from a single point of management.
The SNMP Protocol has two sides, the agent and the management stations. The agent sends data about itself to the management station. The management station collects data from all the agents on the network. The protocol is not identical on each side. The agent sends alerts called traps and answers requests that were sent by the management station. The management station catches and decodes the traps. The management station also requests specific information from the agent.
The agent is a server, router, printer, bridge or workstations. Any network device that can use the IP protocol can be an SNMP agent. The software written to monitor the agents’ hardware and send alerts is primarily the responsibility of the vendor. The initial design of the SNMP protocol called for every vendor to define the same hardware variables in their SNMP alerts. This was so any type of management station could decode the alerts. Today’s technologically advance hardware is too different to follow this design. This is why vendors are responsible for SNMP agent software. This has also caused the development of SNMPv2 and SNMPv3.
The SNMP agent will monitor hardware and send a trap is any problems occur. For example, if a network printer runs low on toner or a server has a hard drive go bad, a trap will be sent to the management station. The management station catches the trap and acts on the trap as configured by the administrator. The administrator could configure the management station to send an e-mail or have the administrator paged.
The management station may also send a request to the agent. In this case the SNMP agent will reply to the management station with the specific data. For example the management station may wish to view the agents hardware event log and see the current BIOS level. The agent will reply with SNMP packets containing the hardware’s events and BIOS level. The information is passed through requests and replies with the use of the MIB or the Management Information Base.
The management station is responsible for decoding the SNMP packets and providing an interface to the administrator. The interface can be a GUI or command line. The interface is the key to the SNMP protocol. The single point of management for administering several network devices.
MIBs
SNMP lets TCP/IP-based network management clients use a TCP/IP-based internetwork to exchange information about the configuration and status of nodes. The information available is defined by a set of managed objects referred to as the SNMP Management Information Base (MIB). The subset of managed objects that make up the TCP/IP portion of the MIB is maintained by each TCP/IP node. SNMP can also generates trap messages used to report significant TCP/IP events asynchronously to interested clients.
SNMP applications run in a network management station (NMS) and issue queries to gather information about the status, configuration, and performance of external network devices (called network elements in SNMP terminology). For example, HP Openview software is an example of a network management station, and a Cisco 4500 Router with its SNMP agent enabled could be considered an example of a network element. Shown below is a (very) basic diagram illustrating this concept.
SNMP agents run in network elements (for example, in the Cisco 4500 router) and respond to NMS queries (GETs) (for example, from HP Openview, or MRTG). In addition, agents send unsolicited reports (called traps) back to the NMS when certain network activity occurs. These traps can spawn events such as e-mail alerts, automatic pages or network server parameter modifications.
For security reasons, the SNMP agent validates each request from an SNMP manager before responding to the request, by verifying that the manager belongs to an SNMP community with access privileges to the agent.
An SNMP community is a logical relationship between an SNMP agent and one or more SNMP managers. The community has a name, and all members of a community have the same access privileges: either read-only (members can view configuration and performance information) or read-write (members can view configuration and performance information, and also change the configuration).
All SNMP message exchanges consist of a community name and a data field, which contains the SNMP operation and its associated operands. You can configure the SNMP agent to receive requests and send responses only from managers that are members of a known community. If the agent knows the community name in the SNMP message and knows that the manager generating the request is a member of that community, it considers the message to be authentic and gives it the access allowed for members of that community. Thus, the SNMP community prevents unauthorized managers from viewing or changing the configuration of a router or other SNMP managable device.
In summary, the SNMP Management program performs the following operations:
- The GET operation receives a specific value about a managed object, such as available hard disk space from the agent's MIB.
- The GET-NEXT operation returns the "next" value by traversing the MIB database (tree) of managed object variables.
- The SET operation changes the value of a managed object's variable. Only variables whose object definition allows READ/WRITE access can be changed.
- The TRAP operation sends a message to the Management Station when a change occurs in a managed object (and that change is deemed important enought to spawn an alert message).
Traps and Polling
When the agent discovers a problem of any kind, a trap is sent to its configured management station or stations. The trap is a simple SNMP packet containing an alert that some type of problem has occurred. The trap is a one way packet. The management station does not reply to the agent to confirm it received the trap.
A trap consists of six pieces. The first piece is the enterprise. This tells the management station what type of hardware has created the trap. The agent’s IP address is included in the trap so the management station can find out who sent the trap. The generic trap type is included to describe the general nature of the problem. The specific trap is another field in the trap. This field is enterprise specific. The MIB extension is defined in this field and passed as an integer. A time stamp is included in the trap to have a record of when the trap occurred. The last piece of the trap is the variable-binding field. This is where the value of the information is sent. The value bound to the MIB extension is sent in this field. The trap is designed for anyone to have the ability to recognize a potential problem.
The generic field allows traps to be sent without a MIB. The generic type will include seven different types of traps. The first two types of traps describe the agent’s startup. A ColdStart trap signifies that the agent has had some type of configuration changed. An example would be if a server has a value in its BIOS change before booting to the operating system. A WarmStart trap signals the management station that the system has been reinitialized meaning it has been restarted.
The next two types of traps describe the link status or communication between the agent and the network. The two traps are LinkDown and LinkUp. These traps are referencing the communication channels on the agents. These traps use the variable-binding field. For example a switch could send a LinkDown trap and use the variable-binding field to inform the management station which port on the switch is down. Of course the LinkUp trap signifies communications through the link have been restored.
The next type of trap is an AuthenticationFailure trap. This trap signals the management station that a request has been received from an unauthorized source. This type of trap can tell administrators if someone is trying to hack into a node or if the SNMP settings are not correct and communication is not working properly between the management station and the agent. In other words, the agent received a request and the request did not pass security.
The next type of generic trap can only be received when it has been configured to do so. An NeighborLoss trap signals the management station that a relationship between two nodes has been broken. The relationship must be configured for this type of trap to take place. Sometimes relationships between nodes are sometimes created to use SNMP. Some network devices, which are usually older, do not have any SNMP software written to monitor its hardware. A relationship is created with a node referred to as a SNMP proxy. The SNMP proxy will monitor the agent without SNMP capability and send traps for it. In this trap the variable-binding field contains the name of the node with the problem.
The last type of trap is enterpriseSpecific. This type relies on the specific-type field in the trap packet. This trap is informing the management station of an event that occurred in a specific piece of hardware defined by the vendor's SNMP software. The specific-type field will contain the MIB that is used to define the exact type of event that has occurred. This is the trap network administrator use to find specific problems. But the other generic traps will provide enough information to keep administrators informed of all types of problems.
Because the management station does not confirm traps, it may not know of a problem that has occurred on the agent if the trap is lost. Also, if the agent itself has crashed it cannot send a trap stating that it is no longer online. This scenario is similar to a patient not being able to tell their doctor they have died. To solve this problem polling is used.
Polling is what the management station does to make sure the agents are still up. A request goes out to each agent at a set time interval. If the agent does not reply after a set amount of retries, the management station can assume the agent is down and will execute a program to alert an administrator. Also periodically asking the agent what is wrong will allow the management station to miss a trap and not miss any critical data.
Alerts and Actions
When a management station receives an alert, it does not just store it in a log waiting for an administrator to come look at it. The latest technology allows the management station to alert the administrator. For example, when a management station receives a trap, an administrator can be paged or e-mailed a description of the problem. Some alerts are more severe that others. This can be configured for traps as well as polling. For example, if a management station receives a trap stating that a hard drive has gone bad on one of it agents, the management station can use a modem to dial an administrator’s pager. A second example is with the use of polling. The management station is polling an agent. When the agent stops replying to the management station, a timer is started. If the timer expires before the agent responses the administrator will be once again called or e-mailed depending on how the management stations is configured. The timer is used as a filter for false alerts. The agents’ replies to the management station may not always be received within the timeout period or received at all. Thus the timer makes sure the administrator is not alerted every time an SNMP packet is lost.
Messaging
There are five types of messages that can be sent using SNMP including the trap. The management station must be able to query the agent to see if any problem have occurred. Administrators may also want to get hardware configuration from the agent. The other four types of messages used with SNMP are what the management stations use to communicate with the agents and receive requested information.
The first two have similar architecture. Get-Request is a message sent by the management station to the agent requesting a specific value. This is how the management station can view the configuration settings of the agent. Set-Request requires the agent to change the value of its configuration. Set-Request is why security is such a major issue with SNMP. These messages are a great help to administrators. The ability to view and change hardware configuration from a remote site extends administrators ability.
After the agent has found the answer to the management stations Get-Request or Set-Request, the agent replies with a Get-Response. This message will tell the management station its answer to its request or whether or not the set was successful. The Get-Response message is also used for replying to the Get-Next-Request message. The Get-Next-Request message is sent by the management station and gives the agent a variable asking it to give the value of the next variable in its MIB. This message is what the management station uses to query the agent for any problems. For example the management station can send a Get-Next-Request on the very first element in the agents MIB. After the management station receives the Get-Response, it sends another Get-Next-Request on the MIB object it just received a value for.
The agent is only responsible for sending Get-Response and the trap. The trap is the only message initiated by the agent. The Get-Response message must be initiated by receiving a request.
Management using DMI
The latest development with network management and SNMP is expanding to all forms of hardware. Today’s latest software includes desktops as well. The Desktop Management Interface (DMI) is a good example. The DMI is another piece of software that is next to the SNMP protocol but not replace it. The DMI functions reach for more of the software level where SNMP is mostly hardware. DMI is very similar to SNMP. Instead of a MIB the DMI uses the Management Information Format know as a MIF. Simple algorithms are used to transform MIF into a MIB and transport through SNMP protocol. The DMI will provide a better API in the Operating System to get information out of the hardware and software. The reason for DMI is new hardware. The latest firmware in today’s technology provides specific application calls that can be used to get information about the hardware. The term DMI can be used in place of API for hardware. DMI is providing a standard procedure for using SNMP to talk to the hardware. The DMI is very useful when wanting data that is very dynamic. For example, if an administrator would like to see the amount of I/O on a nodes hard disk or the current utilization of the processors, the management station will make a request using the DMI management interface (MI) through the SNMP protocol to the DMI component interface (CI) which will make the appropriate API call to the hardware. The CI will then get the answer and use the DMI’s MIF to transport the information back to the management station. This can be done due to the standard DMI calls through the CI. These calls are standard for any platform. The management station can request processor use for any node using a standard procedure call for all nodes to their CI. This will decrease the amount of platform specific information the management station is required have. The reason for DMI is to get network management back to an industry standard.
Security
Security is important when using SNMP. Because SNMP agents broadcasts information and in some agents changed, security must not be overlooked. The initial version of SNMP now referred to, as SNMPv1 did not have a very good implementation of security. SNMP faces all the standard threats of any network application: Modification of Information, Masquerade, Message Stream Modification, and Disclosure.
SNMPv1 used only one form of security, community names. Community names are similar to passwords. Agents can be set to reply to queries only received by accepted community names. In SNMPv1 the community name was passed along with the data packet in clear text. This allowed anyone to ease drop and learn the SNMP community name or password.
SNMPv2 brought a lot of extra security. First of all everything in the packet except for the destination address is encrypted. Inside the encrypted data is the community name and source IP address. The agent can now decode the encrypted data packet and use the accepted community name and accepted source IP address to validate the request. This type of security is referred to as party and context. Party referring to a specific machine or person and context referring to a name or string associated with the party. SNMP uses DES (Data Encryption Standard) for encrypting the data packets.
SNMPv3 provides the latest architecture for SNMP security. It incorporates an SNMP context engine ID to encode and decode SNMP contexts. The context engine ID could take more time than allowed to explain. In short it matches a context name with an object and the security requires the object and context to match. SNMPv3 provides three levels of security. The highest level is with authentication and privacy. The middle level is with authentication and no privacy and the bottom level is without authentication or privacy.
The perfect example of why SNMP security is important is its ability to reboot devices. Administrators cannot let that ability be violated. The latest version of SNMP have brought security a long way from clear text.
RFC's
The following RFC's define how RFC functions:
RFC 1157
RFC 1155
RFC 1212
RFC 1213 - Management Information Base II (MIB II)