When in charge of the preventive maintenance of a company’s servers, there are numerous areas one must cover to ensure its proper operation. First, the hardware of the network must be closely monitored. This hardware includes the servers, routers, switches, storage facilities, and the infrastructure equipment. While most server manufacturers already include monitoring software that can pay close attention to the server’s health, it is still important to monitor the server’s temperature, ECC errors in the memory, hard drive status, fans, and RAID controllers. Environmental factors such as temperature and water accumulation/humidity can cause serious complications to a server and connected devices. Due to these threats, a temperature probe and water detector should be used in the area where the server and additional equipment is located. When possible, using smart devices to monitor the temperature and the presence of water is recommended so that alerts can be sent to the administrator as soon as a problem occurs. To ensure that the server’s software is adequate in overseeing all of this, one must make sure that the server’s software is both installed, properly configured, and most importantly, updated continuously.
Most servers can also be set up to send notifications and alerts in the form of an email or text message if a problem arises. Servers and other devices should also always use an uninterruptable power supply (UPS) so that they will not be affected by a power failure due to natural or man-made causes. The UPS’s software, if included, must be configured appropriately as well so that automatic battery tests are done in short and constant intervals. It is important to remember that nothing can guarantee that a server or other devices will never fail, so a plan must be created for every possible situation such as power surges, power failures, software bugs, or unauthorized access. With these plans, it is recommended to have an adequate number of spare devices to ensure quick fixes can be made should a device fail. Again, to help prevent the possibility of failure, monitoring software should be installed that focuses on switches and routers. All equipment’s manufacture, install, and replacement dates should be logged to keep track of the devices as well as replace older devices before they have the chance to fail. Another great tip is to think of the devices that make up your network as a chain. Even if most of the devices use high-quality parts, one low-quality part can break up the links in the chain. Also, redundancy is always favored. Try to eliminate single points of failure to increase the fault-tolerance.
All software used must be kept up to date. Windows operating systems for example typically receive automatic updates. However, it is crucial to ensure that they are correctly installed. Just like on a personal computer, sophisticated security software such as antivirus programs must be implemented and constantly updated. It is relatively easy to forget to renew their licenses as well. To help prevent this, proper documentation must be made proving that this has been accomplished in short and consistent intervals. As with any network maintenance or installation, it is recommended to have the technician who did the work log their actions and then sign and date the document. This will provide adequate accountability for all work done and make future problems easier to solve. The server’s memory consumption, CPU utilization, along with numerous other statistics should be constantly supervised and logged. Internet domain names and SSL certificates should be paid and renewed promptly and logged as well.
Backing up data should be automatically stored offsite, logged, and secured. Tests should regularly be performed to ensure this happens. System restores should be able to be accomplished quickly to prevent any company downtime. Available disk space is another important area to pay attention to as not enough memory can create several minor to severe problems. These preventive maintenance operations should be done on a daily or at least a weekly schedule. If resources and manpower allow, performing daily, even hourly checks are the best option as a server or network’s health can change instantly and often, without warning. By following these steps, a company’s network administrator should be completely confident in their ability to maintain a healthy server and network.
References:
“Server Maintenance Checklist”. A&J Technology Services, Inc. Ajtechs.com. Web. Accessed 13 Jun 2018.