DCIM and Thermal Monitoring as a Data Center Monitoring Tool

DCIM

What is data center infrastructure management (DCIM)?

Data center infrastructure management (DCIM) comprises processes and technologies used to monitor, measure, and manage a data center's physical and virtual infrastructure. DCIM utilizes tools, software, and applications to keep track of a range of key areas in data centers, such as:

  • Physical infrastructure: This type of monitoring employs methods including sensors, cameras, and facilities management software to check equipment health and the status of security threats, equipment failures, and other potential anomalies.

    Assets monitored in data center's physical infrastructures include cooling systems, environmental detectors, servers, storage devices, PDUs, and UPSs. Overheating is a sign of faults that can lead to failure. Therefore, data center equipment needs to maintain specific temperature ranges to ensure optimum operations and system uptime.
  • Capacity management: A reliable ‘always on’ power supply is a crucial requirement in a data center.DCIM software tracks power capacity, network bandwidth, rack space, and cooling capability. This helps data center operatives understand when server racks are running short on space and deploy new equipment when necessary. It can also help with the investigation of the causes of high power consumption and improve cooling efficiency.
  • Security: DCIM monitors several aspects of security in data centers, such as:
    • Physical security: This includes unauthorized access and malicious activities, preventing the use of cameras, monitoring door locks, and other sensors to detect intrusions and provide alerts.
    • Environmental security: Environmental conditions such as dust, humidity, and temperature can be hazardous, and threaten the smooth running of data centers. DCIM systems help reduce equipment risk from these hazards. Equipment in data centers require a significant amount of energy, therefore, it’s crucial to ensure that the airflow in a data center is cooled and monitored to prevent equipment from overheating. The humidity in a data center must be within a specific range to avoid corrosion.
    • Asset security: DCIM monitors data center assets such as storage devices, network equipment, and servers to identify unauthorized activities occurring on critical assets.
    • Logical security: System logs, network traffic and other data are monitored by DCIM to alert personnel to suspicious activities, data and network breaches.

Data center infrastructure management, or DCIM, utilizes monitoring tools to gather asset data to help improve operational efficiencies across the entire organization. These can be divided into different levels, including:

  1. Enterprise-class monitoring: Many nodes across numerous data can be managed through monitoring, data collection, thresholds, and alerts. This comprises environmental sensors, busways, busbars/bus ducts, UPS, PDUs, Remote Power Panels (RPPs), Computer Room Air Handling CRAS and multiple protocols like Modbus, SNMP and Building Automation and Control Network (BACnet).
  2. Data distribution and storage management.
  3. Infrastructure monitoring.

What is Data Center Monitoring?

Data center monitoring is the process of collecting and analyzing data from a data center's physical and virtual infrastructure to give insights into the health of assets, identify potential issues, and ultimately prevent failure. Monitoring helps track specific metrics in real-time and sends alerts when readings are above or below the set thresholds, ensuring the data center's availability, efficiency and security.

Data center monitoring is an aspect of data center infrastructure management that helps to efficiently run data center operations and improve planning and design.

DCIM gives data center professionals access to insights to identify which activities need to be carried out to ensure the smooth running of the facility. Monitoring data regularly also helps them deploy the most appropriate measures, as well as configure alerts, review server performance, environment conditions, and data security.

Thermal monitoring as a data center monitoring tool

Thermal monitoring is the process of collecting and analyzing data about the temperature of critical electrical assets in a data center.

Thermal monitoring is used in data centres to monitor the temperature of the electrical equipment and infrastructure to prevent overheating and, therefore, equipment failure. This is an important element that contributes to power availability and system uptime.

Temperature rise, especially on electrical joints, is a warning sign that potential issues such as a loose or compromised connection may be present. Left unchecked, there is increased risk of electrical equipment failure, which can put personnel working on or around these critical electrical assets at higher risk. Monitoring the temperature of electrical joints helps not only to avoid downtime and damage to critical infrastructure that can otherwise lead to reduced efficiency, corrupt data, or equipment failure, but it can also help keep personnel safe around assets.

Data center operators face several challenges, but equipment overheating is one of the most critical. Overheating equipment can lead to unplanned downtime, which has a detrimental effect on service reliability for customers and leads to significant financial and reputational costs.As reliance on data increases, there is a greater need for technology such as continuous thermal monitoring to help prevent outages and avoid unplanned downtime.

The adoption of thermal monitoring in data centers is accelerating because it is helping engineering teams minimize equipment damage and reduce the likelihood of outages that can result from undetected faults

Methods of thermal monitoring in data centers

Thermal monitoring can be implemented in data centers in a variety of ways, which include:

  1. Continuous Thermal Monitoring (CTM): CTM is a condition-based monitoring approach that can take the place of periodic inspection using infrared (IR) imaging cameras. It is a proactive way of monitoring the temperature of electrical infrastructure in data centers and other industries that utilize critical infrastructure. It involves using sensors to continuously measure and monitor the temperature of various electrical assets across the data centre, providing real-time data on the health of monitored assets. The sensors provide real-time temperature data, alerting personnel to temperature rises before they exceed safe thresholds. The data from these sensors can then be gathered and analyzed to make insightful decisions and identify potential faults. These sensors can be integrated into SCADA/BMS systems, providing alarms, notifications, trends, and analysis, helping with predictive maintenance.

  2. Thermal imaging cameras: Utilizing thermal imaging cameras, or IR thermography, is another thermal monitoring method. These cameras capture photos of the heat that electrical equipment emits. Hot spots and other issues that might not be obvious to the naked eye might be found using thermal cameras. This approach has proved historically popular but is rapidly being replaced by more predictive approaches such as CTM, outlined above.

  3. Audits and maintenance: This is a preventive maintenance approach that is carried out at regular periods to ensure cooling systems, HVAC (Heating, Ventilation and Air Conditioning) and other critical infrastructure are operating optimally.

Benefits of thermal monitoring for data centers

  • Prevent overheating: Hot spots and overheating are major causes of data center equipment failure. Strategically positioned sensors take temperature readings continuously in a number of places, including the server racks and busway or bus duct distribution systems. The system indicates when the temperatures exceed set thresholds. Thermal monitoring aids in the prevention of overheating in data center equipment.
  • Enhance equipment longevity: Critical data centre equipment, such as server racks, switchgear, and storage devices, can benefit from an extended lifespan when asset temperature and facility humidity are monitored and controlled. Over time, this results in reduced maintenance costs of critical equipment.
  • Prevent unexpected power outages: Power outages are typically unplanned, and downtime is detrimental and costly for data centers. Implementing continuous thermal monitoring of critical assets alerts personnel to potential risks in advance of failure.
  • Improve productivity: With the early detection of compromised joints and connections in electrical assets, power outages are reduced. Data centers rely significantly on power availability. Monitoring the temperature of critical electrical connections improves the reliability of equipment, helping to improve performance and productivity.

Building greater resilience in data centers is critical for owners and operators to run reliable and sustainable facilities that meet future demands. Maintaining efficiency and electrical safety are essential; therefore, monitoring the temperature of critical assets helps to understand where potential failures in critical equipment are likely to occur in advance of an outage. The alerts from temperature monitoring provide information that can be used to schedule predictive maintenance and a more proactive approach for operational personnel.

Linkedin Icon

Would you like to know more?

Discuss your specific application requirements with our expert engineers, obtain additional technical information, or learn more about our other applications.

Recent Posts