Customers have reported to us server racks having caught fire due to an airconditioning failure resulting in an overheat in the racks.
Fire is not something we hear quite often as a result of temperature increase and the lack of temperature monitoring. Fortunately most servers nowadays have built-in security shutting down equipment when temperature is too high.
One of the issues people least are familiar with, is the effect of too high temperatures on CPU calculations. I am not talking in this article about processors melting down - yes that happens more than you think and way before systems catch fire.
Let's return to calculation errors.
A CPU is made of transitors and when those are becoming hot then they tend to leak current. This leakage causes calculation errors having impact on applications stability. Too many errors can cause a system crash.
The frustrating part of this is that initially errors go unnoticed. Only when applications start to fail, server and network monitoring software deployed will trigger alerts. If you want to act before it is too late, then temperature monitoring should be part of your overall monitoring strategy as it has a direct impact on applications & systems availability.
Maximum operating temperature of a CPU is often set around 40°C (104 F). Going above that 40° operating temperature and you are risking big issues rather sooner than later.