Scott Baer, Marketing Manager-Batteries, Vertiv Services
In a recent battery maintenance quiz sponsored by Vertiv, formerly Emerson Network Power, 71 percent of respondents accurately answered that during a power interruption, it only takes one bad cell to cause your data center’s uninterruptible power supply (UPS) to drop a critical load. Batteries are indeed the weakest link in the power chain and are historically a leading cause of costly downtime. Timely battery maintenance and replacement are essential to ensuring critical system availability.Because there are many factors that affect the aging and performance of batteries, determining ideal timing can be a challenge. To help you assess what level and frequency of battery service is right for your business, ask yourself the following five questions.
1. What would it cost me if my UPS batteries fail?
The average cost per downtime incident due to UPS system failure is nearly $680,000 when considering both the direct and indirect costs, including damage to mission-critical assets; the negative impact on productivity; remediation, legal, and regulatory costs; and lost customer confidence. Given the staggering price and an ever-increasing expectation for around-the-clock data availability, unplanned downtime is simply unacceptable for most organizations. When downtime does occur, business leaders will turn to those responsible for data center operations for an explanation.
It’s important to understand that UPS batteries are life-limited components and today’s data center managers can’t afford to overlook them. As the power protection asset that supports critical operations, batteries have a direct impact on overall business success, and they demand attention.
2. What is the true service life of my batteries?
When considering battery lifespan, it is common to confuse battery design life with battery service life. Design life is based on design and battery aging under controlled conditions in the manufacturer’s laboratory, conditions that rarely, if ever, occur in the field. Actual battery service life considers how application, installation design, real-world operating conditions and maintenance practices impact battery aging.
In general, the service life is almost always significantly shorter than the design life. Batteries can fail in less than half the time stipulated by the manufacturer design life due to a variety of issues, including incoming power faults, manufacturing defects, improper room temperatures and overcharging.
To better understand battery service life and performance, you must first have accurate resistance baseline measures such as voltage, temperature, specific gravity and more. And getting the most accurate measures means knowing when to capture that data.
Sixty-three percent of Vertiv battery quiz respondents believe that baseline values should be established immediately. This practice of establishing resistance baseline values just after battery installation should be reconsidered. When Vertiv analyzed resistance baseline measurements of thousands of valve-regulated lead-acid (VRLA) batteries, we found that when a new battery is replaced due to premature failure or cause, often the initial change in resistance was downward and then remained constant.
Figure 2 shows a battery change-out occurring after 600 days of life. Note that this battery’s initial baseline resistance will likely settle near 5250 micro-ohms, instead of at 5650 micro-ohms seen at installation. In fact, our analysis found that when a specific unit settled to its running baseline, the initial variance from the manufacturer’s baseline was as much as 25 percent.
For this reason, we recommend that initial baseline consideration not happen until at least 90 days after installation. On average, baseline values for most batteries should be established after they have been in service anywhere between six and nine months. Comparing the changing features of the battery data against an accurate baseline allows you to identify performance patterns and better forecast a battery’s end of life.
Nearly all respondents to Vertiv’s battery replacement quiz knew what constitutes the end of battery life according to the Institute of Electrical and Electronics Engineers (IEEE), which is when a battery fails to provide 80 percent of its original capacity. Knowing the definition of end of life is important, but knowing which of your batteries fall into that category is essential for avoiding downtime caused by battery failure.
3. How can I minimize my risk of battery failure?
A thorough battery maintenance program is one of the most effective ways to prevent outages and downtime related to battery failure. The investment in such a program is almost always significantly less than what an organization would incur during a lengthy outage. For data center managers, the program is well worth the peace of mind knowing that their UPS, their business-critical operations, and their jobs are protected.
Facilities looking to establish a battery maintenance program should start by considering the schedules for maintenance checks provided by UPS battery manufacturers. In addition, organizations such as IEEE publish maintenance standards.
IEEE has the most well known standards regarding UPS battery maintenance practices. In fact, battery manufacturers often cite the standards and require adherence in order to maintain a valid product warranty. Vertiv recommends adherence to IEEE 450 for vented lead-acid (VLA) batteries; IEEE 1188 for VRLA batteries; and IEEE 1106 for nickel-cadmium (NiCad) batteries. These standards provide recommended practices for maintenance, testing and replacement of batteries for stationary applications. They address the frequency and type of measurements needed to validate battery condition.
Best practices for battery testing and maintenance often go beyond minimum regulatory requirements. While regular preventive maintenance visits and visual battery inspections go a long way toward protecting the battery system and increasing mean time between failures (MTBF), such an approach does not allow for battery oversight outside of the periodic visits.
If external factors lead to a shorter service life than what the data center manager expects, there is still a significant risk of battery failure. In order to optimize system performance, improve availability, and ensure that the emergency power system is ready when it’s needed, Vertiv recommends a proactive approach to mitigating battery risk that ideally includes both battery monitoring and remote services.
4. Do I really need to monitor my batteries?
When assessing the need for battery monitoring, you should first consider if your business can risk battery-caused downtime. If not, then monitoring is essential. Combining battery monitoring with regular preventive maintenance adds a needed layer of protection for the UPS battery system. With mobile or embedded monitoring technology, technicians are able to measure AC impedance and DC resistance.
Instead of waiting for an inevitable failure to occur, or replacing batteries prematurely to prevent problems, battery monitors give access to real-time information on critical battery parameters. You’ll understand the true state of battery health and can continue using your batteries longer with greater confidence.
Best practices call for implementing a monitoring system that connects to and tracks the health of each battery within a string. The most effective battery monitoring systems continuously track all battery parameters using a DC test current to ensure measurement accuracy and repeatability.
Supported by a well-defined process for preventive maintenance and replacement, monitoring batteries can optimize battery life and significantly reduce the risk of dropped loads due to battery failure.
5. How can I ensure timely battery replacement for maximum uptime?
Batteries kept in service beyond expected service life are at risk for failure. Monitoring batteries and performing regular maintenance catches and corrects potential issues such as high resistance or corrosive intercell connections that could shorten service life. Additional tips for getting that most out of your batteries and ensuring availability include adding remote services, keeping on-site spares, and being proactive.
Adding Remote Services
Battery monitoring delivers added protection. A true proactive approach to battery maintenance combines testing and monitoring with remote services provided by infrastructure experts with experience specific to the application. Remote services provide an extra level of protection, and allow for real-time diagnosis and near instant notification when a problem occurs.
In the ideal scenario, remote service providers will have complete visibility into all pre-established critical battery parameters — cell voltage, internal resistance, cycle history, overall string voltage, current and temperature — allowing them to detect and replace weak batteries before they ever become bad batteries. When actual performance data falls outside of the established parameters, signaling performance degradation, an alert is transmitted to remote power system engineers or product experts who assess the situation through data analysis. This in turn can generate a work order to inspect, repair, or replace the part that caused the alert.
Keeping On-Site Spares
A proactive approach to battery maintenance can virtually eliminate the chance of battery failure. When a remote service provider recognizes the need to replace a failing battery, and dispatches a service technician to get the job done, an on-site spare ensures timely replacement.
Additionally, placing unmatched or new batteries into a string of aged batteries diminishes the characteristics of that new battery, negatively affecting lifespan and system availability. Current takes the path of least resistance, so placing a new battery in a string of aged batteries that have varying levels of internal resistance causes the new battery to be overcharged. In time, this could shorten the lifespan of the entire string and diminish the return on a battery investment.
An on-site battery spares cabinet equipped with an onboard charger enables data centers to have fully charged, ready-to-install batteries on site that match the type and condition of the in-service batteries. Nearly half of the industry professionals quizzed on battery replacement thought that far fewer spares were needed on site than what is ideal for mitigating the risk of costly downtime.
While the need for on-site spares depends on the criticality of the facility and battery type, we recommend that data centers have enough spare batteries to cover five to 10 percent of the batteries in every cabinet. These spares should be plugged in and housed similarly to the in-service batteries, allowing the spares to age simultaneously with the main battery string. This supply of batteries supports a fast, first-time fix and eliminates problems involved with mixing new and old batteries in a string.
UPS system failure is one of the most expensive root causes of unplanned outages in data centers, and such failures are most often caused by faulty batteries. Today’s data center managers should fully understand the relationship between battery service life and UPS system failure, and take proactive steps to maintain and protect the battery system that support the UPS.
Implementing a proactive battery maintenance program that incorporates monitoring, remote services, and on-site battery spares helps ensure that batteries are properly maintained and replaced before they become too risky for mission-critical applications. Data center managers who overlook this aspect of ensuring uptime may be unnecessarily putting their operations, and their business as a whole, at risk.
Scott Baer joined Vertiv, formerly Emerson Network Power, in 1999 bringing with him extensive experience in both engineering and sales. Currently he is the marketing manager for batteries in North America where he helps customers incorporate best practices regarding the maintenance and replacement of their UPS batteries. A five-year veteran of the U.S. Marine Corps, Baer has a bachelor’s degree in technical management. He is active in several end-user consortiums, including AFCOM and Gartner, where he shares his data center infrastructure management (DCIM) expertise in pursuit of the common goals of improving data center availability and efficiency.