OIT power interruption disrupts services
Eileen Duffy | Wednesday, October 4, 2006
At around 12:30 a.m. Tuesday morning, the data center of the Information Technology Center (ITC) experienced a power interruption resulting in temporary failure of critical data systems like Notre Dame Webmail, InsideND and WebCT, said Assistant Vice President of the Office of Information Technologies Gordon Wishon.
Normal power was restored at the ITC around 1:30 a.m., allowing OIT personnel to begin restoration of IT services, said Director of Utilities Paul Kempf. By 8 a.m., operation of approximately 90 percent of data systems had been restored, he said. However, some systems are still experiencing “intermittent slow-downs,” Wishon said.
The ITC did not lose “normal power,” Kempf said. Rather, the data center experienced a loss of power because the equipment that transfers the system from its normal power source to a generator “had its brain fried.”
“It made an illogical decision that wasn’t based on reality,” Kempf said. “It turned off the normal feed, but it didn’t start the emergency feed.”
The ITC is equipped with surge protectors, which prevented the electronic equipment, like the computers themselves, from being damaged – but not the transferring component, Wishon said.
While the Utilities Department is still investigating the cause of the equipment failure, ITC staff members working at the time reportedly witnessed a flash of lightning and immediately heard a clap of thunder, indicating a nearby lightning strike, said Kempf – though they were unable to judge the proximity of the strike.
“It seems plausible that [the failure] could have something to do with the lightning strike,” Kempf said.
The “fried” component has been removed and sent to its manufacturer, Eaton Electrical Inc. – with whom the University does “a lot of business” – and a replacement is on the way, Kempf said. For now, the Data Center has been manually reconnected to the normal power source.
Since the equipment performed properly during the weekend’s power outage – that is, it correctly transferred power from the non-working normal power to generator power – this particular failure is perplexing.
“It’s unusual that there weren’t other things, some other solid state devices damaged,” Kempf said.
Such equipment is normally not replaced very often – “about every four or five years,” Kempf said. He admitted a product failure was possible, but unlikely.
“The presumption right now is some sort of electrical disturbance damaged that equipment,” he said.
Wishon said once his staff understood the cause of the failure, they would make plans to prevent future trip-ups.
“It has certainly been some length of time since we’ve suffered similar failure,” Wishon said. “We have invested a significant amount of money in the past three years into the data center to ensure that we’re able to maintain critical service operation in the face of possible failures like this.”
But if lightning indeed caused the equipment failure, he said it would be a hard thing to prepare for.
“Of course, direct lightning strikes – if that’s indeed what this is – while they’re not unheard of, certainly are unusual,” he said.
Kempf also stressed the unique meteorological circumstances of the failure.
“You can’t prevent a lightning strike,” he said. “…It may not be possible, despite all good intentions, to prevent equipment from failing in situations like this.”