On "DNS CERT concept development" and "100% uptime"

  • Subject: On "DNS CERT concept development" and "100% uptime"
  Date: Thu, 28 Jan 2010 14:35:42 -0500


I would like to point out our experience during the Conficker .C effort and the strategic project concerning an emergency response capability. Later in this note I will point out our experience during a "downtime" effort.

I was not informed by ICANN of the Conficker .C issue. I learned of the .C variant through the African DNS operational list. Because it seemed possible that CORE had the connections to find the last, non-responsive ccTLD operator(s), I directed CORE's Secretariat in Geneva to attempt to locate that last, non-responsive ccTLD operator. A lot of calls were made, some detective work was done, and the operator-in-fact was found in East Asia. And that was how the final gap was closed.

It was an error to restrict the information then available to only those registry operators thought to be targeted by the authors of the .C variant. Had the .C authors actually exercised the rendezvous mechanism reported by SRI and others involved in the .A and .B variants, prior to the disclosure to the DNS operational community, and had CORE not acted, and had that last, non-responsive ccTLD operator supported automatic registration, the Conficker system would have achieved the rendezvous it was thought to be attempting.

At the time, it was thought that the .C variant's rendezvous would cause significant deleterious consequences.

Later in the .C variant trajectory, as the -dns effort attempted to find an exit strategy for the ccTLD operators, the issue of who should bear the cost incurred by ccTLD operators was discussed. The discussion was not fruitful, though there was some acceptance that some institution, perhaps ICANN, could collect the cost "chits" offered by the ccTLD operators, and pass them on to the jurisdictions in which criminal prosecutions were eventually conducted, or where the externalization of costs was identified, for the purposes of creating a scalable cost recovery mechanism. The "bagman" concept was discussed by myself, Paul Vixie, and Rodney Joffe, on and off the -dns list.

From these experiences, and the unrelated but tangential inability to focus on what "high security" should mean (it is definitely not "accounting standards"), I am concerned by the detail-free plan to copy-a-Cert.

Some obscure background. While I was at SRI two events occurred. One was a real event, but misinterpreted. The other a real event correctly interpreted. On both occasions I had discussions with a person responsible for a very large collection of computational assets. These discussions lead to DARPA's formation of the CERT at SEI, as a institution to facilitate communication between academic and industry employed computer scientists and governmental agencies with a lower skills profile in the problem domain. Times have changed, the original CERT has shifted its mission over time, there is now a US CERT, and perhaps the original government agencies have improved their facilities and their technical abilities.

The point is, CERTs are not a given thing, they are a box into which some money and some purpose is put. We should decide how much money and what purposes, not just "start a CERT".

We have no way of knowing at present, what registries present zero resource acquisition cost to authors of distributed systems similar to the Conficker .C variant, which could use those resources to construct rendezvous points. No one has done the unglamorous labor of foreach(TLD in IANA) {ask cost of rendezvous resource acquisition questions}. The next instance of a .C will have us again asking "Does .el (Elbonia, Lower) support automated registration?"

We also have no way of knowing at present, what the actual cost of event response is, let alone a means to reasonably present it to some jurisdiction attempting to prosecute for damages.

These are important things to get done. If we are not careful, an "ICANN CERT" will captured, much like the ICANN SSAC function during the fast-flux hosting effort, by retail cops-and-robbers concerns that missed the fundamental issues of rapid update by registries as a fundamental tool of modern dns exploiting systems, and zero effective cost of registration, again by modern dns exploiting systems. At that point we would have a "CERT" which "makes the suits smile" but does us no good when competent and motivated programmers target infrastructure.

On the goal of "100% uptime", this seems more an article of religion than a useful policy statement. What instances of things other than "100% uptime" have been observed, and are claimed to have been sufficient harmful to enter into our strategic planning?

Here I want to point out something of fundamental importance.

The networking community could have settled for repointing the .ht servers, and observing that there has been "100% uptime".

Fortunately we did not. The fundamental issue in network operational continuity inside of Haiti, where 100% of the dead, injured, homeless, and hungry people are, was diesel to the Boutilliers Hill NAP, which was finally delivered on the 18th, and food and water to the surviving technicians, to keep them on the job, finally delivered on the 24th.

The DNS exists to serve. Our mission doesn't start and stop at the DNS, but at the service itself. To keep the government and in-country humanitarian data network alive, we had to supply diesel and beans, not 100% uptime for external resolution of websites that may or may not exist, depending primarily if they are in Haiti (down) or not in Haiti (not down).

The delivery of VSATs and wifi has some utility, the repointing of the authoritative nameservers has some utility, but the vastly greater utility is not in re-engineering Haiti, or virtualizing it, but simply keeping the parts of it that work working, so that more parts of it can be turned up as the recovery progresses.

Diesel and beans is not sexy. It is not "100% uptime". That is what we have to deliver -- operations sufficient to support necessity, not the simulacrum of support, while ignoring necessity.

It took a lot to get it done, to get cooperation within the White House, the State Department, Southern Command, and Congress, simply to get diesel to the Boutilliers Hill NAP before "dry tank" fail on the afternoon of the 17th. Part of the problem was the attractiveness of "technical solutions" which solved the wrong problem and hindered comprehension of what was the most effective tool available to us, fuel and food.

Lets make it a goal to know what the goals are, and that technically informed people make the technical goals.

This comment is offered in an individual capacity.


