ICANN Email Archives: [comments-root-zone-consultation-08mar13]

ICANN ICANN Email List Archives

[comments-root-zone-consultation-08mar13]

<<< Chronological Index >>> <<< Thread Index >>>

Some thoughts on root key roll-over

To: comments-root-zone-consultation-08mar13@xxxxxxxxx
Subject: Some thoughts on root key roll-over
From: tlhackque <tlhackque@xxxxxxxxx>
Date: Wed, 03 Apr 2013 15:17:07 -0400

Although I'm not a big player in DNSSEC, I do follow it with interest(and use it for my tiny domains.) However, I do have experience withdeploying and testing large systems. I hope these thoughts will be useful.

I agree with the 4 comments posted so far, and strongly support thenotion advanced by Steve Crocker that (a) we're (**) late doing this and(b) it needs to be done multiple times.

However, it's unlikely to work the first time - and if it does, therewill be fear that it won't. And we're making progress getting DNSSECadopted. If we disrupt what's working, we'll discourage and/or delayadoption, and perhaps even see some regression in its acceptance.Further, this concern is likely to result in less testing than isrequired for complete test coverage of the process and environment.

It occurs to me that an intermediate step might increase both confidenceand test coverage. Suppose a few new servers (or IP addresses onexisting servers) were allocated as mirrors of the root zone. Thesewould receive all updates to zone data in (near) real-time, but the rootkey roll-over process would be tested on these mirrors.

These server addresses could be published to enable testing of theprocedures and the resolver software in an orderly fashion. This wouldallow the key stakeholders (and the adventurous public) to *opt-in* tothe testing phase, and allow testing the process/resolvers to be moreextensive that might otherwise happen. For example, the plan might takerisks with edge cases or intentional mis-configurations that wouldn't beacceptable for the production root. And we should definitely testalgorithm introduction and roll-over - not just key updates.

With the results of this testing published, and any deficienciescorrected, testing in the production root should be a non-event.

This isn't a trivial undertaking, but seems to provide significantadvantages over simply attempting to update the existing key on theproduction root.


Some things to think about in such a plan (in no particular order):

o Capacity. Perhaps these servers should serve registered IP addressesonly (e.g. those who sign-up on a testing website) to contain costs/dealwith vandals(*). Aside from vandals, a resolver will be configuredeither with the production root servers or the test servers, so netdemand should be constant. However, to ensure adequate test coverage,this needs to be an easily permeable barrier.

o Configuration. Need to ensure that the mix of anycast; geography;latency mixes expose the same timing issues as the production root.Ensure testing plans include what happens when a server/servers are downat critical times, and come back at inopportune ones.

o Phase-in and phase-out plans - once testing is complete, the plan forturning off the test servers needs to accommodate the change controlschedules of testers. As processes are validated and corrected, how dothey get rolled-into the production root? I suggest incrementally, nota big bang.

o Make it easy to opt-in - provide root server files that are 'drop-in'to the known resolver software; perhaps provide implementationscripts/configuration stanzas for the known resolvers.

o Make it easy to provide feedback. And acknowledge/follow-upfeedback. Nothing turns off feedback faster than that "I'm wasting mytime shouting into this barrel" feeling...

o Plan for gathering data - not just about what the root sees, but aboutdistribution/implementation of any resolver or registrar or other toolsthat are tested, and those that are discovered to need updates.Automate where possible.

o Plan for the support resources necessary to diagnose and roll-back anychanges that cause problems quickly. Expect that some testers will - orwe hope they will - do non-trivial testing. So this can't be a toyenvironment with non-deterministic support. It needs to be run as if itwas the production root in most respects.

o Plan for communication - both passive (e.g. a website/wiki) and push(e.g. an e-mail list). Announce tests and results so those using thealternate root can plan, and feel engaged.

I'm sure there are other considerations, but those more involved in theimplementation will doubtless point them out.

This note doesn't pretend to be a complete plan, but rather is intendedto stimulate thought.

(*) vandal - those who think DDOS attacks on infrastructure are anacceptable form of entertainment.(**) As noted, I'm not much of a player in this space, so 'we' is usedfrom habit and to indicate interest in the outcome.


--
Timothe Litt
ACM Distinguished Engineer
--------------------------
This communication may not represent the ACM or my employer's views,
if any, on the matters discussed.

<<< Chronological Index >>> <<< Thread Index >>>

Privacy Policy | Terms of Service | Cookies Policy