ICANN ICANN Email List Archives

[comments-root-zone-consultation-08mar13]


<<< Chronological Index >>>    <<< Thread Index >>>

Some thoughts on root key roll-over

  • To: comments-root-zone-consultation-08mar13@xxxxxxxxx
  • Subject: Some thoughts on root key roll-over
  • From: tlhackque <tlhackque@xxxxxxxxx>
  • Date: Wed, 03 Apr 2013 15:17:07 -0400

Although I'm not a big player in DNSSEC, I do follow it with interest (and use it for my tiny domains.) However, I do have experience with deploying and testing large systems. I hope these thoughts will be useful.
I agree with the 4 comments posted so far, and strongly support the 
notion advanced by Steve Crocker that (a) we're (**) late doing this and 
(b) it needs to be done multiple times.
However, it's unlikely to work the first time - and if it does, there 
will be fear that it won't.  And we're making progress getting DNSSEC 
adopted.  If we disrupt what's working, we'll discourage and/or delay 
adoption, and perhaps even see some regression in its acceptance.  
Further, this concern is likely to result in less testing than is 
required for complete test coverage of the process and environment.
It occurs to me that an intermediate step might increase both confidence 
and test coverage.  Suppose a few new servers (or IP addresses on 
existing servers) were allocated as mirrors of the root zone.  These 
would receive all updates to zone data in (near) real-time, but the root 
key roll-over process would be tested on these mirrors.
These server addresses could be published to enable testing of the 
procedures and the resolver software in an orderly fashion.  This would 
allow the key stakeholders (and the adventurous public) to *opt-in* to 
the testing phase, and allow testing the process/resolvers to be more 
extensive that might otherwise happen. For example, the plan might take 
risks with edge cases or intentional mis-configurations that wouldn't be 
acceptable for the production root.  And we should definitely test 
algorithm introduction and roll-over - not just key updates.
With the results of this testing published, and any deficiencies 
corrected, testing in the production root should be a non-event.
This isn't a trivial undertaking, but seems to provide significant 
advantages over simply attempting to update the existing key on the 
production root.
Some things to think about in such a plan (in no particular order):

o Capacity. Perhaps these servers should serve registered IP addresses only (e.g. those who sign-up on a testing website) to contain costs/deal with vandals(*). Aside from vandals, a resolver will be configured either with the production root servers or the test servers, so net demand should be constant. However, to ensure adequate test coverage, this needs to be an easily permeable barrier.
o Configuration.  Need to ensure that the mix of anycast; geography; 
latency mixes expose the same timing issues as the production root. 
Ensure testing plans include what happens when a server/servers are down 
at critical times, and come back at inopportune ones.
o Phase-in and phase-out plans - once testing is complete, the plan for 
turning off the test servers needs to accommodate the change control 
schedules of testers.  As processes are validated and corrected, how do 
they get rolled-into the production root?  I suggest incrementally, not 
a big bang.
o Make it easy to opt-in - provide root server files that are 'drop-in' 
to the known resolver software; perhaps provide implementation 
scripts/configuration stanzas for the known resolvers.
o Make it easy to provide feedback.  And acknowledge/follow-up 
feedback.  Nothing turns off feedback faster than that "I'm wasting my 
time shouting into this barrel" feeling...
o Plan for gathering data - not just about what the root sees, but about 
distribution/implementation of any resolver or registrar or other tools 
that are tested, and those that are discovered to need updates.  
Automate where possible.
o Plan for the support resources necessary to diagnose and roll-back any 
changes that cause problems quickly.  Expect that some testers will - or 
we hope they will - do non-trivial testing.  So this can't be a toy 
environment with non-deterministic support.  It needs to be run as if it 
was the production root in most respects.
o Plan for communication - both passive (e.g. a website/wiki) and push 
(e.g. an e-mail list).  Announce tests and results so those using the 
alternate root can plan, and feel engaged.
I'm sure there are other considerations, but those more involved in the 
implementation will doubtless point them out.
This note doesn't pretend to be a complete plan, but rather is intended 
to stimulate thought.
(*) vandal - those who think DDOS attacks on infrastructure are an 
acceptable form of entertainment.
(**) As noted, I'm not much of a player in this space, so 'we' is used 
from habit and to indicate interest in the outcome.
--
Timothe Litt
ACM Distinguished Engineer
--------------------------
This communication may not represent the ACM or my employer's views,
if any, on the matters discussed.




<<< Chronological Index >>>    <<< Thread Index >>>

Privacy Policy | Terms of Service | Cookies Policy