Supplementary thoughts on root key roll-over
A couple of points raised on the dnssec-deployment list by Doug Barton and Thierry Moreau's comment to ICANN caused me to supplement my previous remarks.
From a technical standpoint I certainly see the appeal, but the 'layer 9' issues here are deep, and very thorny.
while this idea would get us wider testing under more real-world scenarios it has a "volunteer bias" problem in that it would still only be the most technologically sophisticated users who would be participating. The things we really need to test (read, break) are the "normal" systems that were set up by "normal" sysadmins (albeit those who have configured validation are still somewhat ahead of the curve by definition).
It's certainly not perfect, but this is a bit different from the initial root signing. Then, we worried about response sizes, server load - but there was a lot of testing (e.g. DLV, signed zones) at the TLD and below. And the risk of breaking the production (non-DNSSEC) environment was, in my opinion, pretty low. Now, DNSSEC *is* a production environment. If we break it, there will be blow-back. We won't like that.
And the longer we go without testing all the corner cases of roll-overs, the worse things get. If we don't test an algorithm change for another 5 years - the odds of getting tools fixed are vanishingly small - at least today, they're new enough that people are working on them. And they're not embedded in router, home gateway appliance and other firmware.
Agree the organizational issues are non-trivial - but the stakes are higher. And to a significant extent, they're also what we need to test. When built-in root keys in a long-lived distribution (say, RHEL) expire, what happens? Do site change control policies prevent updates? Such distributions live longer than the 5011 timeouts for removing old keys. So do they know how to fetch new ones, e.g. from the ICANN website? Or do we have an 'embedded system/no-update' policy issue? How do we prevent this, or at least get the word out to consumers? Who only learn from the burned-hand school of teaching... Do we need to work with CERT to classify root key changes as 'critical security updates'? Remember that installs/re-installs/clones of systems based on these distributions can happen decades after initial release. Embedded systems have lifetimes of decades as well. What's the strategy for an algorithm roll that accommodates these systems? I'm much more worried about those sorts of issues than the odd bug in 5011 implementations. Though they matter too.
Who'd participate? Validation is opt-in, and yes those folks are ahead of the curve. But outreach is possible. We can tell who they are, since their systems are requesting DSNSSEC records used for validation (e.g. DNSKEY, DS). So besides the dnssec-deployment mailing list, and the resolver mailing lists (e.g. bind-users, the Linux distributions, the blogs that describe setting up DNSSEC - just Google for them), an effort could be made to contact (via whois data) the tech contacts of those making these requests. That could help offset the volunteer bias.
Maybe someone can fund 1,000 T-shirts for the first people who sign up to fill coverage holes. Partner with some technical school to get students to provide some 'normal' sysadmins to test with. Identify what's needed, and aggressively seek it.
One doesn't just sit back and wait for volunteers. 'Volunteer' can be a verb (e.g. someone is 'volunteered') - if one reaches out.
Still not perfect, but what's the alternative? Accept limited coverage and/or high-risk testing on just the production root?
Thierry Moreau's thoughtful comment urges that protection of the current key be a priority, that ECC may not be a good replacement for RSA, and argues that the economic cost-benefit of a proactive rollover is low if operational procedures are adequate. This is fundamentally a strategy of "put all your eggs in one basket, and watch that basket very carefully". That's an excellent short-term strategy. But in the long run, it will fail. I agree that focus on operations is important. I don't know that ECC is the replacement for RSA. But something will eventually replace RSA. And there needs to be a plan. And the plan needs to be validated.
We need to remember that the DNS will outlive current technology. Quantum computing may well factor the current composite. And even if not, the root DNS key is a uniquely valuable target. Assuming DNSSEC eventually is ubiquitously adopted by governments, financial institutions, health care providers, etc - there is money to be made - or simply 'chaos for entertainment' to be had - by breaking or compromising the key. Or the humans. If enough resources are focused on a problem, humanity has a track record of solving it. For good or for ill.
So on a long enough timescale, it certainly WILL be *necessary* to roll the key. Is that timescale 1 year? 5 years? 25 years? 50 years? I don't know. But I do expect the DNS to last.
The cost of validating/perfecting/deploying the technology to roll the key increases with every passing minute, as DNSSEC resolvers/servers are deployed that are not known to (or are known not to) tolerate a change. I won't quibble about whether this is exponential, geometric or some other hyper-linear function. While Thierry's concern for cost is valid, the way to minimize cost is to do the testing now, and by rolling the key on a reasonable cycle thereafter, ensure that the software/systems in the field are capable of handling change WHEN it happens. Yes, the cost would have been less had this work been done earlier - but we can't change the past.
As the DNSSEC and IPv6 efforts have demonstrated to date, unplanned change on the internet scale is exceedingly difficult and expensive. There are plenty of unknown problems that will plague the future. This one is known today. We are at a point with DNSSEC where we can either avoid a foreseeable trap, or set one - hoping, no doubt that it won't go off in our working lifetimes. The responsible action is to do as much as we can now to finish the job we started and to leave as robust an infrastructure and as complete a plan as we know how to do. For the long-term.
These remarks imply the need for architectural development and operational planning beyond 'roll the RSA key'; e.g. considering embedded systems and future algorithm deployment.
I don't run the zoo, but if I did, I'd rather try and fail than not try at all...
(Apologies to Dr. Seuss & Tennyson.) -- Timothe Litt ACM Distinguished Engineer -------------------------- This communication may not represent the ACM or my employer's views, if any, on the matters discussed.