Thoughts on root-zone key rollover
Hereby we like to share some thoughts about root-zone key rollover. The section numbers below do not map 1-to-1 to the questions asked in the consultation. === 1. Key Signing Key rollover necessity. There are a number of risks associated with performing a rollover and a number of risk when not doing it. When we choose not to roll that decision comes in two scenarios: scenario 1. Never change the key, ever. Even when the private key is compromised through theft or factoring signatures will continue to be generated and no change will occur. scenario 2. 'Cold Boot' or abrupt change (rather than a gracious roll) when the private key has been compromised. The difference between the two scenarios is that in the first case we allow the system to continue with weakened security and likely evolve quickly to a state where there is no added benefit of having DNSSEC. In the second case we have to accept that there are a number of devices for which the public key component cannot be upgraded or removed, and for which DNSSEC cannot be turned off in a timely fashion. Those devices will be useless from that time on. For all other devices the change will be disruptive and we consider it likely that DNSSEC will be left off for those devices. Any of these scenarios will cause severe degradation, if not disruption of trust. If we choose to perform rollovers than we have two choices as well. scenario 3. Only perform a roll when needed i.e. when cryptographic or computing advances are such that cryptanalysis attack is likely to be successful, or the key is known to be compromised. scenario 4. Rolling on a regular basis. The risk with rolling at the moment a roll is needed (scenario 3) is that the environment ca become ossified. Similarly like the case of key-change (scenario 2) there will be entities that have no mechanisms available because the need for those mechanisms where not obvious or take a long time. Hence there was no (economic) motivation to implement more expedient mechanisms. That means that this scenario will also be highly disruptive, at a point where perhaps the user population has moved from innovative first movers to the laggards on the deployment S-Curve. Scenario 4 also brings risks. A key-rollover will be disruptive if it comes unexpected and the methodology to replace keys has not been implemented. However, by performing regular rollovers it will be clear to anybody who implements and maintains DNSSEC that key roll-over is a part of the technology for which one cannot delay implementation. Apart from al these considerations, it is important to communicate the key-rollover clearly. Fortunately we are still in the early stages of deployment and people who deploy validation are mostly innovators who are likely to have deeper understanding of the technology and keep a closer watch at the status of technology. In other words, it is better to take the risk of disruption earlier since the odds are higher that the early deployers can and are willing to cope. === 2. Regular key rollover RFC 6781 speaks to the regularity of key rollovers and uses 'operational habit' as one of the motivations for regular rollovers. We quote 3.2.2 in full since these arguments apply directly (and specifically) to the root zone: 3.2.2. Rolling a KSK That Is a Trust Anchor The same operational concerns apply to the rollover of KSKs that are used as trust anchors: If a trust anchor replacement is done incorrectly, the entire domain that the trust anchor covers will become Bogus until the trust anchor is corrected. In a large number of cases, it will be safe to work from the assumption that one's keys are not in use as trust anchors. If a zone administrator publishes a DNSSEC signing policy and/or a DNSSEC practice statement [DNSSEC-DPS], that policy or statement should be explicit regarding whether or not the existence of trust anchors will be taken into account. There may be cases where local policies enforce the configuration of trust anchors on zones that are mission critical (e.g., in enterprises where the trust anchor for the enterprise domain is configured in the enterprise's validator). It is expected that the zone administrators are aware of such circumstances. One can argue that because of the difficulty of getting all users of a trust anchor to replace an old trust anchor with a new one, a KSK that is a trust anchor should never be rolled unless it is known or strongly suspected that the key has been compromised. In other words, the costs of a KSK rollover are prohibitively high because some users cannot be reached. However, the "operational habit" argument also applies to trust anchor reconfiguration at the clients' validators. If a short key effectivity period is used and the trust anchor configuration has to be revisited on a regular basis, the odds that the configuration tends to be forgotten are smaller. In fact, the costs for those users can be minimized by automating the rollover with RFC 5011 [RFC5011] and by rolling the key regularly (and advertising such) so that the operators of validating resolvers will put the appropriate mechanism in place to deal with these stability costs: In other words, budget for these costs instead of incurring them unexpectedly. It is therefore preferable to roll KSKs that are expected to be used as trust anchors on a regular basis if and only if those rollovers can be tracked using standardized (e.g., RFC 5011 [RFC5011]) mechanisms. Operational Habit means that a person responsible can enter a date in an agenda, or can instruct a colleague who is preceding in their job. If the time between rollovers becomes to long the odds are that the calendar becomes inconsistent or the job has moved on to yet another colleague. To me this type of argument suggests rollovers in the order of one per 1 or 2 years. However, since the operational practice of a rollover needs to be ingrained it seems that having a higher frequency for the first few times is wiser. === 3. How to communicate. As argued above it is important that to communicate that a key-rollover will happen. It is likely that some ccTLDs have direct contact with ISPs that perform validation. By measuring DNS queries (see below) they may know which resolvers in their environment use DNSSEC. ccTLDs could be a useful channel for outreach. The use of banners on popular tech-sites is a second way. Obviously, the well known technical mailing-lists should be a target. Network related ones lik OARC, Nanog, RIPE, Apricot, CENTR etc but also more System Administration related ones, like USENIX lists. Other channels might include CERT advisories, postings through Operating Systems announcement list. Postings to non-technical popular press would probably rise more concern than understanding. Building a network that can be used for future communication (about any DPS issue) should be an explicit goal of the first rollovers. === 4. Measurements A way to measure the success of a rollover is by measuring the patterns of DNSKEY queries. If after a rollover the validation of the DNSKEY RRset agains the new trust-anchor fails resolvers may probe for the DNSKEY RR with a higher frequency than during regular operations. For instance, the Unbound implementation has a so called bad-cache, that is used to store 'bad results' with a much shorter TTL than the DNSKEY RR. Besides, the hosts from which DNSKEY RRs are requested are likely to be validating resolvers. And although one cannot immediately infer size of a user population behind a querying address source it may be possible to correlate those addresses with other parameters to establish the most active validating resolvers and reach out to them during communication. 5. Final remarks. Others have suggested rollovers that roll a key back into itself or perform partial rollovers and measure the effect. Those type of ideas are worth considering but probably need peer review. We want to acknowledge Warren Kumari as a source of inspiration for the 4 scenarios. -- Olaf Kolkman & Jaap Akkerhuis NLnet Labs Attachment:
signature.asc |