APNIC Labs comments to the KSK rollover plan.
Firstly, APNIC Labs would like to commend all those involved in the preparation of this draft report for their careful consideration of this topic, and for the clear and well informed presentation of material in the report. We also appreciate the opportunity through this public comment process to review this activity. In consideration of this document, we would like to make the following comments: 1) Timing of the KSK roll. The report contains no substantial technical justification as to why a key roll must be performed at this point in time. Noting other concerns expressed below, principally the uncertainty surrounding RFC5011 support, and also noting the lack of concrete measurement activities during the critical stages of key roll,and the lack of any feedback from measurement to inform whether to proceed with critical or irrevocable steps in the rollover (such as the switch from old to new signing key, and revocation of the old key) it is surprising that the only justification for performing a KSK roll at this point in time is essentially based on the observation that the Root Zone managers made a commitment in 2010 that they would do so. The report clarifies that there is no evident loss of integrity in the current key pair, nor is this potential loss of key integrity anticipated to be an operational risk for some considerable time, according to the analysis in the report. This begs the question: “Why roll now?" The only justification provided in the report is a risk analysis table which notes a risk of loss of trust in process if a key roll is not performed. There are clear risks associated with performing the key roll which are considered, but not quantified. If the key roll is delayed that there is the unquantified risk of loss of trust in key management process, but against that the delay allows additional time to mitigate the service disruption risks to some extent. It would be helpful if the report explored the consequences of deferring the key roll, including the issues of the potential for key compromise and the possibility of increased level of deployment of DNS resolver code that is aware of, and capable of following a key roll as signalled according to RFC 5011, as noted in the following comment. 2) RFC5011 Capability Signalling Although it is understood that no active signalling of RFC5011 capability currently exists, it is possible that we could delay a planned KSK roll long enough to deploy DNS code changes that not only enable a greater proportion of resolvers to be able to track a key roll through RFC5011 signalling, but to signal their capability to do so. Measuring both the rate of deployment of this DNS signalling capability, and the proportion of clients using RFC5011 capable resolver services would materially inform this space. We would like to see the report consider such options as an alternative to an immediate action to roll the key. 3) Measurements and Reporting The report identifies two critical phases in the KSK rollover: The addition of new keys (increase in packet size), and the loss of the original key (potential loss of DNSSEC from failure to update the TA to the new keyset). However, the report carries only a weak reference to any measurements that may be conducted during these critical phases. This is insufficient in our view, and we would like to see the use of stronger language that requires the root zone management partners to facilitate various forms of active measurement through the keyroll process, so that each stage can be understood for its actual damage consequences, to direct any reverse or delay decisions from the evidence gathered. 4) Algorithm Agility The report’s language around the potential for algorithm change is unclear. There appears to be a strong bias to retention of RSA as the KSK algorithm, despite evidence that ECDSA is both shorter and potentially faster to compute. Whilst the document argues for a reduced risk of large packets, it doesn’t clearly explain why larger RSA-based DNS response payloads would be preferable to smaller ECDSA DNS response payloads. The report appears to assume certain immutable factors in the key roll process, and we would like to understand why these are immutable. These include: 5) Scheduling and Operational Tasks The report notes as a constraint that a key roll must be aligned with existing Quarter and 10-day periods used in existing processes. This has the potential consequence of scheduling the critical change in the root zone on a weekend, or on a major public holiday. For a transition as critical as the roll of the root zone KSK it would be reasonable to see the report canvas options that would ensure that the critical transition events happen when there is a high likelihood of operational support infrastructure in place for users. This rigid adherence to a calendar irrespective of operational support considerations appears to be an inappropriate prioritisation of environmental considerations. 6) KSK-signed sentinels The report considers a staged deployment of the KSK roll, and dismisses this approach as being ineffectual in terms of mitigation of risks to DNS service. However the report does not consider using the incoming KSK in other ways. For example it could be envisaged that the roll process could use the incoming KSK to sign some sentinel record in the root zone, or even in a lower point in the DNS name hierarchy during the initial publication process, thereby allowing measurement of the extent to which resolvers are able to use the KSK as a trust anchor to validate the sentinel record. It would be helpful to understand why such potentially measurable actions were not included in the proposed key roll process. 7) Serialized Key Rolls Why is the process serialised to the introduction of a single candidate new key value? What are the issues involved in staging multiple incoming keys? Was this considered by the design team? 8) Key Overlap The envisaged process performs a state flip from using the old key to sign the ZSK and the new key to using the new key to sign the ZSK and removing the old key completely. Why is there not a period of overlap where the ZSK is signed by both the new and old KSKs, as is the case in a more conventional model of a key roll. Combined with some form of sentinel or staged deployment this would allow the key roll to follow a more conventional process of prime resolvers with an announcement of a new key, then introduce use, test and measure in parallel with the old key, then commit to the change by removing the old key. The process described in the document appears to simply shift from the announcement to the commitment.