Internet-Draft RSA Implementation Guidance March 2024
Kario Expires 5 September 2024 [Page]
Workgroup:
Internet Engineering Task Force
Internet-Draft:
draft-irtf-cfrg-rsa-guidance-00
Updates:
8017 (if approved)
Published:
Intended Status:
Informational
Expires:
Author:
H. Kario
Red Hat, Inc.

Implementation Guidance for the PKCS #1 RSA Cryptography Specification

Abstract

This document specifies additions and amendments to RFC 8017. Specifically, it provides guidance to implementers of the standard to protect against side-channel attacks. It also deprecates the RSAES-PKCS-v1_5 encryption scheme, but provides an alternative depadding algorithm that protects against side-channel attacks raising from users of vulnerable APIs. The purpose of this specification is to increase security of RSA implementations.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 5 September 2024.

Table of Contents

1. Introduction

The PKCS #1 [RFC8017] describes the RSA cryptosystem, providing guidance on implementing encryption schemes and signature schemes.

Unfortunately, straight-forward implementation of the RSA encryption schemes leave it vulnerable to side-channel attacks. Protections against them are not documented in RFC 8017, and attacks are mentioned only in passing.

2. Rationale

The RSAES-PKCS-v1_5 encryption scheme is known to be problematic since 1998, when Daniel Bleichenbacher published his attack [Bleichenbacher98]. Side-channel attacks against public key algorithms, including RSA, are known to be possible since 1996 thanks to work by Paul Kocher [Kocher96].

Despite those results, side-channel attacks against RSA implementations have proliferated for the next 25 years. Including attacks against simple exponentiation implementations [Dhem98][Schindler01], implementations that use the Chinese Remainder Theorem optimisation [Schindler00][Brumley03] [Aciicmez05], and implementations that use either base or exponent blinding exclusively [Aciicmez07][Aciicmez08] [Schindler14].

Similarly, side-channel free handling of the errors from the RSAES-PKCS-v1_5 decryption operation is something that implementations struggle with [Bock18][Kario23].

We thus provide guidance how to implement those algorithms in a way that should be secure against at least the simple timing side channel attacks.

3. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].

4. Notation

In this document we reuse the notation from RFC 8017, in addition, we define the following:

AL

alternative message length, non-negative integer, 0 <= AL <= k - 11

AM

alternative encoded message, an octet string

bb

base blinding factor, a positive integer

bbInv

base un-blinding factor, a positive integer,

  bbInv = bb^(-1) mod n
D

octet string representation of d

DH

an octet string of a SHA-256 hash of D

KDK

an octet string containing a Key Derivation Key for a specific ciphertext C

l

length in octets of the message M

b_i

an exponent blinding factor for i-th prime, non-negative integer.

g_i

a modulus blinding factor for the i-th prime, non-negative integer.

5. Side channel attacks

Cryptographic implementations may provide a lot of indirect signals to the attacker that includes information about the secret processed data. Depending on type of information, those leaks can be used to decrypt data or retrieve private keys. Most common side-channels that leak information about secret data are:

  1. Different errors returned

  2. Different processing times of operations

  3. Different patterns of jump instructions and memory accesses

  4. Use of hardware instructions that take different amount time to execute depending on operands or result

Some of those leaks may be detectable over the network, while others may require closer access to the attacked system. With closer access, the attacker may be able to measure power usage, electromagnetic emanations, or sounds and correlate them with specific bits of secret information.

Recent research into network based side channel detection has shown that even very small side channels (of just few clock cycles) can be reliably detected over the network. The detectability depends on the sample size the attacker is able to collect, not on size of the side-channel.

6. General recommendations

As a general rule, all operations that process secret information (be it parts of the private key or parts of encrypted message) MUST be performed with code that doesn't have secret data dependent branch instructions, secret data dependent memory accesses, or uses non-constant time machine instructions (which ones are those is architecture dependant, but division is commonly non-constant time).

Special care should be placed around the code that handles the conversion of the numerical representation to the octet string representation in RSA decryption operations.

All operations that use private keys SHOULD additionally employ both base blinding and exponent blinding as protections against leaks inside modular exponentiation code.

7. Side-channel free modular exponentiation

The underlying modular exponentiation algorithm MUST be constant time with regards to the exponent in all uses of the private key.

For private key decryption the modular exponentiation algorithm MUST be constant time with regards to the output of the exponentiation.

In case the Chinese remainder theorem optimisation is used the modular exponentiation algorithm must also be constant time with regards to the used moduli.

7.1. General recommendations

It's especially important to make sure that all values that are secret to the attacker are stored in memory buffers that have sizes determined by the public modulus.

For example, the private exponents should be stored in memory buffers that have sizes determined by the public modulus value, not the numerical values of the exponents themselves.

Similarly, the size of the output buffer for multiplication should always be equal to the sum of buffer sizes of multiplicands. The output size of the modular reduction operation should similarly be equal to the size of the modulus and not depend on bit size of the output.

7.2. Montgomery ladder

For the modular exponentiation algorithm to be side-channel free every step of the calculation MUST NOT depend on the bits of the exponent. In particular, use of simple square and multiply algorithm will leak information about bits of the exponent through lack of multiplication operation in individual exponentiation steps.

The recommended workaround against it, is the use of the Montgomery ladder construction.

While that approach ensures that both the square and multiply operations are performed, the fact that the results of them are placed in different memory locations based on bits of the secret exponent will provide enough information for an attacker to recover the bits of the exponent. To counteract it, the implementation should ensure that both memory locations are accessed and updated on every step.

7.3. Montgomery reduction in multiplication

As multiplication operations quickly make the intermediate values in modular exponentiation large, performing a modular reduction after every multiplication or squaring operation is a common optimisation.

To further optimise the modular reduction, the Montgomery modular multiplication is used for performing the combined multiply-and-reduce operation. The last step of that operation is conditional on the value of the output. A side-channel free implementation should perfom the subtraction in all cases and then copy the result or the first operand of the subtraction based on sign of the result of the subtraction in side-channel free manner.

8. Base blinding

As a protection against multiple attacks, it's RECOMMENDED to perform all operations involving the private key with the use of blinding [Kocher96].

It should be noted that for decryption operations the unblinding operation MUST be performed using side-channel free code that does not leak information about the result of this multiplication and reduction modulo operation.

To implement base blinding, select a number bb uniformly at random such that it is relatively prime to n and smaller than n.

Compute multiplicative inverse of bb modulo n.

  bbInv = bb^(-1) mod n

In the RSADP() operation, after performing step 1, multiply c by bb mod n. Use the result as new c for all the remaining operations.

Before returning the value m in step 3, multiply it by bbInv mod n.

Note: multiplication by bbInv and reduction modulo n MUST be performed using side-channel free code with respect to value m.

As calculating multiplicative inverse is expensive, implementations MAY calculate new values of bb and bbInv by squaring them:

  new bb = bb^2 mod n
  new bbInv = bbInv^2 mod n

A given pair of blinding factors (bb, bbInv) MUST NOT be used for more than one RSADP() operation.

Unless the multiplication (squaring) and reduction modulo operations are verified to be side-channel free, it's RECOMMENDED to generate completely new blinding parameters every few hundred private key operations.

9. Exponent blinding

To further protect against private key leaks, it's RECOMMENDED to perform the blinding of the used exponents [Kocher96].

When performing the RSADP() operation, the blinding depends on the form of the private key.

If the key is in the first form, the pair (n, d), then the exponent d should be modified by adding a multiple of Euler phi(n): m = c^(d + b*phi(n)) mod n. Where b is a 64 bit long uniform random number.

A new value b MUST be selected for every RSADP() operation.

If the key is the second form, the quintuple (p, q, dP, dQ, qInv) with optional sequence of triplets (r_i, d_i, t_i), i = 3, ..., u, then each exponent used MUST be blinded individually.

  1. The m_1 = c^(dP + b_1 * phi(p)) mod p

  2. The m_2 = c^(dQ + b_2 * phi(q)) mod q

  3. If u > 3, then m_i = c^(d_i + b_i * phi(r_i)) mod (r_i)

Where b_1, b_2, ..., b_i are all uniformly selected random numbers at least 64 bits long (or at least 2 machine word sizes, whichever is greater).

As Euler phi(p) for an argument p that's a prime is equal p - 1, it's simple to calculate in this case.

Note: the selection of random b_i values, multiplication of them by the result of phi() function, and addition to the exponent MUST be performed with side-channel free code.

Use of smaller blinding factor is NOT RECOMMENDED, as values shorter than 64 bits have been shown to still be vulnerable to side-channel attacks[Bauer12][Schindler11].

The b_1, b_2, ..., b_i factors MUST NOT be reused for multiple RSADP() operations.

10. Modulus blinding

To protect against private key leaks, it's RECOMMENDED to perform blinding of the used modulus values in the CRT implementations.

When the key is in the first form, the pair (n, d), then the used modulus is public, thus no blinding is necessary.

If the key is in the second form, the quintuple (p, q, dP, dQ, qInv) with the optional sequence f triplets (r_i, d_i, t_i), i = 3, ..., u, then each modulus used MUST be blinded individually.

  1. The m_1 = c^dP mod (g_1 * p)

  2. The m_2 = c^dQ mod (g_2 * q)

  3. If u > 3, then m_i = c^d_i mod (g_3 * r_i)

Where g_1, g_2, ..., g_i are all uniformely selected random number at least 64 bits long (or at least 2 machine word sizes, whicher is greater).

Before step 3 of the original algorithm, reduce the returned value m mod n.

11. Depadding

In case of RSA-OAEP, the padding is self-verifying, thus the depadding operation needs to follow the standard algorithm to provide a safe API to users.

It MUST ignore the value of the very fist octet of padding and process the remaining bytes as if it was equal zero.

The RSAES-PKCS-v1_5 encryption scheme is considered deprecated, and should be used only to process legacy data. It MUST NOT be used as part of online protocols or API endpoints.

For implementations that can't remove support for this padding mode it's RECOMMENDED to implement an implicit rejection mechanism that completely hides from the calling code whether the padding check failed or not.

It should be noted that the algorithm MUST be implemented as stated, otherwise in case of heteregonous environments where two implementations use the same key but implement the implicit rejection differently, it may be possible for the attacker to compare behaviour between the implementations to guess if the padding check failed or not.

The basic idea of the implicit rejection is to prepare a random but deterministic message to be returned in case the standard RSAES-PKCS-v1_5 padding checks fail. To do that, use the private key and the provided ciphertext to derive a static, but unknown to the attacker, random value. It's a combination of the method documented in the TLS 1.2 (RFC 5246[RFC5246]) and the deterministic (EC)DSA signatures (RFC 6979 [RFC6979]).

11.1. IRPRF

For the calculation of the random message for implicit rejection we define a Pseudo-Random Function (PRF) as follows:

IRPRF( KDK, label, length )

Input:

KDK the key derivation key

label a label making the output unique for a given KDK

length requested length of output in octets

Output: derived key, an octet string

Steps:

  1. If KDK is not 32 octets long, or if length is larger than 8192 return error and stop.

  2. The returned value is created by concatenation of subsequent calls to a SHA-256 HMAC function with the KDK as the HMAC key and following octet string as the message:

      P_i = I || label || bitLength
    
  3. Where the I is an iterator value encoded as two octet long big endian integer, label is the passed in label, and bitLength is the length times 8 (to represent number of bits of output) encoded as two octet big endian integer. The iterator is initialised to 0 on first call, and then incremented by 1 for every subsequent HMAC call.

  4. The HMAC is iterated until the concatenated output is shorter than length

  5. The output is the length left-most octets of the concatenated HMAC output

11.2. Implicit rejection

For implementations that cannot remove support for the RSAES-PKCS-v1_5 encryption scheme nor provide a usage-specific API, it's possible to implement an implicit rejection algorithm as a protection measure. It should be noted that implementing it correctly is hard, thus it's RECOMMENDED instead to disable support for RSAES-PKCS-v1_5 padding instead.

To implement implicit rejection, the RSAES-PKCS1-V1_5-DECRYPT from section 7.2.2 of RFC 8017 needs to be implemented as follows:

  1. Length checking: If the length of the ciphertext C is not k octets (or if k < 11), output "decryption error" and stop.

  2. RSA decryption:

    1. Convert the ciphertext C to an integer ciphertext representative c:

        c = OS2IP (C).
      
    2. Apply the RSADP decryption primitive to the RSA private key (n, d) and the ciphertext representative c to produce an integer message representative m:

        m = RSADP ((n, d), c).
      

      Note: the RSADP MUST be constant-time with respect of message m.

      If RSADP outputs "ciphertext representative out of range" (meaning that c >= n), output "decryption error" and stop.

    3. Convert the message representative m to an encoded message EM of length k octets:

        EM = I2OSP (m, k).
      

      Note: I2OSP MUST be constant-time with respect of m.

  3. Derivation of alternative message

    1. Derive the Key Derivation Key (KDK)

      1. Convert the private expoent d to a string of length k octets:

          D = I2OSP (d, k).
        
      2. Hash the private exponent using the SHA-256 algorithm:

          DH = SHA256 (D).
        

        Note: This value MAY be cached between the decryption operations, but MUST be considered private-key equivalent.

      3. Use the DH as the SHA-256 HMAC key and the provided ciphertext C as the message. If the ciphertext C is not k octets long, it MUST be left padded with octets of value zero.

          KDK = HMAC (DH, C, SHA256).
        
    2. Create the candidate lengths and the random message

      1. Use the IRPRF with key KDK, "length" as six octet label encoded with UTF-8, to generate 256 octet output. Interpret this output as 128 two octet long big-endian numbers.

          CL = IRPRF (KDK, "length", 256).
        
      2. Use the IRPRF with key KDK, "message" as a seven octet label encoded with UTF-8 to generate k octet long output to be used as the alternative message:

          AM = IRPRF (KDK, "message", k).
        
    3. Select the alternative length for the alternative message.

      Note: this must be performed in side-channel free way.

      1. Iterate over the 128 candidate CL lengths. For each zero out high order bits so that they have the same bit length as the maximum valid message size (k - 11).

      2. Select the last length that's not larger than k - 11, use 0 if none are. Save it as AL.

  4. EME-PKCS1-v1_5 decoding: Separate the encoded message EM into an octet string PS consisting of nonzero octets and a message M as

      EM = 0x00 || 0x02 || PS || 0x00 || M.
    

    If the first octet of EM does not have hexadecimal value 0x00, if the second octet of EM does not have hexadecimal value 0x02, if there is no octet with hexadecimal value 0x00 to separate PS from M, or if the length of PS is less than 8 octets, the check variable must remember if any of those checks failed. Irrespective of the check variable value, the code should also return length of message M: L. If there is no octet with hexadecimal value 0x00 to separate PS from M, then L should equal 0.

    Note: All those checks MUST be performed irrespective if previous checks failed or not. A common technique for that is to have a check variable that is OR-ed with the results of subsequent checks.

  5. Decision which message to return: in case the check variable is set, the code should return the last AL octets of AM, in case the check variable is unset the code should return the last L octets of EM.

    Note: The decision which length to use MUST be performed in side-channel free manner. While the length of the returned message is not considered sensitive, the read memory location is. As such, when returning message M both EM and AM memory locations MUST be read.

12. Safe API

Performing all actions in a way that doesn't leak the status of the padding check includes the API provided to 3rd party code. In particular, if the RSA decryption implementation doesn't implement implicit rejection, then all three pieces of information: the padding check, the length of returned message, and the value of the message are sensitive information, useful in mounting an attack. As such, any API that returns an error in substantially different manner than a successful decryption (e.g. raising an exception, returning a null pointer, returning a different type of structure) is vulnerable to side-channel attacks.

13. Non-fixes

While there are infinite ways to implement those algorithms incorrectly few common ideas to work-around side-channel attacks are repeated. We list few of them as examples of approaches that don't work and thus MUST NOT be used.

13.1. Random delays

Commonly proposed workaround for timing attacks is to add a random delay to procesing of encrypted data. For such mitigation to be effective in practice (raise attacker's work factor to gain meaningful safety margin) it would need to be in the order of at least few seconds if not tens of seconds. Effective sizes and distributions of delays have not been subject of extensive study.

It should also be noted that such a delay would need to be applied in addition to the regular mitigation (generate random key, use it in case RSAES-PKCS-v1_5 padding checks). That is, it needs to be implemented in the calling code, not in the depaddng code.

At the same time, performing a simple wait masks only the timing side channel. It doesn't mitigate other side-channels, like ones that depend on power usage or memory access pattern.

13.2. Returing random value from API

A simpler variant of the proposed implicit rejection algorithm, where the decryption API returns a random message in case the padding check fails or the decrypted message has unexpected length, as decribed in [RFC5246], isn't a universal mitigation.

In case of a Bleichenbacher like attack, the attacker is trying to differentiate two classes of ciphertexts: ones that decrypt to message of specific size and ones that have invalid padding or decrypt to a message of unexpected size. That translates to two kind of values being returned to calling code: either the same message always for every decryption, or a different message for every decryption.

If that message is later used in any operation that is not side-channel free (key derivation, symmetric padding removal, message parsing), then the attacker will be able to observe two kinds of behaviour: consistent one for static messages and erratic one for randomly generated messages.

Use of such a message as a key with block cipher modes that require padding, like AES-CBC with PKCS #7 padding, is particularly leaky as it has about 1 in 256 chance of successful decryption with random key.

The simple implicit rejection with random message may be implemented safely in TLS 1.2 because both static messages and randomly generated messages are mixed together with the random nonce generated by the server, thus the attacker is unable to differentiate if the randomness originates in the nonce or in the randomisation of the message.

14. Deprecated Algorithms

Current protocol deployments MUST NOT use encryption with RSAES-PKCS-v1_5 padding. Support for RSAES-PKCS-v1_5 SHOULD be disabled in default configuration of any implementation of RSA cryptosystem. All new protocols MUST NOT specify RSAES-PKCS-v1_5 as a valid encryption padding for RSA keys.

15. IANA Considerations

This memo includes no request to IANA.

16. Security Considerations

This whole document specifies security considerations for RSA implementations.

17. References

17.1. Normative References

[RFC8017]
Moriarty, K., Ed., Kaliski, B., Jonsson, J., and A. Rusch, "PKCS #1: RSA Cryptography Specifications Version 2.2", RFC 8017, DOI 10.17487/RFC8017, , <https://www.rfc-editor.org/info/rfc8017>.

17.2. Informative References

[Aciicmez05]
Acıiçmez, O., Schindler, W., and Ç. K. Koç, "Improving Brumley and Boneh timing attack on unprotected SSL implementations", Proceedings of the 12th ACM conference on Computer and communications security CCS '05, , <https://doi.org/10.1145/1102120.1102140>.
[Aciicmez07]
Acıiçmez, O. and W. Schindler, "A Major Vulnerability in RSA Implementations due to MicroArchitectural Analysis Threat", Cryptology ePrint Archive Paper 2007/336, , <https://eprint.iacr.org/2007/336>.
[Aciicmez08]
Acıiçmez, O. and W. Schindler, "A Vulnerability in RSA Implementations Due to Instruction Cache Analysis and Its Demonstration on OpenSSL", Lecture Notes in Computer Science vol 4964, , <https://doi.org/10.1007/978-3-540-79263-5_16>.
[Bauer12]
Bauer, S., "Attacking Exponent Blinding in RSA without CRT", Lecture Notes in Computer Science vol 7275, , <https://doi.org/10.1007/978-3-642-29912-4_7>.
[Bleichenbacher98]
Bleichenbacher, D., "Chosen Ciphertext Attacks Against Protocols Based on the RSA Encryption Standard PKCS#1", Lecture Notes in Computer Science vol 1462, <https://doi.org/10.1007/BFb0055716>.
[Bock18]
Böck, H., Somorovsky, J., and C. Young, "Return Of Bleichenbacher's Oracle Threat (ROBOT)", 27th USENIX Security Symposium USENIX Security 18, , <https://www.usenix.org/conference/usenixsecurity18/presentation/bock>.
[Brumley03]
Brumley, D. and D. Boneh, "Remote timing attacks are practical", Computer Networks Volume 48, Issue 5, , <https://doi.org/10.1016/j.comnet.2005.01.010>.
[Dhem98]
Dhem, J., Koeune, F., Leroux, P., Mestré, P., Quisquater, J., and J. Willems, "A Practical Implementation of the Timing Attack", Lecture Notes in Computer Science vol 1820, , <https://doi.org/10.1007/10721064_15>.
[Kario23]
Kario, H., "Everlasting ROBOT: the Marvin Attack", 28th European Symposium on Research in Computer Security , , <https://eprint.iacr.org/2023/1442>.
[Kocher96]
Kocher, P. C., "Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems.", Lecture Notes in Computer Science vol 1109, , <https://doi.org/10.1007/3-540-68697-5_9>.
[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC5246]
Dierks, T. and E. Rescorla, "The Transport Layer Security (TLS) Protocol Version 1.2", RFC 5246, DOI 10.17487/RFC5246, , <https://www.rfc-editor.org/info/rfc5246>.
[RFC6979]
Pornin, T., "Deterministic Usage of the Digital Signature Algorithm (DSA) and Elliptic Curve Digital Signature Algorithm (ECDSA)", RFC 6979, DOI 10.17487/RFC6979, , <https://www.rfc-editor.org/info/rfc6979>.
[Schindler00]
Schindler, W., "A Timing Attack against RSA with the Chinese Remainder Theorem", Lecture Notes in Computer Science vol 1965, , <https://doi.org/10.1007/3-540-44499-8_8>.
[Schindler01]
Schindler, W., Koeune, F., and J. Quisquater, "Improving Divide and Conquer Attacks against Cryptosystems by Better Error Detection / Correction Strategies.", Lecture Notes in Computer Science vol 2260, , <https://doi.org/10.1007/3-540-45325-3_22>.
[Schindler11]
Schindler, W. and K. Itoh, "Exponent Blinding Does Not Always Lift (Partial) Spa Resistance to Higher-Level Security", Lecture Notes in Computer Science vol 6715, , <https://doi.org/10.1007/978-3-642-21554-4_5>.
[Schindler14]
Schindler, W. and A. Wiemers, "Power attacks in the presence of exponent blinding.", Journal of Cryptographic Engineering 4, , <https://doi.org/10.1007/s13389-014-0081-y>.

Appendix A. Test Vectors

A.1. 2048 bit key

«provide test vectors here»

see also: https://github.com/tlsfuzzer/tlslite-ng/blob/master/unit_tests/test_tlslite_utils_rsakey.py#L1694 or OpenSSL 3.2.0.

proposed test vectors:

Otherwise valid, but with wrong first byte of plaintext

Otherwise valid, but with padding type specifying signature

Otherwise valid, but with PS of 7 bytes

Otherwise valid, but with PS of 0 bytes

Otherwise valid, but with the message separator byte missing

Invalid ciphertext that decrypts to a synthehic message of maximum size

Invalid ciphertext that decrypts to a 0-bytes long message

Invalid ciphertext that needs to use the second-to-last synthethic length for the returned message

Valid ciphertext that starts with a zero byte

A.2. 2049 bit key

«provide test vectors here

A.3. 3072 bit key

«provide test vectors here»

A.4. 4096 bit key

«provide test vectors here»

Author's Address

Hubert Kario
Red Hat, Inc.
Purkynova 115
61200 Brno
Czech Republic