Efficient and practically secure masked software implementations are faced with significant challenges. A fundamental reason is the unknown nature of micro-architectures of commercial processors and their leakage-inducing effects. Thus, even though provably secure software algorithms have been presented in the literature, it requires additional consideration when implementing them in practice. In this work, we tackle horizontal leakage effects originating in the ALU micro-architecture of CPUs. Horizontal leakage is emitted when ALU operations require the combination of values at different bit indices to yield the correct result and gives adversaries the joint information of multiple bits within a register. This led to the belief that no more than one share of a secret value must be present in the same register at any point. We show that this restriction is not universally true. We introduce barriers within register that stop horizontal leakage within, and thus allows multiple shares of the same secret to be placed within a single register. This enables us to operate on multiple shares within a single software instruction and therefore increase efficiency. With our proposed share and barrier layout, we present practical case studies on a full AES round and the AES-prime Sbox and show their SCA security with up to one million traces.
Zeitschner et al. (Mon,) studied this question.