From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/thunk: (Mis)align the RETs in clear_bhb_loops() to mitigate ITS

The Indirect Target Selection speculative vulnerability means that indirect
branches (including RETs) are unsafe when in the first half of a cacheline.

clear_bhb_loops() has a precise layout of branches.  The alignment for
performance cause the RETs to always be in an unsafe position, and converting
those to return thunks changes the branching pattern.  While such a conversion
is believed to be safe, clear_bhb_loops() is also a performance-relevant
fastpath, so (mis)align the RETs to be in a safe position.

No functional change.

This is part of XSA-469 / CVE-2024-28956

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

diff --git a/xen/arch/x86/bhb-thunk.S b/xen/arch/x86/bhb-thunk.S
index 7e866784f784..05f1043df7d0 100644
--- a/xen/arch/x86/bhb-thunk.S
+++ b/xen/arch/x86/bhb-thunk.S
@@ -52,7 +52,12 @@ ENTRY(clear_bhb_tsx)
  *   ret
  *
  * The CALL/RETs are necessary to prevent the Loop Stream Detector from
- * interfering.  The alignment is for performance and not safety.
+ * interfering.
+ *
+ * The .balign's are for performance, but they cause the RETs to be in unsafe
+ * positions with respect to Indirect Target Selection.  The .skips are to
+ * move the RETs into ITS-safe positions, rather than using the slowpath
+ * through __x86_return_thunk.
  *
  * The "short" sequence (5 and 5) is for CPUs prior to Alder Lake / Sapphire
  * Rapids (i.e. Cores prior to Golden Cove and/or Gracemont).
@@ -68,12 +73,14 @@ ENTRY(clear_bhb_loops)
         jmp     5f
         int3
 
-        .align 64
+        .balign 64
+        .skip   32 - (.Lr1 - 1f), 0xcc
 1:      call    2f
-        ret
+.Lr1:   ret
         int3
 
-        .align 64
+        .balign 64
+        .skip   32 - 18 /* (.Lr2 - 2f) but Clang IAS doesn't like this */, 0xcc
 2:      ALTERNATIVE "mov $5, %eax", "mov $7, %eax", X86_SPEC_BHB_LOOPS_LONG
 
 3:      jmp     4f
@@ -85,7 +92,7 @@ ENTRY(clear_bhb_loops)
         sub     $1, %ecx
         jnz     1b
 
-        ret
+.Lr2:   ret
 5:
         /*
          * The Intel sequence has an LFENCE here.  The purpose is to ensure