# Use-After-Free via TOCTOU Race in net/tls: tls_sk_proto_close() reads tx_conf without lock_sock **Reporter:** Oleg Sevostyanov **Date:** 2026-05-16 **Kernel version:** 7.1-rc3 (confirmed; likely present since ~v4.13 when TLS ULP was introduced) **Subsystem:** net/tls **Files:** - `net/tls/tls_main.c` — vulnerable read at line 372 - `net/tls/tls_sw.c` — UAF sites in tx_work_handler (line 2637), tls_encrypt_done (line 467) **CWE:** CWE-416 (Use After Free), CWE-362 (Race Condition) **Severity:** High — local privilege escalation; no privileges required --- ## Summary `tls_sk_proto_close()` in `net/tls/tls_main.c` reads the field `ctx->tx_conf` at line 372 **without holding `lock_sock`**. A concurrent `setsockopt(SOL_TLS, TLS_TX, ...)` call writes `ctx->tx_conf = TLS_SW` **inside** `lock_sock`. When the race is won by `setsockopt`, the close path: 1. **Skips** `tls_sw_cancel_work_tx()` (which would set `BIT_TX_CLOSING` and call `disable_delayed_work_sync`) because it saw `TLS_BASE` at line 372. 2. **Calls** `tls_sw_free_ctx_tx()` → `kfree(tls_sw_context_tx)` at line 390 because it sees `TLS_SW` on the second (now correctly-locked) read. 3. A delayed workqueue item (`tx_work_handler`, scheduled 1 jiffy earlier by `tls_encrypt_done` or `tls_sw_write_space`) fires after the `kfree`, producing a **use-after-free** on the freed `tls_sw_context_tx` object. No special privileges are required — any unprivileged user with a TCP socket can trigger the race. --- ## Affected Kernel Versions The unlocked read of `tx_conf` before `lock_sock` in `tls_sk_proto_close` has been present since the TLS ULP was introduced (~v4.13). All kernels with `CONFIG_TLS=y` in the v4.13–v7.1 range are likely affected, subject to confirmation against each stable branch. Earliest introducing commit (approximate): ``` e8f69799810c ("net/tls: Add generic NIC offload infrastructure", 2018-07-13) ``` or the commit that split `tls_sk_proto_close` into its current form. --- ## Exact Vulnerable Code ### net/tls/tls_main.c — unlocked read + free ```c /* Line 365–399 (Linux 7.1-rc3) */ static void tls_sk_proto_close(struct sock *sk, long timeout) { struct inet_connection_sock *icsk = inet_csk(sk); struct tls_context *ctx = tls_get_ctx(sk); long timeo = sock_sndtimeo(sk, 0); bool free_ctx; if (ctx->tx_conf == TLS_SW) /* ← L372: READ WITHOUT lock_sock BUG */ tls_sw_cancel_work_tx(ctx); /* ← L373: SKIPPED when race wins */ lock_sock(sk); /* ← L375: lock acquired too late */ free_ctx = ctx->tx_conf != TLS_HW && ctx->rx_conf != TLS_HW; if (ctx->tx_conf != TLS_BASE || ctx->rx_conf != TLS_BASE) tls_sk_proto_cleanup(sk, ctx, timeo); write_lock_bh(&sk->sk_callback_lock); if (free_ctx) rcu_assign_pointer(icsk->icsk_ulp_data, NULL); WRITE_ONCE(sk->sk_prot, ctx->sk_proto); if (sk->sk_write_space == tls_write_space) sk->sk_write_space = ctx->sk_write_space; write_unlock_bh(&sk->sk_callback_lock); release_sock(sk); if (ctx->tx_conf == TLS_SW) /* ← L389: second read (stale, race won) */ tls_sw_free_ctx_tx(ctx); /* ← L390: kfree(tls_sw_context_tx) FREE */ ... } ``` ### net/tls/tls_main.c — setsockopt sets tx_conf under lock ```c /* Line 757–758 — inside do_tls_setsockopt_conf(), which holds lock_sock */ if (tx) ctx->tx_conf = conf; /* ← sets TLS_SW under lock_sock */ ``` ### net/tls/tls_sw.c — cancel_work_tx: what is skipped ```c /* Line 2539–2546 */ void tls_sw_cancel_work_tx(struct tls_context *tls_ctx) { struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx); set_bit(BIT_TX_CLOSING, &ctx->tx_bitmask); /* prevent new work */ set_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask); disable_delayed_work_sync(&ctx->tx_work.work); /* wait for in-flight */ } ``` Without this call, `BIT_TX_CLOSING` is never set → `tx_work_handler` does not return early at line 2650 and proceeds to access freed memory. ### net/tls/tls_sw.c — delayed work scheduler (1 jiffy after crypto callback) ```c /* Line 515–517 — tls_encrypt_done() */ if (!test_and_set_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask)) schedule_delayed_work(&ctx->tx_work.work, 1); /* 1 jiffy delay */ /* Line 521–522 — tls_encrypt_done() */ if (atomic_dec_and_test(&ctx->encrypt_pending)) complete(&ctx->async_wait.completion); /* wakes tls_encrypt_async_wait */ ``` `tls_encrypt_async_wait` returns first (completion fires before the 1-jiffy delay), so `tls_sw_free_ctx_tx` at L390 can race with the pending delayed work. ### net/tls/tls_sw.c — UAF sites in tx_work_handler ```c /* Line 2637–2668 */ static void tx_work_handler(struct work_struct *work) { struct delayed_work *delayed_work = to_delayed_work(work); struct tx_work *tx_work = container_of(delayed_work, struct tx_work, work); struct sock *sk = tx_work->sk; struct tls_context *tls_ctx = tls_get_ctx(sk); struct tls_sw_context_tx *ctx; if (unlikely(!tls_ctx)) return; ctx = tls_sw_ctx_tx(tls_ctx); /* freed pointer */ if (test_bit(BIT_TX_CLOSING, &ctx->tx_bitmask)) /* UAF READ L2650 */ return; if (!test_and_clear_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask)) /* UAF READ/WRITE */ return; if (mutex_trylock(&tls_ctx->tx_lock)) { lock_sock(sk); tls_tx_records(sk, -1); /* UAF — tx_list */ release_sock(sk); mutex_unlock(&tls_ctx->tx_lock); } else if (!test_and_set_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask)) { /* UAF WRITE */ schedule_delayed_work(&ctx->tx_work.work, /* func ptr on freed*/ msecs_to_jiffies(10)); } } ``` ### net/tls/tls_sw.c — UAF sites in tls_encrypt_done ```c /* Line 467–522 */ static void tls_encrypt_done(void *data, int err) { ... ctx = tls_sw_ctx_tx(tls_ctx); /* freed pointer */ ... ctx->async_wait.err = err; /* UAF WRITE L497 */ ... first_rec = list_first_entry(&ctx->tx_list, /* UAF READ L511 */ struct tls_rec, list); if (!test_and_set_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask)) /* UAF READ/WRITE L515 */ schedule_delayed_work(&ctx->tx_work.work, 1); if (atomic_dec_and_test(&ctx->encrypt_pending)) /* UAF READ/WRITE L521 */ complete(&ctx->async_wait.completion); /* UAF WRITE */ } ``` --- ## Race Condition Timeline ``` Thread A — close(fd) Thread B — setsockopt(fd, SOL_TLS, TLS_TX) ══════════════════════════════════════════════════════════════════════════════════ setsockopt(fd, SOL_TLS, TLS_TX, &info) do_tls_setsockopt_conf() lock_sock(sk) tls_set_sw_offload(sk, tx=1) kzalloc_obj(*sw_ctx_tx) → alloc INIT_DELAYED_WORK(&sw_ctx_tx->tx_work) crypto_aead_encrypt() → -EINPROGRESS atomic_inc(&ctx->encrypt_pending) ctx->tx_conf = TLS_SW ← L758 release_sock(sk) close(fd) tls_sk_proto_close(sk) READ ctx->tx_conf → TLS_BASE ← race window: setsockopt set it after this read! (tls_sw_cancel_work_tx NOT called — BIT_TX_CLOSING never set) [async encrypt callback fires] tls_encrypt_done(): schedule_delayed_work(..., 1) ← +1 jiffy complete(&ctx->async_wait.completion) lock_sock(sk) ← L375 tls_sk_proto_cleanup(sk): tls_sw_release_resources_tx(): tls_encrypt_async_wait() ← returns (completion already fired) crypto_free_aead(ctx->aead_send) release_sock(sk) READ ctx->tx_conf → TLS_SW ← L389: now sees TLS_SW tls_sw_free_ctx_tx(): kfree(ctx) ← tls_sw_context_tx FREED ──────┐ [1 jiffy later — workqueue] │ tx_work_handler(): │ ctx = tls_sw_ctx_tx(tls_ctx) ←──┘ FREED test_bit(BIT_TX_CLOSING, ...) ← UAF READ tls_tx_records(sk, -1) ← UAF ``` --- ## Freed Object ```c /* include/net/tls.h */ struct tls_sw_context_tx { struct crypto_aead *aead_send; /* offset 0x00 */ struct crypto_wait async_wait; /* offset 0x08 */ struct tx_work tx_work; /* offset 0x28 — contains delayed_work */ struct tls_rec *open_rec; /* offset 0x50 */ struct list_head tx_list; /* offset 0x58 */ atomic_t encrypt_pending; /* offset 0x68 */ u8 async_capable:1; unsigned long tx_bitmask; /* BIT_TX_SCHEDULED, BIT_TX_CLOSING */ }; /* allocated via kzalloc_obj(*sw_ctx_tx) → kmalloc-256 slab */ ``` `tx_work.work` (a `struct delayed_work`) is at a fixed offset within the freed chunk. Its embedded `work_struct.func` is the function pointer called by the workqueue. --- ## Privilege Requirements | Requirement | Value | |---|---| | Root / CAP_NET_ADMIN | Not required | | CAP_NET_RAW | Not required | | Network namespace | Default (init_net) | | Minimum privilege | Unprivileged user with TCP socket access | | Kernel config | CONFIG_TLS=y (default on most distros) | | Async crypto | Required for the 1-jiffy UAF window; synchronous crypto still triggers the state inconsistency | --- ## Exploitation Scenarios ### Scenario 1 — Crash / DoS (reliability: high) Even without a controlled allocation, `tx_work_handler` traversing the freed `ctx->tx_list` will likely corrupt memory and trigger a kernel BUG/oops within seconds of the race firing. ### Scenario 2 — Information Leak / KASLR Defeat 1. Win the race → `tls_sw_context_tx` (kmalloc-256) is freed. 2. Spray `kmalloc-256` objects from user space before the 1-jiffy deadline: - `msg_msg` bodies (via `msgsnd()`) - `pipe_buffer` structures - `sk_buff` headers 3. `tls_encrypt_done()` fires and reads from the reclaimed chunk: - `list_first_entry(&ctx->tx_list, ...)` → follows attacker-controlled pointer - Returned pointer is dereferenced as a `tls_rec *` 4. Any kernel pointer stored by the spray object in that slot leaks to attacker via timing or error paths → KASLR broken. ### Scenario 3 — Arbitrary Write `complete(&ctx->async_wait.completion)` calls `wake_up_process()` on `x->wait.task_list.next`. If the freed chunk is reclaimed with a controlled `swait_queue_head`, `wake_up_process()` writes to an attacker-controlled `task_struct` pointer. ### Scenario 4 — Local Privilege Escalation (LPE) — Full Root 1. KASLR defeated (Scenario 2 first). 2. Spray the freed 256-byte slot so that `ctx->tx_work.work.func` (at a known offset within the freed chunk) contains the address of a kernel ROP gadget or directly `commit_creds(prepare_kernel_cred(0))`. 3. When `schedule_delayed_work(&ctx->tx_work.work, 10ms)` is called by `tx_work_handler` on the reclaimed chunk, the workqueue executes the attacker's function in softirq/kernel context. 4. Overwrite `current->cred` → uid=gid=0 → root shell. --- ## Reproducer ### Build ```bash gcc -O2 -lpthread -o poc-tls-uaf-race poc-tls-uaf-race.c ``` ### Run ```bash sudo modprobe tls # ensure TLS ULP module is loaded ./poc-tls-uaf-race # run race loop sudo dmesg | grep -A 40 "BUG: KASAN: use-after-free" ``` ### Expected KASAN output (CONFIG_KASAN=y kernel) ``` ================================================================== BUG: KASAN: use-after-free in tx_work_handler+0x.../net/tls/tls_sw.c:2649 Read of size 8 at addr ffff... by task kworker/... CPU: 1 PID: ... Comm: kworker/... Call Trace: tx_work_handler process_one_work worker_thread kthread ret_from_fork ... Freed by task ...: tls_sw_free_ctx_tx tls_sk_proto_close inet_release sock_close ================================================================== ``` ### Race conditions to verify without KASAN Use `ftrace` to log `tx_conf` values at close entry and compare: ```bash echo 'p:probe_close tls_sk_proto_close ctx->tx_conf=%cx' > \ /sys/kernel/debug/tracing/kprobe_events echo 1 > /sys/kernel/debug/tracing/events/kprobes/probe_close/enable ./poc-tls-uaf-race grep "tx_conf=0" /sys/kernel/debug/tracing/trace # 0=TLS_BASE — race hit ``` A `tx_conf=0` at `tls_sk_proto_close` entry while `tx_conf` later becomes 1 (TLS_SW) before `kfree` confirms the race window. --- ## Proposed Fix Move the `tx_conf` check and `tls_sw_cancel_work_tx()` call to **after** `lock_sock()` so that the read is protected by the same lock that `setsockopt` uses when writing `tx_conf`: ```diff --- a/net/tls/tls_main.c +++ b/net/tls/tls_main.c @@ -365,10 +365,10 @@ static void tls_sk_proto_close(struct sock *sk, long timeout) long timeo = sock_sndtimeo(sk, 0); bool free_ctx; - if (ctx->tx_conf == TLS_SW) - tls_sw_cancel_work_tx(ctx); - lock_sock(sk); + /* tx_conf must be read under lock_sock to avoid TOCTOU with setsockopt */ + if (ctx->tx_conf == TLS_SW) + tls_sw_cancel_work_tx(ctx); + free_ctx = ctx->tx_conf != TLS_HW && ctx->rx_conf != TLS_HW; ``` This one-block move ensures that `tls_sw_cancel_work_tx()` is always called before any cleanup when `tx_conf` is `TLS_SW`, regardless of concurrent `setsockopt`. --- ## References - Subsystem maintainers: Jakub Kicinski , John Fastabend - Related prior work: CVE-2023-0461 (different TLS UAF — listening socket context) - Slab cache: `kmalloc-256` - PoC file: `poc-tls-uaf-race.c` (attached)