Follow @Openwall on Twitter for new release announcements and other news
[<prev] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250222032521.GA30890@openwall.com>
Date: Sat, 22 Feb 2025 04:25:21 +0100
From: Solar Designer <solar@...nwall.com>
To: oss-security@...ts.openwall.com
Cc: Qualys Security Advisory <qsa@...lys.com>,
	Dmitry Belyavskiy <dbelyavs@...hat.com>,
	Jordy Zomer <jordy@...ing.systems>, Damien Miller <djm@...drot.org>
Subject: Re: MitM attack against OpenSSH's VerifyHostKeyDNS-enabled client

Hi,

Thank you Qualys for the very interesting research, as is usual from you.

On Tue, Feb 18, 2025 at 09:14:36AM +0000, Qualys Security Advisory wrote:
> - we manually audited all of OpenSSH's functions that use "goto", for
>   missing resets of their return value;
> 
> - we wrote a CodeQL query that automatically searches for functions that
>   "goto out" without resetting their return value in the corresponding
>   "if" code block.

I didn't go as far as CodeQL, but I also did some semi-manual auditing:

grep -A100 '[^a-z_]if.[^=!<>]*=[^=]' *.c | less

and then search for goto.  I did this against patched OpenSSH source
tree installed with "rpmbuild -rp openssh-8.7p1-43.el9.src.rpm" hoping
to spot any issues there may be specific to this older base OpenSSH
version or Red Hat's changes to it.

This is indeed imperfect as it e.g. doesn't catch assignments only seen
on further lines within an "if" condition (not the line with "if" on it)
if the condition spans multiple lines.  I also ran out of time
completing this review.  In the portion that I did review, I only found
a subset of the same issues that Qualys had found, plus one related
uninteresting bug (see below).

> Our manual audit (of all the functions that use "goto") allowed us to
> verify that our CodeQL query does not produce false negatives (which
> would be worse than false positives), but it also allowed us to review
> code that is similar but not identical to the idiom presented in the
> "Background" section.
> 
> In OpenSSH's client, the following code, which checks the server's
> identity (the server's host key), naturally caught our attention:
> 
> ------------------------------------------------------------------------
>   93 static int
>   94 verify_host_key_callback(struct sshkey *hostkey, struct ssh *ssh)
>   95 {
>  ...
>  101         if (verify_host_key(xxx_host, xxx_hostaddr, hostkey,
>  102             xxx_conn_info) == -1)
>  103                 fatal("Host key verification failed.");
>  104         return 0;
>  105 }
> ------------------------------------------------------------------------
> 1470 int
> 1471 verify_host_key(char *host, struct sockaddr *hostaddr, struct sshkey *host_key,
> 1472     const struct ssh_conn_info *cinfo)
> 1473 {
> ....
> 1538         if (options.verify_host_key_dns) {
> ....
> 1543                 if ((r = sshkey_from_private(host_key, &plain)) != 0)
> 1544                         goto out;
> ....
> 1571 out:
> ....
> 1580         return r;
> 1581 }
> ------------------------------------------------------------------------

Given that the actually security-relevant bug turned out to be "similar
but not identical to the idiom" that Qualys wrote they did most auditing
of, I then switched to going through:

grep 'if.*(.*(.*== *-1' *.c | less

This is similarly imperfect (only catches function calls directly from
the "if" line, not return values assigned to a variable just before, and
doesn't catch continuation lines), but at least I completed this review
for openssh-9.9p1.  This amounted to separately locating and reviewing
the bodies of called OpenSSH-specific functions (not libc functions nor
compatibility wrappers) and sometimes those of nested function calls.

(I assumed the compatibility wrappers correctly implement the same
function that a library would, including return value semantics.
Someone may review them separately.  I actually happened to look at a
few, but that's very far from exhaustive.)

I then diff'ed the output of the above grep command vs. the same for the
openssh-8.7p1-43.el9 tree, and similarly reviewed code for all lines of
grep output that are added for openssh-8.7p1-43.el9.

With this, I also only found another uninteresting bug (see below).

I wonder if such review could also be automated with CodeQL (or maybe
even the classic Coccinelle?), or if it's beyond tools' capabilities?

> 2025-02-10: Advisory and patches sent to distros@...nwall.

Qualys did in fact share a patch from upstream OpenSSH developers, which
I now see is identical to changes that went into 9.9p2 (which also
includes some other changes).  As I found this focused patch helpful for
my code reviews and fix backporting, I also attach it here.

I also attach my result of applying the patch to openssh-8.7p1-43.el9.
I reviewed that whatever hunks did not apply were in fact inapplicable
to this version.  I also added a fix for my uninteresting bug one:

+++ openssh-8.7p1-43.el9-tree.qualys-retval/ssh-agent.c	2025-02-21 04:01:32.677160367 +0000
@@ -700,6 +700,8 @@ process_add_identity(SocketEntry *e)
 	if ((r = sshkey_private_deserialize(e->request, &k)) != 0 ||
 	    k == NULL ||
 	    (r = sshbuf_get_cstring(e->request, &comment, NULL)) != 0) {
+		if (!r) /* k == NULL */
+			r = SSH_ERR_INTERNAL_ERROR;
 		error_fr(r, "parse");
 		goto out;
 	}

This should prevent logging a confusing "parse: success" message on
"k == NULL", as r could have been set to 0 on the line before.

This issue is also present in upstream OpenSSH 9.9p2.

As to my uninteresting bug two, it's illustrated by this patch (also
attached here):

+++ openssh-8.7p1-43.el9-tree.krb5-ssh_asprintf_append/auth-krb5.c	2025-02-21 03:37:13.106465704 +0000
@@ -309,13 +309,14 @@ ssh_asprintf_append(char **dsc, const ch
 	i = vasprintf(&src, fmt, ap);
 	va_end(ap);
 
-	if (i == -1 || src == NULL)
+	if (i == -1)
 		return -1;
 
 	old = *dsc;
 
 	i = asprintf(dsc, "%s%s", *dsc, src);
-	if (i == -1 || src == NULL) {
+	if (i == -1) {
+		*dsc = old;
 		free(src);
 		return -1;
 	}

This is in RH-added Kerberos support code.  The issue was that if the
second asprintf() call failed, it'd leave *dsc undefined, yet the caller
of this function would free() memory via that pointer.  In practice,
glibc would either leave the pointer unchanged or reset it to NULL
(varying by glibc version and specific error condition), both of which
are safe to free().  Yet resetting "*dsc = old;" should be safer, and
should avoid the memory leak that happens if *dsc got reset to NULL.
That memory leak shouldn't have mattered anyway because it'd only occur
when the process already has trouble allocating more memory here.

The "src == NULL" checks are dropped because the first one shouldn't
matter if asprintf() behaves correctly and wouldn't help if it does not
(as src isn't initialized to NULL before the call), the second one
is wrong (was probably meant to check *dsc, not src), and further code
in this same source file relies on asprintf() return value anyway.

These patches just went into the Rocky Linux SIG/Security package of
OpenSSH for EL9:

https://sig-security.rocky.page/packages/openssh/
https://git.rockylinux.org/sig/security/src/openssh

The above auth-krb5.c patch is actually untested since we currently
build that package with Kerberos support excluded (and besides it'd take
specific effort to trigger that error path).

Alexander

View attachment "openssh-9.9-upstream-retval.patch" of type "text/plain" (5150 bytes)

View attachment "openssh-8.7p1-upstream-rocky-retval.patch" of type "text/plain" (3740 bytes)

View attachment "openssh-8.7p1-rocky-krb5-ssh_asprintf_append.patch" of type "text/plain" (627 bytes)

Powered by blists - more mailing lists

Please check out the Open Source Software Security Wiki, which is counterpart to this mailing list.

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.