oss-security - Re: Ghostscript 10.03.1 (2024-05-02) fixed 5 CVEs including CVE-2024-33871 arbitrary code execution

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <57c462dd-f23a-4f55-a870-55c6886767a0@codean.io>
Date: Wed, 3 Jul 2024 16:07:37 +0200
From: Thomas Rinsma <thomas@...ean.io>
To: oss-security@...ts.openwall.com
Subject: Re: Ghostscript 10.03.1 (2024-05-02) fixed 5 CVEs including
 CVE-2024-33871 arbitrary code execution

Hi,

Per Solar's request, here is some information on recent Ghostscript 
bugs. They have all been fixed upstream already for either ~1 month 
(10.03.1) or ~4 months (10.03.0). It looks like patches have also landed 
in most distros, but there is not a super clear changelog or version 
history so this might help clarify things.

Note that this is just a subset of all vulnerabilities fixed in 10.03.0 
and 10.03.1: these are just the bugs I myself found and reported.

# CVE-2024-29509 - heap buffer overflow via the PDFPassword parameter

The `runpdf` command (and friends) allows the new C-based PDF 
interpreter to be invoked from within PS. With this, we can pass various 
flags and arguments (see `pdf_impl_set_param`) that are normally passed 
via the command-line when the PDF interpreter is invoked directly.

It turns out that validation of several of these parameters is flawed, 
maybe because they were considered somewhat "trusted", being 
command-line arguments originally.

The fields `ctx->encryption.Password` and `ctx->encryption.PasswordLen` 
are set based on the value of `PDFPassword`. During the decryption 
process, in `check_password_R5`  in `pdf_sec.c`, a buffer is allocated 
based on the string-length of this field:

```
code = pdfi_object_alloc(ctx, PDF_STRING, 
strlen(ctx->encryption.Password), (pdf_obj **)&P);
```

However, a `memcpy` later copies the full length of the PS-supplied 
object into this buffer:

```
memcpy(P->data, Password, PasswordLen);
```

Because PS-strings are not null-terminated, this will result in a heap 
buffer overflow when a value of `PDFPassword` is supplied with a null 
byte in the middle. For example, the following will result in a `memcpy` 
of 7 bytes into a buffer of size 3:

```
/PDFPassword (foo\000bar) def
```

This bug was fixed in 10.03.0 (2024-03-06), and is bug (1) in this 
report: https://bugs.ghostscript.com/show_bug.cgi?id=707510


# CVE-2024-29506 - stack buffer overflow in pdfi_apply_filter()

The `PDFDEBUG` flag controls the value of `ctx->args.debug`. In 
`pdfi_apply_filter` this enables execution of a `memcpy` into a stack 
buffer, without bounds checks. The input (`n->data`, the PDF filter 
name) is an attacker controlled buffer of arbitrary size. A filter name 
larger than 100 will overflow the `str` buffer.

```
if (ctx->args.pdfdebug)
     {
         char str[100];
         memcpy(str, (const char *)n->data, n->length);
         str[n->length] = '\0';
         dmprintf1(ctx->memory, "FILTER NAME:%s\n", str);
     }
```

This bug was also fixed in 10.03.0 (2024-03-06), and is bug (2) in this 
report: https://bugs.ghostscript.com/show_bug.cgi?id=707510


# CVE-2024-29507 - stack buffer overflow via CIDFSubstPath/Font params

Under specific conditions, the `cidfsubstpath` and `cidfsubstfont` 
parameters (set by corresponding Postscript objects) are used to load 
substitute fonts (this is in `pdfi_open_CIDFont_substitute_file`). The 
values are `memcpy`d into the `fontfname` buffer without bounds checks. 
Hence, an attacker can pass values larger than the buffer size to 
trigger a stack buffer overflow.

```
char fontfname[gp_file_name_sizeof]; // 4096

// .. <snip> ...

if (ctx->args.cidfsubstpath.data == NULL) {
     memcpy(fontfname, fsprefix, fsprefixlen);
}
else {
     memcpy(fontfname, ctx->args.cidfsubstpath.data, 
ctx->args.cidfsubstpath.size);
     fsprefixlen = ctx->args.cidfsubstpath.size;
}

if (ctx->args.cidfsubstfont.data == NULL) {
     // ... <snip> ...
}
else {
     memcpy(fontfname, ctx->args.cidfsubstfont.data, 
ctx->args.cidfsubstfont.size);
     defcidfallacklen = ctx->args.cidfsubstfont.size;
}
```

This bug was also fixed in 10.03.0 (2024-03-06), and is bug (3) in this 
report: https://bugs.ghostscript.com/show_bug.cgi?id=707510


# CVE-2024-29508 - heap pointer leak in pdf_base_font_alloc()

The function `pdf_base_font_alloc` used by the `pdfwrite` device will 
use a hexadecimal pointer representation (`".F" PRI_INTPTR`) for the 
constructed BaseFont name if the input name is empty:

```
if (pfname->size > 0) {
     font_name.data = pfname->chars;
     font_name.size = pfname->size;
     while (pdf_has_subset_prefix(font_name.data, font_name.size)) {
         /* Strip off an existing subset prefix. */
         font_name.data += SUBSET_PREFIX_SIZE;
         font_name.size -= SUBSET_PREFIX_SIZE;
     }
} else {
     gs_snprintf(fnbuf, sizeof(fnbuf), ".F" PRI_INTPTR, (intptr_t)copied);
     font_name.data = (byte *)fnbuf;
     font_name.size = strlen(fnbuf);
}
```

Resulting in, for example:

```
<</BaseFont/YZKFTQ+.F0x5618b147e378/FontDescriptor 8 0 R/ToUnicode 11 0 
R/Type/Font ...
```

An attacker can obtain this pointer value by reading back the output 
file (after writing to a temporary writable and readable location).


This bug (and various other pointer leaks) were fixed in 10.03.0 
(2024-03-06), and is bug (4) in this report: 
https://bugs.ghostscript.com/show_bug.cgi?id=707510


# CVE-2024-29511 - arbitrary file read/write through Tesseract config

The `ocr` family of devices invoke Tesseract to perform OCR operations. 
The device parameter `OCRLanguage` is used by Tesseract to load a data 
file for that specific language. Specifically, such a file is loaded 
from `./<OCRLanguage>.traineddata`. By using a path traversal to 
`/tmp/`, we can force Tesseract to load our own data file:

```
mark
/OutputFile (/tmp/notused)
/OCRLanguage (../../../../../tmp/test) % loads /tmp/test.traineddata
/OutputDevice /ocr
.dicttomark
setpagedevice
```

As it turns out, Tesseract `traineddata` files can include various 
configuration values, including `user_patterns_file` which will try to 
load patterns from the given path, and `debug_file` which will write 
debug information to the given path. The debug information is quite 
verbose, and will print full input lines if they don’t start with a 
valid character in the trained language. By constructing our "language" 
such that no character is valid, all lines in the pattern file are 
printed. For example, the configuration settings:

```
debug_file /tmp/out
user_patterns_file /etc/passwd
```

will result in a file `/tmp/out` containing:

```
Error: failed to insert pattern 'root:x:0:0:root:/root:/bin/bash'
Error: failed to insert pattern 
'daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin'
Error: failed to insert pattern 'bin:x:2:2:bin:/bin:/usr/sbin/nologin'
Error: failed to insert pattern 'sys:x:3:3:sys:/dev:/usr/sbin/nologin'
Error: failed to insert pattern 'sync:x:4:65534:sync:/bin:/bin/sync'
<etc>
```

In Postscript we can:

1. Construct the traineddata file under `/tmp/`
2. Use path traversal in `OCRLanguage` to load it when initializing the 
`ocr` device
3. Read the resulting output data in `/tmp/out`

This allows us to read arbitrary files outside of the SAFER sandbox, and 
write to arbitrary file paths, although during writing, every line will 
start with `Error: failed to insert pattern '` and end with `'`.

Note that this is the Tesseract/OCR-related bug that was referred to by 
the Ghostscript changelog (and quoted earlier in this thread). Contrary 
to what is stated in the changelog it does not lead to RCE by itself, 
just file read/write. It also requires Ghostscript to be compiled with 
Tesseract support.


# CVE-2024-29510 - format string injection in uniprint device

The `uniprint` device allows the user to provide various string 
fragments as device options, which are later appended to the output 
file. Two of these parameters, `upWriteComponentCommands` and 
`upYMoveCommand`, are actually treated as format strings, specifically 
for `gp_fprintf` and `gs_snprintf`. For these, the intention is for the 
user to include just one format specifier in the string, but there is no 
logic preventing arbitrary format strings (with multiple specifiers) 
from being used.

With full control over the format string (by setting a page device with 
the respective options), and read access to the device output (by 
setting it to a temporary file path), an attacker can abuse this to leak 
data from the stack and perform memory corruption. This is specifically 
impactful in the cases of `gs_snprintf` (as opposed to `gp_fprintf`), as 
its format-string parsing logic is not hardened by compiler measures 
like `D_FORTIFY_SOURCE`, while it still supports the `%n` modifier.

Bug report and public blog post with more details and PoC leading to a 
SAFER sandbox bypass:

https://bugs.ghostscript.com/show_bug.cgi?id=707662
https://codeanlabs.com/blog/research/cve-2024-29510-ghostscript-format-string-exploitation/

---

Cheers,
Thomas
Please check out the Open Source Software Security Wiki, which is counterpart to this mailing list.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.