oss-security - Re: CPython: CVE-2024-8088: Infinite loop when iterating over zip archive entry names

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZsjeNwQ-v2yuOR4D@nihonium>
Date: Fri, 23 Aug 2024 21:08:39 +0200
From: Fay Stegerman <flx@...usk.net>
To: oss-security@...ts.openwall.com
Subject: Re: CPython: CVE-2024-8088: Infinite loop when
 iterating over zip archive entry names

* Fay Stegerman <flx@...usk.net> [2024-08-23 17:38]:
> * Fay Stegerman <flx@...usk.net> [2024-08-22 23:12]:
> > * Alan Coopersmith <alan.coopersmith@...cle.com> [2024-08-22 20:56]:
> > > -------- Forwarded Message --------
> > > Subject: 	[Security-announce][CVE-2024-8088] Infinite loop when iterating
> > > over zip archive entry names
> > > Date: 	Thu, 22 Aug 2024 13:40:20 -0500
> > > From: 	Seth Larson <seth@...hon.org>
> > > Reply-To: 	security-sig@...hon.org
> > > To: 	security-announce@...hon.org
> > >
> > > There is a HIGH severity vulnerability affecting the CPython "zipfile" module.
> > >
> > > When iterating over names of entries in a zip archive (for example, methods
> > > of "zipfile.ZipFile" like "namelist()", "iterdir()", "extractall()", etc)
> > > the process can be put into an infinite loop with a maliciously crafted
> > > zip archive. This defect applies when reading only metadata or extracting
> > > the contents of the zip archive. Programs that are not handling
> > > user-controlled zip archives are not affected.
> > >
> > > Please see the linked CVE ID for the latest information on affected versions:
> > >
> > > * https://www.cve.org/CVERecord?id=CVE-2024-8088
> > > * https://github.com/python/cpython/pull/122906
> > > * https://github.com/python/cpython/issues/122905
> >
> > A small correction/addendum based on reading the vulnerability report and the PR
> > that fixes this (as well as being quite familiar with Python zipfile.ZipFile
> > internals and confused how this would affect it): it's not zipfile.ZipFile and
> > its methods that are affected, at least not directly, but zipfile.Path.  The
> > issue being this code in zipfile._path._ancestry():
> >
> >   path = path.rstrip(posixpath.sep)
> >   while path and path != posixpath.sep:
> >       yield path
> >       path, tail = posixpath.split(path)
> >
> > Which results in an infinite loop because for example posixpath.split("//") ==
> > ("//", "") but "//" != posixpath.sep:
> >
> >   >>> it = zipfile._path._parents("//foo")
> >   >>> next(it)
> >   '//'
> >   >>> next(it)
> >   '//'
> >   >>> next(it)
> >   '//'
> >
> > The infinite loop has been fixed by sanitising the paths.
> 
> Forgot to mention this: the infinite loop is triggered when zipfile.Path adds
> "implied directories" -- using _parents(), which calls _ancestry() -- in the
> overridden .namelist() for the custom zipfile.ZipFile subclass it wraps.  Which
> is (indirectly) used by almost all of the zipfile.Path methods like .iterdir(),
> .glob(), .exists(), .joinpath() etc.
> 
>   >>> zf = zipfile.ZipFile(io.BytesIO(), "w")
>   >>> zf.filename = "foo.zip"
>   >>> zf.writestr("a/b/c", "abc")
>   >>> zf.writestr("d/e", "de")
>   >>> zf.namelist()
>   ['a/b/c', 'd/e']
>   >>> zf.__class__
>   <class 'zipfile.ZipFile'>
> 
>   >>> p = zipfile.Path(zf)
>   >>> p.root.namelist()
>   ['a/b/c', 'd/e', 'a/b/', 'a/', 'd/']
>   >>> list(p.iterdir())
>   [Path('foo.zip', 'a/'), Path('foo.zip', 'd/')]
>   >>> zf.__class__
>   <class 'zipfile._path.CompleteDirs'>
> 
>   >>> zf.writestr("//oops", "oops")
>   >>> # infinite loop via joinpath -> resolve_dir -> _name_set -> namelist ->
>   >>> # _implied_dirs -> _ancestry
>   >>> p / "foo"
> 
> As zipfile.Path modifies the class of the original ZipFile, calling .namelist()
> or .extractall() on the original ZipFile used to create the Path afterwards is
> also affected even though zipfile.ZipFile as such is not.

I just reported [1] that the patch introduced a regression:

  >>> import io, zipfile
  >>> zf = zipfile.ZipFile(io.BytesIO(), "w")
  >>> zf.writestr("d:/foo", "bar")
  >>> zf.extractall("a")
  >>> open("a/d:/foo").read()
  'bar'
  >>> p = zipfile.Path(zf)
  >>> x = p / "d" / "foo"
  >>> y = p / "d:" / "foo"
  >>> list(p.iterdir())   # before: [Path(None, 'd:/')]
  [Path(None, 'd/')]
  >>> p.root.namelist()   # before: ['d:/foo', 'd:/']
  ['d/foo', 'd/']
  >>> x.exists()          # before: False
  True
  >>> y.exists()          # before: True
  False
  >>> zf.extractall("b")  # before: worked like above
  KeyError: "There is no item named 'd/foo' in the archive"
  >>> x.read_text()       # before: FileNotFoundError
  KeyError: "There is no item named 'd/foo' in the archive"
  >>> y.read_text()       # before: worked
  FileNotFoundError: ...

- Fay

[1] https://github.com/python/cpython/issues/123270
Please check out the Open Source Software Security Wiki, which is counterpart to this mailing list.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.