oss-security - Re: pdf attacks vectors

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120120194610.GA7882@openwall.com>
Date: Fri, 20 Jan 2012 23:46:10 +0400
From: Solar Designer <solar@...nwall.com>
To: oss-security@...ts.openwall.com
Subject: Re: pdf attacks vectors

On Fri, Jan 20, 2012 at 09:35:04AM +0400, Alexander Pletnev wrote:
> Hi guys, im working with an web app, and going to create PDFs on-the-fly with user related data. 
> Therefore im writinig to oss-security. What are a pdf attacks vector need's to be avoided by my app ?
> 
> In other words, what is the most dangerous pdf attack vectors ?

You need to start by understanding and specifying your threat model.
This may be part of the documentation for your web app such that all
contributors to the project, anyone doing a security audit on it, your
users (web app install admins and maybe even end-users) will know the
threat model that the web app is supposed to work under.

Given your description of what you're doing, it sounds like your
untrusted input is not PDFs, and thus you're asking a wrong question,
or maybe you merely use words ("PDF attack vectors") that I'd associate
with something else (attacks via malicious PDF files).  So I won't
comment on "PDF attack vectors" specifically (these sound irrelevant to
your actual needs, and I am not familiar with them anyway).

[ Update: ...or maybe you literally do mean attacks via malicious PDF
files, considering that they may be untrusted input to users of your web
app (even if not to the web app itself).  I touch on this topic closer
to the end of this reply. ]

What you need to do is sanitize user input before passing it to whatever
PDF creating library you're using.  If you know that a certain input
field can only contain values of a certain format and in a certain range
(e.g., numeric values range or string lengths range), you need to
enforce those limits before passing that input field's value to the
library.  Perhaps you will want to have invalid inputs rejected in some
user-friendly manner (e.g., don't silently truncate overly long strings,
but instead re-display the input form with the problematic field
highlighted and the problem explained).

Please note that input sanitization must be done server-side.  While you
may also (partially) duplicate it in JavaScript for better user
experience (such that they get initial input validation without having
to submit the form), the final and complete sanitization must be done on
the server after the form is submitted.

There may also exist attack vectors via perfectly valid input values
(e.g., if a special character is valid input for one of your fields
given the field's purpose, but is in fact special to the library).
There's little you can do about this.  You'd have to research and
consider the library's limitations, but that's tricky and this
information may become outdated for another version of the library
(although that would probably be considered a vulnerability of the
library then).

You may implement some sort of privilege separation within your web app
(e.g., somehow run parts of it under another operating system account on
the web server), but that's tricky - especially if your web app is to be
installed on other servers and by others - and it does not prevent
attacks on the library (it only mitigates their impact), nor does it
prevent attacks via generated PDFs on users' PDF viewers.

Yes, attacks on PDF viewers may also be a concern.  It is possible that
perfectly valid input (as defined above) will make it through the PDF
generating library correctly, yet will trigger a security issue in a PDF
viewer on a user's system.  If the user filling out the initial form on
the web (inputs to the PDF) is the same user who will view the PDF file,
then there may be no privilege boundary crossed here (so no security
issue even if the PDF viewer may be "attacked" in this way).  However,
if a different user will then view the PDF, or if the user filling the
web form uses some form of privilege separation (e.g. different
computers for different kinds of work), then that privilege separation
might be bypassed (a minor security problem).

When you wrote that the web app will "create PDFs on-the-fly with user
related data", did you possibly mean with data coming from a database
rather than being entered by the user at this time?  If so, I recommend
that you re-sanitize the data as you read it from the database and
before you pass it to the PDF library.  That way, the impact of a
possible compromise of the database will be slightly reduced.  You may
have an abstraction layer for accessing the database along with data
format and value range sanitization.  You'd use this abstraction layer
everywhere in your web app, not just for PDFs.

Again, what is your threat model?

I hope this helps.

Alexander
Please check out the Open Source Software Security Wiki, which is counterpart to this mailing list.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.