|
Message-ID: <CALwvWpFS8Z6XZjw1ax=HkKJvanSn-BTiRzHZhqaJM-Z8hoSjqg@mail.gmail.com> Date: Sat, 27 Sep 2014 20:27:28 +0100 From: Steve Jones <trevd1234@...il.com> To: oss-security@...ts.openwall.com Subject: Re: Fwd: Non-upstream patches for bash Hi There I've been meaning to post all day. After looking at the code with the intention of fixing I can say in my opinion the Parse is 70% of the problem , the Core shell language grammar is 15% and the bashes bashism and it need to allow things like redirection at any position and other fun things is the final 15% The Shell language Grammar is way to ambiguous this is created as a result of allow multiple tokens for 1 action ; which are repurposed without an unambiguous termination of the previous statement or to put it another way the Parser can't count and Grammar lacks consistency and to me seems to me missing an explicit fucking terminator .. Even from a non security view point it is f**ked This " function t2(){ sl } }; } " should not be a value statement block and if it is I'm never writing another shell script again. A little more on the parser - It's not robust at all and is also lacking a thing like basic sanity checking. Also seems parsing strategy does not seem suitable for the complexes and ambiguity of the Grammar and as a result doesn't stand a chance. I certainly not a breaker and just a developer with a security minded bent but even I was able to force arbitrary memory in another process address space just from random fuzzing alone .... Couple of things the wrap : For anyone rusty with thatt context free grammar terminalogy :http://pages.cs.wisc.edu/~fischer/cs536.s08/course.hold/html/NOTES/3.CFG.html Don't worry though as the documentation just define it's own terms DEFINITIONS The following definitions are used throughout the rest of this document. blank A space or tab. word A sequence of characters considered as a single unit by the shell. Also known as a token. name A word consisting only of alphanumeric characters and underscores, and beginning with an alphabetic character or an underscore. Also referred to as an identifier. metacharacter A character that, when unquoted, separates words. One of the following: | & ; ( ) < > space tab control operator A token that performs a control function. It is one of the following symbols: || & && ; ;; ( ) | |& <newline> I'm sure some of you folks may have notice that whitespace has amazing properties and the difference between { list;} and { list} and { list; } So here's a "don't this somethings something mught break" list is simply executed in the current shell environment. list must be terminated with a newline or semicolon. This is known as a group command. The return status is the exit status of list. Note that unlike the metacharacters ( and ), { and } are reserved words and must occur where a reserved word is permitted to be recognized. Since they do not cause a word break, they must be separated from list by whitespace or another shell metacharacter. This is ambiguous token reuse ( many other examples about ) A list is a sequence of one or more pipelines separated by one of the operators ;, &, &&, or ||, and optionally terminated by one of ; &,or <newline>. The use of the word optionally is incorrect and should be replace with "must be" A Grep to run in the bash source code directory. This show all the areas that the developer was confused though there grep -B4 -niR " xxx " --exclude-dir=doc variables.c is troubling variables.c:4331: stupidly_hack_special_variables (var->name); /* XXX */ and another choice one : braces.c:423: QUIT; /* XXX - memory leak here */ Finally : Bash is everywhere - not only being used as the shell interpreter but in the form of libbash which needs to be check to see if they reused the parsed. This is both statically and dynamically linked Executables and libraries are also not adverse to calling bash via an execv . I freely speculate the some bashes are just name sh You could start with a scan of every shell script find / -type f -iname "*.sh" -exec grep bash -lh {} \; In summary Bash is screwed .Using an alternative can be as simple install one or "impossible " du the bashism I Maybe more productive less distruptive the make use a new iparser from an existing project the GPLv3 Adds legal compatibility with Apache2 More on this later perhaps? Thanks for reading folks Trfevd Apologies if lines are .. using a webclient :( On 27 September 2014 16:06, Solar Designer <solar@...nwall.com> wrote: > On Sat, Sep 27, 2014 at 03:26:01PM +0200, Roman Drahtmueller wrote: >> By way of exposing the parser to potentionally harmful content: Is the >> importing of functions the only occasion, or are there more than this? > > That's a great question. This aspect is arguably more important than > individual parsing bugs, in part because distros are already adopting > Florian's prefix/suffix patch turning parser bugs on function imports > into non-security issues. > > Has anyone started reviewing bash for possible other code paths where > untrusted input may hit the parser? > > Of course, what input is trusted vs. not may be unclear. Apparently, 20 > years ago bash developers considered all env vars to be trusted input, > regardless of the names, which is how we got here. > > Are bash scripts themselves exclusively trusted input, or should we > assume that portions of them (which?) may be untrusted (e.g., for > scripts generated by other programs, with some user input substituted > into them)? Clearly, it makes no sense to treat scripts as untrusted in > their entirety - the very purpose of bash is to do a wide variety of > things based on script contents - but maybe some individual tokens, etc. > within scripts may reasonably (and thus should?) be treated as untrusted > (to the extent possible within bash script syntax specs). > > For example, what if a DHCP client sanitizes some input field and then > embeds it in a generated script? That's risky design, yet bash could > try to be robust when faced with scripts like that. Ideally, it should > behave only as specified, with no extra "features" available e.g. via > syntactically correct yet overly long tokens, etc. > > Perhaps this boils down to the parser's robustness in general: treating > whatever we can (even within scripts) as untrusted input is the same as > having the most robust parser. This is why I wrote "arguably" in the > first paragraph above. > > Now, is it realistic to make bash's parser so robust by finding and > patching individual bugs? I doubt it. We should find and patch the > bugs, but perhaps we shouldn't declare bash's parser robust, and perhaps > we shouldn't treat bash issues triggerable via untrusted script contents > as security issues. Perhaps we should instead declare bash unsafe to > use on scripts containing any untrusted input in them, and focus on > treating inputs to such scripts (env vars and command line) safely. > > This also means that we should treat any programs that generate bash > scripts with (sanitized) untrusted input in them as unsafe, and patch > those to use safer mechanisms to pass (sanitized) inputs to scripts > (preferably use env vars with fixed names). > > Comments? > > Alexander
Powered by blists - more mailing lists
Please check out the Open Source Software Security Wiki, which is counterpart to this mailing list.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.