More

throwawayjava · on March 26, 2020

I've encountered exactly two people in my life who were perfectly healthy, both mentally and physically, but just sat around all day doing nothing for years on end.

Both were landlords.

throwawayjava · on March 26, 2020

Are you aware of a quantitative model relating economic losses to loss of life / quality-of-life? I see this claim repeated over and over, but after hours of searching I can't find any quantitative models. If such a class of models did exist, we could weigh it against various SIR models and make an actual assessment. Without that economic model, the decision calculus you're suggesting we make isn't really possible.

Also: even that conversation seems like a huge false choice. Why not just give up on market capitalism for a few quarters and make sure everyone is fed/clothed/sheltered? It's not like houses disappear overnight if rent/mortgages aren't paid. And it's not like we've never suspended capitalism in the past -- the federal government put in price controls, effectively nationalized industries/supply chains, etc. during WWWII. And then we returned to capitalism afterward.

> But nah, the "market" apparently entirely consists of assholes on Wall Street playing Capitalism II and lobbying governments for payouts. No way it has any real impact!

TBF, those are the folks getting most of the economic relief so far. And not just from recent Congressional action. Somehow when it comes to ensuring liquidity in certain financial markets, the federal government can move mountains in minutes. But when it comes to people having a safe place to sleep and food to eat we're the mercy of the invisible hand.

Why can't e.g. HUD, unilaterally and without congressional action, create "liquidity in the housing market" by making enormous zero-interest loans to everyday folk whenever they feel like that's necessary?

People who trade in equities and bonds have enormously powerful economic backstops, which people who merely pay rents and mortgages and buy food do not have.

throwawayjava · on March 9, 2020

Based on issue polling, we know that a majority of people who are opposed to the individual mandate also strongly support a ban on pre-existing conditions clauses. And also those same people oppose any form of universal healthcare.

But abstaining from health insurance right up until the point where you need it is the same as having other people pay for your healthcare. A ban on pre-existing conditions without an individual mandate is a form of welfare. So in actual practice many people who are nominally opposed to universally subsidized HC are in fact not opposed to universal subsidization of healthcare... as long as there's some nominal involvement with the insurance industry prior to use.

The broader point: many people don't realize that just because you're paying into something doesn't necessarily mean it isn't welfare. In a democratic society with a lot of individualism and distrust of government, sometimes this "hack" is the best way to deliver a welfare state. See also: social security.

throwawayjava · on March 8, 2020

> If not, but the environment is okey, I keep going.

If you're a strong developer and willing to live in certain areas -- at least for a couple years before you're trusted enough to go remote -- you can make $200K/yr. It's mostly just a matter of telling yourself you are worth that much, applying to companies that pay that much, and not being afraid to be on the job market every few years.

throwawayjava · on March 8, 2020

Consumer data brokers are a legal (non-criminal) business and they would definitely reidentify then well information if it weren't illegal.

throwawayjava · on March 8, 2020

> passes up 50K

If the author had another job lined up already, the severance package wasn't worth much of anything -- it only paid out until he had a new position. So, probably closer to passing up $0 than to passing up $50K.

Still a bit of an annoying personality. If you know you won't get any $ from the severance package, just send a "thanks but no thanks" message to the person off-boarding you. Don't harass some poor corporate lawyer 2 years of out law school with inane demands for preferential terms on the severance contract for an individual contributor.

SpicyLemonZest · on March 8, 2020

I think you're giving the system too much deference here. If an individual contributor is important enough that the company wants to give them a gag order, they're important enough to ask for preferential terms about it.

throwawayjava · on March 8, 2020

There are some mathematical definitions [1], but the fundamental problem is that with enough cross-referencing between databases it's hard to say anything for sure [2]. You never know what data other people might publish in the future.

I'm not aware of any legal definitions, but given the thorniness of reidentification I would assume they're insufficient.

[1] https://en.wikipedia.org/wiki/K-anonymity

[2] https://www.wired.com/2007/12/why-anonymous-data-sometimes-i...

throwawayjava · on Feb 27, 2020

> (hack cough cough hack)... there is no fundamental solution presenting yet.

Because people think those hacks are fundamental solutions (see: this blog title).

But really, the fundamental solution is finally at long last treating programming as a form of engineering.

> I know the expected answer will be: it's an abstraction of a more complex problem of understanding data and how it is used... Why do the frameworks not eliminate them by construction?

Because in any non-trivial system there are always edge cases, and attackers will find the edge cases. This is why XSS persists even as template engines have taken over. "filter output" is not a panacea. Nothing can replace carefully thinking about the entire range of possible inputs and their related outputs.

But instead of educating programmers to think carefully about how to specify and design robust systems, the software industry repeats gang-of-four-style mantras like "escape output". Even while admitting those solutions don't work universally and offering "get security review" as some sort of universal fix.

megous · on Feb 27, 2020

It's interesting that single page apps actullay have a benefit here. If you generate DOM with code, you can just assign anything you like to el.textContent and you'll not need to muck around with sanitization libraries and edge cases.

Basically the same principle like using parametrized SQL queries.

throwawayjava · on Feb 27, 2020

...so the blog post boils down to "sanitize all inputs that don't get piped to /dev/null; also, there are some good libraries that will do that for you (...by escaping outputs... but oh btw those only work sometimes of course, and in other cases, be careful?).

In other words, for the love of god please do sanitize your inputs.

xenomachina · on Feb 27, 2020

No.

"Sanitize inputs" means modifying the input before you even know where it's going. It's fine for stuff like normalizing user input (eg: "strip leading and trailing spaces") but should not be used to combat things like SQL injection or XSS.

For issues like SQL injection and XSS you should escape on output. Outputting HTML? HTML escape, or better yet: use templating framework that does it by default. Outputting to SQL? SQL escape, or better yet use prepared statements and pass in your arguments using an API that escapes by default.

In the "sanitize inputs" approach to handling these situations you can't store "O'Hara <3 Sue" as a value, because you need to "sanitize" the apostrophe for SQL and the less-than for HTML. In the "escape outputs" approach, you have "O'"Hara <3 Sue" in your SQL, and "O'Hara <3 Sue" in your HTML, and the user's input is preserved.

throwawayjava · on Feb 27, 2020

> "Sanitize inputs" means modifying the input before you even know where it's going.

Okay.

That's not how I've ever used that term or seen it used. Prepared statements are a form of input sanitation. HTML purifiers are a form of input sanitation. Maybe this lingo is specific to PHP-land?

In any case, "You need to know the semantics of the sink in order to know what to do with an untrusted source" seems like an obvious truism not worth writing about.

onion2k · on Feb 27, 2020

"You need to know the semantics of the sink in order to know what to do with an untrusted source" seems like an obvious truism not worth writing about.

Given how often developers get it wrong, I don't think it's written about enough.

Also, you say "untrusted source" here. Whether you trust the source or not is irrelevant. You should still be escaping the output where you use data from it in order to make sure your outputs are safe - the source could be compromised, or broken, or sending something valid that you didn't expect. Maybe this isn't quite so obvious after all.

megous · on Feb 27, 2020

You've probably not been around in the jolly days of PHP automatically adding quotes to all $_GET parameters and stuff like that, before it was even known where the data will be passed to, lol. Be glad.

xenomachina · on Feb 28, 2020

> That's not how I've ever used that term or seen it used.

That's the terminology being used by the document under discussion.

Honestly, I think what causes a lot of people to get it wrong, is that they don't understand the distinction between input filtering and output escaping. They see them as the same thing, and so they use them interchangeably.

> Prepared statements are a form of input sanitation.

No. Input sanitization involves removing "bad" stuff from the input. For example, you remove the "'" in "O'Hara" so that it doesn't mess up your SQL, but you end up storing "OHara" in the DB.

Output escaping (which prepared statements fall under) removes nothing. Instead, characters that happen to be special are escaped so that they are treated as literal characters, and not as special characters. The DB gets the user's original input: "O'Hara"

> HTML purifiers are a form of input sanitation.

I assume you mean HTML sanitization (https://en.wikipedia.org/wiki/HTML_sanitization). In which case, usually, yes. Note that there's a difference here because you're removing part of the input, not doing a lossless transformation as with escaping.

Another way to think about the difference is whether you're doing type conversion or not. When escaping for SQL, you're converting from text/plain to SQL. When escaping for embedding in HTML, you're converting from text/plain to text/html.

When you do input sanitization instead, you aren't changing the type, you're just making certain values impossible. For HTML sanitization, this means turning stuff like "<em>safe</em> <script>unsafe()</script>" into "<em>safe</em> ". Both are texp/html, but the latter has been "sanitzed".

In this case, input sanitization makes sense, as long as you have a universal concept of what "safe" means, ans as long as your input was actually HTML.

The place where people mess up is in thinking that they need to "sanitize their inputs" in anticipation of something downstream using that same string as a different type. In the HTML exaple, this would be taking a text/plain string, like "I <3 HTML" and stripping out "bad" characters to turn it into "I 3 HTML".

> Maybe this lingo is specific to PHP-land?

I've never used PHP, so I wouldn't know.

> In any case, "You need to know the semantics of the sink in order to know what to do with an untrusted source" seems like an obvious truism not worth writing about.

In practice, that doesn't seem to be the case. Almost every time someone says "sanitize your inputs" in response to an XSS or SQL injection exploit, they're getting it wrong.

yuliyp · on Feb 27, 2020

What does "sanitize inputs" even mean? What do you do with a backslash? What do you do with a "? What do you do with weird unicode? What properties does your "sanitized" input actually have?

The meaning of "sane" depends on where you're sending it to. A backslash is a perfectly reasonable character, for instance. Put it in the wrong place in a SQL string and you have bad news. Put a ' in the wrong place in a shell command, sometimes nothing bad happens, other times you get pwned.

The right way to escape strange characters is different if you're sending it to an SQL engine, or writing it into a JSON string, or into some HTML, etc.

throwawayjava · on Feb 27, 2020

This feels like a distinction without a difference.

Escaping outputs is just one way of sanitizing inputs. Sometimes it works. Sometimes it doesn't. The author of this post even realizes that their prognostication is not general and then offers the advice to "be sure to get security review"...

At the end of the day, you need to make sure that any untrusted source is treated in a safe way by every sink and does not otherwise interfere with system specs (e.g., mangling user output). Whether that happens at line 5 (where the input is read) or line 155 (where the command is generated) doesn't really matter. Or to be more precise, is determined by whatever design patterns the framework developer chose.

What matters at the end of the day is that command injection isn't possible and the system's specs (including UI/UX specs) are respected.

Crucially, both input and output constraints are informed by the nature of both the source and the sink. Hence the existence of libraries like DomPurify and HTMLPurifier, which consider one very particular type of sink. Sometimes you will write code in domains where others haven't written excellent libraries but where sanitization (of either input or output) is needed. E.g., embedded systems.

I'd replace the author's advice with "carefully specify the semantics of your sources and sinks", which is ultimately what the author's actual advice (basically, "use trusted libraries and, when not, be sure to get security review") boils down to.

tptacek · on Feb 27, 2020

Not really, no. Output filtering is done in the context of a specific output domain. Input sanitization isn't; the developer who builds sanitization has to guess at all the possible output domains.

"Filter outputs not inputs" is a very old appsec truism.

eandre · on Feb 27, 2020

I think the confusion comes from that not everybody thinks of "passing data to the database layer" as an output, but only an input to the next layer. If you think of this input as an output from the previous layer, then your advice makes perfect sense. But I don't think everyone thinks that way so it might help to clarify what "output" means in this context.

jessaustin · on Feb 27, 2020

No, that's clearly input. That data shouldn't be sanitized, but it should be passed to the database via parameterized query.

throwawayjava · on Feb 27, 2020

Output filtering is input sanitization. wtf is is that you think you are filtering? Inputs!

> the developer who builds sanitization has to guess at all the possible output domains.

No they don't. They need to carefully understand/document all the places input might be used and ensure no command injections are possible. In some cases (e.g., web apps, where everything is string) that works relatively well...

Until, of course, you're the one writing the input sanitization logic in the HTML purifier / prepared statements generator. And those code bases do have occasional CVEs. So, random PHP dev can put faith in a library but the system itself never gets away from having to sanitize input!

Output filtering has the complimentary problem -- you need to understand every possible input. That's not always trivial like it is in PHP-based websites. Think about e.g. an embedded system santiziing potentially adverarial time series data (what does this mean / how do you detect it? Harder, right?). Or a compiler. The blog post author even points this out: "...In these cases you’re best off using a proper SQL parser (like this one) to ensure it’s a well-formed SELECT query – but doing this correctly is not trivial, so be sure to get security review."

Ultimately, "Filter outputs not inputs" is incomplete advice that kinda sorta works well for the most part in web apps. The correct advice is, again, "carefully specify the semantics of your sources and sinks".

zAy0LfpBZLC8mAC · on Feb 27, 2020

> Output filtering has the complimentary problem -- you need to understand every possible input.

No, you simply need to understand the encoding rules of the sink. Which is precisely why "sanitizing input" is plain nonsense: Whether a particular unescaped character has some meta character function is not a property of the character, but of the output language, so you can not possibly "sanitize input" in any meaningful sense, unless you mean by that "randomly garble the input".

wglb · on Feb 27, 2020

>you need to understand every possible input.

This is often not possible.

When I talk to developers about this, I point use database storage as an example. There may be computations behind the scenes that mangle the nicely input-sanitized database contents. Concatenation with other values, string work, data from some other system. Thus, data that was sanitized upon input is now questionable for output.

This is well-intentioned, but leads to a false sense of security, and sometimes mangles perfectly good input.

And in some applications, for example, ones that must process data in a forensic environment, any change to the input is prohibited.

Thus, the only useful way to think about this is that the contents of the database is toxic and must be sanitized on output. Simply working with the input gives the programmer no useful idea about what is in the database when it comes time to output it.

Frameworks these days help significantly with providing tools to properly parameterize SQL. However, it is unlikely that they handle all the cases. Consider an example where user input from a web page is used to build a column name or table name. This isn't covered by frameworks. That needs to be carefully processed in the code.

>Ultimately, "Filter outputs not inputs" is incomplete advice that kinda sorta works well for the most part in web apps. The correct advice is, again, "carefully specify the semantics of your sources and sinks".

It is in fact the primary advice that should be followed.

So sanitization of input is a good idea, but if output is not properly encoded, somebody else is likely to profit.

throwawayjava · on Feb 28, 2020

Sorry, this still seems like a terribly hacky way to think about code.

Again, if you write a template engine or a SQL engine, the code the library's developer writes to determine how holes are safely filled is literally sanitizing input! You never get away from sanitizing inputs, you just do it further from the source and closer to the sink.

> So sanitization of input is a good idea

Right. "Don’t try to sanitize input" is bad advice. Also, the whole point of escaping outputs is that you don't trust inputs. Escaping outputs is done to sanitize inputs.

If by "sanitize input" you mean "add some backslashes to $_GET values like it's 1995", well, I guess, point taken. But then, the actually good advice should be "step back learn how to think more systematically about your code", not "escape outputs instead of inputs!"

0xff00ffee · on Feb 27, 2020

> I'd replace the author's advice with "carefully specify the semantics of your sources and sinks",

I think that's the abstraction, but the author is presenting it in a way that requires repeating frequently simply because new programmers arrive ready to do damage every day, and the two forms of input sanitization are a great intro into how the Real World (tm) conspires against you.