"The Precautionary Principle" revisited

Addendum posted 30 August 2015.

Nassim Nicholas Taleb has posted a response to my blog post on “The Precautionary Principle” from last week, arguing that I made “severe errors.” In response to an earlier version of the criticism posted on Twitter, I commented that I had misunderstood something that wasn’t expressed precisely. Taleb responded by accusing me of violating a “Principle of scientific integrity.” After a more careful reading, I decided that my reading of Taleb’s work was fine. However, I do think think there were a few things that I could have expressed more clearly. I’ll try to clarify a couple of points that could have been expressed better, as well as respond to Taleb’s criticisms of the piece.

Before I get there, I do have a correction to make, though I consider it to be relatively minor. In my critique of “The Precautionary Principle” I referred to an analogy with a coffee cup receiving damage from earthquakes of various intensity. I claimed that Taleb et al measured the intensity of earthquakes by their magnitude. However, when I reviewed “The Precautionary Principle,” I realized that the paper made no mention of how the intensity of an earthquake should be measured. In fact, it was a different paper which was cited by “The Precautionary Principle” and written by Taleb and Raphael Douady (a co-author on “The Precautionary Principle”) that made a similar analogy and used magnitude as the measure intensity. Therefore, I think it’s reasonable to believe that this was the intention in “The Precautionary Principle,” I’m adding a footnote to that post to reflect that the mention of magnitude was in this other paper.

With regards to the earthquake analogy, the substance of Taleb’s criticism is that I should have compared the harm done by earthquakes when “intensity” is measured by magnitude with the harm done when “intensity” is measured by energy release. It doesn’t take much math to see that the aggregate harm done by any set of earthquakes doesn’t depend on how we’re measuring the intensity of those earthquakes.

The point I was making was specifically about how using energy instead of magnitude would change the relationship between “intensity” and harm. Specifically, in “The Precautionary Principle,” Taleb writes, “if the coffee cup were linearly sensitive to earthquakes and accumulated their effects as small deteriorations of its form, it would not persist even for a short time as it would have been broken down due to the accumulated impact of small vibrations.” This argument presupposes that the coffee cup – even in “a short time” – experiences enough small earthquakes that their aggregate “intensity” would rival that of a quake big enough to destroy the coffee cup. And that is clearly true if “intensity” means magnitude, but unlikely to be true if “intensity” means energy.

In his response to me, Taleb writes “In practical terms, the coffee cup will more likely break upon a magnitude 6 event (or beyond) or, equivalently, an event of energy bigger than 6³². And this will happen whatever the ratio between the frequency of earthquakes of magnitude 1 and those of magnitude ≥ 6.” And I think he’s probably right about this! It’s hard to imagine a coffee cup being destroyed by small earthquakes, even if there were an enormous number of them. That’s why I didn’t claim that the coffee cup isn’t fragile if we measure the intensity of a quake by the energy released.

Instead, what I said was this: “Taleb’s argument for fragility falls apart if we’re measuring earthquakes by the amount of energy released instead of the magnitude.” In other words, this particular argument doesn’t work. Indeed, in claiming that the ratio of the frequency of small quakes to that of large quakes doesn’t matter, Taleb is straying from the argument in “The Precautionary Principle,” which specifically depends on small quakes being much more frequent than large ones. Perhaps another line of reasoning would work here, but we should be careful about generalizing this particular argument to “everything on planet earth.”

In many contexts, there isn’t a single obvious way to stressors. With my discussion of the earthquake example, I hoped to show that we can’t just choose any way of measuring stressors and expect it to work in the fragility argument. To make that argument rigorous would require a discussion of the “right” way to measure the intensity of any kind of stressor. It would also require some insight into why the claims about “the statistical structure of stressors” necessarily hold when we measure intensity in this way.

In the next few sentences, I did go a bit further. I wrote,

The property that Taleb calls fragility is not, as Taleb puts it, a simple consequence of “the statistical structure of stressors.” Instead, it is inextricably dependent on the way in which we choose to measure stressors. Put another way, the definition of fragility in terms of “nonlinear response” isn’t really precise enough for use in mathematical arguments until we’ve decided how we’re measuring the relevant quantities.

These statements don’t quite directly follow from my discussion of the coffee cup argument, but I’ll stand by them. If I see an argument for a “right” way to measure stressors, which is compatible with the claims about “the statistical structure of stressors,” I’ll be happy to reconsider this stance.

Taleb also states that I erred in my criticism his explanation of the subexponential class of distributions. He writes,

with time the sum gets larger, but with time the maximum is ALSO likely to be bigger. In other words, the distribution of the ratio is time invariant; what time does is reveal statistical properties.

It’s true that the maximum (of a family of subexponential random variables) should get bigger as we get more of them (i.e. “with time”). Beyond that, I don’t agree with his assessment. Unfortunately, my response will be more technical than my usual style. Perhaps someday I’ll explain it to a larger audience, but for now I just want to respond.

If we have a sequence of independent identically distributed random variables \(X_1\), \(X_2\),…, then their distribution is subexponential if \[ \lim_{r\to\infty}\frac{P(M_n>r)}{P(S_n>r)}=1, \] where \(M_n\) and \(S_n\) are the maximum and sum, respectively, of \(X_1,\ldots, X_n\).

This is a statement about the ratio of certain probabilities, one involving the maximum and the other involving the sum. That doesn’t immediately tell us much about the probability distribution of the ratio of the maximum to the sum. Nor was I able to find any published results that would support claims such as those that I criticized.

Of course, my not being able to find a result doesn’t mean it doesn’t exist, so I’ll give an example. Fix \(\alpha>1\) and consider a sequence of independent identically distributed random variables \(X_1\), \(X_2\),… having cumulative distribution function \[ F(x)= \begin{cases} 1-(x+1)^{-\alpha},&\text{if }x\ge1,\\ 0,&\text{otherwise}. \end{cases} \] This agrees with a Pareto distribution for \(x>1\), and is therefore a subexponential distribution. It is a subexponential distribution for any \(\alpha>0\), but for our purposes we consider the case \(\alpha>1\). Note, however, that this distribution guarantees that \(X_i\ge 1\) for all \(i\).

Define \(S_n=\sum_{i=1}^n X_i\) and \(M_n=\max\{X_i: 1\le i\le n\}\). By definition, \(X_i\ge1\) for all \(i\), which means that \(S_n>n\) for all \(n\). Now fix \(\beta \in (0,1]\) and consider \(P(M_n<\beta n)\). The maximum \(M_n\) is less than or equal to \(\beta n\) if and only if \(X_i\le \beta n\) for \(1\le i\le n\), so \[ P(M_n\le\beta n)=(F(\beta n))^n= \begin{cases} \left(1-(\beta n+1)^{-\alpha}\right)^n,&\text{if }\beta n\ge 1,\\ 0,&\text{otherwise}. \end{cases} \] Under the assumption that \(\alpha>1\), it can be shown using elementary calculus (or symbolic computing software like Mathematica) that \(\lim_{n\to\infty}(1-(\beta n+1)^{-\alpha})^n=1\) so that \(\lim_{n\to\infty}P(M_n\le \beta n)=1\) also. Therefore, \[ \lim_{n\to\infty} P\left(\frac{M_n}{S_n}\le \beta\right)= \lim_{n\to\infty} P\left(\frac{M_n}{n}\le \beta\right)=1. \] This shows that the ratio \(\frac{M_n}{S_n}\) converges to 0 in distribution.

In simpler terms, for this choice of (subexponential) distribution, as we consider more and more terms, the probability of the ratio exceeding any particular positive threshold approaches 0. Given this result, it is unclear in what sense it can be said that “the sum…has the same magnitude as the largest sample.” One can also find values of \(\alpha\) which make it very unlikely that a single term ever dominates the sum of all of the previous terms. However, if Taleb (or coauthors) can produce precise statements with proofs (or references) which support the claims in their paper, then I’ll be happy to withdraw my criticism.

Addendum 30 August 2015

Nassim Nicholas Taleb has updated his response to include a response to this post. In it, he claims that I “failed to see at the end a section ‘APPENDIX C MATHEMATICAL DERIVATIONS OF FRAGILITY’, where [my] questions are (or could be) answered (and can be expected to be answered since that is the title of the section).” He points out that while that appendix, in the ArXiv version of the paper (which I critiqued), was relatively empty, “the text has been on the web for a year in the version on my site.” He also says, “A scholar in good faith waits to see what is in the referenced text (or asks for it) before starting his character assassination.”

The suggestion that I should have looked at the version on his site is perplexing. After all, Taleb has cited the ArXiv version in labeling William Saletan a fraud on Twitter. But apparently one must seek out a different version from his website if one wants to criticize the work. (For what it’s worth, I’m perfectly happy to critique scholarly works politely if the authors comport themselves in a scholarly manner. That ship sailed many months ago in this case.)

Taleb also includes in his response a quote from the appendix in the version of “The Precautionary Principle” on his website. This text, unfortunately, does not meaningfully answer the questions I raised about fragility. Indeed, it starts on the assumption that we have “a single dimensional measure of the structural integrity of the system, s.” It measures harm based on deviations in that quantity, without giving any idea of how it should be defined. In other words, Taleb’s answer to the question, “How much harm is done?” is “However much the structural integrity decreases.” Defining that “single dimensional measure of the structural integrity” is the hard part of measuring harm, so this doesn’t really answer anything at all.

The assumption of a single-dimensional measure is particularly dubious. In the real world, a system may suffer from many different kinds of harm, e.g. nature may face damage from pollutants, fire damage, erosion, or the hunting of a species to extinction. To model the “structural integrity” of nature with a single number, we’d have to be able to compare all of these. How much forest must be burned by fire to constitute as much harm as the extinction of a given species? Such questions may not even have meaningful answers, and yet, without such answers we can’t even begin to apply Taleb’s results on fragility.

And, incidentally, the appendix in the ArXiv version of the paper wasn’t completely empty. It referred the reader to two other documents – a textbook and a paper – for “expositions and proofs.” In fact, I downloaded both of these documents and looked for the answers to my questions well before I published my critique of “The Precautionary Principle.” (My paper management software shows annotations on one document going back to July 1, more than three weeks before I published the post.)

I didn’t find answers to my questions about the fragility argument in either one. After I read Taleb’s latest response, I looked again and still didn’t find anything. Both documents take measurement of stressors as a given. I found nothing to explain why, for instance, magnitude is a more suitable measure of an earthquake’s intensity than energy release. Nor did the various examples of intensity of a stressor – height of a fall, size of a rock pelted at someone, magnitude of an earthquake – suggest an approach that would generalize easily.

Taleb also dismisses my criticism of his discussion of subexponential distributions. He argues that the context made clear that the claim “the sum…has the same magnitude as the largest sample” should have been interpreted conditionally. Namely, I should have read it as saying that “if the sum is sufficiently big, then the sum has the same magnitude as the largest sample.” As a relevant bit of context, he points to the statement that the subexponential class was intended to distinguish “cases where tail events dominate impacts.”

I don’t think that context helps his case. There are a few ways to interpret that bit, and none of none is a particularly good fit for the subexponential class. One could read it as saying that tail events dominate impacts “often” or “eventually”, but neither of these necessarily holds for subexponential distributions (more on that below). You could also read it as saying that tail events dominate large impacts, the idea being that smaller sums aren’t significant to merit the term “impact.” This does start to sound like the definition of the subexponential class.

Unfortunately, things start to fall apart if we try to answer the question of “how large?” For practical decision-making, if we’re interested in the idea of tail events dominating large impacts, we’re probably interested in impacts that do a certain amount of damage which is of real-world importance. However, knowing that a distribution belongs to the subexponential class only tells us that there’s some level beyond which total harm is likely to be dominated by the largest event. It does not tell us what that level is, and, indeed, the level depends on the choice of subexponential distribution and increases over time (as more terms are added to the sum). As time passes, these dominant events may become vanishingly unlikely. Under a subexponential distribution, it is entirely possible for accumulation of small events to have significant impacts.

Even so, I would have left the discussion of subexponential distributions alone if the body of the paper hadn’t depended on the premise that fat tails (which the appendix equates with subexponential distributions) imply that a dominant tail event is inevitable. In my original post, I pointed to the claim that under fat tails, “a single deviation will eventually dominate the sum of their effects.” Somehow, Taleb makes a leap from the idea that the largest event dominates if, at a particular time, the sum is “large enough” to the idea that the largest event eventually dominates (i.e. if enough time passes). To get to the latter claim, they’d need some information about how likely it is that the sum will be “large enough” at a given time.¹

In his first response to me, Taleb claimed that “the ratio [of the maximum to the sum] is time invariant.” If true, this would mean that the probability of a single term dominating the sum didn’t change over time (as more terms are added), which would indeed be sufficient to show that a single term would eventually dominate. Unfortunately, that claim need not be true for a subexponential distribution, and a direct counterexample is above.²

In his response, Taleb took issue with my choice of example. I had chosen for \(\alpha>1\) a cumulative distribution function \(F(x)=1-(x+1)^{-\alpha}\) for \(x\ge 1\) and \(F(x)=0\) for \(x<1\). He described as “mysterious” my decision to define \(F(x)\) to be \(0\) up until \(1\), when the function \(1-(x+1)^{-\alpha}\) could be defined for all \(x\ge 0\).

Mathematically, however, the function I chose to use is a perfectly good function. It just happens to be a different function from the one that Taleb says that I should have used. Both of these functions are cumulative distribution functions arising from subexponential distributions. If you want to make a claim about subexponential distributions in general, that claim should hold true for both of these functions. If it doesn’t, that means your claim is wrong, or at least needs to be qualified. If I want to disprove that claim, it’s enough to show that it doesn’t work for either of those functions (or for any other subexponential distribution).

I made the choice I did because it made the calculations easier, and that’s a perfectly reasonable thing to do. Taleb wrote that with my choice “observations in the sum of between 0 and 1 are strangely and artificially excluded.” I won’t say he’s wrong on that; but I’d phrase it differently. I chose a distribution for which there were no values between 0 and 1. It’s true that this choice of distribution is an unusual choice, and I won’t claim to have a real-life phenomenon modeled by this distribution.

Regardless, it still gives counterexamples to two claims that Taleb has made about subexponential distributions, namely that the “the ratio [of the maximum to the sum] is time invariant” and (narrowing the choice of \(\alpha\) further) that the maximum eventually dominates the sum. And just because one counterexample is “strange” or “artificial”, it isn’t necessarily the case that the statement holds for all examples which are not “strange” and “artificial”[^strange]. In mathematics, even “strange” and “artificial” examples can carry useful insights. It would, of course, be nice to find these ideas in more practical examples, but sometimes the “strange” and “artificial” ones are easier to work with.

In this case, subexponentiality is what is called a tail property which means that it only depends on the probabilities of values that are “large enough.” However, the chance of a single event dominating the sum depends on the probabilities of the small values as well as the probabilities of the large ones. Since subexponentiality doesn’t tell us about the probabilities of small values, alone it’s not a strong enough condition to tell us that one term will eventually dominate.

Indeed, one can see similar results in numerical simulations with more conventional subexponential distributions, such as a Weibull distribution with appropriate shape parameter. Such simulations ought to cast doubt on the claim that “the ratio [of the maximum to the sum] is time invariant,” or the idea that a single event is likely to dominate the sum. However, numerical simulations do not constitute a proof, and a proof for this example would have been less straightforward. I chose the example that I did simply because it was a clear counterexample with a relatively simple proof.

Maybe it’s a little bit lazy of me to give an easy example. However, the point is that if Taleb hopes to be credited with a rigorous mathematical argument, then it’s his responsibility to specify the distributions for which his claims are true, and then to prove it for that class of distributions. In a rigorous mathematical argument, one doesn’t just make overly broad claims and then dismiss the examples that don’t work.

Taleb also claims that I make an error in considering only values of \(\alpha\) which are greater than 1 and notes that things work out differently for smaller values of the parameter. It’s true that I didn’t mention smaller values of \(\alpha\), but there’s nothing wrong with that. The point is that each positive value of \(\alpha\) corresponds to a different distribution in the subexponential class. A statement about subexponential distributions in general should be true for all values of \(\alpha\). If Taleb’s statements about subexponential distributions aren’t actually true for all subexponential distributions, perhaps we need only worry about some smaller class of probability distributions.

More importantly, though, I included the discussion of fat tails in my original critique because it pointed to the need for more precise argumentation throughout the paper. The fat tails discussion is one of the few places where any substantial mathematical formalism is included. Even there, the mathematical definition is relegated to an appendix, and the colloquialisms in the main argument don’t accurately represent the mathematics. This is not how rigorous mathematical argument works, and it does not inspire confidence in the parts of the argument that are given only on the colloquial level.

Since the threshold for what counts as “large enough” grows over time, these tail events may become less likely over time.↩
This is one statement that can’t be rescued by interpreting it conditional on the sum being “large enough.” Even formulating the conditional version of the statement requires a bit of work because what constitutes “large enough” changes over time. Moreover, even if this has been stated, it wouldn’t support Taleb’s argument in context.↩

Inexact Change

Thoughts on science, politics, and social progress.

"The Precautionary Principle" Revisited

Addendum 30 August 2015

Comments