[Solved] Drawing from the tail of a normal distribution

For the discussion of math. Duh.

Moderators: gmalivuk, Moderators General, Prelates

User avatar
The Cheshirest Catamount
Posts: 3069
Joined: Sat May 09, 2009 11:55 pm UTC

[Solved] Drawing from the tail of a normal distribution

Postby Qaanol » Tue Jan 14, 2014 2:35 pm UTC

I'm looking for help understanding something. On the Wikipedia page for the Ziggurat algorithm, a fallback method to generate values from the tail of a normal distribution is listed. In particular, to generate values greater than M, the procedure is a rejection method:

Draw x from an exponential distribution of mean 1/M, and draw y from an exponential distribution of mean 1.
eg: x = -ln(U) / M and y = -ln(V), where U and V are drawn uniformly and independently from (0, 1).
If 2y > x2 then return (M + x), otherwise start over.

I've been trying to understand what's going on and why it works, but something isn't clicking for me. I see that for any particular x that gets drawn, the probability of accepting it (and returning M+x) is e-x²/2. But I haven't wrapped my head around how to show that the distribution of accepted values matches what we want.

Edit: This PDF (link downloads file) describes the method, but it's still not clicking for me.

Edit 2: Okay, I figured it out. The idea is to shift and scale the tail so it starts at (0, 1), then run rejection sampling by drawing from the area below the exponential curve e-Mu. The ratio of the heights of those curves at u is exactly e-u²/2, so a point (u, v) chosen uniformly at random from below the exponential curve will also be below the (shifted, scaled) tail e-½u² of the time. The x from the algorithm is chosen from that exponential distribution. Given that x, we then need to test for rejection. The straightforward approach would be to draw v uniformly between 0 and e-Mu and check if it is below the (shifted, scaled) tail, but we already know the odds of that being true are e-u²/2. So the procedure actually employed is just a simpler way to get an event of that same probability.

Or at least, that is how I interpreted it. Is there a simpler, “more obvious”, way to see what’s going on in the algorithm?
wee free kings

Posts: 1
Joined: Fri Apr 08, 2016 9:13 pm UTC

Re: [Solved] Drawing from the tail of a normal distribution

Postby jagerman » Fri Apr 08, 2016 11:14 pm UTC

I realize this post is a couple of years old, but I came across it in search of the same question, and thought I'd contribute an updated answer.

So, as to the other way of looking at it that you ask for, I'm going to assume you're at least a little bit familiar with rejection sampling. The quick overview is that I can sample from distribution X (where X has pdf f(x)) using distribution Y (with pdf g(x)) by taking a draw x from Y and then accepting that draw with probability f(x) / (M g(x)), where M is a constant chosen to scale this ratio so it is a valid probability (i.e. it never exceeds 1 over the support of X and Y; it's also obvious from this that a finite M requires that the support of Y include the support of X). Accepting the drawn x value is then simply a matter of drawing a uniform u ~ U[0,1], then accepting if u < f(x)/(M g(x)). Basically, we accept the draw with a probability proportional to the ratio of PDFs at the drawn value.

So what we need to do for the tail of the normal is, essentially, to draw from a truncated normal, truncated to values ≥ a, where a is the point at which the ziggurat table tail begins. (I'm going to ignore the left tail, and assume we're dealing with a standard normal, as is typical; the other cases can be easily generated from the final result of this approach by negating/shifting/scaling appropriately).

Now the tail of a normal distribution looks very similar to an exponential distribution (and, proportionally, it looks more and more similar the further our in the tail you get), so a good choice for the proposal distribution is, unsurprisingly, the exponential. Which exponential (i.e. which λ distribution parameter)? Well, the exponential with the maximum acceptance rate, as worked out in Robert (1995) ("Simulation of truncated normal variables") is the one with λ = (a + √(a2 + 4)) / 2. But square roots are expensive to compute, and that expression is approximately equal to a, so in general for truncated normal approaches it's pretty common to simplify a little by using Exp(a) as our proposal distribution. This won't break anything, it'll just be slightly less efficient (i.e. accept slightly less often, and thus need slightly more draws on average).

So suppose we want to do rejection sampling using a + Exp(a) as our proposal distribution (shifting by a for two reasons: first, because there is no point in using an exponential that gives values less than a, where f(x) will equal 0 and thus we will never accept; and second because a truncated exponential distribution is exactly the same thing as a shifted exponential distribution!). So from the truncated standard normal distribution we have pdf:
f(x) = (1 / √(2π)) e-x^2 / 2 / Φ(-a)
and for the shifted exponential:
g(x) = a e-a (x-a)

Expanding and rearranging all that crap into constants times things-dependent-on-x, we get:

f(x) / (M g(x)) = 1 / (M √(2π) a Φ(-a) ea^2) e-x^2 / 2 + ax

This thing has first derivative of (a-x) times a positive value, so it has a maximum at a: using this x=a condition nails down the best M we can choose:

M = e-a^2 / 2 / (√(2π) a Φ(-a))

Plugging that in results in an orgy of cancellation, the result of which is our acceptance probability as a function of the drawn value:

ρ(x) = e-½(x-a)^2

So at this point, with the usual rejection sampling approach, you would proceed to do something like:

Code: Select all

do { x = draw_exp(a); u = draw_uniform_01(); } while (u < exp(-0.5*x*x));
x += a;

and that leaves you with x distributed as a truncated standard normal, with truncation range [a, ∞). You can stop here, actually, the task is finished. But, of course, we wanted to match Marsaglia and Tsang's version, which is, as you pointed out, like this:

Code: Select all

do { x = -ln(draw_uniform_01())/a; y = -ln(draw_uniform_01()); } while (2*y > x*x);

The first thing to note is that -ln(U), where U is a uniform 0-1, is exactly generating a Exp(1) by inverse CDF sampling. So we generated two exponentials, and divided one of them by a.

The second thing to note is that (by the rules of functions of random variables) Exp(1) / a is itself a random variable with pdf g(x) = e-ax|a| = a e-ax, and since that's exactly the pdf of Exp(a), Exp(1)/a is just another way to write Exp(a) (in fact, this is what most software libraries calculate when asked to give an Exp(a)). Okay, so the x being generated is the same in both cases.

Now for y, suppose we want to eliminate the "exp" call in the condition on the while loop, i.e. this one:

Code: Select all

... while (u < exp(-0.5*x*x));

because calculating exp is, in general, pretty expensive. We can take the natural logarithm of both sides:

Code: Select all

... while (ln(u) < -0.5*x*x));

and then divide both sides by -0.5 (and so flip the inequality direction):

Code: Select all

... while (2*(-ln(u)) > x*x));

But wait, -ln(u) is just an exponential, so instead of generating a uniform u, we could generate an exponential y instead:

Code: Select all

do { x = draw_exp(a); y = draw_exp(1); } while (2*y <= x*x);

so we've boiled it down to the same thing. So really, this is exactly rejection sampling, which is doing exactly the usual rejecting proportional to the distance between the proposal distribution and the actual distribution, with some (clever) log transformations thrown into the condition for good measure.

Now, a caveat: in most software library implementations, this isn't any more efficient than the first rejection sampling bit that I wrote, because generating an exponential is usually implemented by drawing a uniform, taking the logarithm of it, and multiplying by -1/lambda (or equivalent, if using a different parameter interpretation for Exp, such as taking the 1/lambda value). The extra logarithm calculation in the generation of y takes the same amount of time to compute as the exponential in the condition (both are very expensive operations), so we don't actually save anything by doing it this way (but make it harder to understand!)

There is, however, one saving grace: there is no reason exponential has to be generated in this slow way. In fact, the Marsaglia and Tsang ziggurat paper actually gives both a normal AND an exponential implementation, but the latter has, sadly, been almost entirely ignored--even by Marsaglia and Tsang themselves in that very paper! If you look in their paper, they give a (fairly nasty, but working) reference implementation in C that does exactly this in the normal distribution, despite the fact that the very same code contains a working ziggurat exponential generator.

Edit: converted all the non-supported TeX to more primitive forum codes; plus a few edits.

Return to “Mathematics”

Who is online

Users browsing this forum: No registered users and 14 guests