I realize this post is a couple of years old, but I came across it in search of the same question, and thought I'd contribute an updated answer.

So, as to the other way of looking at it that you ask for, I'm going to assume you're at least a little bit familiar with rejection sampling. The quick overview is that I can sample from distribution

X (where

X has pdf

f(x)) using distribution

Y (with pdf

g(x)) by taking a draw

x from

Y and then accepting that draw with probability

f(x) / (M g(x)), where M is a constant chosen to scale this ratio so it is a valid probability (i.e. it never exceeds 1 over the support of

X and

Y; it's also obvious from this that a finite

M requires that the support of

Y include the support of

X). Accepting the drawn

x value is then simply a matter of drawing a uniform

u ~ U[0,1], then accepting if

u < f(x)/(M g(x)). Basically, we accept the draw with a probability proportional to the ratio of PDFs at the drawn value.

So what we need to do for the tail of the normal is, essentially, to draw from a truncated normal, truncated to values ≥

a, where

a is the point at which the ziggurat table tail begins. (I'm going to ignore the left tail, and assume we're dealing with a standard normal, as is typical; the other cases can be easily generated from the final result of this approach by negating/shifting/scaling appropriately).

Now the tail of a normal distribution looks very similar to an exponential distribution (and, proportionally, it looks more and more similar the further our in the tail you get), so a good choice for the proposal distribution is, unsurprisingly, the exponential.

Which exponential (i.e. which

λ distribution parameter)? Well, the exponential with the maximum acceptance rate, as worked out in Robert (1995) (

"Simulation of truncated normal variables") is the one with

λ = (a + √(a^{2} + 4)) / 2. But square roots are expensive to compute, and that expression is approximately equal to

a, so in general for truncated normal approaches it's pretty common to simplify a little by using

Exp(a) as our proposal distribution. This won't break anything, it'll just be slightly less efficient (i.e. accept slightly less often, and thus need slightly more draws on average).

So suppose we want to do rejection sampling using

a + Exp(a) as our proposal distribution (shifting by

a for two reasons: first, because there is no point in using an exponential that gives values less than a, where f(x) will equal 0 and thus we will never accept; and second because a truncated exponential distribution is

exactly the same thing as a shifted exponential distribution!). So from the

truncated standard normal distribution we have pdf:

f(x) = (1 / √(2π)) e^{-x^2 / 2} / Φ(-a)and for the shifted exponential:

g(x) = a e^{-a (x-a)}Expanding and rearranging all that crap into constants times things-dependent-on-x, we get:

f(x) / (M g(x)) = 1 / (M √(2π) a Φ(-a) e^{a^2}) e^{-x^2 / 2 + ax}This thing has first derivative of (a-x) times a positive value, so it has a maximum at a: using this x=a condition nails down the best M we can choose:

M = e^{-a^2 / 2} / (√(2π) a Φ(-a))Plugging that in results in an orgy of cancellation, the result of which is our acceptance probability as a function of the drawn value:

ρ(x) = e^{-½(x-a)^2}So at this point, with the usual rejection sampling approach, you would proceed to do something like:

Code: Select all

`do { x = draw_exp(a); u = draw_uniform_01(); } while (u < exp(-0.5*x*x));`

x += a;

and that leaves you with x distributed as a truncated standard normal, with truncation range

[a, ∞). You can stop here, actually, the task is finished. But, of course, we wanted to match Marsaglia and Tsang's version, which is, as you pointed out, like this:

Code: Select all

`do { x = -ln(draw_uniform_01())/a; y = -ln(draw_uniform_01()); } while (2*y > x*x);`

The first thing to note is that

-ln(U), where U is a uniform 0-1, is exactly generating a Exp(1) by inverse CDF sampling. So we generated two exponentials, and divided one of them by a.

The second thing to note is that (by the rules of functions of random variables)

Exp(1) / a is itself a random variable with pdf

g(x) = e^{-ax}|a| = a e^{-ax}, and since that's exactly the pdf of Exp(a), Exp(1)/a is just another way to write Exp(a) (in fact, this is what most software libraries calculate when asked to give an Exp(a)). Okay, so the

x being generated is the same in both cases.

Now for

y, suppose we want to eliminate the "exp" call in the condition on the while loop, i.e. this one:

because calculating exp is, in general, pretty expensive. We can take the natural logarithm of both sides:

and then divide both sides by -0.5 (and so flip the inequality direction):

But wait,

-ln(u) is just an exponential, so instead of generating a uniform

u, we could generate an exponential

y instead:

Code: Select all

`do { x = draw_exp(a); y = draw_exp(1); } while (2*y <= x*x);`

so we've boiled it down to the same thing. So really, this is exactly rejection sampling, which is doing exactly the usual rejecting proportional to the distance between the proposal distribution and the actual distribution, with some (clever) log transformations thrown into the condition for good measure.

Now, a caveat: in most software library implementations, this isn't any more efficient than the first rejection sampling bit that I wrote, because generating an exponential is usually implemented by drawing a uniform, taking the logarithm of it, and multiplying by -1/lambda (or equivalent, if using a different parameter interpretation for Exp, such as taking the 1/lambda value). The extra logarithm calculation in the generation of y takes the same amount of time to compute as the exponential in the condition (both are very expensive operations), so we don't actually save anything by doing it this way (but make it harder to understand!)

There is, however, one saving grace: there is no reason exponential has to be generated in this slow way. In fact, the Marsaglia and Tsang ziggurat paper actually gives both a normal AND an exponential implementation, but the latter has, sadly, been almost entirely ignored--even by Marsaglia and Tsang themselves in that very paper! If you look in their paper, they give a (fairly nasty, but working) reference implementation in C that does exactly this in the normal distribution, despite the fact that the very same code contains a working ziggurat exponential generator.

Edit: converted all the non-supported TeX to more primitive forum codes; plus a few edits.