Postby **Zamfir** » Fri Nov 02, 2018 10:35 am UTC

I think I have worked out the math here, but please check...

You have some process that draws a sample x from a random variable X with unknown distribution (more on this later)

This is your reagent.

Your reaction products are drawn from a k-valued random variable Y. For a given reagent sample x, Y is a multivariate normal distribution:

Y(X=x) ~ Nk(g(x), SIGMA(x))

g(x) is a known k-valued function of x, and the k-by-k covariant matrix SIGMA is a know function of x

---------------------------------------

Now, you have drawn a sample y from Y

You are looking for an interval [xlow, xhi], such that the conditional probabilities

p(X<xlow | Y=y) = ptarget

p(X>xhi | Y=y ) = ptarget

ptarget might (for example) be 0.025, for a 95% confidence interval

In general, you want the cumulative dsitribution function

FX(x) = p(X<x | Y =y)

from there you can choose intervals

----------------------------

It is easier to work with probability density functions (which we can later integrate to get the cdf FX)

So we are looking for the pdf

fX(x | Y=y)

We can use Bayes

fX(x | Y=y) = fY(y | X=x) * fX(x) / fY(y)

Do we have everything in that formula?

We know fY(y | X=x) = phi_k( g(x), SIGMA(x))

where phi_k(mu, SIGMA) is a k-valued gaussian function with a mean vector mu and covariate matrix SIGMA

fX(x) is the prior, the unknown distribution of X. This depends on your reagent mechanism

Perhaps that is already accurate and always close to x0. Then you should take this into account, telling you that x is likely close to x0 even if y suggests something else. But I think you are interested in the case where you don't know much about the prior.

For that case, we can assume a constant prior, fX(x) = c

Of course, it cannot be constant from -inf to inf. But it might be zero only far out, where phi_k is already close to zero.

Last part:

fY(y) = int(-inf to inf; [fY(y | X=x) * fX(x) ])dx

So this is not a function of x, just a constant to make the pdf integrate to 1. the constant c will appear here again, cancelling out.

---------------------

In the general case, this is the end. You can use numerical integration to find fX(x | Y=y)

For simplificatieon, we might take a constant matrix SIGMA and assume that g(x) is an affine function g(x) = a*x + b, where a and b are k-valued vectors

Then we can solve directly.

The nice thing is, Y(X=x) is a gaussian randomvariable, and affine transforms of a gaussian random variable are still gaussian.

Which means we can define a new, single-valued random variable

Z = vT*Y - vT*b

where we pick v such that

vT*a = 1

And the magic of gaussians gives us:

fZ(z | X=x) = phi_k (vT*g(x) -vTb, vt*SIGMA*v) = phi_k(vT*[a*x +b] -vT*b, vt*SIGMA*v) = phi_k(x, vT*SIGMA*v)

In other words, Z is an estimator for X with a (single-valued) variance s2 = vT*SIGMA*v

Going back to the bayes formula, we can write it in terms of Z:

fX(x | Z=z) = fZ(z | X=x) * fX(x) / fZ(z)

Since fZ(z | X=x) already has an integral of 1 by construction, and since we assume that fX(x) is constant over the range of interest, then fZ(z)

only compensates for the unknown value of the constant in fX(x). They cancel out, leaving our desired result:

fX(x | Z=z) = phi_k(x, vT*SIGMA*v) = phi_k(x, s2)

------------------------

Effectively, vT*Y - vT*b is a weighted average of sour reaction product samples, constructed to estimate X

For example, we can take vT = [0, 1/an, 0,0 ...], so we use one of the products as sole estimator and we end with variance

s2 = vT*SIGMA*v = SIGMAnn, the error variance of that single product in isolation

We can take other combinations, to make the variance as small as possible.

The best value for v is a quadratic optimization problem,

v_opt = argmin( vT*SIGMA*v ) constrained by aT*v = 1

You'll have to check this, but I think you need to solve the following system for the optimal v:

[SIGMA a ][v] = [0]

[aT 0 ][lambda] [1]

-----------------------

Mmm, that got out of hand. Again, please check the steps involved