4 thoughts on “Control Variates for Variance Reduction”
There is a bug in the formula CG^2 ==> CG, yielding m = E(CG) / E(G^2). Also, the general solution (without assuming things are centered) is: cov(C, G) / var(G).
Hi, thank you but I think this formula is correct.
Var [(c (x) – m) G (x)] – Var [c (x) G (x)] = – 2 m E[c (x) G (x)^2] + m^2 E [G (x)^2]
The minimum of that with respect to m is m = E[c(x) G(x)^2]/E[G(x)^2].
Note that in the case of RL E[G(x)] = 0 by construction (policy gradient).
There is a bug in the formula CG^2 ==> CG, yielding m = E(CG) / E(G^2). Also, the general solution (without assuming things are centered) is: cov(C, G) / var(G).
Hi, thank you but I think this formula is correct.
Var [(c (x) – m) G (x)] – Var [c (x) G (x)] = – 2 m E[c (x) G (x)^2] + m^2 E [G (x)^2]
The minimum of that with respect to m is m = E[c(x) G(x)^2]/E[G(x)^2].
Note that in the case of RL E[G(x)] = 0 by construction (policy gradient).
Nice blogpost, BTW!
Thanks for the post! I am trying to read the references you listed in paragraph 3, but link1 and link3 are broken. I wonder if you have those files?