## Bayesian Optimization by Density Ratio Estimation

Louis C. Tiao, Aaron Klein, Cédric Archambeau, Edwin V. Bonilla, Matthias Seeger, and Fabio Ramos

## Bayesian Optimization

• $\mathbf{x}$ denote an input to the blackbox function
• $y$ its corresponding output value
• $y = f(\mathbf{x}) + \epsilon$, with noise $\epsilon \sim \mathcal{N}(0, \sigma^2)$

## Relative Density Ratio

• The $\gamma$-relative density ratio $$r_{\gamma}(\mathbf{x}) = \frac{\ell(\mathbf{x})}{\gamma \ell(\mathbf{x}) + (1 - \gamma) g(\mathbf{x})}$$
• where $\gamma \ell(\mathbf{x}) + (1 - \gamma) g(\mathbf{x})$ is the $\gamma$-mixture density
• for some mixing proportion $0 \leq \gamma < 1$

## Ordinary Density Ratio

• The ordinary density ratio $$r_{0}(\mathbf{x}) = \frac{\ell(\mathbf{x})}{g(\mathbf{x})}$$

## Expected Improvement (EI)

• Non-negative improvement over $\tau$ $$I_{\gamma}(\mathbf{x}) = \max(\tau - y, 0)$$
• Posterior predictive $$p(y | \mathbf{x}, \mathcal{D}_N)$$
• Expected value of $I_{\gamma}(\mathbf{x})$ under posterior predictive $$\alpha_{\gamma}(\mathbf{x}; \mathcal{D}_N) = \mathbb{E}_{p(y | \mathbf{x}, \mathcal{D}_N)}[I_{\gamma}(\mathbf{x})]$$

## Conditional

Express $p(\mathbf{x} | y, \mathcal{D}_N)$ in terms of $\ell(\mathbf{x})$ and $g(\mathbf{x})$ $$p(\mathbf{x} | y, \mathcal{D}_N) = \begin{cases} \ell(\mathbf{x}) & \text{if } y < \tau, \newline g(\mathbf{x}) & \text{if } y \geq \tau \end{cases}$$

## Bergstra et al. 2011

Bergstra et al. 2011

$$\underbrace{\alpha_{\gamma}(\mathbf{x}; \mathcal{D}_N)}_\text{expected improvement} \propto \underbrace{r_{\gamma}(\mathbf{x})}_\text{relative density ratio}$$

