<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Matthias Seeger |</title><link>https://tiao.io/authors/matthias-seeger/</link><atom:link href="https://tiao.io/authors/matthias-seeger/index.xml" rel="self" type="application/rss+xml"/><description>Matthias Seeger</description><generator>HugoBlox Kit (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Sat, 08 May 2021 00:00:00 +0000</lastBuildDate><item><title>BORE: Bayesian Optimization by Density-Ratio Estimation</title><link>https://tiao.io/publications/bore-2/</link><pubDate>Sat, 08 May 2021 00:00:00 +0000</pubDate><guid>https://tiao.io/publications/bore-2/</guid><description>&lt;p&gt;&lt;strong&gt;B&lt;/strong&gt;ayesian &lt;strong&gt;O&lt;/strong&gt;ptimization (BO) by Density-&lt;strong&gt;R&lt;/strong&gt;atio &lt;strong&gt;E&lt;/strong&gt;stimation (DRE),
or &lt;strong&gt;BORE&lt;/strong&gt;, is a simple, yet effective framework for the optimization of
blackbox functions.
BORE is built upon the correspondence between &lt;em&gt;expected improvement (EI)&lt;/em&gt;&amp;mdash;arguably
the predominant &lt;em&gt;acquisition functions&lt;/em&gt; used in BO&amp;mdash;and the &lt;em&gt;density-ratio&lt;/em&gt;
between two unknown distributions.&lt;/p&gt;
&lt;p&gt;One of the far-reaching consequences of this correspondence is that we can
reduce the computation of EI to a &lt;em&gt;probabilistic classification&lt;/em&gt; problem&amp;mdash;a
problem we are well-equipped to tackle, as evidenced by the broad range of
streamlined, easy-to-use and, perhaps most importantly, battle-tested
tools and frameworks available at our disposal for applying a variety of approaches.
Notable among these are
/
and
/
for Deep Learning,
for Gradient Tree Boosting,
not to mention
for just about
everything else.
The BORE framework lets us take direct advantage of these tools.&lt;/p&gt;
&lt;h2 id="code-example"&gt;Code Example&lt;/h2&gt;
&lt;p&gt;We provide an simple example with Keras to give you a taste of how BORE can
be implemented using a feed-forward &lt;em&gt;neural network (NN)&lt;/em&gt; classifier.
A useful class that the
package provides is
,
a subclass of
from
Keras that inherits all of its existing functionalities, and provides just
one additional method.
We can build and compile a feed-forward NN classifier as usual:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;bore.models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MaximizableSequential&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;tensorflow.keras.layers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Dense&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# build model&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;classifier&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MaximizableSequential&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;classifier&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Dense&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;activation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;relu&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;classifier&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Dense&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;activation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;relu&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;classifier&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Dense&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;activation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;sigmoid&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# compile model&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;classifier&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;optimizer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;adam&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;binary_crossentropy&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;See
from the
if this seems unfamiliar to
you.&lt;/p&gt;
&lt;p&gt;The additional method provided is &lt;code&gt;argmax&lt;/code&gt;, which returns the &lt;em&gt;maximizer&lt;/em&gt; of
the network, i.e. the input $\mathbf{x}$ that maximizes the final output of
the network:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;x_argmax&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;classifier&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;argmax&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bounds&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;bounds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;L-BFGS-B&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num_start_points&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Since the network is differentiable end-to-end wrt to input $\mathbf{x}$, this
method can be implemented efficiently using a &lt;em&gt;multi-started quasi-Newton
hill-climber&lt;/em&gt; such as
.
We will see the pivotal role this method plays in the next section.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;Using this classifier, the BO loop in BORE looks as follows:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nn"&gt;np&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;features&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;targets&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# initialize design&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;features_initial_design&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="n"&gt;targets&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;targets_initial_design&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num_iterations&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;# construct classification problem&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;X&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vstack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;hstack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;targets&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;tau&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;quantile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.25&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;z&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;less&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tau&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;# update classifier&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;classifier&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;# suggest new candidate&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;x_next&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;classifier&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;argmax&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bounds&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;bounds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;method&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;L-BFGS-B&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num_start_points&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;# evaluate blackbox&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;y_next&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;blackbox&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_next&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="c1"&gt;# update dataset&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_next&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="n"&gt;targets&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_next&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;hr&gt;
&lt;p&gt;Let&amp;rsquo;s break this down a bit:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;At the start of the loop, we construct the classification problem&amp;mdash;by labeling
instances $\mathbf{x}$ whose corresponding target value $y$ is in the top
&lt;code&gt;q=0.25&lt;/code&gt; quantile of all target values as &lt;em&gt;positive&lt;/em&gt;, and the rest as &lt;em&gt;negative&lt;/em&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Next, we train the classifier to discriminate between these instances. This
classifier should converge towards
&lt;/p&gt;
$$
\pi^{*}(\mathbf{x}) = \frac{\gamma \ell(\mathbf{x})}{\gamma \ell(\mathbf{x}) + (1-\gamma) g(\mathbf{x})},
$$&lt;p&gt;
where $\ell(\mathbf{x})$ and $g(\mathbf{x})$ are the unknown distributions of
instances belonging to the positive and negative classes, respectively, and
$\gamma$ is the class balance-rate and, by construction, simply the quantile
we specified (i.e. $\gamma=0.25$).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Once the classifier is a decent approximation to $\pi^{*}(\mathbf{x})$, we
propose the maximizer of this classifier as the next input to evaluate.
In other words, we are now using the classifier &lt;em&gt;itself&lt;/em&gt; as the acquisition
function.&lt;/p&gt;
&lt;p&gt;How is it justifiable to use this in lieu of EI, or some other acquisition
function we&amp;rsquo;re used to?
And what is so special about $\pi^{*}(\mathbf{x})$?&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Well, as it turns out, $\pi^{*}(\mathbf{x})$ is equivalent to EI, up to some
constant factors.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The remainder of the loop should now be self-explanatory. Namely, we&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;evaluate the blackbox function at the suggested point, and&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;update the dataset.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id="step-by-step-illustration"&gt;Step-by-step Illustration&lt;/h3&gt;
&lt;p&gt;Here is a step-by-step animation of six iterations of this loop in action,
using the &lt;em&gt;Forrester&lt;/em&gt; synthetic function as an example.
The noise-free function is shown as the solid gray curve in the main pane.
This procedure is warm-started with four random initial designs.&lt;/p&gt;
&lt;p&gt;The right pane shows the empirical CDF (ECDF) of the observed $y$ values.
The vertical dashed black line in this pane is located at $\Phi(y) = \gamma$,
where $\gamma = 0.25$.
The horizontal dashed black line is located at $\tau$, the value of $y$ such
that $\Phi(y) = 0.25$, i.e. $\tau = \Phi^{-1}(0.25)$.&lt;/p&gt;
&lt;p&gt;The instances below this horizontal line are assigned binary label $z=1$, while
those above are assigned $z=0$. This is visualized in the bottom pane,
alongside the probabilistic classifier $\pi_{\boldsymbol{\theta}}(\mathbf{x})$
represented by the solid gray curve, which is trained to discriminate between
these instances.&lt;/p&gt;
&lt;p&gt;Finally, the maximizer of the classifier is represented by the vertical solid
green line.
This is the location at which the BO procedure suggests be evaluated next.&lt;/p&gt;
&lt;p&gt;
&lt;figure &gt;
&lt;div class="flex justify-center "&gt;
&lt;div class="w-full" &gt;
&lt;img alt="Animation"
srcset="https://tiao.io/publications/bore-2/paper_1500x5562_hu_bf54a19b8bc6fbf5.webp 205w"
sizes="(max-width: 480px) 100vw, (max-width: 768px) 90vw, (max-width: 1024px) 80vw, 760px"
src="https://tiao.io/publications/bore-2/paper_1500x5562_hu_bf54a19b8bc6fbf5.webp"
width="205"
height="760"
loading="lazy" data-zoomable /&gt;&lt;/div&gt;
&lt;/div&gt;&lt;/figure&gt;
&lt;/p&gt;
&lt;p&gt;We see that the procedure converges toward to global minimum of the blackbox
function after half a dozen iterations.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;To understand how and why this works in more detail, please read our paper!
If you only have 15 minutes to spare, please watch the video recording of our
talk!&lt;/p&gt;
&lt;h2 id="video"&gt;Video&lt;/h2&gt;
&lt;div id="presentation-embed-38942425"&gt;&lt;/div&gt;
&lt;script src='https://slideslive.com/embed_presentation.js'&gt;&lt;/script&gt;
&lt;script&gt;
embed = new SlidesLiveEmbed('presentation-embed-38942425', {
presentationId: '38942425',
autoPlay: false, // change to true to autoplay the embedded presentation
verticalEnabled: true
});
&lt;/script&gt;</description></item><item><title>Simulation-based Scoring for Model-based Asynchronous Hyperparameter and Neural Architecture Search</title><link>https://tiao.io/publications/simulation-based-scoring/</link><pubDate>Sat, 01 May 2021 00:00:00 +0000</pubDate><guid>https://tiao.io/publications/simulation-based-scoring/</guid><description/></item><item><title>Bayesian Optimization by Density Ratio Estimation</title><link>https://tiao.io/publications/bore-1/</link><pubDate>Tue, 01 Dec 2020 00:00:00 +0000</pubDate><guid>https://tiao.io/publications/bore-1/</guid><description/></item><item><title>Model-based Asynchronous Hyperparameter and Neural Architecture Search</title><link>https://tiao.io/publications/async-multi-fidelity-hpo/</link><pubDate>Sun, 01 Mar 2020 00:00:00 +0000</pubDate><guid>https://tiao.io/publications/async-multi-fidelity-hpo/</guid><description/></item></channel></rss>