Bethe free energy landscape (
Red dot shows the global optimum
(stylized)
m, we might return the green dot
22 / 46
Curvature: all terms of the He
Hii = − di − 1 +
qi (1 − qi )
j ∈N(i )
Hij = qi qj −ξij (i, j) ∈
Tij (i, j) ∈/
0
where di is the degree of Xi in the
Tij = qi qj (1−qi )(1−qj )−(ξij −qi q
Leads to bound on max secon
(curvMesh)
qi qj − ξij term is negative for
the submodularity result
essian Hij = ∂2F
∂qi ∂qj
qj (1 − qj ) ≥ 1 ,
Tij qi (1 − qi )
)
∈E
∈/ E, i = j.
e model, and
qj )2 ≥ 0, equality iff qi or qj ∈ {0, 1}
nd derivative in any direction
r an attractive edge, hence obtain
23 / 46
gradMesh: analyze first deriva
∂F = −θi + log (1 − qi )di −1
∂qi qidi −1
Theorem (WJ14)
−θi + log qi − Wi ≤ ∂
1−qi ∂
Upper and lower bounds are
both are monotonically increa
Within our search space, allo
∂F ≤ Di := Vi + Wi = j
∂qi
atives of F
j∈N(i)(qi − ξij ) [WT01]
j∈N(i)(1 + ξij − qi − qj )
∂F ≤ −θi + log qi + Vi
∂qi 1−qi
separated by a constant, and
asing with qi
ows us to bound
∈N(i) |Wij |
24 / 46
gradMesh: search over purple
Upper and Lo
15
qi s.t.
10 fUi (qi)=0
Shaded area shows where
5 partial derivative can be 0
Partial derivative 0
fiU Di=Vi+Wi−logLi−logUi
fLi
−5
−10 Parameters used in this example:
−15 θi=1, Vi=2, Wi=3
Li=1.8, Ui=2.9
0
0.1 0.2 0.3 0.4
Pseudo
region
ower Bounds for ∂F
∂qi
qi s.t.
fLi (qi)=0
Region of Bethe box
[Ai, 1−Bi]
Ai 1−Bi 0.9 1
0.7 0.8
0.5 0.6
o−marginal q
i
25 / 46
gradMesh: complexity
In search space, ∂F ≤ D
∂qi
We can apportion error am
Simple method: each gets n
Need gradienti .stepi ≈ n .
Hence number of mesh point
Ni ≈ 1 ≈ n
stepi .gradi
Hence N = i Ni = O n mW
Various tricks in paper show
Di := Vi + Wi = |Wij |
j ∈N(i )
mong n variables
ts in dimension i,
n |Wij |
ienti = O
j ∈N(i )
W
how to improve performance
26 / 46
NNComparison of methods: left = 1, right
1020
curvMeshOrig
curvMeshNew
gradMesh
1010
100 10 15 20
5 n
1020
curvMeshOrig
curvMeshNew
gradMesh
1010
0 5 10
W
NN= 0.1; (when fixed, W = 5, n = 10)
1020
curvMeshOrig
curvMeshNew
gradMesh
1010
100 10 15 20
5 n
1020
curvMeshOrig
curvMeshNew
gradMesh
1010
0 5 10
W 27 / 46
Example where LBP fails to co
Power network of 12
transformers 48
Xi ∈ {stable, 43
fail} 45
Attractive 51 33
edges between 4
transformers
Would like to
rank by
marginal
probability of
failure p(Xi )
onverge, gradMesh works well
38
55
34
42 21
2 15 29
53
13
27 54
10 18
7
3 5 49 4 50
5 11 2
32 16
41 20
26
22
18 35
47 30
9
24 52 6
44 17 14 19
37
31 40 23 46
36
28
39 25
28 / 46
Recap
The Bethe approximation is often
New results:
Novel formulation of the Hes
Bounds on derivatives and lo
First method guaranteed to r
log ZB , allows its accuracy to
Provides benchmark against w
(LBP, HAK etc.)
Useful in practice for small pr
FPTAS for attractive models,
Further improvements in new
n strikingly accurate.
ssian of the Bethe free energy F
ocations of optima
return -approx global optimum
o be tested rigorously
which to judge other heuristics
roblems
, was open theoretical question
w work...
29 / 46
Understanding the Bethe appr
Joint work with Kui Tang an
Goal - separate and evaluate
approximation:
1 Relax the marginal polytop
enforces only pairwise cons
2 Use Bethe entropy SB =
Consider marginal, cycle and
Compare against tree-reweigh
same polytopes
concave upper-bounding en
Analytic and experimental res
roximation
nd David Sontag
the two aspects of the Bethe
pe M to the local polytope L which
sistency, hence pseudo-marginals
i∈V Si + (i,j)∈E Sij − Si − Sj
local polytopes
hted approximation (TRW)
ntropy
sults
30 / 46
Illustration of polytopes
marginal polytope cycle poly
global consistency cycle con
ytope local polytope
nsistency local consistency
31 / 46
Questions addressed include
Does tightening the relaxatio
always improve the Bethe app
on of the marginal polytope
proximation for log Z ?
32 / 46
Questions addressed include
Does tightening the relaxatio
always improve the Bethe app
No (empirically usually ver
on of the marginal polytope
proximation for log Z ?
ry helpful for general models)
32 / 46
Questions addressed include
Does tightening the relaxatio
always improve the Bethe app
No (empirically usually ver
In attractive models, when lo
couplings high, why does the
poorly for marginals?
on of the marginal polytope
proximation for log Z ?
ry helpful for general models)
ocal potentials are low and
e Bethe approximation perform
32 / 46
Questions addressed include
Does tightening the relaxatio
always improve the Bethe app
No (empirically usually ver
In attractive models, when lo
couplings high, why does the
poorly for marginals?
Bethe entropy
on of the marginal polytope
proximation for log Z ?
ry helpful for general models)
ocal potentials are low and
e Bethe approximation perform
32 / 46
Questions addressed include
Does tightening the relaxatio
always improve the Bethe app
No (empirically usually ver
In attractive models, when lo
couplings high, why does the
poorly for marginals?
Bethe entropy
In general models, for low co
performs much better than T
this advantage disappears. H
the relaxation of the margina
on of the marginal polytope
proximation for log Z ?
ry helpful for general models)
ocal potentials are low and
e Bethe approximation perform
ouplings, the Bethe approximation
TRW, yet as coupling increases,
How does this vary if we tighten
al polytope?
32 / 46
Questions addressed include
Does tightening the relaxatio
always improve the Bethe app
No (empirically usually ver
In attractive models, when lo
couplings high, why does the
poorly for marginals?
Bethe entropy
In general models, for low co
performs much better than T
this advantage disappears. H
the relaxation of the margina
Mixed, see Experiments
on of the marginal polytope
proximation for log Z ?
ry helpful for general models)
ocal potentials are low and
e Bethe approximation perform
ouplings, the Bethe approximation
TRW, yet as coupling increases,
How does this vary if we tighten
al polytope?
32 / 46
Tightening the polytope relax
No 16
15
Consider symmetric 14
nonhomogeneous 13
cycle, vary WBC , 12
θA = θB = θC = 0
log Z 11
A 10
9
8
B C 7
6
WAB = WAC = 10, −10
strongly attractive
Lemma: ∂ log ZB = µBC (0, 0) + µBC
∂ WBC
For weakly attractive edge BC, cyc
slopes near 0) but worsens partitio
xation - does it always help?
true
Bethe
Bethe+cycle
−5 0 5 10
BC edge weight
C (1, 1), all singleton marginals 1
2
cle improves pairwise marginal (similar
on function (gap between curves near 0)
33 / 46
Threshold result for attractive
Lemma: For a symmetric hom
q = ( 1 , . . . , 1 ) is a stationary
2 2
d
for W > 2 log d −2 (uses earli
Recall i di = 2m (handshak
SB = mSij + (n − 2m)Si . For
pulled onto main diagonal, he
avoid negative SB , each entro
pairwise 1 0 or symmetr
0 0
Bethe free energy E−S
B
Bethe free energy E−SB
00
−0.5 −0.2
−0.4
−1 −0.6
−0.8
−1.50 0.5 1
q 0
K5 : W = 1 W=
e models due to SB entropy
mogeneous d-regular MRF,
y point of F but not a minimum
ier Hessian result)
ke lemma), hence
or large W , all probability mass
ence Sij ≈ Si . For m > n, to
opy term → 0 by tending to
rically 0 0 .
0 1
Bethe free energy E−SB 0
−0.1
−0.2
−0.3
0.5 1 −0.4 0.5 1
q 0 q 34 / 46
= 1.38 W = 1.75
Also a polytope effect for frus
A frustrated cycle has an odd num
singleton marginals the other way,
Seen Bethe entropy effect for
Also a polytope effect for fru
Recall optimum energy on lo
frustrated cycle is at ( 1 , ... ,
2
C5 topology, θi ∼ [0, Tmax ], all ed
avg singleton marginal 1
true
0.9 Bethe
Bethe+cycle
0.8
0.7
0.6
0.5 −5 0 5 10
−10
edge weight W
strated cycles
mber of repulsive edges, this pulls
y, toward 1
2
r attractive cycles
ustrated cycles
cal polytope for a symmetric
1 )
2
dges W
avg singleton marginal 1
true
0.9 Bethe
Bethe+cycle
0.8
0.7
0.6
0.5 −5 0 5 10
−10
edge weight W
35 / 46
Experiments: General models
(attractive and repulsive edge
100
Bethe+local
Bethe+cycle
80 Bethe+marg
TRW+local
60 TRW+cycle
TRW+marg
40
20
0 8 16 24 32
2 Maximum coupling strength y
2 log partition error
1.5
Bethe+cycle
1 Bethe+marg
TRW+cycle
TRW+marg
0.5
02 8 16 24 32
Maximum coupling strength y
log partition error, local removed
θi ∼ [−2, 2]
es) K10 topology
0.4
0.3
0.2
0.1
0 8 16 24 32
2 Maximum coupling strength y
32
Singleton marginals, average 1 error 36 / 46
0.4
0.3
0.2
0.1
0 8 16 24
2 Maximum coupling strength y
Pairwise marginals, average 1 error
Conclusions for general model
Big gains from cycle polytope
Not much additional gain fro
(computationally harder)
Bethe performs remarkably w
Better than TRW for log Z
Less clear on singleton mar
coupling
Still much to learn about why
ls
e (suggest Frank-Wolfe)
om marginal polytope
well
Z , pairwise marginals
arginals: TRW better for very strong
y Bethe performs so well...
37 / 46
Summary
The Bethe approximation is r
approximate inference
Novel results on Hessian of B
First algorithm for -approx o
for attractive models
Contributions to understandin
(polytope and entropy)
Where feasible, tightening to
helpful
Additional results in new wor
Thank you!
remarkably effective for
Bethe free energy
of global optimum log ZB , FPTAS
ng the Bethe approximation
o the cycle polytope can be very
rk (e.g. clamping)...
38 / 46
Score/ValueAttractive example: max score
Opt Score(C) an
1
0
−1
0 0.1 0.2 0.3 0.4 0
e and value, with arg max
nd Value(−F), i=3/4
1
0.5 argmax Singleton Values
0.5 0.6 0.7 0.8 0.9 0
qi 1
39 / 46
References
F. Kor˘c, V. Kolmogorov, and C. L
discrete energy minimization. Tech
J. Mooij and H. Kappen. Sufficien
sum-product algorithm. IEEE Tran
D. Schlesinger and B. Flach. Tran
into a binary one. Technical report
A. Weller and T. Jebara. Approxim
UAI, 2014.
A. Weller, K. Tang, D. Sontag, an
approximation: When and how can
A. Weller and T. Jebara. Bethe bo
optimum. In AISTATS, 2013.
M. Welling and Y. Teh. Belief opt
alternative to loopy belief propaga
J. Yedidia, W. Freeman, and Y. W
and its generalizations. In IJCA, D
Lampert. Approximating marginals using
hnical report, IST Austria, 2012.
nt conditions for convergence of the
nsactions on Information Theory, 2007.
nsforming an arbitrary minsum problem
t, Dresden University of Tech, 2006.
mating the Bethe partition function. In
nd T. Jebara. Understanding the Bethe
n it go wrong? In UAI, 2014.
ounds and approximating the global
timization for binary networks: A stable
ation. In UAI, 2001.
Weiss. Understanding belief propagation
Distinguished Lecture Track, 2001.
40 / 46
Extra Slides with Supplementa
Supplementa
(if time or
ary Material
ary Material
questions)
41 / 46