The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.

On the Bethe approximation Adrian Weller Department of Statistics at Oxford University September 12, 2014 Joint work with Tony Jebara 1/46

Discover the best professional documents and content resources in AnyFlip Document Base.
Search
Published by , 2017-05-18 02:10:04

On the Bethe approximation - Columbia University

On the Bethe approximation Adrian Weller Department of Statistics at Oxford University September 12, 2014 Joint work with Tony Jebara 1/46

Bethe free energy landscape (

Red dot shows the global optimum

(stylized)

m, we might return the green dot

22 / 46

Curvature: all terms of the He

Hii = − di − 1 +
qi (1 − qi )
j ∈N(i )

Hij = qi qj −ξij (i, j) ∈
Tij (i, j) ∈/

0

where di is the degree of Xi in the
Tij = qi qj (1−qi )(1−qj )−(ξij −qi q

Leads to bound on max secon
(curvMesh)

qi qj − ξij term is negative for
the submodularity result

essian Hij = ∂2F
∂qi ∂qj

qj (1 − qj ) ≥ 1 ,
Tij qi (1 − qi )
)

∈E

∈/ E, i = j.

e model, and
qj )2 ≥ 0, equality iff qi or qj ∈ {0, 1}

nd derivative in any direction

r an attractive edge, hence obtain

23 / 46

gradMesh: analyze first deriva

∂F = −θi + log (1 − qi )di −1
∂qi qidi −1

Theorem (WJ14)

−θi + log qi − Wi ≤ ∂
1−qi ∂

Upper and lower bounds are
both are monotonically increa

Within our search space, allo

∂F ≤ Di := Vi + Wi = j
∂qi

atives of F

j∈N(i)(qi − ξij ) [WT01]
j∈N(i)(1 + ξij − qi − qj )

∂F ≤ −θi + log qi + Vi
∂qi 1−qi

separated by a constant, and
asing with qi
ows us to bound
∈N(i) |Wij |

24 / 46

gradMesh: search over purple

Upper and Lo

15

qi s.t.
10 fUi (qi)=0

Shaded area shows where
5 partial derivative can be 0

Partial derivative 0

fiU Di=Vi+Wi−logLi−logUi
fLi

−5

−10 Parameters used in this example:
−15 θi=1, Vi=2, Wi=3
Li=1.8, Ui=2.9
0
0.1 0.2 0.3 0.4

Pseudo

region

ower Bounds for ∂F
∂qi

qi s.t.
fLi (qi)=0

Region of Bethe box
[Ai, 1−Bi]

Ai 1−Bi 0.9 1
0.7 0.8
0.5 0.6

o−marginal q

i

25 / 46

gradMesh: complexity

In search space, ∂F ≤ D
∂qi

We can apportion error am

Simple method: each gets n
Need gradienti .stepi ≈ n .
Hence number of mesh point

Ni ≈ 1 ≈ n
stepi .gradi

Hence N = i Ni = O n mW
Various tricks in paper show

Di := Vi + Wi = |Wij |

j ∈N(i )

mong n variables

ts in dimension i,



n |Wij |
ienti = O 

j ∈N(i )

W
how to improve performance

26 / 46

NNComparison of methods: left = 1, right

1020
curvMeshOrig
curvMeshNew
gradMesh

1010

100 10 15 20
5 n

1020
curvMeshOrig
curvMeshNew
gradMesh

1010

0 5 10
W

NN= 0.1; (when fixed, W = 5, n = 10)

1020
curvMeshOrig
curvMeshNew
gradMesh

1010

100 10 15 20
5 n

1020
curvMeshOrig
curvMeshNew
gradMesh

1010

0 5 10
W 27 / 46

Example where LBP fails to co

Power network of 12
transformers 48

Xi ∈ {stable, 43
fail} 45

Attractive 51 33
edges between 4
transformers

Would like to
rank by
marginal
probability of
failure p(Xi )

onverge, gradMesh works well

38
55

34
42 21

2 15 29
53
13

27 54
10 18

7

3 5 49 4 50
5 11 2
32 16
41 20
26

22

18 35

47 30
9
24 52 6

44 17 14 19
37
31 40 23 46
36
28
39 25

28 / 46

Recap

The Bethe approximation is often
New results:

Novel formulation of the Hes
Bounds on derivatives and lo
First method guaranteed to r
log ZB , allows its accuracy to
Provides benchmark against w
(LBP, HAK etc.)
Useful in practice for small pr
FPTAS for attractive models,
Further improvements in new

n strikingly accurate.

ssian of the Bethe free energy F
ocations of optima
return -approx global optimum
o be tested rigorously
which to judge other heuristics

roblems
, was open theoretical question
w work...

29 / 46

Understanding the Bethe appr

Joint work with Kui Tang an
Goal - separate and evaluate
approximation:

1 Relax the marginal polytop
enforces only pairwise cons

2 Use Bethe entropy SB =
Consider marginal, cycle and
Compare against tree-reweigh

same polytopes
concave upper-bounding en
Analytic and experimental res

roximation

nd David Sontag
the two aspects of the Bethe

pe M to the local polytope L which
sistency, hence pseudo-marginals
i∈V Si + (i,j)∈E Sij − Si − Sj
local polytopes
hted approximation (TRW)

ntropy
sults

30 / 46

Illustration of polytopes

marginal polytope cycle poly
global consistency cycle con

ytope local polytope
nsistency local consistency

31 / 46

Questions addressed include

Does tightening the relaxatio
always improve the Bethe app

on of the marginal polytope
proximation for log Z ?

32 / 46

Questions addressed include

Does tightening the relaxatio
always improve the Bethe app

No (empirically usually ver

on of the marginal polytope
proximation for log Z ?
ry helpful for general models)

32 / 46

Questions addressed include

Does tightening the relaxatio
always improve the Bethe app

No (empirically usually ver
In attractive models, when lo
couplings high, why does the
poorly for marginals?

on of the marginal polytope
proximation for log Z ?
ry helpful for general models)
ocal potentials are low and
e Bethe approximation perform

32 / 46

Questions addressed include

Does tightening the relaxatio
always improve the Bethe app

No (empirically usually ver
In attractive models, when lo
couplings high, why does the
poorly for marginals?

Bethe entropy

on of the marginal polytope
proximation for log Z ?
ry helpful for general models)
ocal potentials are low and
e Bethe approximation perform

32 / 46

Questions addressed include

Does tightening the relaxatio
always improve the Bethe app

No (empirically usually ver
In attractive models, when lo
couplings high, why does the
poorly for marginals?

Bethe entropy
In general models, for low co
performs much better than T
this advantage disappears. H
the relaxation of the margina

on of the marginal polytope
proximation for log Z ?
ry helpful for general models)
ocal potentials are low and
e Bethe approximation perform

ouplings, the Bethe approximation
TRW, yet as coupling increases,
How does this vary if we tighten
al polytope?

32 / 46

Questions addressed include

Does tightening the relaxatio
always improve the Bethe app

No (empirically usually ver
In attractive models, when lo
couplings high, why does the
poorly for marginals?

Bethe entropy
In general models, for low co
performs much better than T
this advantage disappears. H
the relaxation of the margina

Mixed, see Experiments

on of the marginal polytope
proximation for log Z ?
ry helpful for general models)
ocal potentials are low and
e Bethe approximation perform

ouplings, the Bethe approximation
TRW, yet as coupling increases,
How does this vary if we tighten
al polytope?

32 / 46

Tightening the polytope relax

No 16
15
Consider symmetric 14
nonhomogeneous 13
cycle, vary WBC , 12
θA = θB = θC = 0

log Z 11

A 10

9

8

B C 7
6
WAB = WAC = 10, −10
strongly attractive

Lemma: ∂ log ZB = µBC (0, 0) + µBC
∂ WBC

For weakly attractive edge BC, cyc

slopes near 0) but worsens partitio

xation - does it always help?

true
Bethe
Bethe+cycle

−5 0 5 10
BC edge weight

C (1, 1), all singleton marginals 1
2

cle improves pairwise marginal (similar

on function (gap between curves near 0)

33 / 46

Threshold result for attractive

Lemma: For a symmetric hom

q = ( 1 , . . . , 1 ) is a stationary
2 2
d
for W > 2 log d −2 (uses earli

Recall i di = 2m (handshak

SB = mSij + (n − 2m)Si . For

pulled onto main diagonal, he

avoid negative SB , each entro

pairwise 1 0 or symmetr
0 0

Bethe free energy E−S

B

Bethe free energy E−SB
00

−0.5 −0.2
−0.4
−1 −0.6
−0.8
−1.50 0.5 1
q 0

K5 : W = 1 W=

e models due to SB entropy

mogeneous d-regular MRF,

y point of F but not a minimum

ier Hessian result)

ke lemma), hence

or large W , all probability mass

ence Sij ≈ Si . For m > n, to
opy term → 0 by tending to

rically 0 0 .
0 1

Bethe free energy E−SB 0

−0.1

−0.2

−0.3

0.5 1 −0.4 0.5 1
q 0 q 34 / 46

= 1.38 W = 1.75

Also a polytope effect for frus

A frustrated cycle has an odd num
singleton marginals the other way,

Seen Bethe entropy effect for

Also a polytope effect for fru

Recall optimum energy on lo

frustrated cycle is at ( 1 , ... ,
2

C5 topology, θi ∼ [0, Tmax ], all ed

avg singleton marginal 1

true

0.9 Bethe

Bethe+cycle

0.8

0.7

0.6

0.5 −5 0 5 10
−10
edge weight W

strated cycles

mber of repulsive edges, this pulls

y, toward 1
2

r attractive cycles

ustrated cycles

cal polytope for a symmetric

1 )
2

dges W

avg singleton marginal 1

true

0.9 Bethe

Bethe+cycle

0.8

0.7

0.6

0.5 −5 0 5 10
−10
edge weight W

35 / 46

Experiments: General models
(attractive and repulsive edge

100
Bethe+local
Bethe+cycle

80 Bethe+marg
TRW+local

60 TRW+cycle
TRW+marg

40

20

0 8 16 24 32
2 Maximum coupling strength y

2 log partition error
1.5
Bethe+cycle
1 Bethe+marg

TRW+cycle
TRW+marg

0.5

02 8 16 24 32
Maximum coupling strength y
log partition error, local removed

θi ∼ [−2, 2]
es) K10 topology

0.4

0.3

0.2

0.1

0 8 16 24 32
2 Maximum coupling strength y
32
Singleton marginals, average 1 error 36 / 46
0.4

0.3

0.2

0.1

0 8 16 24
2 Maximum coupling strength y

Pairwise marginals, average 1 error

Conclusions for general model

Big gains from cycle polytope
Not much additional gain fro
(computationally harder)
Bethe performs remarkably w

Better than TRW for log Z
Less clear on singleton mar
coupling
Still much to learn about why

ls

e (suggest Frank-Wolfe)
om marginal polytope
well
Z , pairwise marginals
arginals: TRW better for very strong
y Bethe performs so well...

37 / 46

Summary

The Bethe approximation is r
approximate inference
Novel results on Hessian of B
First algorithm for -approx o
for attractive models
Contributions to understandin
(polytope and entropy)
Where feasible, tightening to
helpful
Additional results in new wor

Thank you!

remarkably effective for
Bethe free energy
of global optimum log ZB , FPTAS
ng the Bethe approximation
o the cycle polytope can be very
rk (e.g. clamping)...

38 / 46

Score/ValueAttractive example: max score

Opt Score(C) an
1

0

−1
0 0.1 0.2 0.3 0.4 0

e and value, with arg max

nd Value(−F), i=3/4

1

0.5 argmax Singleton Values

0.5 0.6 0.7 0.8 0.9 0
qi 1

39 / 46

References

F. Kor˘c, V. Kolmogorov, and C. L
discrete energy minimization. Tech
J. Mooij and H. Kappen. Sufficien
sum-product algorithm. IEEE Tran
D. Schlesinger and B. Flach. Tran
into a binary one. Technical report
A. Weller and T. Jebara. Approxim
UAI, 2014.
A. Weller, K. Tang, D. Sontag, an
approximation: When and how can
A. Weller and T. Jebara. Bethe bo
optimum. In AISTATS, 2013.
M. Welling and Y. Teh. Belief opt
alternative to loopy belief propaga
J. Yedidia, W. Freeman, and Y. W
and its generalizations. In IJCA, D

Lampert. Approximating marginals using
hnical report, IST Austria, 2012.
nt conditions for convergence of the
nsactions on Information Theory, 2007.
nsforming an arbitrary minsum problem
t, Dresden University of Tech, 2006.
mating the Bethe partition function. In

nd T. Jebara. Understanding the Bethe
n it go wrong? In UAI, 2014.
ounds and approximating the global

timization for binary networks: A stable
ation. In UAI, 2001.
Weiss. Understanding belief propagation
Distinguished Lecture Track, 2001.

40 / 46

Extra Slides with Supplementa

Supplementa
(if time or

ary Material

ary Material
questions)

41 / 46


Click to View FlipBook Version