The words you are searching are inside this book. To get more targeted content, please make full-text search by clicking here.

Collaborative Filtering: Models and Algorithms Andrea Montanari Jose Bento, ashY Deshpande, Adel Jaanmard,v Raghunandan Keshaan,v Sewoong Oh, Stratis Ioannidis, Nadia ...

Discover the best professional documents and content resources in AnyFlip Document Base.
Search
Published by , 2016-02-01 21:45:03

Collaborative Filtering: Models and Algorithms

Collaborative Filtering: Models and Algorithms Andrea Montanari Jose Bento, ashY Deshpande, Adel Jaanmard,v Raghunandan Keshaan,v Sewoong Oh, Stratis Ioannidis, Nadia ...

Collaborative Filtering: Models and Algorithms

Andrea Montanari

Jose Bento, Yash Deshpande, Adel Javanmard, Raghunandan Keshavan, Sewoong Oh,
Stratis Ioannidis, Nadia Fawaz, Amy Zhang
Stanford University, Technicolor

September 15, 2012

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 1 / 58

Problem statement

Given data on the activity of a set of users, provide personalized
recommendations to users X, Y, Z,. . .

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 2 / 58

Example

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 3 / 58

An obviously useful technology

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 4 / 58

An obviously useful technology

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 5 / 58

Outline

1 A model
2 Algorithms and accuracy

#3 Challenge 1: Privacy
4 Challenge #2: Interactivity

5 Conclusion

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 6 / 58

A model

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 7 / 58

Setting

Users : i P f1; 2; : : : ; mg
Movies : j P f1; 2; : : : ; ng

When user i watches movie j , she enters her rating Rij .
Want to predict ratings for missing pairs.

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 8 / 58

Setting

Users : i P f1; 2; : : : ; mg
Movies : j P f1; 2; : : : ; ng

When user i watches movie j , she enters her rating Rij .
Want to predict ratings for missing pairs.

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 8 / 58

Linear regression model

Movie j

=vj  genre; main actor; supporting actor; year; . . . ¡ P Rr

Rij $ ci + hui ; vj i + "ij

Want: User parameters vector ui

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 9 / 58

Linear regression model

Movie j

=vj  genre; main actor; supporting actor; year; . . . ¡ P Rr

Rij $ ci + hui ; vj i + "ij

Want: User parameters vector ui

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 9 / 58

Linear regression model

Movie j

=vj  genre; main actor; supporting actor; year; . . . ¡ P Rr

Rij $ ci + hui ; vj i + "ij

Want: User parameters vector ui

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 9 / 58

Linear regression model

Movie j

=vj  genre; main actor; supporting actor; year; . . . ¡ P Rr

Rij +$ ci hui ; vj i + "ij

Want: User parameters vector ui

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 10 / 58

Least squares

VW
`ˆ 2a

xi ; vj Y
=   h iRarg mini x ijXu
j iRi P r
PWatchedBy( )

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 11 / 58

Ridge regression

VW
`ˆ a
2

xi ; vj
=   h i + k kRarg mini x ijXu xi 2
j iRi P r
PWatchedBy( ) Y

How do we construct the vj 's?
Ad hoc denitions are not suited to recommendation!

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 12 / 58

Ridge regression

VW
`ˆ a
2

xi ; vj
=   h i + k kRarg mini x ijXu xi 2
j iRi P r
PWatchedBy( ) Y

How do we construct the vj 's?
Ad hoc denitions are not suited to recommendation!

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 12 / 58

If I knew the users' feature vectors

Rij $ hui ; vj i + "ij

VW
`ˆ a
2

ui ; yj
=   h i + k kRvj
arg miny i jXRj P r ij yj 2

PWatched( ) Y

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 13 / 58

If I knew the users' feature vectors

Rij $ hui ; vj i + "ij

VW
`ˆ a
2

ui ; yj
=   h i + k kRvj
arg miny i jXRj P r ij yj 2

PWatched( ) Y

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 13 / 58

Everything together

VW
2 a
=   h i + k kR` ˆ
arg xmin j iR Xi P r
ui ij xi ; vj xi 2

PWatchedBy( ) Y

VW
2 a
=   h i + k kR` ˆ
vj arg miny i jXRj P r ij ui ; yj yj 2

PWatched( ) Y

=(Minimize E Watched)

2 ˆm ˆn

xi ; yj
ˆ

  h i k k k kRij
( ) = + +F X ; Y ix 2 jy 2
2 2
i j=1 =1
(i;j E)P

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 14 / 58

Everything together

VW
2 a
=   h i + k kR` ˆ
arg xmin j iR Xi P r
ui ij xi ; vj xi 2

PWatchedBy( ) Y

VW
2 a
=   h i + k kR` ˆ
vj arg miny i jXRj P r ij ui ; yj yj 2

PWatched( ) Y

=(Minimize E Watched)

2 ˆm ˆn

xi ; yj
ˆ

  h i k k k kRij
( ) = + +F X ; Y ix 2 jy 2
2 2
i j=1 =1
(i;j E)P

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 14 / 58

Objective function

=(Minimize E Watched)

2 ˆm ˆn

xi ; yj
ˆ

  h i k k k kRij
( ) = + +F X ; Y ix 2 jy 2
2 2
i j=1 =1
(i;j E)P

k€E (R   XY T)kF2 + kX k2F + kY k2F

@ if (i; j ) P E ;

€E(A) = Aij otherwise
0

= ¡ ¡ ¡X Th x2 xm i

x1

= ¡ ¡ ¡Y Th y2 yn i

y1

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 15 / 58

Objective function

=(Minimize E Watched)

2 ˆm ˆn

xi ; yj
ˆ

  h i k k k kRij
( ) = + +F X ; Y ix 2 jy 2
2 2
i j=1 =1
(i;j E)P

k€E (R   XY T)kF2 + kX k2F + kY k2F

@ if (i; j ) P E ;

€E(A) = Aij otherwise
0

= ¡ ¡ ¡X Th x2 xm i

x1

= ¡ ¡ ¡Y Th y2 yn i

y1

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 15 / 58

Low rank structure

I VT r

=m Low-rank M U

nr

1. Low-rank matrix M

2. R = M + Z

3. Observed subset E

@ Mij + Zij if (i; j ) P E ;
0
€E (R)ij = otherwise.

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 16 / 58

Low rank structure

I VT r

=m Low-rank M U

nr

1. Low-rank matrix M

2. R = M + Z

3. Observed subset E

@ Mij + Zij if (i; j ) P E ;
0
€E (R)ij = otherwise.

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 16 / 58

Low rank structure

I VT r

=m Low-rank M U

nr

1. Low-rank matrix M

2. R = M + Z

3. Observed subset E

@ Mij + Zij if (i; j ) P E ;
0
€E (R)ij = otherwise.

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 16 / 58

Low rank structure

m Sample €E (R)

n

1. Low-rank matrix M

2. R = M + Z

3. Observed subset E

@ Mij + Zij if (i; j ) P E ;
0
€E (R)ij = otherwise.

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 16 / 58

Algorithms and accuracy

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 17 / 58

Questions

( )How do we minimize F X ; Y ? (RMSE = kM   M^ kF =pmn)

What prediction accuracy?

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 18 / 58

How do we minimize F (X ; Y )?

Spectral methods. [Srebro, Rennie, Jaakkola, 2003]
Gradient method. [Fazel, Hindi, Boyd, 2001]
Convex relaxations.

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 19 / 58

Spectral method (for simplicity = 0)

Replace

( )F X ; Y = k€E (R   XY T)k2F
= k€E (XY T)kF2   2h€E (R); XY Ti + const:

=(with p j jE =mn fraction of observed entries)

( )Fe X ; Y = p kXY kT 2F   2h€E (R); XY Ti + const
=   € ( )p XY T E R F2
1

p

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 20 / 58

Spectral method (for simplicity = 0)

Replace

( )F X ; Y = k€E (R   XY T)k2F
= k€E (XY T)kF2   2h€E (R); XY Ti + const:

=(with p j jE =mn fraction of observed entries)

( )Fe X ; Y = p kXY kT 2F   2h€E (R); XY Ti + const
=   € ( )p XY T E R F2
1

p

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 20 / 58

Spectral method (for simplicity = 0)

Replace

( )F X ; Y = k€E (R   XY T)k2F
= k€E (XY T)kF2   2h€E (R); XY Ti + const:

=(with p j jE =mn fraction of observed entries)

( )Fe X ; Y = p kXY kT 2F   2h€E (R); XY Ti + const
=   € ( )p XY T E R F2
1

p

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 20 / 58

Spectral method (for simplicity = 0)

Minimize

XY T 1 E R 2F
( ) =   € ( )Fe X ; Y
p

Solved by SVD

1
Xm ¢r

p
( ) = ^ = ( )€E R
AXSY T M S r¢r YnT¢r

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 21 / 58

Spectral method (for simplicity = 0)

Minimize

XY T 1 E R 2F
( ) =   € ( )Fe X ; Y
p

Solved by SVD

1
Xm ¢r

p
( ) = ^ = ( )€E R
AXSY T M S r¢r YnT¢r

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 21 / 58

Random matrix r = 4, m = n = 10000, p = 0:0012

40

30

20

10 4 3 21

0
0 10 20 30 40 50 60 70 80

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 22 / 58

Netix data (trimmed)

140 20 40 60 80 100 120 140
120
100

80
60
40
20

0
0

Andrea Montanari (Stanford) RMSE % 0:99 September 15, 2012 23 / 58

Collaborative Filtering

Accuracy guarantees

Theorem (Keshavan, M, Oh, 2009)

jM j MQssume ij max. Then, w.h.p., rank-r projection achieves

q

j j + k k p j jM =C max nr E
RMSE ZC H E r= E :
2n

( + ) j jp

E.g. Gaussian noise: C HH 1 z r n= E

[Improves over Achlioptas-McSherry 2003]

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 24 / 58

Gradient descent

2 ˆm ˆn

xi ; yj
ˆ

  h i k k k kRij
( ) = + +F X ; Y ix 2 jy 2
2 2
i j=1 =1
(i;j E)P

=(
Update rule: `learning rate')

xi 2 (1   )
xi +
ˆ   hxi ; i

j iPWatchedBy( ) Rij yj yj

yj 2 (1   )
yj +
ˆ   hxi ; i

i jPWatched( ) Rij yj xi

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 25 / 58

Gradient descent

2 ˆm ˆn

xi ; yj
ˆ

  h i k k k kRij
( ) = + +F X ; Y ix 2 jy 2
2 2
i j=1 =1
(i;j E)P

=(
Update rule: `learning rate')

xi 2 (1   )
xi +
ˆ   hxi ; i

j iPWatchedBy( ) Rij yj yj

yj 2 (1   )
yj +
ˆ   hxi ; i

i jPWatched( ) Rij yj xi

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 25 / 58

Interpretation

( ) +2   ˆ
xi 1
xi
j iPWatchedBy( ) wij yj

( ) +2   ˆ
yj 1
yj
i jPWatched( ) wij xi

user 2 avg of movies she liked
movie 2 avg of users that liked it

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 26 / 58

Interpretation

( ) +2   ˆ
xi 1
xi
j iPWatchedBy( ) wij yj

( ) +2   ˆ
yj 1
yj
i jPWatched( ) wij xi

user 2 avg of movies she liked
movie 2 avg of users that liked it

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 26 / 58

A variant (stochastic gradient)

( ) PPick i ; j E:

xi 2 (1   )
xi +
wij yj
yj 2 (1   )
yj +
wij xi

[Srebro, Rennie, Jaakkola, 2003]
[Srebro, Jaakkola, 2005]

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 27 / 58

A variant (stochastic gradient)

( ) PPick i ; j E:

xi 2 (1   )
xi +
wij yj
yj 2 (1   )
yj +
wij xi

[Srebro, Rennie, Jaakkola, 2003]
[Srebro, Jaakkola, 2005]

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 27 / 58

In the words of SimonFunk

[Neix challenge, 2006-2009]

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 28 / 58

And his results

Target RMSE 0:8564

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 29 / 58

Three variants

1.1 Message Passing
1.05 Alt. Min.
OptSpace
1
RMSE 0.95

0.9

0.85

0.8

0.75

0.7
0 5 10 15 20 25 30 35

iterations

OptSpace $ Gradient descent on Grassmannian [Keshavan-M.-Oh 2009]
Alternating Least Squares [Koren-Bell 2008]
Message Passing
[Keshavan-M. 2011]

Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 30 / 58

Accuracy guarantees: Vignette (m = n = 2000 rank-8)

low-rank matrix M sampled matrix ME

OptSpace output M^ squared error (M   M^ )2

Andrea Montanari (Stanford) 0:25% sampled September 15, 2012 31 / 58

Collaborative Filtering

Accuracy guarantees: Vignette (m = n = 2000 rank-8)

low-rank matrix M sampled matrix ME

OptSpace output M^ squared error (M   M^ )2

Andrea Montanari (Stanford) 0:50% sampled September 15, 2012 31 / 58

Collaborative Filtering

Accuracy guarantees: Vignette (m = n = 2000 rank-8)

low-rank matrix M sampled matrix ME

OptSpace output M^ squared error (M   M^ )2

Andrea Montanari (Stanford) 0:75% sampled September 15, 2012 31 / 58

Collaborative Filtering

Accuracy guarantees: Vignette (m = n = 2000 rank-8)

low-rank matrix M sampled matrix ME

OptSpace output M^ squared error (M   M^ )2

Andrea Montanari (Stanford) 1:00% sampled September 15, 2012 31 / 58

Collaborative Filtering


Click to View FlipBook Version