Collaborative Filtering: Models and Algorithms
Andrea Montanari
Jose Bento, Yash Deshpande, Adel Javanmard, Raghunandan Keshavan, Sewoong Oh,
Stratis Ioannidis, Nadia Fawaz, Amy Zhang
Stanford University, Technicolor
September 15, 2012
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 1 / 58
Problem statement
Given data on the activity of a set of users, provide personalized
recommendations to users X, Y, Z,. . .
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 2 / 58
Example
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 3 / 58
An obviously useful technology
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 4 / 58
An obviously useful technology
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 5 / 58
Outline
1 A model
2 Algorithms and accuracy
#3 Challenge 1: Privacy
4 Challenge #2: Interactivity
5 Conclusion
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 6 / 58
A model
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 7 / 58
Setting
Users : i P f1; 2; : : : ; mg
Movies : j P f1; 2; : : : ; ng
When user i watches movie j , she enters her rating Rij .
Want to predict ratings for missing pairs.
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 8 / 58
Setting
Users : i P f1; 2; : : : ; mg
Movies : j P f1; 2; : : : ; ng
When user i watches movie j , she enters her rating Rij .
Want to predict ratings for missing pairs.
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 8 / 58
Linear regression model
Movie j
=vj genre; main actor; supporting actor; year; . . . ¡ P Rr
Rij $ ci + hui ; vj i + "ij
Want: User parameters vector ui
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 9 / 58
Linear regression model
Movie j
=vj genre; main actor; supporting actor; year; . . . ¡ P Rr
Rij $ ci + hui ; vj i + "ij
Want: User parameters vector ui
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 9 / 58
Linear regression model
Movie j
=vj genre; main actor; supporting actor; year; . . . ¡ P Rr
Rij $ ci + hui ; vj i + "ij
Want: User parameters vector ui
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 9 / 58
Linear regression model
Movie j
=vj genre; main actor; supporting actor; year; . . . ¡ P Rr
Rij +$ ci hui ; vj i + "ij
Want: User parameters vector ui
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 10 / 58
Least squares
VW
` 2a
xi ; vj Y
= h iRarg mini x ijXu
j iRi P r
PWatchedBy( )
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 11 / 58
Ridge regression
VW
` a
2
xi ; vj
= h i + k kRarg mini x ijXu xi 2
j iRi P r
PWatchedBy( ) Y
How do we construct the vj 's?
Ad hoc denitions are not suited to recommendation!
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 12 / 58
Ridge regression
VW
` a
2
xi ; vj
= h i + k kRarg mini x ijXu xi 2
j iRi P r
PWatchedBy( ) Y
How do we construct the vj 's?
Ad hoc denitions are not suited to recommendation!
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 12 / 58
If I knew the users' feature vectors
Rij $ hui ; vj i + "ij
VW
` a
2
ui ; yj
= h i + k kRvj
arg miny i jXRj P r ij yj 2
PWatched( ) Y
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 13 / 58
If I knew the users' feature vectors
Rij $ hui ; vj i + "ij
VW
` a
2
ui ; yj
= h i + k kRvj
arg miny i jXRj P r ij yj 2
PWatched( ) Y
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 13 / 58
Everything together
VW
2 a
= h i + k kR`
arg xmin j iR Xi P r
ui ij xi ; vj xi 2
PWatchedBy( ) Y
VW
2 a
= h i + k kR`
vj arg miny i jXRj P r ij ui ; yj yj 2
PWatched( ) Y
=(Minimize E Watched)
2 m n
xi ; yj
h i k k k kRij
( ) = + +F X ; Y ix 2 jy 2
2 2
i j=1 =1
(i;j E)P
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 14 / 58
Everything together
VW
2 a
= h i + k kR`
arg xmin j iR Xi P r
ui ij xi ; vj xi 2
PWatchedBy( ) Y
VW
2 a
= h i + k kR`
vj arg miny i jXRj P r ij ui ; yj yj 2
PWatched( ) Y
=(Minimize E Watched)
2 m n
xi ; yj
h i k k k kRij
( ) = + +F X ; Y ix 2 jy 2
2 2
i j=1 =1
(i;j E)P
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 14 / 58
Objective function
=(Minimize E Watched)
2 m n
xi ; yj
h i k k k kRij
( ) = + +F X ; Y ix 2 jy 2
2 2
i j=1 =1
(i;j E)P
kE (R XY T)kF2 + kX k2F + kY k2F
@ if (i; j ) P E ;
E(A) = Aij otherwise
0
= ¡ ¡ ¡X Th x2 xm i
x1
= ¡ ¡ ¡Y Th y2 yn i
y1
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 15 / 58
Objective function
=(Minimize E Watched)
2 m n
xi ; yj
h i k k k kRij
( ) = + +F X ; Y ix 2 jy 2
2 2
i j=1 =1
(i;j E)P
kE (R XY T)kF2 + kX k2F + kY k2F
@ if (i; j ) P E ;
E(A) = Aij otherwise
0
= ¡ ¡ ¡X Th x2 xm i
x1
= ¡ ¡ ¡Y Th y2 yn i
y1
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 15 / 58
Low rank structure
I VT r
=m Low-rank M U
nr
1. Low-rank matrix M
2. R = M + Z
3. Observed subset E
@ Mij + Zij if (i; j ) P E ;
0
E (R)ij = otherwise.
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 16 / 58
Low rank structure
I VT r
=m Low-rank M U
nr
1. Low-rank matrix M
2. R = M + Z
3. Observed subset E
@ Mij + Zij if (i; j ) P E ;
0
E (R)ij = otherwise.
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 16 / 58
Low rank structure
I VT r
=m Low-rank M U
nr
1. Low-rank matrix M
2. R = M + Z
3. Observed subset E
@ Mij + Zij if (i; j ) P E ;
0
E (R)ij = otherwise.
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 16 / 58
Low rank structure
m Sample E (R)
n
1. Low-rank matrix M
2. R = M + Z
3. Observed subset E
@ Mij + Zij if (i; j ) P E ;
0
E (R)ij = otherwise.
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 16 / 58
Algorithms and accuracy
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 17 / 58
Questions
( )How do we minimize F X ; Y ? (RMSE = kM M^ kF =pmn)
What prediction accuracy?
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 18 / 58
How do we minimize F (X ; Y )?
Spectral methods. [Srebro, Rennie, Jaakkola, 2003]
Gradient method. [Fazel, Hindi, Boyd, 2001]
Convex relaxations.
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 19 / 58
Spectral method (for simplicity = 0)
Replace
( )F X ; Y = kE (R XY T)k2F
= kE (XY T)kF2 2hE (R); XY Ti + const:
=(with p j jE =mn fraction of observed entries)
( )Fe X ; Y = p kXY kT 2F 2hE (R); XY Ti + const
= ( )p XY T E R F2
1
p
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 20 / 58
Spectral method (for simplicity = 0)
Replace
( )F X ; Y = kE (R XY T)k2F
= kE (XY T)kF2 2hE (R); XY Ti + const:
=(with p j jE =mn fraction of observed entries)
( )Fe X ; Y = p kXY kT 2F 2hE (R); XY Ti + const
= ( )p XY T E R F2
1
p
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 20 / 58
Spectral method (for simplicity = 0)
Replace
( )F X ; Y = kE (R XY T)k2F
= kE (XY T)kF2 2hE (R); XY Ti + const:
=(with p j jE =mn fraction of observed entries)
( )Fe X ; Y = p kXY kT 2F 2hE (R); XY Ti + const
= ( )p XY T E R F2
1
p
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 20 / 58
Spectral method (for simplicity = 0)
Minimize
XY T 1 E R 2F
( ) = ( )Fe X ; Y
p
Solved by SVD
1
Xm ¢r
p
( ) = ^ = ( )E R
AXSY T M S r¢r YnT¢r
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 21 / 58
Spectral method (for simplicity = 0)
Minimize
XY T 1 E R 2F
( ) = ( )Fe X ; Y
p
Solved by SVD
1
Xm ¢r
p
( ) = ^ = ( )E R
AXSY T M S r¢r YnT¢r
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 21 / 58
Random matrix r = 4, m = n = 10000, p = 0:0012
40
30
20
10 4 3 21
0
0 10 20 30 40 50 60 70 80
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 22 / 58
Netix data (trimmed)
140 20 40 60 80 100 120 140
120
100
80
60
40
20
0
0
Andrea Montanari (Stanford) RMSE % 0:99 September 15, 2012 23 / 58
Collaborative Filtering
Accuracy guarantees
Theorem (Keshavan, M, Oh, 2009)
jM j MQssume ij max. Then, w.h.p., rank-r projection achieves
q
j j + k k p j jM =C max nr E
RMSE ZC H E r= E :
2n
( + ) j jp
E.g. Gaussian noise: C HH 1 z r n= E
[Improves over Achlioptas-McSherry 2003]
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 24 / 58
Gradient descent
2 m n
xi ; yj
h i k k k kRij
( ) = + +F X ; Y ix 2 jy 2
2 2
i j=1 =1
(i;j E)P
=(
Update rule: `learning rate')
xi 2 (1 )
xi +
hxi ; i
j iPWatchedBy( ) Rij yj yj
yj 2 (1 )
yj +
hxi ; i
i jPWatched( ) Rij yj xi
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 25 / 58
Gradient descent
2 m n
xi ; yj
h i k k k kRij
( ) = + +F X ; Y ix 2 jy 2
2 2
i j=1 =1
(i;j E)P
=(
Update rule: `learning rate')
xi 2 (1 )
xi +
hxi ; i
j iPWatchedBy( ) Rij yj yj
yj 2 (1 )
yj +
hxi ; i
i jPWatched( ) Rij yj xi
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 25 / 58
Interpretation
( ) +2
xi 1
xi
j iPWatchedBy( ) wij yj
( ) +2
yj 1
yj
i jPWatched( ) wij xi
user 2 avg of movies she liked
movie 2 avg of users that liked it
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 26 / 58
Interpretation
( ) +2
xi 1
xi
j iPWatchedBy( ) wij yj
( ) +2
yj 1
yj
i jPWatched( ) wij xi
user 2 avg of movies she liked
movie 2 avg of users that liked it
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 26 / 58
A variant (stochastic gradient)
( ) PPick i ; j E:
xi 2 (1 )
xi +
wij yj
yj 2 (1 )
yj +
wij xi
[Srebro, Rennie, Jaakkola, 2003]
[Srebro, Jaakkola, 2005]
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 27 / 58
A variant (stochastic gradient)
( ) PPick i ; j E:
xi 2 (1 )
xi +
wij yj
yj 2 (1 )
yj +
wij xi
[Srebro, Rennie, Jaakkola, 2003]
[Srebro, Jaakkola, 2005]
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 27 / 58
In the words of SimonFunk
[Neix challenge, 2006-2009]
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 28 / 58
And his results
Target RMSE 0:8564
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 29 / 58
Three variants
1.1 Message Passing
1.05 Alt. Min.
OptSpace
1
RMSE 0.95
0.9
0.85
0.8
0.75
0.7
0 5 10 15 20 25 30 35
iterations
OptSpace $ Gradient descent on Grassmannian [Keshavan-M.-Oh 2009]
Alternating Least Squares [Koren-Bell 2008]
Message Passing
[Keshavan-M. 2011]
Andrea Montanari (Stanford) Collaborative Filtering September 15, 2012 30 / 58
Accuracy guarantees: Vignette (m = n = 2000 rank-8)
low-rank matrix M sampled matrix ME
OptSpace output M^ squared error (M M^ )2
Andrea Montanari (Stanford) 0:25% sampled September 15, 2012 31 / 58
Collaborative Filtering
Accuracy guarantees: Vignette (m = n = 2000 rank-8)
low-rank matrix M sampled matrix ME
OptSpace output M^ squared error (M M^ )2
Andrea Montanari (Stanford) 0:50% sampled September 15, 2012 31 / 58
Collaborative Filtering
Accuracy guarantees: Vignette (m = n = 2000 rank-8)
low-rank matrix M sampled matrix ME
OptSpace output M^ squared error (M M^ )2
Andrea Montanari (Stanford) 0:75% sampled September 15, 2012 31 / 58
Collaborative Filtering
Accuracy guarantees: Vignette (m = n = 2000 rank-8)
low-rank matrix M sampled matrix ME
OptSpace output M^ squared error (M M^ )2
Andrea Montanari (Stanford) 1:00% sampled September 15, 2012 31 / 58
Collaborative Filtering