This is pretty much Simon's basic method. Initialize once and then train and apply features.
Initialize:
initialize CurrentEstimate[user,movie] to MovieAverage[movie] + UserOffset[user] (see Simon's page)
Train a Feature:
initialize UserVector[user] to small random numbers
initialize MovieVector[movie] to small random numbers
repeatedly:
for each user:
for each training movie (for this user):
uv = UserVector[user]
mv = MovieVector[movie]
err = ActualRating[user,movie] - CurrentEstimate[user,movie] - uv * mv
UserVector[user] += learningRate * err * mv
MovieVector[movie] += learningRate * err * uv
Apply a Feature:
for each user:
uv = UserVector[user]
for each training and testing movie (for this user):
mv = MovieVector[movie]
CurrentEstimate[user,movie] += uv * mv
2. Eliminate UserVector
If we want to be able to provide estimates to new users without first training the features with that new user, it would be handy to get rid of the UserVector. Given the current error and the MovieVector, the user's value is just the dot product of the error and MovieVector divided by the dot product of the MovieVector with itself (using only movies the user has rated).
Train a Feature:
initialize MovieVector[movie] to small random numbers
repeatedly:
for each user:
errMv = 0
MvMv = 0
for each training movie (for this user):
mv = MovieVector[movie]
err = ActualRating[user,movie] - CurrentEstimate[user,movie]
errMv += err * mv
MvMv += mv * mv
uv = errMv / MvMv
for each training movie (for this user):
mv = MovieVector[movie]
err = ActualRating[user,movie] - CurrentEstimate[user,movie] - uv * mv
MovieVector[movie] += learningRate * err * uv
Apply a Feature:
for each user:
for each training movie (for this user):
mv = MovieVector[movie]
err = ActualRating[user,movie] - CurrentEstimate[user,movie]
errMv += err * mv
MvMv += mv * mv
uv = errMv / MvMv
for each training and testing movie (for this user):
mv = MovieVector[movie]
CurrentEstimate[user,movie] += uv * mv
This gets a probe RMSE of .9636. Not great.
I start each feature with learningRate = .1 and multiply by .95 after each epoch and continue for 150 epochs. This takes me just under 4 minutes per feature.
3. Discount
Multiplying each feature by a small constant improves the score by quite a bit (unfortunately the number of features required increases even faster).
Apply a Feature:
for each user:
for each training movie (for this user):
mv = MovieVector[movie]
err = ActualRating[user,movie] - CurrentEstimate[user,movie]
errMv += err * mv
MvMv += mv * mv
uv = errMv / MvMv
for each training and testing movie (for this user):
mv = MovieVector[movie]
CurrentEstimate[user,movie] += uv * mv * (1 - discount)
Probe RMSE under .9250. Now, we are making progress.
4. Per User Discount
Somewhat better results can be obtained by applying a bigger discount to users with small numbers of rated movies. Might MvMv be a better choice? It would seem like if you haven't rated any movies that are significantly involved with the feature, maybe that is worth a larger discount.
Apply a Feature:
for each user:
for each training movie (for this user):
mv = MovieVector[movie]
err = ActualRating[user,movie] - CurrentEstimate[user,movie]
errMv += err * mv
MvMv += mv * mv
uv = errMv / MvMv
disc = (SOME_CONST + discount * TrainingMovieCount[user]) / (SOME_CONST + TrainingMovieCount[user]);
for each training and testing movie (for this user):
mv = MovieVector[movie]
CurrentEstimate[user,movie] += uv * mv * (1 - disc)
Probe RMSE about .9150. More progress.