Sunday, May 04, 2008

Programming: Collective Intelligence.

       Item-based filtering is significantly faster than user-based when getting a list of recommendations for a large dataset, but it does have the additional overhead of maintaining the item similarity table. Also, there is a difference in accuracy that depends on hwo 'sparse' the dataset is. In the movie example MOVIELENS, since every critic has rated nearly every movie, the dataset is dense. On the other hand, it would be unlikely to find two people with the same set of del.icio.us bookmarks----most bookmarks are saved by a small group of people, leading to a sparse dataset. Item-based filtering usually outperforms user-based filtering in sparse datasets, and the two perform about equally in dense dataset.

       这本书是给程序员写的,原理比较浅显。但是使用了很多的现实中活生生的数据。facebook之类

No comments: