Monday, May 19, 2008

Naive Bayes Classifer

several points:
1. Bayesian Theorem;
2.Conditional Independency Assumption;
3.MAP parameter estimation.
I think it's a simple algorithm, not hard to understand.
some resources attached:
1. Anthony's slides
2.a much easy-to-read slide about NBC:
http://homepages.inf.ed.ac.uk/keller/teaching/connectionism/lecture10_4up.pdf
3. ........

Monday, May 05, 2008

流水_05.05

介绍一下今天看到的几篇文章:
1、何毓琦(www.sciencenet.cn/blog/user_index.aspx?userid=1565 )关于研究和教育的一系列文章:http://www.sciencenet.cn/blog/user_content.aspx?id=23451 可以做一个很详尽的读书笔记。

2、孟岩一篇关于web 2.0 API的文章。
3、community QnA sites的几篇调研文章。

Sunday, May 04, 2008

Programming: Collective Intelligence.

       Item-based filtering is significantly faster than user-based when getting a list of recommendations for a large dataset, but it does have the additional overhead of maintaining the item similarity table. Also, there is a difference in accuracy that depends on hwo 'sparse' the dataset is. In the movie example MOVIELENS, since every critic has rated nearly every movie, the dataset is dense. On the other hand, it would be unlikely to find two people with the same set of del.icio.us bookmarks----most bookmarks are saved by a small group of people, leading to a sparse dataset. Item-based filtering usually outperforms user-based filtering in sparse datasets, and the two perform about equally in dense dataset.

       这本书是给程序员写的,原理比较浅显。但是使用了很多的现实中活生生的数据。facebook之类