Introduction to Information Retrieval

Class Hours: Monday/Wednesday 8:05-9:25am, SFEBB 3170


Qingyao Ai

Office Hours: Wednesday 9:30am-10:30am, MEB 2172


Text Books (optional):



The grade will count the assessments using the following proportions (tentative and subject to change):

Tentative Class Schedule

Week, Date Topic Reminder Slides Readings
Week 01, 08/24 - 08/28 ▪ Course Overview
Introduce information retrieval and the course organization.
▪ The life of a query
Modern search engine architecture overview and how information is retrieved for a given search query.
Week 02, 08/31 - 09/04 ▪ Evaluation
Introduce the concept of ranking and how to evaluate IR systems with ranking metrics.
▪ Crawls and feeds
How to collect and create information retrieval collections.
    NDCG: Järvelin and Kekäläinen (TOIS 2002)
ERR Chapelle et al. (CIKM 2009)
BigTable Chang et al. (OSDI 2006)
Week 03, 09/07 - 09/11 ▪ Processing text
The basic techniques for text processing such as stemming, stop words removal, etc.
▪ Processing Web
Link analysis and spam detection.
    Zipf’s Law and Heap’s Law Sano et al. (2012)
Topic-sensitive PageRank Haveliwala (WWW 2002)
Week 04, 09/14 - 09/18 ▪ Indexing
Introduce basic indexing techniques (e.g., inverted index) for efficient information retrieval.
▪ Compression
Data compression and index compression algorithms.
    BitFunnel Goodwin et al. (SIGIR 2017)
Week 05, 09/21 - 09/25 ▪ Queries and Interfaces
Interface design and query process techniques including suggestions, reformulation, etc.
▪ Retrieval Model (1)
Vector space models.
Assignment 1 Due (02/09)   Local vs. Global Analysis Xu and Croft (SIGIR 1996)
Week 06, 09/28 - 10/02 ▪ Retrieval Model (2)
Classic probabilistic models.
▪ Retrieval Model (3)
Language modeling approaches and smoothing.
    2-Possion Model Robertson (SIGIR 1994)
Query Likelihood model Ponte and Croft (SIGIR 1998)
KL-divergence model Lafferty and Zhai (SIGIR 2001)
Language smoothing Lafferty and Zhai (SIGIR 2001)
Week 07, 10/05 - 10/09 ▪ Retrieval Model (4)
Enhanced language modeling approaches.
▪ Relevance Feedback
The concept and basic techniques of relevance feedback and pseudo relevance feedback.
    Dependence models Metzler and Croft (SIGIR 2005)
Huston and Croft (CIKM 2014)
Relevance Model Lavrenko and Croft (SIGIR 2001)
Model-based feedback model Zhai and Lafferty (CIKM 2001)
Week 08, 10/12 - 10/16 ▪ Search Result Diversification
The motivation of result diversification and classic models.
▪ Learning to Rank (1)
The motivation and basic concepts of learning to rank, including query-document pairs, feature vectors, etc.
    Maximal Marginal Relevance Carbonell and Goldstein (SIGIR 1998)
Diversity evaluation larke et al. (SIGIR 2008)
LTR textbook Li.
Week 09, 10/19 - 10/23 ▪ Learning to Rank (2)
The optimization paradigms for learning-to-rank models, e.g. pointwise, pairwise, and listwise methods.
▪ Clustering and Search
The concepts and algorithms for text clustering and their applications in search.
Assignment 2 Due (03/08)   Cluster-based LM Liu and Croft (SIGIR 2004)
RankNet Burges et al. (ICML 2008)
ListMLE Xia et al. (ICML 2008)
ListMLE Xia et al. (ICML 2008)
LambdaMART Burges.
DLCM Ai et al. (SIGIR 2018)
Week 10, 10/26 - 10/30 ▪ Class Project Proposal Presentation
Proposal presentation and mutual evaluation
▪ Beyond bag-of-words (1)
Latent space models and distributed representations.
Project Proposal Due (03/17)   LSI Deerwester et al. (JASIS 1990)
LDA-LM Wei and Croft (SIGIR 2006)
Week 11, 11/02 - 11/06 ▪ Beyond bag-of-words (2)
Neural Information Retrieval.
▪ Recommendation systems (1)
The basic concepts and naive collaborative filetering algorithms.
    Neural Ranking Models Guo et al. (IPM 2019)
DSSM Huang et al. (CIKM 2013)
DRMM Guo et al. (CIKM 2016)
K-NRM Xiong et al. (SIGIR 2017)
Week 12, 11/09 - 11/13 ▪ Recommendation systems (2)
Matrix factorization and latent representation learning techniques.
▪ Adv. User Study and Crowdsourcing
The study and applications of user modeling and crowdsourcing in information retrieval.
Assignment 3 Due (04/05)   NCF He et al. (WWW 2017)
JRL Zhang et al. (CIKM 2017)
User experience Crescenzi et al. (CHIIR 2016)
Crowdsourcing Kittur et al. (CHI 2008)
Week 13, 11/16 - 11/20 ▪ Adv. Click Models and Unbiased Learning
The idea of biased user behavior and the algorithms for unbiased optimization.
▪ Adv. Personal Search and Product Search
Introducing IR techniques for domain-specific applications such as email search and e-commerce search.
    Cascade Model Craswell et al. (WSDM 2008)
DBN Chapelle and Zhang (WWW 2009)
UBM Dupret and Piwowarski (SIGIR 2008)
IPW Joachims et al. (WSDM 2017)
Regression EM Wang et al. (WSDM 2018)
DLA Ai et al. (SIGIR 2018)
Stuff I’ve seen Dumais et al. (SIGIR 2003)
Email Re-finding Ai et al. (WWW 2017)
HEM Ai et al. (SIGIR 2017)
E-commerce Search Taxonomy Sondhi et al. (SIGIR 2018)
Week 14, 11/23 - 11/27 ▪ Adv. Question Answering
Introduction on classic and state-of-the-art methods for question answering in open domains.
▪ Class Project Final Presentation (1)
The presentation and evaluation of course projects.
    QA Performance Moldovan et al. (TOIS 2003)
Non-factoid QA with LTR Liu et al. (ECIR 2016)
Risk of QA Su et al. (SIGIR 2019)
Week 15, 11/30 - 12/04 ▪ Class Project Final Presentation (2)
The presentation and evaluation of course projects.
Project and Paper Presentation Slides Due (04/21)    
Week 16, 12/07 - 12/11 ▪ Class Ended Project Final Report Due (04/29)    


Special thanks to Dr. Hamed Zamani from Microsoft Research, Prof. Jiepu Jiang from Virginia Tech University, Prof. Yongfeng Zhang from Rutgers University, and Prof. James Allan from University of Massachusetts Amherst. Some teaching materials are borrowed from their courses on CS656: Information Retrieval and CS691: Recommender System.