CS4960-6550

Introduction to Information Retrieval

Class Hours: Monday/Wednesday 8:05am-9:25am, IVC (Zoom Link on Canvas)

Note that CS4960 and CS6550 have been cross-listed on Canvas as CS 6550-001.

Instructor

Qingyao Ai

Office Hours: Monday 9:30am-10:30am, Zoom Link on Canvas

TA

Tao Yang

Office Hours: Wednesday 4:00pm-4:40pm, Zoom Link on Canvas

Prerequisites

Text Books (optional):

Resources

Grading

The grade will count the assessments using the following proportions (tentative and subject to change):

Tentative Class Schedule

Week, Date Topic Reminder Slides Readings
Week 01, 08/24 - 08/28 ▪ Course Overview
Introduce information retrieval and the course organization.
▪ The life of a query
Modern search engine architecture overview and how information is retrieved for a given search query.
     
Week 02, 08/31 - 09/04 ▪ Evaluation
Introduce the concept of ranking and how to evaluate IR systems with ranking metrics.
▪ Crawls and feeds
How to collect and create information retrieval collections.
    NDCG: Järvelin and Kekäläinen (TOIS 2002)
ERR Chapelle et al. (CIKM 2009)
BigTable Chang et al. (OSDI 2006)
Week 03, 09/07 - 09/11 ▪ Labor Day
▪ Processing text
The basic techniques for text processing such as stemming, stop words removal, etc.
    Zipf’s Law and Heap’s Law Sano et al. (2012)
Week 04, 09/14 - 09/18 ▪ Processing Web
Link analysis and spam detection.
▪ Indexing
Introduce basic indexing techniques (e.g., inverted index) for efficient information retrieval.
    Topic-sensitive PageRank Haveliwala (WWW 2002)
BitFunnel Goodwin et al. (SIGIR 2017)
Week 05, 09/21 - 09/25 ▪ Compression
Data compression and index compression algorithms.
▪ Queries and Interfaces
Interface design and query process techniques including suggestions, reformulation, etc.
Assignment 1 Due (09/27)   Local vs. Global Analysis Xu and Croft (SIGIR 1996)
Week 06, 09/28 - 10/02 ▪ Retrieval Model (1)
Vector space models.
▪ Retrieval Model (2)
Classic probabilistic models.
Lab 1 Due (10/04)   2-Possion Model Robertson (SIGIR 1994)
Week 07, 10/05 - 10/09 ▪ Retrieval Model (3)
Language modeling approaches and smoothing.
▪ Retrieval Model (4)
Enhanced language modeling approaches.
    Query Likelihood model Ponte and Croft (SIGIR 1998)
KL-divergence model Lafferty and Zhai (SIGIR 2001)
Language smoothing Lafferty and Zhai (SIGIR 2001)
Dependence models Metzler and Croft (SIGIR 2005)
Huston and Croft (CIKM 2014)
Week 08, 10/12 - 10/16 ▪ Relevance Feedback
The concept and basic techniques of relevance feedback and pseudo relevance feedback.
▪ Search Result Diversification
The motivation of result diversification and classic models.
    Relevance Model Lavrenko and Croft (SIGIR 2001)
Model-based feedback model Zhai and Lafferty (CIKM 2001)
Maximal Marginal Relevance Carbonell and Goldstein (SIGIR 1998)
Diversity evaluation larke et al. (SIGIR 2008)
Week 09, 10/19 - 10/23 ▪ Learning to Rank (1)
The motivation and basic concepts of learning to rank, including query-document pairs, feature vectors, etc.
▪ Learning to Rank (2)
The optimization paradigms for learning-to-rank models, e.g. pointwise, pairwise, and listwise methods.
Assignment 2 Due (10/25)   LTR textbook Li.
RankNet Burges et al. (ICML 2008)
ListMLE Xia et al. (ICML 2008)
ListMLE Xia et al. (ICML 2008)
LambdaMART Burges.
DLCM Ai et al. (SIGIR 2018)
Week 10, 10/26 - 10/30 ▪ Class Project Proposal Presentation
Proposal presentation and mutual evaluation
▪ Clustering and Search
The concepts and algorithms for text clustering and their applications in search.
Project Proposal Due (10/27)
Lab 2 Due (11/01)
  Cluster-based LM Liu and Croft (SIGIR 2004)
Week 11, 11/02 - 11/06 ▪ Beyond bag-of-words (1)
Latent space models and distributed representations.
▪ Beyond bag-of-words (2)
Neural Information Retrieval.
    LSI Deerwester et al. (JASIS 1990)
LDA-LM Wei and Croft (SIGIR 2006)
Neural Ranking Models Guo et al. (IPM 2019)
DSSM Huang et al. (CIKM 2013)
DRMM Guo et al. (CIKM 2016)
K-NRM Xiong et al. (SIGIR 2017)
Week 12, 11/09 - 11/13 ▪ Recommendation systems (1)
The basic concepts and naive collaborative filetering algorithms.
▪ Recommendation systems (2)
Matrix factorization and latent representation learning techniques.
Assignment 3 Due (11/15)   NCF He et al. (WWW 2017)
JRL Zhang et al. (CIKM 2017)
Week 13, 11/16 - 11/20 ▪ Adv. User Study and Crowdsourcing
The study and applications of user modeling and crowdsourcing in information retrieval.
▪ Adv. Click Models and Unbiased Learning
The idea of biased user behavior and the algorithms for unbiased optimization.
Lab 3 Due (11/22)   User experience Crescenzi et al. (CHIIR 2016)
Crowdsourcing Kittur et al. (CHI 2008)
Cascade Model Craswell et al. (WSDM 2008)
DBN Chapelle and Zhang (WWW 2009)
UBM Dupret and Piwowarski (SIGIR 2008)
IPW Joachims et al. (WSDM 2017)
Regression EM Wang et al. (WSDM 2018)
DLA Ai et al. (SIGIR 2018)
Week 14, 11/23 - 11/27 ▪ Adv. Personal Search and Product Search
Introducing IR techniques for domain-specific applications such as email search and e-commerce search.
▪ Adv. Question Answering
Introduction on classic and state-of-the-art methods for question answering in open domains.
    Stuff I’ve seen Dumais et al. (SIGIR 2003)
Email Re-finding Ai et al. (WWW 2017)
HEM Ai et al. (SIGIR 2017)
E-commerce Search Taxonomy Sondhi et al. (SIGIR 2018)
QA Performance Moldovan et al. (TOIS 2003)
Non-factoid QA with LTR Liu et al. (ECIR 2016)
Risk of QA Su et al. (SIGIR 2019)
Week 15, 11/30 - 12/04 ▪ Class Project Final Presentation (1)
The presentation and evaluation of course projects (first half).
▪ Class Project Final Presentation (2)
The presentation and evaluation of course projects (second half).
Project and Paper Presentation Slides Due (12/06)    
Week 16, 12/07 - 12/11 ▪ Class Ended Project Final Report Due (12/11)    

Acknowledgements

Special thanks to Dr. Hamed Zamani from Microsoft Research, Prof. Jiepu Jiang from Virginia Tech University, Prof. Yongfeng Zhang from Rutgers University, and Prof. James Allan from University of Massachusetts Amherst. Some teaching materials are borrowed from their courses on CS656: Information Retrieval and CS691: Recommender System.