Almost anyone who has deployed machine learning systems in the real world has encountered the task of domain adaptation: We build our models from some fixed source domain, but we wish to deploy them across one or more different target domains. For example, large-scale speech recognition systems need to work well across arbitrary speech, regardless of background noise or accents. Text processing systems trained on news often need to be applied to blogs or forum posts. Gene finders are trained on a particular organism, but often we wish to identify the genes of another organism or even group of organisms. Face recognition systems might be trained under certain pose, lighting, and occlusion settings, but applied to arbitrary sets of pose, lighting, and occlusion. The purpose of this tutorial is to introduce participants to the problem of domain adaptation, the variety of forms it takes, the techniques that have been used to solve it, and our current understanding of when these techniques can and cannot work. We hope that our tutorial leads to new and interesting work on the open questions of domain adaptation.
Slides from the tutorial [pptx] [pdf]
John Blitzer is a postdoctoral fellow at the University of California, Berkeley. He co-taught, with Jerry Zhu, an ACL 2008 tutorial on semi-supervised learning for natural language processing, and a summer course on supervised and semi-supervised learning for natural language processing at Harbin Institute of Technology, Harbin, China.
Hal Daume is an assistant professor of computer science at the University of Utah. He has taught machine learning, artificial intelligence and natural language processing courses, and has also given a tutorial on Bayesian natural language processing at HLT/NAACL 2006.