|
In HIT (哈尔滨工业大学)
Jan 15, 2008 Tuesday ---- Tutorial |
TIME |
ACTIVITIES |
VENUE |
09:00-12:00 |
Tutorial: Learning to Rank for Information Retrieval by Dr. Tie-Yan Liu |
The Second Floor of ShaoGuan in HIT
哈尔滨工业大学邵馆二楼报告厅 |
13:30-16:30 |
Tutorial: Speech, Dialog and Ubiquitous Information Access by Prof. Gary Geunbae Lee |
The Second Floor of ShaoGuan in HIT
哈尔滨工业大学邵馆二楼报告厅 |
In Sinoway Hotel (华融饭店)
Hai Tang (海棠厅) in 6th floor of Sinoway Hotel
Ke Xue (珂雪厅) in 6th floor of Sinoway Hotel
Shi Jiin (世纪展厅) in 6th floor of Sinoway Hotel
“Tian Yi” Coffee House(天怡咖啡厅) in 2nd floor of Sinoway Hotel
Ballroom “Qian Xi”(千禧宴会厅) in 5th floor of Sinoway Hotel
Jan 16, 2008 Wednesday ---- Main Conference |
TIME |
ACTIVITIES |
VENUE |
08:30-09:00 |
Opening |
Hai Tang |
09:00-10:00 |
Plenary Session: Invited Talk |
10:00-10:30 |
Break |
Shi Ji |
10:30-11:00 |
IR Models: Improving Expertise Recommender Systems by Odds Ratio -- Zhao Ru, Jun Guo, Wei-Ran Xu |
Hai Tang |
Image Retrieval: Semantic Discriminative Projections for Image Retrieval – He-Ping Song, Qun-Sheng Yang, Yin-Wei Zhan |
Ke Xue |
11:00-11:30 |
IR Models: Exploring the Stability of IDF Term Weighting -- Xin Fu, Miao Chen |
Hai Tang |
Image Retrieval: Comparing Dissimilarity Measures for Content-Based Image Retrieval – Hai-Ming Liu, Da-Wei Song, Stefan Rueger, Rui Hu, Victoria Uren |
Ke Xue |
11:30-12:00 |
IR Models: Completely-Arbitrary Passage Retrieval in Language Modeling Approach -- Seung-Hoon Na, In-Su Kang, Jong-Hyeok Lee |
Hai Tang |
Image Retrieval: A Semantic Content-Based Retrieval Method for Histopathology Images -- Juan Caicedo, Fabio Gonzalez, Eduardo Romero |
Ke Xue |
12:00-13:30 |
Lunch |
“Tian Yi” Coffee House |
13:30-14:00 |
Text Classification: Integrating Background Knowledge into RBF Networks for Text Classification -- Eric Jiang |
Hai Tang |
Chinese Language Processing: Fusion of Multiple Features for Chinese Named Entity Recognition based on CRF Model – Yue-Jie Zhang, Zhi-Ting Xu, Tao Zhang |
Ke Xue |
14:00-14:30 |
Text Classification: An Extended Document Frequency Metric for Feature Selection in Text Categorization -- Yan Xu, Bin Wang |
Hai Tang |
Chinese Language Processing: Semi-joint Labeling for Chinese Named Entity Recognition -- Chia-Wei Wu, Wen-Lian Hsu, Richard Tzong-Han Tsai |
Ke Xue |
14:30-15:00 |
Text Classification: Smoothing LDA Model for Text Categorization -- Wen-Bo Li, Le Sun, Yuan-Yong Feng, Da-Kun Zhang |
Hai Tang |
Chinese Language Processing: On the Construction of a Large Scale Chinese Web Test Collection – Hong-Fei Yan, Chong Chen, Bo Peng, Xiao-Ming Li |
Ke Xue |
15:00-15:30 |
Break |
Shi Ji |
15:30-16:00 |
Text Processing: Topic Tracking Based on Keywords Dependency Profile -- Wei Zheng, Yu Zhang, Yu Hong, Ji-Li Fan, Ting Liu |
Hai Tang |
Applications of IR: Discrimination of Ventricular Arrhythmias Using NEWFM – Zhen-Xing Zhang, Sang-Hong Lee, Joon S. Lim |
Ke Xue |
16:00-16:30 |
Text Processing: A Dynamic Programming Model for Text Segmentation Based on Min-Max Similarity -- Na Ye, Jing-Bo Zhu, Yan Zheng, Matthew Ma, Bin Zhang |
Hai Tang |
Text Processing: Pronoun Resolution with Markov Logic Networks -- Ki Chan, Wai Lam |
Ke Xue |
16:30-17:30 |
Free Action |
|
17:30-18:30 |
Dinner |
Shi Ji |
18:30-19:10 |
Transfer to Zhao Lin Park(The Ice Lantern Festival Exposition)
(Sightseeing will be organized as group tour. The participants can decide whether to join the group or not. The tour costs and tickets should be paid by you, and the payment should be collected on-site.) |
In the bus |
19:10-20:40 |
Visit the Ice Lantern Festival Exposition |
Zhao Lin Park |
20:40-21:20 |
Back to the hotel |
In the bus |
Jan 17, 2008 Thursday ---- Main Conference |
TIME |
ACTIVITIES |
VENUE |
09:00-10:00 |
Plenary Session: Invited Talk |
Ke Xue |
10:00-10:30 |
Break |
Shi Ji |
10:30-11:00 |
Machine Learning: Efficient Feature Selection in the Presence of Outliers and Noises -- Shuang-Hong Yang, Bao-Gang Hu |
Hai Tang |
Taxonomy: Combining WordNet and ConceptNet for Automatic Query Expansion: A Learning Approach -- Ming-Hung Hsu, Hsin-Hsi Chen, Ming-Feng Tsai |
Ke Xue |
11:00-11:30 |
Machine Learning: Domain Adaptation for Conditional Random Fields -- Qi Zhang, Xi-Peng Qiu, Xuan-Jing Huang, Li-De Wu |
Hai Tang |
Taxonomy: Improving Hierarchical Taxonomy Integration with Semantic Feature Expansion on Category-Specific Terms -- Cheng-Zen Yang, Ing-Xiang Chen, Cheng-Tse Hung, Ping-Jung Wu |
Ke Xue |
11:30-12:00 |
Machine Learning: Graph Mutual Reinforcement based Bootstrapping -- Qi Zhang, Ya-Qian Zhou, Xuan-Jing Huang, Li-De Wu |
Hai Tang |
Taxonomy: HOM: An Approach to Calculating Semantic Similarity Utilizing Relations between Ontologies – Zhi-Zhong Liu, Huai-Min Wang, Bin Zhou, Hong-Bin Huang |
Ke Xue |
12:00-13:30 |
Lunch |
“Tian Yi” Coffee House |
13:30-14:00 |
IR Methods: Semi-Supervised Graph-Ranking for Text Retrieval -- Mao-Qiang Xie, Jin-Li Liu, Nan Zheng, Dong Li, Ya-Lou Huang, Yang Wang |
Hai Tang |
Information Extraction: Gram-Free Synonym Extraction via Suffix Arrays -- Minoru Yoshida |
Ke Xue |
14:00-14:30 |
IR Methods: Learnable Focused Crawling Based on Ontology – Hai-Tao Zheng, Bo-Yeong Kang, Hong-Gee Kim |
Hai Tang |
Information Extraction: Synonyms Extraction Using Web Content Focused Crawling -- Chien-Hsing Chen, Chung-Chian Hsu |
Ke Xue |
14:30-15:00 |
Information Extraction: Blog Post and Comment Extraction Using Information Quantity of Web Format – Dong-Lin Cao, Xiang-Wen Liao, Hong-Bo Xu, Shuo Bai |
Ke Xue |
15:00-15:30 |
Break |
Shi Ji |
15:30-17:30 |
Poster and Demo Session |
18:30-20:30 |
Banquet |
Ballroom “Qian Xi” |
Jan 18, 2008 Friday ---- Main Conference |
TIME |
ACTIVITIES |
VENUE |
09:00-09:30 |
Summarization: A Lexical Chain Approach for Update-style Query-focused Multi-document Summarization -- Jing Li, Le Sun |
Hai Tang |
Multimedia: An Ontology and SWRL Based 3D Model Retrieval System – Xin-Ying Wang, Sheng-Sheng Wang |
Ke Xue |
09:30-10:00 |
Summarization: GSPSummary: A Graph-based Sub-topic Partition Algorithm for Summarization -- Jin Zhang |
Hai Tang |
Multimedia: Multi-Scale TextTiling for Automatic Story Segmentation in Chinese Broadcast News -- Lei Xie, Jia Zeng, Wei Feng |
Ke Xue |
10:00-10:30 |
Break |
Shi Ji |
10:30-11:00 |
Web IR: Improving Spamdexing Detection via a Two-Stage Classification Strategy – Guang-Gang Geng, Chun-Heng Wang, Qiu-Dan Li |
Hai Tang |
Text Clustering: A Novel Reliable Negative Method Based on Clustering for Learning from Positive and Unlabeled Examples – Bang-Zuo Zhang, Wan-Li Zuo |
Ke Xue |
11:00-11:30 |
Web IR: Clustering Deep Web Databases Semantically -- Ling Song, Jun Ma, Po Yan, Li Lian |
Hai Tang |
Text Clustering: Term Weighting Evaluation in Bipartite Partitioning for Text Clustering -- Chao Qu, Yong Li, Jia-Li Hou, Jun Zhu |
Ke Xue |
11:30-12:00 |
Web IR: PostingRank: Bringing Order to Web Forum Postings -- Zhi Chen, Li Zhang, Wei-Hua Wang |
Hai Tang |
Text Clustering: A Refinement Framework for Cross Language Text Categorization -- Ke Wu, Bao-Liang Lu |
Ke Xue |
12:00-12:10 |
Closing |
Hai Tang |
Invited Speech: Indexing and retrieving structured and semi-structured information
- A key problem in the context of the Web
Speaker: Prof. Yves Chiaramella
Abstract:
During the last two decades the domain of Information Retrieval has gained a fast-growing strategic importance, due in particular to the Internet revolution. However, standard Information Retrieval models consider documents as atomic units of information, which means that any document, whatever its size and structure, is indexed and retrieved as a whole. This is no longer adapted to the modern evolution of document design, storage and access one hand, which have since a long time proposed more advanced representations and interpretations of document content offered by modern standards such as SGML, first, then HTML and XML. On the other hand the Internet revolution and its billions of Web pages, triggers new requirements about information retrieval: looking for needles in an enormous, fast growing "haystack of information", modern users need precision more than recall. Moreover they need focused answers or, said otherwise, when considering system responses they tend to prefer specificity over exhaustively. In this context, retrieving structured documents refers to index and retrieve electronic documents according to some explicit structure of these documents. An example of such explicit structure is given by the hierarchical structure of sections, chapters, paragraphs etc., called the logical structure of textual documents (and of semi-structured web pages). Another important example is the graph-like structure of hypermedia documents such as web sites which, by nature, are highly structured. Complex, structured, documents may be no longer considered as atomic entities, but as aggregates of interrelated objects that can be retrieved separately according to users’ needs. Corresponding retrieval techniques would allow retrieval systems to focus on the smallest components of documents, instead of giving much larger and complex embedding documents as answers, hence providing a significant help to the end user. Considering this notion of specificity of retrieved information implies dealing with the implicit relationship existing between document content and document structure. In this lecture we shall present and discuss some approaches dealing with this problem.
Bio:
Yves Chiaramella is Professor in the department of Computer Science and Applied Mathematics Department of Université Joseph Fourier, Grenoble France. In 1983 he founded the first French academic research group entirely dedicated to Information Retrieval (IR), and he has been since then involved in a number of national research projects and EEC-funded international collaborations. His major research topic is about multimedia information retrieval. In this latter domain, his research is mainly focused on the design and experimentation of logic-based models for multimedia IR, with a more specific interest for image retrieval and structured-document retrieval. While doing his own research Pr. Yves Chiaramella has been also very active in promoting the domain of IR in France and Europe. In 1985 he organized the first RIAO conference, and in 1988 he was chairman of the annual ACM-SIGIR international conference which was also held in Grenoble. Since twelve years he has also undertaken in a number of academic responsibilities including the direction of two computer science laboratories, (one of which he founded: the CLIPS-IMAG laboratory, a laboratory dedicated to Man-Machine Communication). Until this year he was director of the IMAG Institute, a federation of eight academic laboratories in Computer Science and Applied Mathematics in Grenoble.
Invited Speech: Entropy of Search Logs: How Hard is Search?
Speaker: Dr. Kenneth Church
Abstract:
How many pages are there on the Web? 5B? 20B? More? Less? Big bets on clusters in the clouds could be wiped out if a small cache of a few million urls could capture much of the value. Language modeling techniques are applied to Live search logs to estimate entropy. The perplexity is surprisingly small: millions, not billions. Entropy is a powerful tool for sizing challenges and opportunities. How hard is search? How hard are query suggestion mechanisms like auto-complete and spelling correction? How much does personalization help? Does backoff help? Can we personalize based on other users like you? Joint Work with Qiaozhu Mei (http://sifaka.cs.uiuc.edu/~qmei2/). See our paper in WSDM 2008 for technical details. I’ll use this as a foil for a more high-level pitch for the use of entropy and language modeling techniques for search apps.
Bio:
Kenneth Church is currently working on search problems at Microsoft Research, in Redmond, WA, USA. Ken likes to work with lots of data. Before joining Microsoft in 2003, he was the head of a data mining department in AT&T Labs-Research (formally AT&T Bell Labs, Murray Hill, NJ). Before mining web logs and telephone call detail feeds, Ken was using large datasets (corpora) in computational linguistics, an approach that has since become standard. Education: MIT (undergraduate and graduate). Awards: AT&T Fellow (2001).
Tutorial: Learning to Rank for Information Retrieval
Speaker: Dr. Tie-Yan Liu (MSRA, China)
Abstract:
Ranking is a central problem in information retrieval (IR). Many different ranking models have been proposed in the literature of IR, such as Boolean model, vector space model, Okapi BM25 model, and language model. In recent years, machine learning technologies have been gradually used to create ranking models, in a supervised fashion, from training examples. Empirical studies have shown that learning to rank methods can consider more factors of ranking, leverage human judgments, and thus can outperform conventional IR models. In this tutorial, we will first give a brief review of conventional IR models, and then introduce three types of learning to rank approaches. 1) Pointwise approaches, which solve the problem of ranking by means of regression or classification on single document. 2) Pairwise approaches, which transform the problem of ranking to that of classification on document pairs. Examples include Ranking SVM, RankBoost, and RankNet. For this kind of approaches, we will also introduce some recent works which incorporate the characteristics of IR into the learning process, such as Ranking SVM for IR and multiple hyperplane ranker (MHR). 3) Listwise approaches, which solve the problem of ranking at the level of ranked list. The listwise approaches can be further divided into two categories. One is to optimize IR evaluation measures directly. Typical algorithms include AdaRank, SVM-MAP, SoftRank and RankGP. The other is to define a listwise loss function. Example algorithms include RankCosine and ListNet. In particular, the ListNet algorithm defines the listwise loss function as the K-L divergence between permutation probability distribution of the ground truth and that of the ranking model. After discussing these three types of approaches, we will also introduce LETOR, a benchmark dataset for learning to rank, and then propose some future research directions regarding learning to rank for IR.
Bio:
Tie-Yan Liu is a lead researcher at Microsoft Research Asia. His current research interests include learning to rank for information retrieval, infrastructure and algorithms for large-scale graph mining. So far, Dr. Liu has 60 quality papers published in referred international conferences and journals, including SIGIR(6), WWW(2), KDD(2), ICML, IEEE TKDE, etc. He has over 30 filed US / international patents or pending applications. He is the winner of the Most Cited Paper Award for the Journal of Visual Communication and Image Representation (2004~2006). He has served on the program committees for more than 20 international conferences, such as WWW, SIGIR, ICDM, and ICIP. He is the co-chair of the SIGIR 2007 workshop on learning to rank for information retrieval (LR4IR 2007), and a Senior Program Committee member of SIGIR 2008. Prior to joining Microsoft, Dr. Liu obtained his Ph.D. from Tsinghua University in 2003, where his research efforts were devoted to video content analysis. During his studies at Tsinghua University, he also worked as research assistant for the City University of Hong Kong and the Hong Kong Polytechnic University. He has been a member of IEEE since 1999.
Tutorial: Speech, Dialog and Ubiquitous Information Access
Speaker: Prof. Gary Geunbae Lee (POSTECH, Korea)
Abstracts:
Information retrieval nowadays extends its horizon from plain text to multimedia data. Also, the retrieval interfaces are diversified including speech and natural language dialog interfaces. The trends make it possible for ubiquitous information access for multilingual, multimedia and multimodal information. This tutorial discusses how speech and dialog technology has been developed for ubiquitous information access. First, I will discuss basic technology for speech recognition and language processing. Second, the spoken dialog technology for information access will be presented including automatic knowledge acquisition and expansion. As time permits, I will also discuss a little bit on the dialog translation and multimodal information access. Finally, I will cover the spoken document retrieval techniques as a major role for video and UCC data retrieval. I will present many of the research results from my group in POSTECH including several demos for this tutorial.
Bio:
Gary Geunbae Lee has been a professor at CSE department, POSTECH in Korea since 1991. He is a director of Intelligent Software (ISoft) Laboratory which focuses on human language technology researches including natural language processing, speech recognition/synthesis, speech translation, question answering and web/text mining. Professor Lee authored more than 100 papers in international journals and conferences, and has served as a technical committee member and reviewer for several international conferences such as ACL, COLING, IJCAI, ACM SIGIR, AIRS, ACM IUI, Interspeech-ICSLP/EUROSPEECH, EMNLP and IJCNLP. He is currently leading several national and industry projects for robust spoken dialog systems, spoken dialog translation and expressive TTS. Professor Lee holds a Ph.D. in computer science from UCLA, and BS/MS in computer engineering from Seoul National University.
|