Last Update: Dec. 05, 2007
Home
Call for Papers
Important Dates
Program Committee
Organization
Submission
Registration
- Registration category and fees
- Registration Procedure
- Payment Method
- Online Register Form
Travel Information
- Hotel Reservation
- Local Tour
- Ticket Booking
F.A.Q
 
Past Symposium
Sponsorship
 
Organized by:
HIT
CIPSC
Email ID:
Password:
AIRS2008_info@racemind.net

In HIT (哈尔滨工业大学)

Jan 15, 2008  Tuesday ---- Tutorial

TIME

ACTIVITIES

VENUE

09:00-12:00

Tutorial: Learning to Rank for Information Retrieval by Dr. Tie-Yan Liu

The Second Floor of ShaoGuan in HIT
哈尔滨工业大学邵馆二楼报告厅

13:30-16:30

Tutorial: Speech, Dialog and Ubiquitous Information Access by Prof. Gary Geunbae Lee

The Second Floor of ShaoGuan in HIT
哈尔滨工业大学邵馆二楼报告厅

In Sinoway Hotel (华融饭店)

Hai Tang (海棠厅) in 6th floor of Sinoway Hotel
Ke Xue (珂雪厅) in 6th floor of Sinoway Hotel
Shi Jiin (世纪展厅) in 6th floor of Sinoway Hotel
“Tian Yi” Coffee House(天怡咖啡厅) in 2nd floor of Sinoway Hotel
Ballroom “Qian Xi”(千禧宴会厅) in 5th floor of Sinoway Hotel

Jan 16, 2008  Wednesday ---- Main Conference

TIME

ACTIVITIES

VENUE

08:30-09:00

Opening

Hai Tang

09:00-10:00

Plenary Session: Invited Talk

10:00-10:30

Break

Shi Ji

10:30-11:00

IR Models: Improving Expertise Recommender Systems by Odds Ratio -- Zhao Ru, Jun Guo, Wei-Ran Xu

Hai Tang

Image Retrieval: Semantic Discriminative Projections for Image Retrieval – He-Ping Song, Qun-Sheng Yang, Yin-Wei Zhan

Ke Xue

11:00-11:30

IR Models: Exploring the Stability of IDF Term Weighting -- Xin Fu, Miao Chen

Hai Tang

Image Retrieval: Comparing Dissimilarity Measures for Content-Based Image Retrieval – Hai-Ming Liu, Da-Wei Song, Stefan Rueger, Rui Hu, Victoria Uren

Ke Xue

11:30-12:00

IR Models: Completely-Arbitrary Passage Retrieval in Language Modeling Approach -- Seung-Hoon Na, In-Su Kang, Jong-Hyeok Lee

Hai Tang

Image Retrieval: A Semantic Content-Based Retrieval Method for Histopathology Images -- Juan Caicedo, Fabio Gonzalez, Eduardo Romero

Ke Xue

12:00-13:30

Lunch

“Tian Yi” Coffee House

13:30-14:00

Text Classification: Integrating Background Knowledge into RBF Networks for Text Classification -- Eric Jiang

Hai Tang

Chinese Language Processing: Fusion of Multiple Features for Chinese Named Entity Recognition based on CRF Model – Yue-Jie Zhang, Zhi-Ting Xu, Tao Zhang

Ke Xue

14:00-14:30

Text Classification: An Extended Document Frequency Metric for Feature Selection in Text Categorization -- Yan Xu, Bin Wang

Hai Tang

Chinese Language Processing: Semi-joint Labeling for Chinese Named Entity Recognition -- Chia-Wei Wu, Wen-Lian Hsu, Richard Tzong-Han Tsai

Ke Xue

14:30-15:00

Text Classification: Smoothing LDA Model for Text Categorization -- Wen-Bo Li, Le Sun, Yuan-Yong Feng, Da-Kun Zhang

Hai Tang

Chinese Language Processing: On the Construction of a Large Scale Chinese Web Test Collection – Hong-Fei Yan, Chong Chen, Bo Peng, Xiao-Ming Li

Ke Xue

15:00-15:30

Break

Shi Ji

15:30-16:00

Text Processing: Topic Tracking Based on Keywords Dependency Profile -- Wei Zheng, Yu Zhang, Yu Hong, Ji-Li Fan, Ting Liu

Hai Tang

Applications of IR: Discrimination of Ventricular Arrhythmias Using NEWFM – Zhen-Xing Zhang, Sang-Hong Lee, Joon S. Lim

Ke Xue

16:00-16:30

Text Processing: A Dynamic Programming Model for Text Segmentation Based on Min-Max Similarity -- Na Ye, Jing-Bo Zhu, Yan Zheng, Matthew Ma, Bin Zhang

Hai Tang

Text Processing: Pronoun Resolution with Markov Logic Networks -- Ki Chan, Wai Lam

Ke Xue

16:30-17:30

Free Action

 

17:30-18:30

Dinner

Shi Ji

18:30-19:10

Transfer to Zhao Lin Park(The Ice Lantern Festival Exposition)
(Sightseeing will be organized as group tour. The participants can decide whether to join the group or not. The tour costs and tickets should be paid by you, and the payment should be collected on-site.)

In the bus

19:10-20:40

Visit the Ice Lantern Festival Exposition

Zhao Lin Park

20:40-21:20

Back to the hotel

In the bus


Jan 17, 2008  Thursday ---- Main Conference

TIME

ACTIVITIES

VENUE

09:00-10:00

Plenary Session: Invited Talk

Ke Xue

10:00-10:30

Break

Shi Ji

10:30-11:00

Machine Learning: Efficient Feature Selection in the Presence of Outliers and Noises -- Shuang-Hong Yang, Bao-Gang Hu

Hai Tang

Taxonomy: Combining WordNet and ConceptNet for Automatic Query Expansion: A Learning Approach -- Ming-Hung Hsu, Hsin-Hsi Chen, Ming-Feng Tsai

Ke Xue

11:00-11:30

Machine Learning: Domain Adaptation for Conditional Random Fields -- Qi Zhang, Xi-Peng Qiu, Xuan-Jing Huang, Li-De Wu

Hai Tang

Taxonomy: Improving Hierarchical Taxonomy Integration with Semantic Feature Expansion on Category-Specific Terms -- Cheng-Zen Yang, Ing-Xiang Chen, Cheng-Tse Hung, Ping-Jung Wu

Ke Xue

11:30-12:00

Machine Learning: Graph Mutual Reinforcement based Bootstrapping -- Qi Zhang, Ya-Qian Zhou, Xuan-Jing Huang, Li-De Wu

Hai Tang

Taxonomy: HOM: An Approach to Calculating Semantic Similarity Utilizing Relations between Ontologies – Zhi-Zhong Liu, Huai-Min Wang, Bin Zhou, Hong-Bin Huang

Ke Xue

12:00-13:30

Lunch

“Tian Yi” Coffee House

13:30-14:00

IR Methods: Semi-Supervised Graph-Ranking for Text Retrieval --  Mao-Qiang Xie, Jin-Li Liu, Nan Zheng, Dong Li, Ya-Lou Huang, Yang Wang

Hai Tang

Information Extraction: Gram-Free Synonym Extraction via Suffix Arrays -- Minoru Yoshida

Ke Xue

14:00-14:30

IR Methods: Learnable Focused Crawling Based on Ontology – Hai-Tao Zheng, Bo-Yeong Kang, Hong-Gee Kim

Hai Tang

Information Extraction: Synonyms Extraction Using Web Content Focused Crawling -- Chien-Hsing Chen, Chung-Chian Hsu

Ke Xue

14:30-15:00

Information Extraction: Blog Post and Comment Extraction Using Information Quantity of Web Format – Dong-Lin Cao, Xiang-Wen Liao, Hong-Bo Xu, Shuo Bai

Ke Xue

15:00-15:30

Break

Shi Ji

15:30-17:30

Poster and Demo Session

18:30-20:30

Banquet

Ballroom “Qian Xi”


Jan 18, 2008  Friday ---- Main Conference

TIME

ACTIVITIES

VENUE

09:00-09:30

Summarization: A Lexical Chain Approach for Update-style Query-focused Multi-document Summarization -- Jing Li, Le Sun

Hai Tang

Multimedia: An Ontology and SWRL Based 3D Model Retrieval System – Xin-Ying Wang, Sheng-Sheng Wang

Ke Xue

09:30-10:00

Summarization: GSPSummary: A Graph-based Sub-topic Partition Algorithm for Summarization -- Jin Zhang

Hai Tang

Multimedia: Multi-Scale TextTiling for Automatic Story Segmentation in Chinese Broadcast News -- Lei Xie, Jia Zeng, Wei Feng

Ke Xue

10:00-10:30

Break

Shi Ji

10:30-11:00

Web IR: Improving Spamdexing Detection via a Two-Stage Classification Strategy – Guang-Gang Geng, Chun-Heng Wang, Qiu-Dan Li

Hai Tang

Text Clustering: A Novel Reliable Negative Method Based on Clustering for Learning from Positive and Unlabeled Examples – Bang-Zuo Zhang, Wan-Li Zuo

Ke Xue

11:00-11:30

Web IR: Clustering Deep Web Databases Semantically -- Ling Song, Jun Ma, Po Yan, Li Lian

Hai Tang

Text Clustering: Term Weighting Evaluation in Bipartite Partitioning for Text Clustering -- Chao Qu, Yong Li, Jia-Li Hou, Jun Zhu

Ke Xue

11:30-12:00

Web IR: PostingRank: Bringing Order to Web Forum Postings -- Zhi Chen, Li Zhang, Wei-Hua Wang

Hai Tang

Text Clustering: A Refinement Framework for Cross Language Text Categorization -- Ke Wu, Bao-Liang Lu

Ke Xue

12:00-12:10

Closing

Hai Tang

 

Invited Speech: Indexing and retrieving structured and semi-structured information
- A key problem in the context of the Web
Speaker: Prof. Yves Chiaramella
Abstract:
During the last two decades the domain of Information Retrieval has gained a fast-growing strategic importance, due in particular to the Internet revolution. However, standard Information Retrieval models consider documents as atomic units of information, which means that any document, whatever its size and structure, is indexed and retrieved as a whole. This is no longer adapted to the modern evolution of document design, storage and access one hand, which have since a long time proposed more advanced representations and interpretations of document content offered by modern standards such as SGML, first, then HTML and XML. On the other hand the Internet revolution and its billions of Web pages, triggers new requirements about information retrieval: looking for needles in an enormous, fast growing "haystack of information", modern users need precision more than recall. Moreover they need focused answers or, said otherwise, when considering system responses they tend to prefer specificity over exhaustively. In this context, retrieving structured documents refers to index and retrieve electronic documents according to some explicit structure of these documents. An example of such explicit structure is given by the hierarchical structure of sections, chapters, paragraphs etc., called the logical structure of textual documents (and of semi-structured web pages). Another important example is the graph-like structure of hypermedia documents such as web sites which, by nature, are highly structured. Complex, structured, documents may be no longer considered as atomic entities, but as aggregates of interrelated objects that can be retrieved separately according to users’ needs. Corresponding retrieval techniques would allow retrieval systems to focus on the smallest components of documents, instead of giving much larger and complex embedding documents as answers, hence providing a significant help to the end user. Considering this notion of specificity of retrieved information implies dealing with the implicit relationship existing between document content and document structure. In this lecture we shall present and discuss some approaches dealing with this problem.


Bio:
Yves Chiaramella is Professor in the department of Computer Science and Applied Mathematics Department of Université Joseph Fourier, Grenoble France. In 1983 he founded the first French academic research group entirely dedicated to Information Retrieval (IR), and he has been since then involved in a number of national research projects and EEC-funded international collaborations. His major research topic is about multimedia information retrieval. In this latter domain, his research is mainly focused on the design and experimentation of logic-based models for multimedia IR, with a more specific interest for image retrieval and structured-document retrieval. While doing his own research Pr. Yves Chiaramella has been also very active in promoting the domain of IR in France and Europe. In 1985 he organized the first RIAO conference, and in 1988 he was chairman of the annual ACM-SIGIR international conference which was also held in Grenoble. Since twelve years he has also undertaken in a number of academic responsibilities including the direction of two computer science laboratories, (one of which he founded: the CLIPS-IMAG laboratory, a laboratory dedicated to Man-Machine Communication). Until this year he was director of the IMAG Institute, a federation of eight academic laboratories in Computer Science and Applied Mathematics in Grenoble.

Invited Speech: Entropy of Search Logs: How Hard is Search?
Speaker: Dr. Kenneth Church
Abstract:
How many pages are there on the Web? 5B? 20B? More? Less? Big bets on clusters in the clouds could be wiped out if a small cache of a few million urls could capture much of the value. Language modeling techniques are applied to Live search logs to estimate entropy. The perplexity is surprisingly small: millions, not billions.  Entropy is a powerful tool for sizing challenges and opportunities. How hard is search? How hard are query suggestion mechanisms like auto-complete and spelling correction? How much does personalization help? Does backoff help?  Can we personalize based on other users like you? Joint Work with Qiaozhu Mei (http://sifaka.cs.uiuc.edu/~qmei2/). See our paper in WSDM 2008 for technical details.  I’ll use this as a foil for a more high-level pitch for the use of entropy and language modeling techniques for search apps.


Bio:
Kenneth Church is currently working on search problems at Microsoft Research, in Redmond, WA, USA.  Ken likes to work with lots of data.  Before joining Microsoft in 2003, he was the head of a data mining department in AT&T Labs-Research (formally AT&T Bell Labs, Murray Hill, NJ).  Before mining web logs and telephone call detail feeds, Ken was using large datasets (corpora) in computational linguistics, an approach that has since become standard.  Education: MIT (undergraduate and graduate).  Awards: AT&T Fellow (2001).

Tutorial: Learning to Rank for Information Retrieval
Speaker: Dr. Tie-Yan Liu (MSRA, China)
Abstract:
Ranking is a central problem in information retrieval (IR). Many different ranking models have been proposed in the literature of IR, such as Boolean model, vector space model, Okapi BM25 model, and language model. In recent years, machine learning technologies have been gradually used to create ranking models, in a supervised fashion, from training examples. Empirical studies have shown that learning to rank methods can consider more factors of ranking, leverage human judgments, and thus can outperform conventional IR models. In this tutorial, we will first give a brief review of conventional IR models, and then introduce three types of learning to rank approaches. 1) Pointwise approaches, which solve the problem of ranking by means of regression or classification on single document. 2) Pairwise approaches, which transform the problem of ranking to that of classification on document pairs. Examples include Ranking SVM, RankBoost, and RankNet. For this kind of approaches, we will also introduce some recent works which incorporate the characteristics of IR into the learning process, such as Ranking SVM for IR and multiple hyperplane ranker (MHR). 3) Listwise approaches, which solve the problem of ranking at the level of ranked list. The listwise approaches can be further divided into two categories. One is to optimize IR evaluation measures directly. Typical algorithms include AdaRank, SVM-MAP, SoftRank and RankGP. The other is to define a listwise loss function. Example algorithms include RankCosine and ListNet. In particular, the ListNet algorithm defines the listwise loss function as the K-L divergence between permutation probability distribution of the ground truth and that of the ranking model. After discussing these three types of approaches, we will also introduce LETOR, a benchmark dataset for learning to rank, and then propose some future research directions regarding learning to rank for IR.


Bio:
Tie-Yan Liu is a lead researcher at Microsoft Research Asia. His current research interests include learning to rank for information retrieval, infrastructure and algorithms for large-scale graph mining. So far, Dr. Liu has 60 quality papers published in referred international conferences and journals, including SIGIR(6), WWW(2), KDD(2), ICML, IEEE TKDE, etc. He has over 30 filed US / international patents or pending applications. He is the winner of the Most Cited Paper Award for the Journal of Visual Communication and Image Representation (2004~2006). He has served on the program committees for more than 20 international conferences, such as WWW, SIGIR, ICDM, and ICIP. He is the co-chair of the SIGIR 2007 workshop on learning to rank for information retrieval (LR4IR 2007), and a Senior Program Committee member of SIGIR 2008. Prior to joining Microsoft, Dr. Liu obtained his Ph.D. from Tsinghua University in 2003, where his research efforts were devoted to video content analysis. During his studies at Tsinghua University, he also worked as  research assistant for the City University of Hong Kong and the Hong Kong Polytechnic University. He has been a member of IEEE since 1999.

Tutorial: Speech, Dialog and Ubiquitous Information Access
Speaker: Prof. Gary Geunbae Lee (POSTECH, Korea)
Abstracts:
Information retrieval nowadays extends its horizon from plain text to multimedia data. Also, the retrieval interfaces are diversified including speech and natural language dialog interfaces. The trends make it possible for ubiquitous information access for multilingual, multimedia and multimodal information. This tutorial discusses how speech and dialog technology has been developed for ubiquitous information access. First, I will discuss basic technology for speech recognition and language processing. Second, the spoken dialog technology for information access will be presented including automatic knowledge acquisition and expansion. As time permits, I will also discuss a little bit on the dialog translation and multimodal information access. Finally, I will cover the spoken document retrieval techniques as a major role for video and UCC data retrieval. I will present many of the research results from my group in POSTECH including several demos for this tutorial.


Bio:
Gary Geunbae Lee has been a professor at CSE department, POSTECH in Korea since 1991. He is a director of Intelligent Software (ISoft) Laboratory which focuses on human language technology researches including natural language processing, speech recognition/synthesis, speech translation, question answering and web/text mining. Professor Lee authored more than 100 papers in international journals and conferences, and has served as a technical committee member and reviewer for several international conferences such as ACL, COLING, IJCAI, ACM SIGIR, AIRS, ACM IUI, Interspeech-ICSLP/EUROSPEECH, EMNLP and IJCNLP. He is currently leading several national and industry projects for robust spoken dialog systems, spoken dialog translation and expressive TTS. Professor Lee holds a Ph.D. in computer science from UCLA, and BS/MS in computer engineering from Seoul National University.

 

Harbin Institute of Technology
No.92, West Da-Zhi Street, Harbin, Heilongjiang, China,150001