文本顺滑

自动语音识别(ASR)获得的口语文本往往含有大量的不流畅的成分,如重复词、语气词、冗余等。这些不流畅的文本会对下游的自然语言理解任务(如句法分析、机器翻译、文本摘要等)造成严重的干扰。文本顺滑(Disfluency Detection)任务的目的就是要识别出这些不流畅成分并进行删除,从而为下游的自然语言理解任务提供更加友好的输入。如:

语音识别的结果:帮我订一张去北京上海的飞机票

文本顺滑的结果:帮我订一张去上海的飞机票

注:其中红色字体为冗余,绿色字体为语气词,蓝色字体为语气词。

传统的文本顺滑技术严重依赖于有标注数据,而有标注数据在实际应用中是很难获取且标注代价高昂,因此其很难被应用到新的场景。基于大规模预训练模型的学习范式可以利用大规模无标注数据来减少对有标注数据的依赖。因此,本研究充分利用深度学习和大规模预训练模型,从模型表示和无标注数据利用两方面来提升文本顺滑任务的性能。重点围绕文本顺滑任务的块之间的相似性、长距离依赖、结果的句法结构完整性、减少对有标注数据依赖等几个关键挑战和特点分别进行了有针对性的研究,显著地提升了文本顺滑任务的性能,并缓解了对有标注数据的依赖。已发表CCF A/B类论文5篇,成果已成功集成到工业界的文本顺滑系统中。

论文列表

Combining Self-Training and Self-Supervised Learning for Unsupervised Disfluency Detection

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1813--1822, 2020.

Wang, Shaolei and Wang, Zhongyuan and Che, Wanxiang and Liu, Ting

Combining Self-Training and Self-Supervised Learning for Unsupervised Disfluency Detection

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1813--1822, 2020.

Wang, Shaolei and Wang, Zhongyuan and Che, Wanxiang and Liu, Ting

Multi-task self-supervised learning for disfluency detection

Proceedings of the AAAI Conference on Artificial Intelligence, 9193--9200, 2020.

Wang, Shaolei and Che, Wanxiang and Liu, Qi and Qin, Pengda and Liu, Ting and Wang, William Yang

Multi-task self-supervised learning for disfluency detection

Proceedings of the AAAI Conference on Artificial Intelligence, 9193--9200, 2020.

Wang, Shaolei and Che, Wanxiang and Liu, Qi and Qin, Pengda and Liu, Ting and Wang, William Yang

Transition-Based Disfluency Detection using LSTMs

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2785--2794, 2017.

Wang, Shaolei and Che, Wanxiang and Zhang, Yue and Zhang, Meishan and Liu, Ting

Transition-Based Disfluency Detection using LSTMs

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2785--2794, 2017.

Wang, Shaolei and Che, Wanxiang and Zhang, Yue and Zhang, Meishan and Liu, Ting

A Neural Attention Model for Disfluency Detection

Proceedings of {COLING} 2016, the 26th International Conference on Computational Linguistics: Technical Papers, 278--287, 2016.

Wang, Shaolei and Che, Wanxiang and Liu, Ting

A Neural Attention Model for Disfluency Detection

Proceedings of {COLING} 2016, the 26th International Conference on Computational Linguistics: Technical Papers, 278--287, 2016.

Wang, Shaolei and Che, Wanxiang and Liu, Ting

Enhancing Neural Disfluency Detection with Hand-Crafted Features

Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, 336--347, 2016.

Wang, Shaolei and Che, Wanxiang and Liu, Yijia and Liu, Ting

Enhancing Neural Disfluency Detection with Hand-Crafted Features

Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, 336--347, 2016.

Wang, Shaolei and Che, Wanxiang and Liu, Yijia and Liu, Ting

Combining Self-supervised Learning and Active Learning for Disfluency Detection

ArXiv preprint, abs/10.1145, 2010.

Wang, Shaolei and Wang, Zhongyuan and Che, Wanxiang and Zhao, Sendong and Liu, Ting

Combining Self-supervised Learning and Active Learning for Disfluency Detection

ArXiv preprint, abs/10.1145, 2010.

Wang, Shaolei and Wang, Zhongyuan and Che, Wanxiang and Zhao, Sendong and Liu, Ting