摘要:
本文基于天网实验室的Platform for Applying, Researching And Developing Intelligent Search Engine (PARADISE)搜索引擎平台,通过以从portal.acm.org抓取的计算机网络方向的2500多篇论文为数据,搭建成一个论文搜索系统,最终目的是通过论文之间的引用关系,获得其他引用这篇论文的作者对这篇论文的评价,形成一个小的评价段落,以及Impact-based Summaries,从而使得我们能够从专业级的角度获得这篇论文的内容以及优劣。我们首先从portal.acm.org上面抓取了文章之间的引用关系,然后通过一个算法获得对一篇文章评价的候选句子集,根据这些句子的重要程度进行排序,获得一个评价短文。并且构建了一个语言模型,通过这些候选句子集对原文的句子进行评分,取得分最高的几个句子,获得原文基于影响的概括。
关键词
搜索引擎, 论文评价, 语言模型, KL-divergence算法, 基于影响的概括
Abstract
In this paper, based on the PARADISE (Platform for Applying, Researching and Developing Intelligent Search Engine) and the data of 2500 papers in area of computer network, we construct a search engine of papers. Our goal is to get the comment and impact-based summaries of one paper based on the reference relations between the papers. We firstly get candidate sentences which comment on the previous paper and generate a citation context. Then we construct a Language Model, through the citation context, we can score the sentence in the previous paper, and get the impact-based summaries.
Key words
Search Engine, Paper Comment, Language Model, KL-divergence Scoring, Impact-based Summaries