Automated Biomedical Text Fragmentation: in support of biomedical sentence fragment classification
The past decade has seen a tremendous growth in the amount of biomedical literature, specifically in the area of bioinformatics. As a result, biomedical text categorization has become a central task for providing researchers with literature appropriate for their specific information needs. Pan et al. have explored a method that automatically identifies information-bearing sentence fragments within scientific text. Their proposed method aims to automatically classify sentence fragments into certain sets of categories defined to satisfy specific types of information needs. The categories are grouped into five different dimensions known as Focus, Polarity, Certainty, Evidence, and Trend. The reason that fragments are used as the unit of classification is that the class value along each of these dimensions can change mid-sentence. In order to automatically annotate sentence fragments along the five dimensions, automatically breaking sentences into fragments is a necessary step. In this study, we investigate the problem of automatic fragmentation of biomedical sentences, which is a fundamental layer in the multi- dimensional fragment classification.