Scientific text mining is a natural-language-processing pipeline applied to academic literature. Grounded in domain-specific pretrained models such as SciBERT (Beltagy et al., 2019) and SPECTER (Cohan et al., 2020), it automatically extracts hypotheses, methodologies, findings, and scholarly contributions from full-text papers or abstracts, enabling systematic review automation, research-trend analysis, and science mapping at scale.
Beltagy, I., Lo, K., & Cohan, A. (2019). SciBERT: A Pretrained Language Model for Scientific Text. EMNLP 2019. link ↗
Cohan, A., Feldman, S., Beltagy, I., Downey, D., & Weld, D. (2020). SPECTER: Document-Level Representation Learning using Citation-Informed Transformers. ACL 2020. link ↗