MOTIVATION
A fascinating aspect of science is how different fields of study interact and influence each other. Many significant advances have emerged from the synergistic interaction of multiple disciplines. For example, the conception of quantum mechanics is a theory that coalesced Planck’s idea of quantized energy levels, Einstein’s photoelectric effect, and Bohr’s atom model.
The degree to which the ideas and artifacts of a field of study are helpful to the world is a measure of its influence.
Developing a better sense of the influence of a field has several benefits, such as understanding what fosters greater innovation and what stifles it, what a field has success at understanding and what remains elusive, or who are the most prominent stakeholders benefiting and who are being left behind.
Mechanisms of field-to-field influence are complex, but one notable marker of scientific influence is citations. The extent to which a source field cites a target field is a rough indicator of the degree of influence of the target on the source. We note here, though, that not all citations are equal and subject to various biases. Nonetheless, meaningful inferences can be drawn at an aggregate level; for example, if the proportion of citations from field x to a target field y has markedly increased as compared to the proportion of citations from other fields to the target, then it is likely that the influence of x on y has grown.
WHY NLP?
While studying influence is useful for any field of study, we focus on Natural language Processing (NLP) research for one critical reason.
NLP is at an inflection point. Recent developments in large language models have captured the imagination of the scientific world, industry, and the general public.
Thus, NLP is poised to exert substantial influence despite significant risks. Further, language is social, and its applications have complex social implications. Therefore, responsible research and development need engagement with a wide swathe of literature (arguably, more so for NLP than other fields).
By tracing hundreds of thousands of citations, we systematically and quantitatively examine broad trends in the influence of various fields of study on NLP and NLP’s influence on them.
We use Semantic Scholar’s field of study attribute to categorize papers into 23 fields, such as math, medicine, or computer science. A paper can belong to one or many fields. For example, a paper that targets a medical application using computer algorithms might be in medicine and computer science. NLP itself is an interdisciplinary subfield of computer science, machine learning, and linguistics. We categorize a paper as NLP when it is in the ACL Anthology, which is arguably the largest repository of NLP literature (albeit not a complete set of all NLP papers).
- 209m papers and 2.5b citations from various fields (Semantic Scholar): For each citation, the field of study of the citing and cited paper.
- Semantic Scholar’s field of study attribute to categorize papers into 23 fields, such as math, medicine, or computer science.
- 77K NLP papers from 1965 to 2022 (ACL Anthology)