Counting mathematical diagrams with machine learning
Henrik Kragh SÃ¸rensen and Mikkel Willum Johansen
The role and use of diagrams in mathematical research has recently attracted increasing attention within the philosophy of mathematics, leading to a number of in-depth case studies of how diagrams are used in mathematical practice. Though highly interesting, the study of diagrams still largely lack quantitative investigations which can provide vital background information regarding variations e.g. in the frequency or type of diagrams used in mathematics publication over time. A first attempt at providing such quantitative background information has recently been conducted, making it clear that the manual labour required to identify and code diagrams constitutes a major limiting factor in large-scale investigations of diagram-use in mathematics. In order to overcome this limiting factor, we have developed a machine learning tool that is able to identify and count mathematical diagrams in large corpora of mathematics texts. In this paper we report on our experiences with this first attempt to bring machine learning tools to the aid of philosophy of mathematics. We describe how we developed the tool, the choices we made along the way, and how reliable the tool is in identifying mathematical diagrams in corpora outside of its training set. On the basis of these experiences we discuss how machine learning tools can be used to inform philosophical discussions, and we provide some ideas to new and valuable research questions that these novel tools may help answer.