The second focal point uses and develops natural language processing techniques within the methodology of Distant Reading. As the digitization initiative of the BHMPI library will then have added special collections (see 3.1.1), the doctoral students will be able to develop new methods and technologies for recognizing linguistic patterns in the textual data. The main goal of this axis of research is the development of a computational historiography that allows researchers to recon-struct the history and geography of artistic and art-historical concepts and ideas, by being able to trace art historical terminology and argumentation contextually. Indeed, scholarly experience shows that studying the history of Art History becomes increasingly challenging after 1900 due to the large quantity of textual material. On the other hand, since the 1990s, textual scholarship itself has become digital. In this light, digital research methods become not only necessary (due to the scale and complexity of the material) but contextually appropriate (with increasingly born-digital scientific publications) to map understand the development of the intellectual history and geography of the discipline in the last century.
This shift towards a corpus historiography echoes previous similar shifts in corpus linguistics: such a historiography, however, requires different computational methods to those developed for linguistic and literary phenomena. Novel computational problems include multilingualism (the interrelation between language-groups and schools of scientific thought), contextualization (particularly in relation to intertextual and intervisual references), and polysemy, which however has well progressed over the last two decades thanks to the Getty Research Institute and others, including the BHMPI. New textual search engines will change the virtual structure of the BHMPI library and its scholarly use significantly, whilst the semantic analysis of text can lead towards a recommendation engine for secondary references for links made in the BHMPI’s knowledge-graph. Specific tools for identifying handwriting will be necessary for the identification of the digitized photo material. The developed technologies also be of particular interest to the MPIWG and other MPIs of the Section.
Examples for possible research topics driven by the new infrastructures and tools:
- study of the semantic drift of artistic vocabulary in time and space;
- visualizing the diffusion of art historical terms and trends in social networks;
- exploring intertextual references across publications;
- automatic transcription, comparison and categorization of hand-written and early printed documents;
- context-aware intelligent ‘diff’ tools for intratextual comparisons, i.e. between different editions, translations or transcriptions of the same text;
- analysis of aesthetic terms in texts denoting color, space, forms etc. and topological representation of their interrelations and values in comparison to the object they describe;
- stylometric analysis of art historical writings and their developments and affiliations;
- machine learning-driven and thesaurus-based multilingual text search as to reach out more easily to foreign-language publications and sources;
- research-trend analysis and forecast in relation to times, places, terms, actors and networks as to be able to study the contemporary history of Art History;
- statistical and archival research into the provenance, the value development and mobility of works of art;
- use of digital tools for exploring and correlating visual images in texts, furthering recent technical developments in image-text relations, e.g. in automatic image caption generation;
- Cross-lingual text mining and joint text and image mining from art historical publications.