Description
Historical newspapers have traditionally been popular sources to study public mentalities and collective cultures within historical scholarship. At the same time, they have been known as notoriously time-consuming and complex to analyze. The recent digitization of newspapers and the use of computers to gain access to the growing mass of digital corpora of historical news media are altering the historian’s heuristic process in fundamental ways.
The large digitization project the Dutch National Library (KB) currently runs can illustrate this. Until now, the KB has made publicly available over 80 million historical newspaper articles from the last four centuries. Researchers (as well as the wider public) are able to do full-text searches in the entire repository of articles through the KB’s own online search interface Delpher . Instead of manually skimming through a selected numbers of editions or volumes this functionality allows for the searching of particular (strings of) keywords within the entire corpus. As basic as it may seem, full-text searching completely overturns the way in which historians are used to approach newspapers. Instead of the successive top-down selections historians traditionally made in order to gradually isolate potentially interesting material, keyword searching treats the corpus as a singular bag of words and, therefore, enables researchers to immediately dive into the texts that meet their search criteria.
At the same time, keyword searching has some serious shortcomings for the use in (cultural) historical research. Historians commonly work with texts, but are rarely interested in language per se. Rather, they use written or spoken sources (be it correspondence, literature, diaries, or news media) to gain access to past cultures, ideas, or mentalities. The things that historians are mostly interested in, are often not made explicit (e.g. the Enlightenment attitude, generational conflicts) and difficult to abstract into singular keywords (modernity, secularization). Doing historical research with keyword searching is like painting a canvas using felt-tip pens: it loses every inch of subtlety.
The goal of this project was to develop software to overcome this problem. The ‘Keyword Generator’ tool, developed in cooperation with Juliette Lonij of the KB Lab, offers a technique of dictionary extraction. The use of dictionaries is able to bring greater subtlety and diversity into digital historical scholarship. The more elaborate these dictionaries are, the more they overcome the contingency that comes with the use of singular keywords in search strategies.
Website
https://pimhuijnen.com/2015/12/04/from-keyword-searching-to-concept-mining/