Authors:
Gatherer, Derek
Abstract:
The data file is a spreadsheet used to record queries made via CQPweb (https://cqpweb.lancs.ac.uk). Search Terms For clarity, in the ensuing descriptions, we use bold font for search terms and italic font for collocates and other quotations. Based on clinical descriptions of COVID-19 (reviewed by Cevik et al., 2020), we identified the following search terms: 1) “cough”, 2) “fever”, 3) “pneumonia”. To avoid confusion with years when influenza pandemics may have occurred, we added 4) “influenza” and 5) “epidemic”. Any combinat (...)
The data file is a spreadsheet used to record queries made via CQPweb (https://cqpweb.lancs.ac.uk). Search Terms For clarity, in the ensuing descriptions, we use bold font for search terms and italic font for collocates and other quotations. Based on clinical descriptions of COVID-19 (reviewed by Cevik et al., 2020), we identified the following search terms: 1) “cough”, 2) “fever”, 3) “pneumonia”. To avoid confusion with years when influenza pandemics may have occurred, we added 4) “influenza” and 5) “epidemic”. Any combination of terms 1 to 3 co-occurring with term 4 alone or terms 4 and 5 together, would be indicative of a respiratory outbreak caused by, or at the least attributed to, influenza. By contrast, any combination of terms 1 to 3 co-occurring with term 5 alone, or without either of terms 4 and 5, would suggest a respiratory disease that was not confidently identified as influenza at the time. This outbreak would provide a candidate coronavirus epidemic for further investigation. Newspapers Newspapers and years searched were as follows: Belfast Newsletter (1828-1900), The Era (1838-1900), Glasgow Herald (1820-1900), Hampshire & Portsmouth Telegraph (1799-1900), Ipswich Journal (1800-1900), Liverpool Mercury (1811-1900), Northern Echo (1870-1900) Pall Mall Gazette (1865-1900), Reynold’s Daily (1850-1900), Western Mail (1869-1900) and The Times (1785-2009). The search in The Times was extended to 2009 in order to provide a comparison with the 20th century. Searches were performed using Lancaster University’s instance of the CQPweb (Corpus Query Processor) corpus analysis software (https://cqpweb.lancs.ac.uk/; Hardie, 2012). CQPweb’s database is populated from the newspapers listed, using optical character recognition (OCR), so for older publications in particular, some errors may be present (McEnery et al., 2019). Statistics The occurrence of each of the five search terms was calculated per million words within the annual output of each publication, in CQPweb. This is compared to a background distribution constituting the corresponding words per million for each search term over the total year range for each newspaper. Within the annual distributions, for each search term and each newspaper, we determined the years lying in the top 1% (i.e. p<0.05 after application of a Bonferroni correction), following Gabrielatos et al. (2012). These are deemed to be years when that search term was in statistically significant usage above its background level for the newspaper in which it occurs. For years when search terms were significantly elevated, we also calculated collocates at range n. Collocates, in corpus linguistics, are other words found at statistically significant usage, over their own background levels, in a window from n positions to the left to n positions to the right of the search term. In other words, they are found in significant proximity to the search term. A default value of n=10 was used throughout, unless specified. Collocation analysis therefore assists in showing how a search term associates with other words within a corpus, providing information about the context in which that search term is used. CQPweb provides a log ratio method for the quantification of the strength of collocation.
(Read More)