The Janus Intertextuality Search Engine: A Research Tool of (and for) the Electronic Manipulus florum Project

This article demonstrates how the search engine developed for this online edition not only serves the research purposes of users of this digital resource, but is also a valuable tool for refining and improving the edition while also aiding the author’s research on the construction of this text. An example of its utility for the edition project is provided which calls into question previous theories regarding the influence John of Wales may have had on this collection of Latin quotations.

§ 1 The Manipulus florum is a collection of authoritative Latin quotations that was compiled in Paris by Thomas of Ireland at the beginning of the fourteenth century. Its popularity and influence are attested by the survival of over 180 manuscripts and the publication of at least fifty printed editions between 1483 and 1887. It contains some 6000 textual excerpts and proverbs organized under 266 alphabetically-ordered topics, though the actual number of entries is about 5700 because Thomas often combined multiple proverbs attributed to Seneca in a single entry. As explained in the text's Preface (http://web.wlu.ca/history/cnighman/Preface.pdf), Thomas created a crossreferencing system for this collection by assigning to each entry a unique reference letter, or pair of letters for later entries in lemmata that have more than twenty-three quotations; following the last quotation of nearly every topic, he provided a list of quotations of similar interest under unrelated topics, as well as entire topics that are closely related to that particular lemma. Thomas' use of these cutting-edge information technologies explains why the Manipulus was so useful and thus so influential during the late medieval and early modern periods, when the citation of authoritative quotations was fundamental to academic, pastoral and literary composition.
The online critical edition project § 2 This florilegium was extensively studied in the 1970s by Mary and Richard Rouse, whose seminal book on this subject includes their edition of Thomas' preface to his collection and also an annotated edition of the list of original sources that he appended to it (Rouse and Rouse 1979, 236-8, 251-310), but the Rouses did not attempt a modern critical edition of the text itself. The Electronic Manipulus florum Project originated in October 2000, when I started transcribing the text of the second printed edition of the Manipulus, which was published in Venice in 1493/1495, and the online resource was launched in May 2001, when I began publishing individual transcribed topics in PDF files on the website that I created for this project (www.manipulusflorum.com). In 2002, after the transcription work had been completed and the text of the entire Venice edition had been provided online, I began compiling a critical edition of the text that is based on three early manuscripts and collated against five early imprints. As each edited topic is completed, the old transcription file for that topic is supplanted online by the new edition files. At present (December 2010), the online edition is approximately 65 percent complete. § 3 The edited quotations are provided online in HTML files created with MS Publisher. The critical apparatus for each edited entry is provided in three types of PDF documents that are linked to the HTML pages: 1. Every entry has a Varia document which notes any significant textual variants among the three source manuscripts and five early printed editions that have been collated for this edition. 2. About 95 percent of the entries are also linked to a Fons primus or Fontes primi document which displays the source or sources from the best modern edition of the original text, and sometimes also from the actual manuscript presumably used by Thomas of Ireland, in a parallel column beside the edited text of the quotation from the Manipulus. Variations are indicated by gaps in the underscoring of the original text. 3. About 10 percent of the entries are also linked to a Fons proximus or Fontes proximi document which provides the text of the intermediate source or sources for the quotation, again with variants indicated by breaks in the underscoring, and also sometimes including the text from the actual manuscript copy that Thomas probably used. § 4 This system of broken underscoring to show variants in the Fons primus/Fontes primi and the Fons proximus/Fontes proximi documents is also employed in paragraphs 14, 15, 18, 19, 28 and 30 of this article.
The online search engine § 5 While the texts on the HTML files for the online edition are not readable by Internet search engines such as Google, this is not the case with the PDF files in which the old transcription files and new critical apparatus files are provided. In fact, a number of scholars have stumbled onto this online project by searching the Internet for a Latin phrase and hitting one or more of the PDF files. [1] However, given the problem of spelling variants and the awkwardness of using an Internet search engine to search this particular online text, it became clear that the project website should be equipped with a customized search engine that would account for orthographical variants. For instance, the diphthongs "ae" and "oe" would be read as "e"; also "u" and "v" as well as "i" and "j" would be read interchangeably, as would a number of common spelling variants such as "nihil/nichil" and "mihi/michi". § 6 At first I envisaged this search engine as having a standard keyword search function, with Boolean capabilities to allow for variations in the conjugation of verbs and the declination of nouns and adjectives. However, inspired by the example of the online anti-plagiarism service turnitin.com, I later realized that an intertextuality search engine, which would also have the ability to account for the most common spelling variants, would be of much greater utility to users of the online edition who would be able to quickly and easily determine whether, and to what extent, the author of a particular late medieval or early modern Latin text employed the Manipulus as a reference work. Once a long passage or even an entire text has been pasted into an expandable search field, the search engine would generate an intertextuality report (similar to the "originality report" generated by turnitin.com when an instructor suspects plagiarism), which would indicate likely matches between the provided search text and the Manipulus. Not only would such a research tool save scholars much time and effort by obviating conventional searches of keywords or short phrases, but it would also have the potential of revealing uncited quotations that are imbedded in their text, which might have otherwise gone unnoticed as such. § 7 In 2007, I contacted Dr. Frank Tompa, who had been involved in several important computing in the humanities projects, including the Electronic Oxford English Dictionary and the MARGOT Project (http://margot.uwaterloo.ca/), among others. After I explained my idea for the intertextuality search engine, Frank agreed to supervise its development by a graduate student, Mr. Andrew Kane, a doctoral candidate at the David Cheriton School of Computer Science at the University of Waterloo. § 8 The intertextuality search engine of the Electronic Manipulus florum Project (http://web.wlu.ca/history/cnighman/page13.html) that Frank and Andrew created has been operational on the project website since November 2008 and is freely available to the public (see Kane and Tomba 2011 for a technical report on the development of this engine). Its database contains the edited quotations from the Manipulus florum, but not the transcribed texts that have not yet been edited. The database, which is periodically updated as the edition work progresses, currently contains about 63 percent of the text of the Manipulus.
Using Janus to determine intermediate sources § 9 Although the primary purpose of this search engine is to enable scholars to compare a late medieval or early modern Latin text that they are studying with the edited quotations from the Manipulus, it will also serve as a very useful tool for my editorial work as the project enters its final phase, especially in determining intermediate medieval sources that Thomas mined for quotations from classical and patristic authors. This is why the title of this article refers to the search engine as a research tool both "of" and "for" the online edition project. In the same vein, I have named the search engine after the two-faced Roman deity of the threshold who looks both forward and backward because Janus can reveal both the influence of the Manipulus on texts composed after it was completed in 1306 and the influence on the Manipulus of intermediate sources written before that date that were used by Thomas of Ireland. In what follows, I will explain the potential uses of this intertextuality search engine for refining the online critical edition and demonstrate how it has already been used in my own research on this text.
Intermediate sources for the Manipulus florum § 10 The Rouses noted that Thomas cites three intermediate sources from which he extracted patristic and classical quotations: the Glossa ordinaria, Gratian's Decretum and an anonymous twelfth-century florilegium known as the Flores angelica, which Thomas usually cited as "Prouerbia philosophorum". They also made the important discovery that Thomas made extensive use of two other previous florilegia, characterized by the Rouses as "major sources", which he did not cite or otherwise acknowledge: the Flores paradisi and the Liber exceptionum (Rouse and Rouse 1979, 126-56). In the process of editing the Manipulus I have discovered several other intermediate sources that Thomas did not acknowledge. The most important of these Fontes proximi are Thomas Aquinas' Secunda secundae from the Summa theologiae, Arnoldus Brixiensis' Liber consolationis, and Pseudo-Guillaume de Conches' Moralium dogma philosophorum. Copies of all three of these works were available to Thomas when he was compiling the Manipulus at the Sorbonne; in fact, as the Rouses pointed out, he actually owned a copy of Aquinas' Secunda secundae which survives among the former Sorbonne manuscripts now in the Bibliothèque Nationale de France (Rouse and Rouse 1979, 96). § 11 Thomas' use of Aquinas' Secunda secundae became apparent because his Summa theologiae, of which it is a part, is included in one of the most important research tools for this project: the Cetedoc Library of Christian Latin Texts (CLCLT-6) database, published by Brepols. His use of the Liber consolationis and the Moralium dogma philosophorum was discovered by Internet searches which led to the digital transcriptions of these two public domain texts provided on Angus Graham's website (http://freespace.virgin.net/angus.graham/Albertano.htm). While the extent of Thomas' use of the Liber consolationis and the Secunda secundae remains to be determined, I have already conducted a close analysis of the Moralium dogma philosophorum and determined that Thomas derived about fifty quotations in the Manipulus from two different copies of the Moralium that were in the library of the Sorbonne in 1306. This analysis was essentially complete before Andrew and Frank began developing Janus, so we were able to use the electronic text of the Moralium, downloaded from Graham's website, in order to test prototype versions of the search engine and adjust its search parameters to ensure that all of the expected hits in the Moralium were reported with a minimal number of coincidental "false" hits.

Using Janus to determine whether the Communiloquium was an intermediate source for the
Manipulus florum § 22 I decided to test Swanson's theory by using Janus to compare the entire text of the Communiloquium with the edited portion of the Manipulus, the first use of this intertextuality search engine for such a purpose. Because there was no digital version of the Communiloquium or even a modern critical edition that could be scanned, a research assistant and I transcribed the entire text of the long recension of that text from the 1475 Augsburg edition (over 125,000 words), [4] the same edition Swanson had consulted for her book. In August 2009, once the transcription was complete, I conducted a Janus search with the entire text of the Communiloquium when the database contained about 50 percent of the edited text of the Manipulus florum. Although the search produced numerous hits of possible cases of intertextual influence besides Correctio da, I determined that they were merely coincidental and that there were no other examples in the edited portion of the Manipulus to support Swanson's contention that Thomas had used the Communiloquium as an intermediate source in compiling his florilegium. This finding led me to conclude that Swanson was probably incorrect and that there must be some other explanation for the close correspondence between these paraphrases of the passage from Aulus Gellius. Therefore, I decided against adding a Fons proximus document to the online edition that would cite the Communiloquium as an intermediate source for Correctio da. § 23 I repeated this search just prior to the submission of this article, after the Janus database had been updated and expanded to its current 63 percent level. The results page reported 285 hits, including Correctio da. To facilitate comparative textual analysis, the commonalities on the results page are highlighted in both the excerpts from the search text and the matched quotations from the Manipulus florum, with variants indicated by breaks in the highlighting in the same way that underscoring is used in the Fontes documents for the critical apparatus of the online edition. Moreover, the name of the Manipulus entry which appears in the left-hand column is linked to the Fons primus/Fontes primi PDF document for that quotation, or to the Varia PDF document if the original source has not yet been found. Thus, without leaving the intertextuality report screen, users of the Janus search engine can call up the PDF as a pop up in order to compare a particular passage in their search text with both the version in the Manipulus and the version in the best modern edition of the original source (or sources), unless it has not yet been identified. These features are shown in the following screen shot from the recent Janus search of the Communiloquium: Figure 1: A screen-shot from a Janus search of the Communiloquium § 24 Four Manipulus florum quotations (Ebrietas z, Clericus l, Electio u, and Coniugium m) rank higher than Correctio da on the Janus search results page because they each share more textual content with the Communiloquium. However, all of these quotations have been dismissed as examples of Thomas' supposed use of the Communiloquium as an intermediate source: 1. Ebrietas z is unlikely because the quotation in the Manipulus comprises two non-contiguous passages from the source that have been spliced together, but they appear separately in the Communiloquium. Also, the text of the first part of Ebrietas z is identical to the modern edition of that text, but in the Communiloquium there are some variants.

Clericus l can be ruled out because that version of the quotation and the parallel passage in the
Communiloquium both vary significantly from the original source but in different ways. 3. Electio u (same rationale as Clericus l). 4. Coniugium m can be ruled out because Thomas misattributed this quotation to his actual intermediate source, Hugh of Folieto's De nuptiis, which he misattributed to Hugh of St. Victor, rather than to the original source, which is Jerome's Aduersus Iovinianum, as correctly attributed in the Communiloquium. § 25 On grounds such as these many of the remaining legitimate hits (as distinct from the "false" hits which are merely coincidental) for this search have been rejected as evidence for Thomas' supposed use of the Communiloquium as an intermediate source. Other disqualifying factors that apply to many of the other hits are when the quotation in the Manipulus is longer than the excerpt in the Communiloquium. There are also a number of parallel passages that cannot be used as evidence in support of Swanson's theory because both the textual and ascription evidence are inconclusive; in other words, the passage in the original source is identical to the versions in both the Communiloquium and the Manipulus, and there is no variant attribution evidence (as in the case of Coniugium m) to be considered. § 26 Besides Correctio da, only one quotation was found in this search which may indicate that the Communilioquium was used by Thomas of Ireland as an intermediate source for the Manipulus: § 27 Honestas c in the Manipulus florum: "Nichil turpe faciendum bono uiro, eciam si ex omni parte lateat, eciam si omnes deos hominesque celare possimus. Nichil tamen in nobis auare, nichil iniuste, nichil libidinose, nichil incontinenter esse faciendum. Sapientis enim est proprium, nichil quod penitere possit facere, nichil iniuste, sed splendide, constanter grauiter honeste, omnia. Tullius libro III. de officiis." § 28 Fontes primi for part of Honestas c: "Atque etiam ex omni deliberatione celandi et occultandi spes opinioque removenda est; satis enim nobis, si modo in philosophia aliquid profecimus, persuasum esse debet, si omnes deos hominesque celare possimus, nihil tamen avare, nihil iniuste, nihil libidinose, nihil incontinenter esse faciendum" (Cicero, De officiis, 3.38). "Sapientis est enim proprium nihil quod poenitere possit facere, nihil invitum, splendide, constanter, graviter, honeste omnia, nihil ita exspectare quasi certo futurum, nihil cum acciderit admirari, ut inopinatum ac novum accidisse videatur, omnia ad suum arbitrium referre, suis stare iudiciis; quo quid sit beatius mihi certe in mentem venire non potest" (Cicero, Tusculanae disputationes, 5.28.81). § 29 While most of Honestas c is derived from these two sources, the opening line has not been found in any classical or medieval source, except for the Communiloquium: § 30 Possible Fons proximus for part of Honestas c: "Quantum autem gentiles detestati sunt peccata ob suam turpitudinem quia nichil turpe est faciendum bono uiro, etiam si ex omni parte lateat, nichil iniuste, nichil libidinose, nichil inconuenienter esse faciendum philosophia persuadet, prout ait Tullius iii. de officiis capitulo x" (Iohannes Galensis, Communiloquium, 3.5.1). § 31 However, this apparent case of intertextual influence is weakened considerably by the fact that the brief Ciceronian sententia (nichil iniuste…esse faciendum) that follows in the Communiloquium corresponds to only part of the excerpt from De officiis that appears in Honestas c, and the passage from the Tusculan disputations does not follow at all; nor does it appear elsewhere in the Communiloquium. One possible explanation for this anomaly is that the text in the 1475 edition of the Communiloquium may be significantly different from the manuscript that Thomas presumably used, BnF MS lat. 15451 (Swanson MS 319), that is, if he did actually employ John of Wales' tract as "a classical quarry". I have not yet had an opportunity to check that manuscript to determine whether this is the case, but I suspect that it is not because the apparent truncation of the Ciceronian sententia in the 1475 Augsburg edition would only make sense if that version were the short recension of the Communiloquium, which it is not (Swanson 1989, 64). § 32 The results from this second Janus search of the Communiloquium therefore seem to confirm the results of the first search, which led me to conclude that Swanson was probably incorrect in suggesting that Thomas of Ireland used the Communiloquium as an intermediate source for classical quotations. But how, then, do we explain the Aulus Gellius paraphrase which appears in virtually identical versions in the Communiloquium and in Correctio da in the Manipulus and has been found nowhere else, and the existence of a single line in Honestas c, which is apparently original to the Communiloquium, although the rest of Honestas c was very likely not derived from John of Wales' tract? § 33 In claiming a connection between the Communiloquium and the Manipulus florum Swanson did not consider the old theory that John of Wales began the Manipulus and Thomas of Ireland later completed it. This claim is made in the colophons of several early manuscript copies of the Manipulus, and it was picked up by an Italian bibliographer in the early fifteenth century and perpetuated by subsequent scholars until it was rejected by the Rouses, who argued that Thomas of Ireland was probably the sole creator of the Manipulus florum (Rouse and Rouse 1979, 106-10). However, in light of Correctio da and Honestas c and their parallel passages in the Communiloquium, it appears that the old tradition of John of Wales' early involvement in the creation of the Manipulus may, in fact, be correct. For if Thomas of Ireland did use the Communiloquium as an intermediate source for classical quotations, he did so in a manner that is completely different from his use of the Moralium dogma philosophorum, the Secunda secundae and other intermediate sources which he extensively pillaged. There is also the uncertainty as to whether Thomas even had access to a copy of the Communiloquium when he was compiling the Manipulus. Given all of these circumstances, it seems more plausible that the rare instances of intertextuality between the Manipulus and the Communiloquium are actually relics of John of Wales' initiation of a project that was left incomplete until Thomas of Ireland assumed it a few years later. This scenario would also explain certain inconsistencies that have become apparent while editing the Manipulus, such as the careful citation of the Decretum and the Glossa as intermediate sources for some quotations, but the absence of such citations in other cases where one of those texts was surely used as an intermediate source. Further research may result in a definitive answer to the question of John of Wales' purported involvement in the compilation of the Manipulus florum.

Future plans for using Janus to determine intermediate sources for the
Manipulus § 34 Once the initial edition work has been completed, the Janus search engine will be used in a similar manner to further refine the critical edition by conducting systematic searches to determine Thomas' use of intermediate sources more thoroughly. For example, instances of Thomas' use of Aquinas' Secunda secundae have been determined by chance when searching the Cetedoc database for a quotation and there have been two hits: the original source from a patristic author and the Secunda, with the latter version being more similar to the passage in the Manipulus than the original. An example may be seen by comparing the Fons primus and Fons proximus documents for Luxuria d: http://web.wlu.ca/history/cnighman/MFfontes/LuxuriaD.pdf http://web.wlu.ca/history/cnighman/MFfontesprox/LuxuriaD.pdf § 35 But surely there must be other cases that have not been detected in this random manner. When the edition work enters its final phase a systematic Janus search of the Secunda secundae will make use of the digital text provided online in Roberto Busa's transcription of Thomas Aquinas' Opera omnia (http://www.corpusthomisticum.org/iopera.html). The same will be done with Gratian's Decretum, if an electronic copy of the entire text can be obtained. Similarly, digital copies of the other intermediate sources that have been identified (and perhaps others that have not yet been found) will be systematically searched with the Janus search engine, and it is expected that the final version of the Manipulus florum edition will be significantly refined and improved as a result. § 36 Another important intermediate source that will eventually be checked through Janus is the Glossa ordinaria. The Rouses, noting that there are only fifty-five citations of the Glossa in the Manipulus, categorized it as a "minor source", though they also suggested that there are about seventy-five other quotations that, on the basis of how Thomas cited the source, were probably also extracted from the Ordinary Gloss (Rouse and Rouse 1979, 151). However, in the process of compiling the critical edition, it has become clear that Thomas also used the Glossa in more cases than these. This became apparent through searches of another very important tool for this project, the online Patrologia Latina (PL), published by ProQuest/Chadwyck-Healy. However, this nineteenth-century edition of the Glossa (PL 113-114) is seriously flawed and so the PL database is of only limited use for this text. Although there is a published critical edition of the Glossa ordinaria on Cantica canticum (Dove 1997) in a Brepols series, it will probably be decades before critical editions of the Ordinary Gloss on the rest of the Bible appear and digital versions of the texts become available for searching the Janus database. Much more promising in the short term is the Glossae Net project (www.glossae.net), which seeks to digitize and edit the marginal and interlinear glosses from Adolph Rusch's 1479/80 Strasbourg edition of the Vulgate, and to provide this text freely online. Once the Glossae Net project is complete, I will request a digital copy of the text for the purpose of conducting a Janus search to determine all of the instances in which Thomas mined the Glossa ordinaria for patristic quotations. Indeed, this analysis may result in the Glossa being reclassified as a "major source" along with the two uncited florilegia that the Rouses discovered to be important intermediate sources used by Thomas of Ireland.
Future plans for using Janus to determine duplicate quotations in the manuscript and later print traditions of the Manipulus § 37 Another way in which Janus will be used in the final phase of the edition project will be to identify all instances of duplication of quotations, or portions of quotations, within the original collection of the Manipulus. For example, I have discovered that Detractio ak and part of Detractio al are repeated under other topics (Paciencia be and Inuidia z, respectively) and thus cross-reference links to those repetitions have been added to these entries on their respective HTML edition pages: http://web.wlu.ca/history/cnighman/MFedition/Detractio/page4.html (for Detractio ak) http://web.wlu.ca/history/cnighman/MFedition/Detractio/page5.html (for Detractio al) http://web.wlu.ca/history/cnighman/MFedition/Paciencia/page6.html (for Paciencia be) http://web.wlu.ca/history/cnighman/MFedition/Inuidia/page4.html (for Inuidia z) § 38 So far about forty such cases of duplication in the original collection have been discovered by chance, either when Googling a short phrase in a quotation that was not found in the PL or CLCLT-6 databases and hitting a PDF document for a different Manipulus florum entry, or when checking to ensure that the critical apparatus files of a recently completed topic are searchable by Google before removing the old transcription PDF file for that topic from the project's server. Janus is ideally suited to determining all such instances of internal duplication, and I expect that many other complete and partial repetitions will be revealed by simply pasting the text of each edited topic into the Janus search window. The intertextuality report for this type of search will display not only all of the quotations in that topic but also possible duplications under other topics. Once these have been determined, links to the duplications will be added to the relevant HTML pages as exemplified above with Detractio, Inuidia and Paciencia. § 39 Janus will also be employed after the edition is completed to determine any quotations that appear both under a particular topic in the original collection of quotations in the Manipulus and also under a different topic among the additional quotations that are found in the early printed versions of the Manipulus. In the case of the 1483 Piacenza edition, there are only a handful of new quotations that were either added at some point in the manuscript tradition or were introduced by the printer of that first edition (Rouse and Rouse 1979, 182); so far, none of those quotations have been found elsewhere in the original collection of the Manipulus. However, Tibault Payen's 1567 Lyon edition includes hundreds of added quotations, most of which are indicated by an asterix in the margin, and these additional quotations were perpetuated in most subsequent editions (Rouse and Rouse 1979, 184). Several of Payen's added quotations have been found to be repetitions from elsewhere in the original collection, as in the case of Fama c, part of which appears in the 1567 edition under the topic Conscientia: § 40 Fama c in the Manipulus florum: "Duo sunt tibi necessaria, scilicet consciencia et fama, consciencia propter te, fama propter proximum. Qui consciencie sue confidens famam negligit, crudelis est. Augustinus libro de communi sermone clericorum." § 41 Conscientia* in Payen's Lyon edition (Hibernicus 1567, 170): "Duo sunt necessaria, conscientia & fama: conscientia, propter te: fama, propter proximum. Idem" (Ambrosius in epistola ad Constantinum). § 42 No doubt a systematic Janus search will reveal other cases of added quotations in Payen's edition that are partial or complete repetitions from other topics in the original collection. In such cases a cross reference notation will be added to the PDF document for the 1567 additiones (http://web.wlu.ca/history/cnighman/1567Additiones.pdf), but not the online edition of the original collection.
Conclusion § 43 The Janus Intertextuality search engine clearly has great potential as a research tool, and it has already proven very useful in the ongoing edition work as well as my own textual research on the Manipulus florum. Thanks to the efforts of Andrew Kane and Frank Tompa, the final version of the online edition will be much more complete, and thus more useful to scholars, than it would have otherwise been, as it will allow for a thorough determination of Thomas' use (or probable non-use, in the case of John of Wales' Communiloquium) of intermediate sources, full and partial repetitions within the original collection, and full and partial repetitions in later printed versions, most notably the 1567 Lyon edition.
Notes [1]. For example, see the articles by Boyer and Steggle cited in the Annotated Bibliography (http://web.wlu.ca/history/cnighman/Bibliography.pdf) on the project website.