the fragment corpora - [book]
"The air itself is one vast library on whose pages are forever written all that
man has ever said or women whispered"
Charles Babbage - Inventor of the Difference Engine (1821)
The title of this book, The Fragment Corpora, originates from on-line dictionaries which, in addition to supplying definitions and synonyms, also often provide a Corpus - a set of examples of how, and in which context, words are used. Originally, this was for the benefit of linguists to help track changes in language and meaning over time. The Corpus for the word “Fragment” provides exhaustive examples of the use of the word and its derivations from across the sciences and humanities as well as from fiction and the arts.
The chapter headings in the book derive from the Corpus for the word ‘fragment’ and are taken from a number of on-line dictionaries.
The prose texts are embedded in continuous blocks of code (Javascript). For the general reader they occur as islands of meaning in a sea of glyphs. They are chosen as representing texts which might be found in searches either on line or in libraries during which process a researcher might sift items of interest from a sea of irrelevant material. The range and type of these inclusions sets the tone of the book. Often, either directly or indirectly, they form resonances with the images, which in turn provide the texts with context and atmosphere.
In combination, they reflect the autobiographical nature of retained knowledge, as defined by the authors education, interests and cultural origins – conditioned as they are much by inclusions as by omissions. The selected texts and images become pieces of a puzzle which might be re-assembled to construct the world through the opticks of a particular author.
In Artificial Intelligence, Large Language Models (LLMs) require a Corpus, containing as much material as possible relating to the particular subject in which the AI is to be trained. A corpus could contain, for example, all the works of a particular author. The Shakespeare corpus would not only contain all the writers plays, sonnets and letters, but also the related commentaries, reviews and details of productions. Everything, in fact, pertaining to the oeuvre, so that searches can be made and questions answered. More controversially, it also offers the possibility of creating texts “in the style of” based on machine learning of the corpus content.
The word “Fragment” and its derivations suggests the divisions and distinctions made in descriptions of the world to assign meaning to its composite parts – parts which are sometimes under the auspices of specialist interests and therefore come with their own discourses and proprietary languages. This implies a divergence from the pre-enlightenment notion of the world as a seamlessly interconnected whole. The fragment is such a universal theme in contemporary culture that we frequently describe ourselves as “living in a fragmented world” – which leads to the question of what a “whole world” might look like.
The images and texts in The Fragment Corpora include a variety of seemingly random elements relating to enclyclopaedias, aerial views, coding, allegories, references to art history, micro and macro environments, mathematical figures; images of wholeness and fragmentation. To create an image of ‘the whole world’ we know to be an impossibility, although we are constantly obliged to inhabit worlds created by others. Our environment is, and continues to be conditioned by these strictures and imposed descriptions.
The Fragment Corpora embraces the limitations of the methodologies commonly used to construct a meaningful images of the world we inhabit. Theoretical physicists, photographers, astronomers and mathematicians all have to grapple with systems which work convincingly within certain parameters but which eventually encounter environments in which they no longer usefully function. As James Gleick reminds us in his 1987 classic Chaos:
Each scientist had a private constellation of intellectual parents. Each had his own picture of the landscape of ideas, and each picture was limited in its own way. Knowledge was imperfect. Scientists were biased by the customs of their disciplines or by the accidental paths of their own education. The scientific world can be surprisingly finite. No committee of scientists pushed history into a new channel —a handful of individuals did it, with individual perceptions and individual goals.
The same could be said for Artificial Intelligence, itself a product conditioned by the minds which define its purpose and assemble its composite parts. Efforts to construct an ‘image of the world’ or an exclusive description of it, have been frequent in historical memory and attempts generally fall far short of the hopes and expectations which inspired them, sometimes with disastrous consequences.
Early collections of knowledge about the world such as libraries and encyclopaedias might also include Cabinets of Curiosities, popular with aristocrats and rulers of the early modern era. These collections prefigured the museums of modern times, when collectors began to favour taxonomies based on science and history. Cabinets of Curiosity differ from encyclopaedias and other kinds of compendiums in the sense that they were perhaps more personalised and contained a kind of autobiographical portrait of the interests and world view of the collector. The digital world, in which we can include AI, can also be included in this story of ‘completist’ projects which attempt to layout the world out before us.
The connections between texts and images in The Fragment Corpora are purposefully tenuous, but maintain an overall identity through their curation and aesthetic. The book can be seen as a kind of cartoon internet on paper, for an audience of one. A random collection of elements from unrecognised corners of the world; marking out its height and breadth and indicating both its unimaginable extent and the limitations of its descriptors.
Apart from existing as a book in its own right The Fragment Corpora can be seen both as a catalogue and as a starting point for its own expansion into other media, presented either on-line or in in physical spaces. Spin-off projects such as the ORACLE CORPUS, is an LLM, to be presented in a three dimensional space and trained on a wide selection of texts originating from the pre-(European) enlightenment era. Thus it presents a picture of ‘the world’ up to approximately 1660. The Corpus includes classical texts as well as material related to alchemy, astronomy, poetry, geography, history, science and the taxonomy of the period.
The European Enlightenment, under the banner of scientific supremacy, borrowed heavily from the Golden Age of Islam and is now seen by many as having been one of the main engines of colonialism. It could be argued that a similar trajectory is being followed by large parts of the internet today.
Allan Forrester Parker 08/25