Projects

Ubiqu+Ity

Ubiq is an online text-tagging service that generates statistics and web-based tagged text views for your text(s), using the Docuscope dictionary or your own rules. A preliminary version of the tool can be found here.

Serendip+Ity

In this project, we are exploring multi-scale exploration of large text corpora guided by probabilistic topic models. Unlike prior work that focuses on visualizing topic models, we seek to treat the models as a lens through which the original documents can be viewed, rather than treating it as an end to be visualized in and of itself. Through this lens, the reader can observe trends and build hypotheses at multiple scales—ranging from across a corpus to within a single text—and support these hypotheses with both algorithmic data and textual examples. Supporting this workflow requires a multi-tiered framework that affords comparisons at many levels, from multiple documents to specific passages to individual words. In doing so, we must overcome challenges including the scale of the corpus, the density of the models, and the overlapping nature of topic distributions.

We tackle these in our implementation of Serendip, a tool that combines view-coordinated re-orderable matrices, small multiples displays, and tagged text in order to allow readers develop insight at multiple levels and carry that insight into their analysis of other levels. Serendip uses metadata and reader interaction to highlight trends and areas of potential interest.

TextDNA

In this project, we are exploring multi-scale exploration of large text corpora through the affordances of the genomics sequencing system, Sequence Surveyor. With TextDNA, users can compare word usage between document collections, between individual documents, or between elements within a document. Word usage can be explored across raw texts, i.e., text documents not subject to processing. Additionally, word usage can be explored across different metrics, such as the frequency with which they appear in a document.