Our Ontological Future

Recently, the [National Institute for Standards and Technology][nist] hosted a conference to establish a word/logic bank for thinking machines. Out of that conference came an agreement:

> Information scientists announced an agreement last month on a “concept bank” programmers could use to build thinking machines that reason about complex problems at the frontiers of knowledge from advanced manufacturing to biomedicine.

> The agreement by ontologists — experts in word meanings and in using appropriate words to build actionable machine commands — outlined the critical functions of such a bank. It was reached at a two-day Ontology Summit held during NIST’s Interoperability Week in Gaithersburg, Md. The decision to create a unique Internet facility called the Open Ontology Repository (OOR) culminated more than three months of Internet discussion.
(Quote taken from [Science Blog report][sb]. The OOR proposal is [here][oor].)

When I was an undergraduate in college, I was both an English and Philosophy major. (I know, what hope for me, eh?) Studying philosophy in the 1980s, before the rise of artificial intelligence (AI), *ontology* meant only one thing: the study of existence to determine what entities (we called them *phenomena*) were present, what categories (or types) into which those entities prevailed, and the relationships between entities.

With the rise of AI, there has been a need to re-use ontology with a different vector: *ontology* can also be *a description (like a formal specification of a program) of the concepts and relationships that can exist for an agent or a community of agents*. So far as I know, Tom Gruber and his colleagues were the first to re-use *ontology* in this way. Their use is not as far from the philosophical usage as they might believe: their goal is to establish a set of concept definitions expressly for knowledge sharing and re-use. To my mind, such a project isn’t that far from what philosophers were doing, especially within the phenomenological tradition. Their goal, at least in my reading of Heidegger and Bachelard and others, was a kind of concise mapping of the universe as humans understood it in order to understand the very principles of human understanding. (Levi-Strauss’ *structuralism* operated in much the same manner. Again, to my mind, which may now be proving itself divergent and/or errant.)

The point, for Gruber *et al.*, is that one commits to an ontology — their term is in fact “ontological commitment” — in order to create agents that can then engage in knowledge sharing. There were several levels (layers, dimensions) to what Project Bamboo participants aspired to, but one was definitely at the deep infrastructural level of *meta-data*. One of the groups in which I participated was tasked with the job of teasing out the notion of *foraging* which was something that the larger group perceived as being a *commonality* among humanities practitioners. We go out. We search for data. Faced with the forest of data to be found in libraries, which really do feel like being lost in the woods sometimes (in a good way), and on-line, we forage. Sometimes we find what we wanted. Sometimes we find not the berries we were looking for, but a root which is even better. That is the nature of foraging. All of us, however, yearn for better breadcrumbs through our proverbial forests. Better search devices would seem to be a key to better, more efficient searches — though one does have to wonder if efficiency, within the humanities paradigm, doesn’t also lead to impoverishment. Building better searches would seem to be founded on not only the data being accessible, which is one of the shiniest promises of the digital age, but also the data being searchable. Often what we want to know about something isn’t contained within the object itself. Take, for instance, a digital audio file of a performance by Varise Conner of his “Lake Arthur Stomp.” Nothing in the file itself will tell you the name of the tune — there are no words, no refrain. Nothing will tell you who originated the song or who is playing it now, unless you can recognize the tune and/or the style of its performance. Or that it’s a melody in the Cajun repertoire. Or that its author was of Irish descent. Or that he lived in Vermilion parish. Or that he was also a sawyer. None of this is the data itself. It’s all **meta-data** and it turns out that meta-data is sometimes more important than the data itself, especially when it comes to finding the data. The problem is committing to a meta-data set. Some of you live in places where there is a university library, which usually adheres to the Library of Congress call system, and a local public library, many of which still use the Dewey Decimal system. It’s not as easy as switching gears from letters to numbers, from PS to 800. The ordering of entities and their groupings are different. Philosophy, for example, occupies a different place and is near different things in the two systems. And that’s just to catalog — in order to house and then to find* — books and other printed materials. What do you do with other kinds of objects? Especially objects that will never actually be housed in a physical facility? (We begin to border on an infinite regression here, since what we are dealing with is housing data about objects which, it turns out, is really meta-data itself. Oh. My.) But that is the grail that humanists seek, because we really would like to get as much of the human universe into a form that is searchable and accessible. Why is that important? Well, precisely because so much human activity still lies outside the scope of libraries and archives. And that not only includes the majority of humanity on this planet, but the majority of the lives of even the hyper-connected. One could easily argue that any assessment of what humans are based on what is currently available is really only a small part of the story. And we’re looking to tell a big story. * I am the father of a toddler, after all, and so the idea of putting things away in a place where you can later find them is central to my existence.

[nist]: http://www.nist.gov/
[sb]: http://www.scienceblog.com/cms/wordlogic-bank-help-build-“thinking”-machines-16567.html
[oor]: http://ontolog.cim3.net/cgi-bin/wiki.pl?OntologySummit2008_Communique

Specs for a New Computer

I need a new computer in order to play *Half-Life 2* and *Mass Effect*. The latter states that the minimum specifications to play it are:

* OS: Windows XP/Vista * Processor: Single Core: 2.4 GHz or faster / Multiple Cores: 2.0 GHz or faster
* Memory: XP: 1.0 GB RAM / Vista: 2.0 GB RAM
* Hard Drive: 15 GB (30 GB for Digital Download)
* DVD Drive: 1 SPEED (Not required for Digital Download)
* Video Card: 256 MB with Pixel Shader 3.0 support*
* Sound Card: DirectX 9.0c compatible

I’m guessing that would also work for HL2.

Read the Novel, *then* Play the Game

I’m a little late to the party, and I haven’t read the novel nor have I played the game. (Our home PC isn’t up to the task just yet.) But I did just finish reading the review for _Mass Effect: Revelation_ the novel that came out in June 2007 and the review for the game _Mass Effect_ which came out in October 2007.

Yes, once upon a time, films were renditions of novels or short stories. Then, later, in the wake of a (or “the” for some) blockbuster _Star Wars_, novels were commissioned after a successful vehicle was established. Where movies blazed the trail, games followed, and so we already have a trilogy of novels set in the _Halo_ universe. (The notion of a “story universe” is something that I think the science fiction genre established rather early on, but I could be wrong and would love to hear from anyone who has a better sense of the history.) It is indeed the case that _Mass Effect_ was conceived first as a game, but in an effort to help build up interest in the game — so, a form of marketing — and in an effort to provide more backstory in order to make game play more interesting, the game’s makers commissioned a novel to precede the game. Not a prequel after the fact, but a prequel before the fact. (Imagine that.)

To my mind, this opens up a *huuuuge* new and fascinating landscape for fiction. (I hate the term “storytelling” if only because it gets not only over-used but often misused. Not everything is telling a story. Sometimes you’re describing. Sometimes you’re arguing. Please, let’s not confuse everything because it’s fun to say, or safe to say, you’re telling a story.) In this landscape, or network or nexus or whatever, you can allow each medium to do what it does best. Literary texts are often the best way to provide a rich description or provide a backstory — why else does voiceover work so well in so many flashbacks? Game play is great for immersive action. The same goes for audio, video, and images. It’s all so cool. I can’t wait to beef up the home PC and take a crack at the game, but I’ll be sure to read the book first…

Updating Gems

I wasn’t sure if I had ever updated Rails on my MBP, and so, as I begin developing my first real project, I thought it was time. A regular `gem update` didn’t work. I had to use `sudo`:

sudo gem update rails –include-dependencies

The same applied for `gem cleanup`, which deleted the following items — after asking me for confirmation:

Successfully uninstalled rails-1.2.6
Successfully uninstalled rake-0.7.3
Successfully uninstalled actionwebservice-1.2.3
Successfully uninstalled activerecord-1.15.3
Successfully uninstalled actionmailer-1.3.3
Successfully uninstalled actionpack-1.13.3
Successfully uninstalled activesupport-1.4.2

I now have to do the same on the iMac.

Problem Space

In problem solving, the **problem space** is the set of all possible operations that can be performed in an attempt to reach a solution. The idea is credited, at least in one place, to A. Newell, who defined the *problem space principle* as “The rational activity in which people engage to solve a problem can be described in terms of (1) a set of states of knowledge, (2) operators for changing one state into another, (3) constraints on applying operators and (4) control knowledge for deciding which operator to apply next.”


From Tom Gruber’s web page:

> Short answer: an ontology is a specification of a conceptualization.

> The word “ontology” seems to generate a lot of controversy in discussions about AI. It has a long history in philosophy, in which it refers to the subject of existence. It is also often confused with epistemology, which is about knowledge and knowing.

> In the context of knowledge sharing, I use the term ontology to mean a specification of a conceptualization. That is, an ontology is a description (like a formal specification of a program) of the concepts and relationships that can exist for an agent or a community of agents. This definition is consistent with the usage of ontology as set-of-concept-definitions, but more general. And it is certainly a different sense of the word than its use in philosophy.

> What is important is what an ontology is for. My colleagues and I have been designing ontologies for the purpose of enabling knowledge sharing and reuse. In that context, an ontology is a specification used for making ontological commitments. The formal definition of ontological commitment is given below. For pragmetic reasons, we choose to write an ontology as a set of definitions of formal vocabulary. Although this isn’t the only way to specify a conceptualization, it has some nice properties for knowledge sharing among AI software (e.g., semantics independent of reader and context). Practically, an ontological commitment is an agreement to use a vocabulary (i.e., ask queries and make assertions) in a way that is consistent (but not complete) with respect to the theory specified by an ontology. We build agents that commit to ontologies. We design ontologies so we can share knowledge with and among these agents.

> This definition is given in the article: > T. R. Gruber. A translation approach to portable ontologies. _Knowledge Acquisition_ 5(2):199-220, 1993. Available [on line][http://tomgruber.org/writing/ontolingua-kaj-1993.htm]. A more detailed description is given in T. R. Gruber. 1995. Toward principles for the design of ontologies used for knowledge sharing. Presented at the Padua workshop on Formal Ontology, March 1993, later published in _International Journal of Human-Computer Studies_ 43(4-5): 907-928. Available [online][http://tomgruber.org/writing/onto-design.htm].

### Ontologies as a specification mechanism

A body of formally represented knowledge is based on a conceptualization: the objects, concepts, and other entities that are assumed to exist in some area of interest and the relationships that hold among them (Genesereth & Nilsson, 1987) . A conceptualization is an abstract, simplified view of the world that we wish to represent for some purpose. Every knowledge base, knowledge-based system, or knowledge-level agent is committed to some conceptualization, explicitly or implicitly. An ontology is an explicit specification of a conceptualization. The term is borrowed from philosophy, where an Ontology is a systematic account of Existence. For AI systems, what “exists” is that which can be represented. When the knowledge of a domain is represented in a declarative formalism, the set of objects that can be represented is called the universe of discourse. This set of objects, and the describable relationships among them, are reflected in the representational vocabulary with which a knowledge-based program represents knowledge. Thus, in the context of AI, we can describe the ontology of a program by defining a set of representational terms. In such an ontology, definitions associate the names of entities in the universe of discourse (e.g., classes, relations, functions, or other objects) with human-readable text describing what the names mean, and formal axioms that constrain the interpretation and well-formed use of these terms. Formally, an ontology is the statement of a logical theory.[^1] We use common ontologies to describe ontological commitments for a set of agents so that they can communicate about a domain of discourse without necessarily operating on a globally shared theory. We say that an agent commits to an ontology if its observable actions are consistent with the definitions in the ontology. The idea of ontological commitments is based on the Knowledge-Level perspective (Newell, 1982) . The Knowledge Level is a level of description of the knowledge of an agent that is independent of the symbol-level representation used internally by the agent. Knowledge is attributed to agents by observing their actions; an agent “knows” something if it acts as if it had the information and is acting rationally to achieve its goals. The “actions” of agents—including knowledge base servers and knowledge-based systems— can be seen through a tell and ask functional interface (Levesque, 1984) , where a client interacts with an agent by making logical assertions (tell), and posing queries (ask). Pragmatically, a common ontology defines the vocabulary with which queries and assertions are exchanged among agents. Ontological commitments are agreements to use the shared vocabulary in a coherent and consistent manner. The agents sharing a vocabulary need not share a knowledge base; each knows things the other does not, and an agent that commits to an ontology is not required to answer all queries that can be formulated in the shared vocabulary. In short, a commitment to a common ontology is a guarantee of consistency, but not completeness, with respect to queries and assertions using the vocabulary defined in the ontology.

[^1]: Ontologies are often equated with taxonomic hierarchies of classes, but class definitions, and the subsumption relation, but ontologies need not be limited to these forms. Ontologies are also not limited to conservative definitions, that is, definitions in the traditional logic sense that only introduce terminology and do not add any knowledge about the world (Enderton, 1972) . To specify a conceptualization one needs to state axioms that do constrain the possible interpretations for the defined terms.

My Scholarly Practices

This Project Bamboo workshop is certainly emphasizing the first word in “workshop”: we have several homework assignments to complete before we get to Chicago. The first is, obviously, to read through the proposal. The second is to read some items out of the working bibliography. The third is to “identify [our] scholarly practices.”

The proposal is an interesting document. It weighs in at 39 pages, from the front page listing quite a collection of individuals to the final page which lists a budget of $2,431,000. The meat of the document is really in the third section, focused on the “perspectives of the five communities.” Those communities are:

1. Arts and Humanities Scholars
2. Computer Science
3. Information Science
4. Library and Scholarly Communications
5. Central Information Technology Organizations

That particular listing is interesting in itself, no? The first thing I noticed was that the parallelism is off: only the arts and humanities are embodied. The next two are disciplines. The fourth is a process. And the fifth? Well, it’s an abstraction of an abstraction. If you dig into the sections themselves, the one focusing on humanities scholars is the longest and most … the word I want to use is “splintered” or “fractured.” There are just so many impulses and directions. It feels like someone tried to tame either a wide-ranging discussion or an argument into something with a bit more cohesion. Bakhtin would have called this “multivocal.” The remaining four sections are much more cohesive and concise. (Compare this first section with the one on “Information Science.”)

My initial conclusion is that humanities scholars were (1) the hardest to get focused and/or (2) perhaps the farthest from the authors’ own perspective. The latter case seems like a fair way to account for the diversity of that section: perhaps they simply didn’t feel comfortable synthesizing what they were getting. (But I won’t leave out the fact that humanities scholars can be a fractious bunch who are difficult to keep on task. Heck, I can be just as bad — if not worse — as anyone else when it comes to behaving badly. [Dreadful having to admit that.]) Reading the proposal closely is perhaps better left for another time.

More importantly is my other bit of homework: defining a scholarly practice. This was a really interesting activity and one I think I will take into the classroom with me. That is probably because I have already begun making it a part of my own life and making it, in some way, part of my pedagogy. My interest in breaking practices into tasks is based on my having climbed upon the Geek Express where they serve mighty helpings of [GTD][gtd].

GTD, for those who have not already been so inundated by references to it that your eyes have already rolled into the back of your head, is short for “Getting Things Done.” It is a time/life management system developed by David Allen and packaged into a reasonably easily digested book titled, surprise, _[Getting Things Done: The Art of Stress-Free Productivity][gtdb]_. In the book, Allen argues for what he calls the “natural planning” process where we all break complex tasks down into more manageable chunks. So, to use a version of one of his favorite examples, if you want to go out to dinner, then you have to: * decide on a restaurant * call to make reservations * get ready to go * drive there * *etc.* … Porject Bamboo is asking workshop participants to do much the same thing, but instead of looking at a particular project, they are asking us to imagine a particular practice that is, itself, part of a compound practice. (More on this later.)

Here is there definition of practices versus tasks:

> A **scholarly practice** can be defined as a set of tasks that accomplishes a scholarly goal or objective. A practice is typically a collection of tasks AND, most significantly, has a scholarly purpose that can be broadly understood by other scholars.

> For our purpose, a **task** can be defined as a unit of work often completed in a set period of time. It typically does not have a scholarly purpose in and of itself. Their examples are: * *Booking travel* is a task / *Attending a conference* is a practice * *Finding a book* is a task / *Locating source materials* is a practice I like to play with the edges of things and some part of me must have picked up on the repetition of *book* in their examples, and so I decided that the practice I would break down would be **fieldwork**, and that, for the sake of the assignment, it consisted of the following tasks:

* checking equipment and supplies (charging equipment, clearing cards)
* calling individuals to set up interviews
* traveling to site
* interviewing individuals / documenting an event
* making notes
* making drawings
* taking photographs
* returning from site
* logging miles, notes, images
* uploading images
* making summaries of day

There’s more to say about so much of this — like how “artists” keep dropping out of the proposal, which really wants to focus on the humanities (and that might be a good thing because the humanities themselves are already so diverse).

[gtd]: http://www.davidco.com/what_is_gtd.php [gtdb]: http://www.amazon.com/gp/product/0142000280?ie=UTF8&tag=johnlaudun-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=0142000280

Notes for Project Bamboo Workshop 1

Clai and I sat down to talk a bit more about what we are already thinking about as a way to start clearing ground for building something new in dialogue with Project Bamboo:

**Modularity and the Mature Platform**. One of the things we worry about is the problem that mature platforms which is sometimes known as “software bloat” or “feature creep.” That is, the humanities, let alone the arts and humanities, represent, as some of the sources in the bibliography make clear, a very diverse audience. That diversity is not only in terms of needs/wants but also in terms of interests and abilities. Given such diversity, how does one develop a program or platform which affords the majority of users what they consider to be essential functionality and not, in the process, have it so full/cluttered that it is unusable?

**Granularity is multi-dimensional.** Another thing we discussed was the fact that the fine-grained analysis that the digital era promises also means different things to different people. To highlight one dimension of what we mean by the multi-dimensionality of granularity, we would point only to the current discussions about meta-data. For some users, meta-data are the tags, or descriptors, associated with an item. E.g., the Dublin Core’s suggestive list of tags/meta-data. On the one hand, this notion of meta-data is foundational. On another, this implementation does not go far enough: they would like to be able to tag content within digital artifacts — texts, images, audio, video. For a linguist interested in pronoun usage from a previous era, being able to distinguish between “the” as an article and alternate spellings of “thee” as “the” as a pronoun is crucial. Perhaps another way to say this is that one person’s beach is another person’s coastline. We think it is ineluctable that data will get described in more sophisticated, “fine-grained” if you like, ways as we move forward and that the important thing is establish a base-line from which everyone starts and upon which everyone can depend.

**Achieving platform sophistication means going both ways.** One way around, perhaps only current, potential limitations of being able to carry content with tags both attached to it and attached to parts within it would be to make users more capable. In the example above, the number of “thes” that would amount to false positives could be significantly reduced by better searching, e.g., use of regular expressions. Regular expressions, while somewhat different across various scripting and programming languages, are fairly consistent and not that difficult to use. They are not, however, part of any humanities computing course of which we are aware.

Some Ideas I’m Taking with Me to Chicago

I’m not quite sure what I am going to encounter in Chicago, but if I were to dream up a digital infrastructure *right now* I think I would build my dreams on the following:

* A more fully realized version of the [Louisiana Survey][ls] not only in terms of its current contents and scope but expanding that scope to a national level. What the Louisiana Survey does, in its current form, is harness the wiki methodology to allow individuals to contribute to the project’s attempt to document Louisiana’s contemporary folk cultures. I think the kind of indexing and cross-indexing that we’re doing is a somewhat unusual harnessing of the wiki engine/methodology. See: [http://code.google.com/p/louisianasurvey][ls].

* A step toward realizing the full potential of the Archives of Cajun and Creole Folklore in terms of delivering its contents — text, audio, images, and video — on-line and at the same time, like the Louisiana Survey, making it possible to contribute to the Archives.

I see both these projects as a chance to engage an audience which would otherwise not have access to or interest in an university campus and which would, I hope, widen our own disciplinary conventions, perspectives, and assumptions. A very distinct use of interdisciplinary work that would also call upon a fair amount of computing power would be:

* An architectural survey that, a la the Historic American building Survey (HABS), would document extant structures but would expand the range of the “historical” to be *all* of history. Currently, HABS’ notion of “historical” means “homes of the wealthy,” which means the HABS survey of the south focuses on the plantation landscape. That has changed in the last decade or so, but there is still so much we don’t know about most architectural forms. Louisiana has some particularly interesting forms because of the shotgun house. The shotgun’s transformation into the Louisiana bungalow has been given some attention, but nothing has been done on the Louisiana ranch that followed — it’s something I have only sketched out in my own notes — and the forms that followed in the rest of the twentieth century. What I would like to do, one day, is harness the power of architectural students to take accurate measurements and then make accurate 3D CAD renderings with the documentary capabilities of humanities students to not only produce amazing 3D virtual models — potentially walkable a la LITE — but models that are not empty structures but filled with objects and individuals and their descriptions and narratives.

[ls]: http://code.google.com/p/louisianasurvey