Friday, May 11, 2007

The Living Encyclopedia - MyEoL


The greatest challenge engineers and architects for the Encyclopedia of Life (EoL) will face is the proposed MyEoL, part of which will be the workbench for authors to contribute content. Without this, EoL degrades to a search engine with a spider (aka EoLbot) with a bit of Catalogue of Life name-smarts. In MyEoL, material must be made accessible in some form of drag-drop interface along with some form of textarea WYSIWYG box to write content. Images and snazzery (my new word-of-the-day) aside, it's ultimately the textual content that will drive discoverability. Afterall, this is the basis for any local or remote search engine index because image and video metadata are terrible. So how do you get contributors to sit down and type content? Do you first create a politically messy granting scheme by getting public funding agencies on-board to fund such manual efforts? Or, do you create something beyond the catch-phrase, Web 2.0?

Like Rod Page of iSpecies fame, I have been following the progress on mash-up technologies like Yahoo's Pipes, Dapper, OpenKapow and similar emerging tools. Nick Gonzalez has a nice overview of these. The one that stands out from all these in my mind is RSSBus. There are two reasons it hasn't quite caught on like Yahoo's Pipes and the others: 1. There is no slick user interface, and 2. it is not yet cross-platform. However, don't sell it short because it is far more powerful than most have given it credit. What really attracts me to RSSBus is its server to desktop and back (push-pull) architecture with the ability to use or create any sort of connector. One can pull xml data from an on-line resource, mix it with local Excel data or other data objects, then churn it out as an RSS feed if so desired. Here then is a superb opportunity for the systematics community - heck any biological community - to leverage this great work. Coincidentally, Donald Hobern (GBIF) has already coined a Biodiversity Data Bus for EoL's server-server communications.

But why stop at the server environment? What would truly change the way we conduct biological research, thus building the EoL dream, is if this data bus were extended to the desktop. Wouldn't you like to mix your local data with that pulled from external resources? I sure would. Better yet, imagine creating a Facebook-like community of colleagues when preparing data for a manuscript. Each co-author contributes his/her data via their desktop RSSBus, leverages the great work on names management undertaken by the Catalogue of Life and uBio thus making a great first crack at merging data sets, then the co-authors in this little invite-only community can collectively work on analyses and presentation for the manuscript. What we typically have today with co-authored manuscripts is one or a few more individuals responsible for the grunt work of merging data sets and making sense of it. At the end of a much more simplified, RSSBus-like communal data merging effort, I would then be very much inclined to click a button and push such a creation or parts of this creation back out to EoL.

What also hasn't been effectively discussed is how EoL will acquire content to feed its pages. Will it be an EoLbot like what was hinted at in a few press announcements or will it be something like DiGIR with canned or modularized Darwin Core-like elements? That may work for existing species page providers who serve their pages from a backend, but what about all that great, flat HTML content out there for which only traditional search engines like Google, Yahoo, MSN and other big players have been scouring? Does EoL intend to use Google, Yahoo, and others and scrape their results to feed the initial EoL species pages? Yikes. That scares me because these engines may and often do produce erroneous results...they haven't got biological intelligence. Google Images is particularly bad at handling nomenclature and image associations as I discovered with some of the indexed content from The Nearctic Spider Database (e.g. HERE). Here's also where we might be creative with RSSBus if it could be married with something like OpenSearch and a client-run spidering and indexing tool for their served content. One such example is the inexpensive Zoom Search that has lots of great plug-ins to read image metadata and ultimately produces an index through its spidering algorithms for use in a template-driven search portal. This sort of system with a UDDI registry would be really cool because attribution is then possible without any great deal of effort. Stripping canned results from Google or Yahoo to build the initial content does not come bundled with attribution for the source. EoL can essentially create a search engine for content providers, freely hand it out and content providers can pretty-up their search portal and spider their content as they want. This is great value because as with Zoom Search, content providers can log search queries and get a sense for what people are actually searching for on their pages. Behind the scenes, EoL pulls content via OpenSearch to feed the RSSBus scaffolding in MyEoL.

Though this video doesn't really give RSSBus its deserved credit and it's tough to see the relevance in biology, it none the less provides a glimpse of what I have been talking about with the "Living Encyclopedia" as opposed to merely an "Encyclopedia of Life".

No comments: