It ain’t the data that was unstructured.

Stone Tech

Photo from wallyg on Flickr under Creative Commons license

So we’re told in a report on e-commerce search engines from, incredibly, May 2012. On the surface, the reporter got it wrong, as they’re wont to do when reporting outside their areas of expertise. The eBay data clearly was structured in a database rather than distributed across random, idiosyncratically formatted documents, which is what unstructured means in this context. The lack of structure was in eBay’s use of its data. Can a company that size and THAT dependent on search really have been limping along with an algorithm that “takes a query and matches it faithfully against the title of items”? Effing eBay was content with SQL string matching in a single effing field??? Apparently technology isn’t a key determinant of success if a company can post solid financial results over a long time even with the limitations of systems as dusty as that.

The search engine project takes time because eBay’s online marketplace has so much variable information from millions of listings that are described differently by each seller – something known as unstructured data in the tech world.

Threat to Google?

The article asks, but it also answers: No, probably not, given the headstart Google has on these e-commerce pretenders. And the reasons aren’t just technical, or even mostly so. The use case for eBay’s or WalMart’s search starts with someone happy to get results from within those silos, so they’ve already missed the key context for people’s product searches: Buyers don’t want a product specifically from one catalog or another, they just want the damn product. Google gets that. (In some ways it has invented that, or at least nurtured it, because its pioneering reach across silos has opened gaping holes in brand loyalty just by making the options so easy to find.) If the search people at one or another e-commerce outfit really believe people will start with their own proprietary single-box interfaces, they’re making the terrible mistake of believing their own marketing messaging.

But the more interesting stuff in the article is in the discussion of that “red dress” search on the Goog. They’re the ones dealing with the unstructured data scattered across the DIVs and TDs of the Web, but they’ve obviously managed to read the semantic cues well enough to suss out some underlying structure. For all the complaining we hear about the indiscriminate, blunt-force tactics of single-box search, Google obviously has managed to infer categories and act on them to structure a display relevant to users. And as reported elsewhere they’re building out that structure in a rather disciplined and targeted way, in part through their integration of Freebase. They’ve done and are doing more with the truly unstructured data of the web than are competitors with long histories of carefully structuring their information, in part because those competitors haven’t bothered to build anything user-facing on top of those carefully constructed foundations.

So when people complain about searchers satisficing within the unstructured mess of Google results — and I’m looking at YOU, librarians — maybe they should take a peak under the hood themselves, possibly learning a lesson or two in the process about how to define a use case and what to do about it after you’re finished. Library systems, AND vendors, are a lot closer to the stone-age technology that eBay is only now replacing than they are to anything current.