Apple’s digital assistant Siri, which ships with the new iPhone 4S, has gotten a lot of attention the last few days. Initially the reaction to Apple’s new flagship phone wasn’t that enthusiastic. This all changed when people actually got the chance to test this new user interaction technology. An experiment, feeding Siri awkward requests, showed that she(*) can provide smart and funny responses. Further, Wired gave the iPhone4S a raving review, with Siri as its main reason. Together with Google Voice, Siri was heralded as the voice powered artificial intelligence that is ‘shaping up to become the next-generation user interface’.
The adoption of Siri as the next generation user interface
The use of voice to control devices could indeed provide a whole new level of ease-of-use, beyond the intuitive user interaction multi-touch provided us. However, that heavily depends on how well the voice recognition (from audio to words in a sentence), the natural language processing (the meaning of the sentence) and the inference of context is (e.g. what is the current context in terms of time, place, current activity of the user and how does it relate to the request?). If one of these processing steps fail (to often), users won’t get the desired result, leaving them frustrated and abandoning the technology.
However, if Siri really is as convenient as Wired’s review suggests, wide use of this technology can be foreseen, especially when the technology further matures in the future. If it works properly, the convenience it will bring to complete tasks fast (find information, book a flight, reserve a table in a restaurant, etc.) will compel many users to adopt it.
And if users want it, other manufacturers will come with their own version of a digital assistant. (By the way, this could spark a new ‘patent war’ in the future. I think all big industrial players have patents in the area of voice control systems.)
How Siri and her offspring will bring the semantic web to life
Currently Siri can only interact with one (?) external service, Yelp. Based on the concepts it understood from the voice command and the interpretation of what the request specifically is about it uses the Yelp API to e.g. find a restaurant.
Now, that’s a great deal for Yelp (and the businesses listed on it), likely it will see it’s service used much more extensively, implying more business. So in short Siri means business!
Since Siri must have an understanding of concepts and what a user request is about, in the future it could in principle use any service through an API. However, without semantics (information about the meaning) of what can be queried through an API this knowledge would need to be hardcoded, just like specific knowledge about the Yelp API must have been added to Siri. For Siri and other digital agents to make use of all the information and services on the web, leveraging its potential to its full extent, it would need information about the meaning of the content of web pages, the meaning of API calls it can make for specific service, what it gets back when calling APIs and how this information needs to be represented.
In short what Siri and her offspring would need is a semantic web, in its broadest sense. It would not only be useful to have information on what content of web pages mean, and how these different pieces of information and concepts are linked, it would also be useful to have semantics related to API calls of web services
So why would we all integrate semantics in to our web content and service APIs? That is a question that didn’t have a strong answer until now. I already gave my new answer earlier in this post: Siri means business! Sure, multiple businesses already saw the potential of the semantic web but I think that Siri and its future improvements and competitors could become the first Artificial Intelligent Agents that will be used on a scale unseen before and will prove a driving force behind businesses and individuals to together build the semantic web on large scale.
– Freddy Snijder
(*) Referring to Siri as a she feels completely natural to me, but that could be cultural bias. I don’t mean to offend anyone