2011-11-23

Thoughts on Siri

After using the iPhone 4S for a couple of weeks, I have made a few observations about Apple’s voice-based assistant Siri.

Siri is very good at determining which words you have spoken. However, she is less adept at understanding the meaning of these words. This becomes clear in two ways: first of all, the range of topics and tasks is severely limited. Also, Siri is quite limited in her ability parse even moderately complex syntax. For example,  Siri doesn’t understand this simple statement:
“Remind me the day before my brother’s birthday.”
The biggest stumbling block appeared to be understanding the meaning of the phrase the day before. Siri appears not to know that this phrase, combined with the birthday already in my brother’s address book entry, encodes the specific date needed to set up the reminder.

The other big limitation is that Siri has a very shallow conversation flows. She sometimes keeps some context in mind and asks a follow-up question to clarify an ambiguous detail; however, it’s currently not possible to do something like this:
User: Siri, what is David’s birthday?
Siri: I don’t know when David Smith’s birthday is.
User: It’s April 2, 1975
Siri: Would you like me to remember that David Smith’s birthday is April 2, 1975?
First of all, Siri doesn’t know how to add a birthday to an existing contact. Secondly, by the time you tell her the date, she has already forgotten what you were talking about. She doesn’t maintain the conversational context, so she has already discarded the information that would tell her the antecedent of it in the second user statement above.

Next, consider the following example:
User: Siri, remember my appointment tomorrow evening?
Siri: Yes. You have dinner with Karen Jones at 18:30 tomorrow.
User: Please change the time to 8 P.M.
Siri: Alright. I have changed dinner with Karen Jones to 8 P.M. tomorrow.
User: Please add a reminder too. Remind me one hour before.
Siri: OK. I have added your reminder.
This is another conversation that Siri cannot currently have, because there is no straightforward way to set Siri’s context so that she knows that you want to modify an existing appointment. Put simply, there is no way to tell Siri, “Hey, I want to work with item x.”

Siri’s inability to remember or set the conversational context makes the service feel somewhat rigid and restrictive. Adding this ability will be essential to making conversations with Siri more natural and flexible. After using Siri for a while, one begins to feel like one’s conversations are scripted. It’s like the difference between taking a car and taking the train. The car can drive almost anywhere; the train has to stay on the tracks. As soon as you ask something of Siri that is outside of her known conversation flows, she gracefully admits defeat. Then it begins to feel less like talking HAL, and more like talking to an airline’s telephone reservation system. (The many easter eggs people have uncovered are indeed humorous, intended no doubt to distract whimsically from the fact that Siri is not HAL.)

Like Amazon’s Silk browser, Apple’s Siri is a hybrid application that combines programs run locally on the mobile device with operations distributed to a cloud of servers. By splitting up the work between on-device and in-cloud operations, Apple gains several advantages. First of all, the computationally expensive work is offloaded to powerful servers better suited to these tasks. Every operation that can be performed on a server in Apple’s data centers is an operation that doesn’t have to be performed on the iPhone’s A5 CPU. Because computationally-intensive operations consume more power, it makes sense to offload them to the cloud, where battery life is not a concern. This allows Apple to find the right balance between good voice recognition performance and decent battery life. This design requires that Siri have network access, and using the wireless transceivers also consumes power. This means that there’s a trade-off between using battery power to perform calculations locally, and using power to distribute these calculations to the cloud. I expect Apple will adjust and fine-tune this balance in the future, as batteries and mobile CPUs improve.

Another benefit of Siri’s hybrid design is that because the recognition portion of Siri is done in the cloud, many improvements can be made transparently, behind the scenes. Users will not have to install an update on their iPhones to benefit from updates to to the voice recognition engine.

The most important benefit of Siri’s hybrid design though is that Apple is collecting huge numbers of (one hopes, anonymized) Siri conversations. Every question an iPhone user asks Siri is a datapoint Apple will use to refine the system. Voice recognition quality can be expected to improve over time as Apple collects and analyzes millions of voice samples. Even more importantly though, Apple is amassing a wealth of information about what users are asking Siri. When Siri gets the same type of request from thousands of users, this is a clear indication that this is something users want Siri to be able to do. Popular requests that Siri cannot yet handle will be given priority when it comes time to add new features. We can expect an expansion of the topics about which Siri is conversant, the depth of her conversations, and the actions of which she is capable.

Already, Siri is fun and quite useful in some limited situations. It’s just a first step though. Siri 1.0 is but a tantalizing taste of what is to come.

No comments: