Apple researchers develop AI framework that leverages users’ intents to automatically label data
Personal assistants like Apple’s Siri accomplish tasks through natural language commands.
However, their underlying components often rely on supervised algorithms requiring large amounts of hand-annotated training data. In an attempt to reduce the time and effort taken to collect this data, researchers at Apple developed a framework that leverages user engagement signals to automatically create data-augmenting labels. They report that when incorporated using strategies like multi-task learning and validation with an external knowledge base, the annotated data significantly improve accuracy in a production deep learning system.
“We believe this is the first use of user engagement signals to help generate training data for a sequence labeling task on a large scale, and can be applied in practical settings to speed up new feature deployment when little human-annotated data is available,” wrote the researchers in a preprint paper . “Moreover … user engagement signals can help us to identify where the digital assistant needs improvement by learning from its own mistakes.”
The researchers used a range of heuristics to identify behaviors indicating either positive or negative engagement. A few included tapping on content to engage with it further (a positive response), listening to a song for a long duration (another positive response), or interrupting content provided by an intelligent assistant and manually selecting different content (a negative response). Those signals were selectively harvested in a “privacy-preserving manner” to automatically produce ground truth annotations, and they were subsequently combined with coarse-grained labels provided by human annotators.
In order to incorporate the coarse-grained labels and the inferred fine-grained labels into an AI model, the paper’s coauthors devised a multi-task learning framework that treats coarse-grained and fine-grained entity labeling as two tasks. Additionally, they incorporated an external knowledge base validator consisting of entities and their relations. Given the prediction “something” as a music title and “the Beatles” as a music artist for the query “Play something by the Beatles,” the validator would perform a lookup for the top label alternatives and send them to a component that’d re-rank the predictions and return the best alternative.
The researchers conducted two separate test sets to evaluate the tasks performed by the multi-task model, which they compiled by randomly sampling from the production system and hand-annotating with ground truth labels. They say that across 21 model runs, adding 260,000 training examples “consistently” reduced the coarse-grained entity error rate on a prediction task compared with the baseline for all amounts of human-annotated data. Moreover, they report that adding weakly supervised fine-grained data had a larger impact when there was a relatively small amount of human-annotated data (5,000 examples). Lastly, they report that for examples where any of the top model hypotheses passed the knowledge base validator, the fine-grained entity error rate dropped by around 50%. […]
No comments:
Post a Comment