Teaching AI Assistants to Use Smartphones

Researchers from Microsoft and Peking University are developing a training environment called AndroidArena to study how large language models like GPT-4 can interact with and control operating systems autonomously.


  • Researchers found that LLMs struggle with tasks requiring manipulation of an OS due to the vast action spaces, need for inter-app cooperation, and identifying optimal solutions.
  • They developed AndroidArena as a training environment for LLMs to explore an Android-like OS and identified lack of understanding, reasoning, exploration, and reflection as key deficiencies.
  • Simple prompt engineering that provides context about past attempts increased success rates by 27% by addressing the reflection deficiency.
  • This research could be significant for building better AI assistants that can operate autonomously to complete tasks within an OS environment.
  • The capabilities tested include understanding, reasoning, exploration and reflection when interacting with the AndroidArena training environment.
  • LLMs tested include Meta's Llama2 70B, OpenAI's GPT-3.5 and GPT-4. None performed particularly well on tasks requiring OS manipulation.
  • Operating systems provide a distinct challenge compared to games/simulations due to vast action spaces, need for inter-app cooperation and identifying optimal solutions.
  • Overall goal is to enable LLMs like GPT-4 to interact with and control OSes like Android autonomously by overcoming current deficiencies.


Related post