EVA OS - Multimodal AIOS for Next-Generation Hardware

EVA

Multimodal AIOS for Next-Generation Hardware

Create a real brain with “life” and “high intelligence” for your hardware

EVA OS

Intelligent Multimodal Core

Our Advantages

EVA OS is AutoArk’s core multimodal system, combining speech, video, and vision with ASR/TTS and system orchestration to enable real-time, natural interaction. We welcome developers to join as Maintainers.

Multimodal Interaction

Voice, vision, and text work seamlessly together for natural real-time interaction.

视频

EVA OS

音频

Speaking

EVA OS

文本

你是一名心理疏导机器人 |

EVA OS

Low-Latency Speech & Visual

ASR, translation, and TTS deliver fast, stable, and interruption-free communication.

Custom Voices & Emotion

Create unique voice styles with fine-grained emotional expression.

Zero-Code Multimodal Agents

Build powerful multimodal agents effortlessly with simple drag-and-drop tools.

EVA OS

Full Hardware Development Support

Complete open-source firmware for all

supported boards.

ESP32、RK 3562/3562/3568...

EVA OS

Full Hardware Development Solutions

Hardware Product Use Cases

Aqi Companion Robot

Kidodo Early Education Robot

Designed for children aged 0–10, combining emotional companionship and science-based learning with AI-driven interactive stories.

Confidential Smartphone Assistant Project

A next-generation AI assistant built on EVA’s multimodal model, enabling natural dialogue, real-time perception, and intelligent decision-making. Currently in confidential development.

Confidential AR Glasses Project

Lightweight AR intelligent eyewear with visual understanding, spatial sensing, and multimodal interaction. Designed for seamless real-world integration. Technical details remain confidential.

Confidential In-Vehicle Assistant Project

An intelligent in-car assistant combining voice and visual capabilities for real-time perception and multi-scene assistance. Developed in partnership with automotive teams.

EVA OS

Development History

EVA OS Timeline

01 Aug 2024

Multimodal Base Model Released

Launched our foundation multimodal model built on Transformer architecture, supporting long-context generation up to 2M tokens.

05 Sep 2024

Aqi Robot Debuts at Overseas Expo

Aqi showcased ArkModel 2.0 with GPT-4o-level latency, multimodal perception, and interactive capabilities, attracting significant attention.

16 Sep 2025

Kidodo Pre-orders Exceed 10,000 Units

“Kidodo AI Story Companion” surpassed 10,000 pre-orders and joined the LUMI Global Education Initiative.

08 Dec 2025

EVA EVA OS Developer Release

Released desktop and mobile demos, streaming solution configs, video/audio pipelines, TTS voices, Agent tools, MCP components, open-source docs, and GitHub access for developers.

Upcoming

New features and capabilities coming soon. Stay tuned.

🔌 Embedded SDK: ESP32/RK/MCU hardware-level deep adaptation

☁️ Cloud-Edge Collaboration: Cloud-side compute scheduling + device-side precise instruction dispatch

🧠 Intelligent Memory System: Device-level short-term interaction memory + long-term preference memory, multimodal technology

🔧 MCP Tool Extension: Open third-party tool access interface

🎭 Custom Digital Avatar: Support photo generation of custom avatar

🏢 Enterprise-level Capabilities: Multi-model hybrid deployment + high-concurrency capability

An open-source, real-time multimodal agent engine that lets your devices hear, see, and act.

Visit GitHub