Tutorial: Claude Computer Use & Gemini Live

Anthropic shipped 74 releases in 52 days — and two stand out for real workflow impact. This tutorial shows you how to set up Claude Computer Use for autonomous desktop control and run live multimodal Gemini conversations in Google AI Studio. The verified docs layer flags where the video and official sources diverge, including a key model name correction for Gemini Live.


0

Automate Your Desktop with Claude Computer Use and Explore Gemini Live Multimodal Conversations

Anthropic shipped 74 releases across 52 days heading into late March 2026 — and buried inside that cadence are two features with real workflow implications: Claude’s ability to take autonomous control of your desktop, and Google’s Gemini Live model that watches your screen and talks you through what it sees. By the end of this walkthrough, you’ll know how to enable Claude computer use, set up a Co-work project with custom instructions, and run a live multimodal session inside Google AI Studio.

The full Anthropic release calendar: every Claude Code, Cowork, and model update shipped between Feb 1 and Mar 24, 2026, mapped day by day.
The full Anthropic release calendar: every Claude Code, Cowork, and model update shipped between Feb 1 and Mar 24, 2026, mapped day by day.

Part 1: Claude Computer Use

  1. Open the Claude desktop app, navigate to Settings → General → Desktop App, and flip the Computer Use toggle on. Without this step, the feature remains dormant regardless of your subscription tier — it requires at minimum a paid plan.
Anthropic's official announcement page for Claude computer use via Cowork and Claude Code — point, click, and assign tasks from your phone.
Anthropic’s official announcement page for Claude computer use via Cowork and Claude Code — point, click, and assign tasks from your phone.
  1. Type a natural-language instruction into the Co-work chat panel. The video demonstrates: “Open DaVinci Resolve and show me where the Magic Mask feature is.” Click Let’s go and take your hands off the keyboard — Claude takes over mouse and keyboard control from that point forward. A faint orange glow borders the screen while it’s active.

  2. Watch Claude work autonomously. It opens the application, navigates to the Color page, and surfaces the relevant control — without any further input from you.

Warning: this step may differ from current official documentation — see the verified version below.

The video shows Claude’s first attempt timing out entirely before succeeding on retry. Expect the full sequence to take approximately five minutes for tasks a human would complete in ten seconds.

Claude Cowork in action: the AI narrates its plan to locate DaVinci Resolve's Magic Mask feature using on-screen tools and touch gestures.
Claude Cowork in action: the AI narrates its plan to locate DaVinci Resolve’s Magic Mask feature using on-screen tools and touch gestures.
  1. To trigger computer use tasks while away from your desk, use Claude’s Dispatch feature from the mobile app. You compose the instruction on your phone; your desktop executes it. The latency tradeoff matters far less when you’re not watching it happen in real time.

  2. Inside Co-work, open the Work in a project dropdown and select Create new project. Give it a name, write a system prompt in the custom instructions field to shape Claude’s behavior for that context, then attach any relevant files before clicking Create.

Anthropic's Auto mode for Claude Code: a new permissions model where Claude monitors its own actions before executing — no more --dangerously-skip-permissions.
Anthropic’s Auto mode for Claude Code: a new permissions model where Claude monitors its own actions before executing — no more –dangerously-skip-permissions.

Part 2: Gemini Live in Google AI Studio

  1. Go to aistudio.google.com, select the Real-time tile from the Playground home screen, and switch the model dropdown to Gemini 3.1 Flash Live.
Google AI Studio Playground: the entry point for Gemini Live — select 'Real-time' to access live voice and video multimodal conversations.
Google AI Studio Playground: the entry point for Gemini Live — select ‘Real-time’ to access live voice and video multimodal conversations.
  1. Grant webcam access when prompted, then ask the model what it currently sees. It will describe the live video feed in natural language — in the demo, it correctly identifies a recording studio setup, microphone, and display behind the host.
Gemini 3.1 Flash Live streaming in Google AI Studio: the model analyzes a live video feed and describes the host's studio setup in real time.
Gemini 3.1 Flash Live streaming in Google AI Studio: the model analyzes a live video feed and describes the host’s studio setup in real time.
  1. Stop the webcam feed, switch to Screen Share, and select any open application window — the video uses OBS Studio. Ask Gemini to identify what it sees. It accurately lists visible scenes, the audio mixer panel, and the controls column, then asks what you want to accomplish inside the app.

  2. On an Android or iOS device, open the Google app, tap the icon next to AI Mode (the star icon), and start speaking. The session runs as a live multimodal conversation — the same Gemini 3.1 Flash Live model, accessed without opening a browser.

  3. Open the Gemini app separately and tap Create Music to access the Lyria 3 Pro long-form song generation feature. The demo cuts off at this point before a full generation completes.


How does this compare to the official docs?

The video moves fast across a dense feature set — but the official documentation for Claude computer use, Co-work projects, and Gemini Live each contain permission models, API parameters, and limitations that the demo doesn’t surface.

Here’s What the Official Docs Show

The video covers a genuinely useful slice of what shipped from Anthropic and Google in early 2026, and most of the core demonstrations hold up. What the docs add is precision — correct model names, accurate product branding, and a handful of technical constraints worth knowing before you build anything on top of these features.


Part 1: Claude Computer Use

Step 1 — Enabling Computer Use

No official documentation was found for this step — proceed using the video’s approach and verify independently.

Check your plan tier before toggling anything. The current claude.ai pricing tiers are Free ($0), Pro ($17/month billed annually or $20/month), and Max (from $100/month) — and the docs do not specify on the pricing page which tier gates Computer Use access.

Claude.ai pricing page showing Free, Pro ($17/mo), and Max (from $100/mo) individual plan tiers — plan-level access to Computer Use is not specified on this page.
📄 Claude.ai pricing page showing Free, Pro ($17/mo), and Max (from $100/mo) individual plan tiers — plan-level access to Computer Use is not specified on this page.

Verify which plan includes Computer Use at claude.ai before proceeding.


Step 2 — Giving Claude a Natural-Language Instruction

The video’s approach here matches the current docs exactly. DaVinci Resolve is an active, maintained application under Blackmagic Design’s support umbrella — the current release visible in official documentation is DaVinci Resolve Studio 20.2.

Blackmagic Design support listing showing DaVinci Resolve Studio 20.2 release notes and New Features Guide as current documentation.
📄 Blackmagic Design support listing showing DaVinci Resolve Studio 20.2 release notes and New Features Guide as current documentation.

One useful note: Magic Mask does not appear by name in any visible support article title in the Blackmagic documentation — so if Claude needs reference material to locate it, pointing it at the New Features Guide for your installed version is the most reliable path.


Step 3 — Watching Claude Work Autonomously

No official documentation was found for this step — proceed using the video’s approach and verify independently.


Step 4 — Remote Triggering via Dispatch

No official documentation was found for this step — proceed using the video’s approach and verify independently.

“Claude’s Dispatch feature” does not appear by name in any Claude.ai or Claude Code documentation captured in the screenshots. The Claude Code docs do list “Schedule recurring tasks” as a supported capability, but that refers to Claude Code’s development workflow automation — not desktop Computer Use triggered from a mobile device. If Dispatch exists, it is not yet represented in official documentation.

Claude Code Docs 'What you can do' section listing eight capability categories including recurring task scheduling and agentic automation — scoped to software development, not general desktop control.
📄 Claude Code Docs ‘What you can do’ section listing eight capability categories including recurring task scheduling and agentic automation — scoped to software development, not general desktop control.

Step 5 — Creating a Cowork Project

The video’s approach here matches the current docs exactly, with one branding correction worth noting plainly. As of March 27, 2026, the official product name on claude.ai is Cowork — one word, no hyphen. The video refers to it as “Co-work” and “Co-work Projects”; neither matches the interface labeling.

Claude.ai showing the Cowork interface with Context panel, folder structure (Meeting Transcripts, Quarterly Reports), and Progress tracker — the product is branded 'Cowork' throughout.
📄 Claude.ai showing the Cowork interface with Context panel, folder structure (Meeting Transcripts, Quarterly Reports), and Progress tracker — the product is branded ‘Cowork’ throughout.

The interface itself also doesn’t use the word “Projects” — what you’re creating is a Cowork context with custom instructions and attached files. Anthropic’s official framing positions Cowork as autonomous background task execution: the tagline is “Let Claude power through tasks so you can focus on what matters most.”

Claude.ai 'Meet Cowork' section with Anthropic's official product description emphasizing autonomous task execution as the core capability.
📄 Claude.ai ‘Meet Cowork’ section with Anthropic’s official product description emphasizing autonomous task execution as the core capability.

Part 2: Gemini Live in Google AI Studio

Step 6 — Selecting the Live Model in AI Studio

As of March 27, 2026, the correct model to select is Gemini 3.1 Flash Live — the video instructs selecting “Gemini 2.0 Flash Live,” which reflects an earlier version. No Gemini 2.0 series model appears in the current model lineup on the Gemini API documentation page. The entire current generation is 3.x.

Gemini API docs homepage with banner promoting Gemini 3.1 Flash Live as the current audio-to-audio Live model, and sidebar listing current model families — no 2.0 series is present.
📄 Gemini API docs homepage with banner promoting Gemini 3.1 Flash Live as the current audio-to-audio Live model, and sidebar listing current model families — no 2.0 series is present.
Gemini API model card grid showing current 'New' models: Gemini 3.1 Pro, Gemini 3 Flash, Gemini 3.1 Flash-Lite — all in the 3.x generation.
📄 Gemini API model card grid showing current ‘New’ models: Gemini 3.1 Pro, Gemini 3 Flash, Gemini 3.1 Flash-Lite — all in the 3.x generation.

No additional official documentation was found for the AI Studio UI navigation described in this step — use the corrected model name above and verify the interface path independently.


Step 7 — Webcam Input: Ask What Gemini Sees

The video’s approach here matches the current docs exactly. The Gemini Live API officially supports real-time voice and vision interactions, with webcam confirmed as a valid input modality.

Gemini Live API overview showing real-time voice and vision capability description and the App-to-Live-API WebSocket architecture — webcam and screen share are both confirmed input types.
📄 Gemini Live API overview showing real-time voice and vision capability description and the App-to-Live-API WebSocket architecture — webcam and screen share are both confirmed input types.

One technical detail the tutorial skips: image input via the Live API is limited to JPEG at ≤1 FPS. Gemini is processing approximately one frame per second from your webcam — not a continuous video stream. This doesn’t affect what you can do in AI Studio, but it matters if you’re building anything on top of the API.

Gemini Live API technical spec table showing image input at ≤1 FPS JPEG, audio specs (16-bit PCM, 16kHz input / 24kHz output), and WSS protocol.
📄 Gemini Live API technical spec table showing image input at ≤1 FPS JPEG, audio specs (16-bit PCM, 16kHz input / 24kHz output), and WSS protocol.

Step 8 — Screen Share with OBS Studio

The video’s approach here matches the current docs exactly. OBS Studio 32.1.0 is the current release, and its interface — Scenes, Sources, Audio Mixer, Scene Transitions, Controls — is exactly the kind of multi-panel UI the Live API is designed to parse.

OBS Studio interface showing Scenes panel, Sources (Image, Window Capture, Video, Browser, Color Source), Audio Mixer, and Controls — the interface Gemini Live analyzes in step 8.
📄 OBS Studio interface showing Scenes panel, Sources (Image, Window Capture, Video, Browser, Color Source), Audio Mixer, and Controls — the interface Gemini Live analyzes in step 8.

The same ≤1 FPS image input constraint from step 7 applies here. Also worth knowing: the Live API supports barge-in (you can interrupt the model mid-response), tool use (function calling and Google Search), and audio transcriptions — none of which the tutorial demonstrates but all of which are available in the same AI Studio session.

Gemini Live API key features section listing multilingual support (70+ languages), barge-in, tool use, and audio transcriptions as core capabilities not covered in the tutorial.
📄 Gemini Live API key features section listing multilingual support (70+ languages), barge-in, tool use, and audio transcriptions as core capabilities not covered in the tutorial.

Step 9 — Google App Live AI on Mobile

No official documentation was found for this step — proceed using the video’s approach and verify independently.

On the desktop google.com homepage, only a single AI Mode button is visible inside the search bar — no adjacent “Live AI” button appears. The mobile Google app UI may differ, but it is not represented in the available documentation screenshots.

Google.com desktop homepage showing 'AI Mode' integrated into the search bar — no separate 'Live AI' button is present on the desktop version.
📄 Google.com desktop homepage showing ‘AI Mode’ integrated into the search bar — no separate ‘Live AI’ button is present on the desktop version.

Step 10 — Gemini App Music Generation via Lyria 3

No official documentation was found for this step — proceed using the video’s approach and verify independently.

One important distinction: the screenshots showing an “AI Music” button are from Genspark (genspark.ai) — a third-party AI workspace that runs on Claude Opus 4.6, not a Google product. The Gemini API docs do list Lyria 3 in the model sidebar, but no “Create Music” button path inside the Gemini app is confirmed by any available documentation.

Genspark AI Workspace 3.0 homepage showing AI Music tool — a Genspark-native feature on a separate platform, not the Gemini app path described in step 10.
📄 Genspark AI Workspace 3.0 homepage showing AI Music tool — a Genspark-native feature on a separate platform, not the Gemini app path described in step 10.

  1. Claude — Official Claude.ai homepage where Cowork is accessible via the Chat/Cowork tab toggle, with plan pricing and desktop app download
  2. Claude Code overview – Claude Code Docs — Official documentation for Claude Code as a distinct agentic coding tool, separate from the Computer Use desktop automation feature
  3. Gemini API | Google AI for Developers — Gemini API documentation hub listing current model families (all 3.x generation) and capabilities including Voice Agents with Live API
  4. Gemini Live API overview | Gemini API | Google AI for Developers — Full technical specification for the Live API including the ≤1 FPS image input constraint, barge-in, tool use, and audio transcription features
  5. Google — Google.com desktop homepage showing the current AI Mode button placement inside the search bar
  6. Support Center | Blackmagic Design — Official Blackmagic Design support portal for DaVinci Resolve, showing Studio 20.2 as the current release
  7. Open Broadcaster Software | OBS — OBS Studio homepage confirming version 32.1.0 as the current release with download links for Windows, macOS, and Linux
  8. Genspark – Your All-in-One AI Workspace — Genspark AI Workspace 3.0, a third-party platform distinct from the Gemini app — its AI Music tool does not confirm the step 10 navigation path

Like it? Share with your friends!

0

What's Your Reaction?

hate hate
0
hate
confused confused
0
confused
fail fail
0
fail
fun fun
0
fun
geeky geeky
0
geeky
love love
0
love
lol lol
0
lol
omg omg
0
omg
win win
0
win

0 Comments

Your email address will not be published. Required fields are marked *