#19246: feat(media): add Google Vertex AI media provider

by ronaldslc open 2026-02-17 15:22 View on GitHub →

size: M

# PR: Add Google Vertex AI Media Provider ## Summary - **Problem**: Lack of a native Google Vertex AI provider for media understanding. - **Why it matters**: Enterprise users using Google Cloud Gemini models needed native ADC support for transcribing audio and describing images/videos without manual api-key management. - **What changed**: Added `google-vertex` media provider using the `pi-ai` SDK, integrated it into auto-resolution logic, and added unit tests. - **What did NOT change**: Core orchestration logic for other providers remains stable. ## Change Type (select all) - [ ] Bug fix - [x] Feature - [ ] Docs (Removed from this PR) - [x] Testing (Added unit tests for Vertex provider) ## Scope (select all touched areas) - [x] Integrations - [x] API / contracts - [x] Testing ## Linked Issue/PR - Related # (Insert relevant issue number if applicable) ## User-visible / Behavior Changes - New media provider `google-vertex` available for `tools.media.audio`, `tools.media.image`, and `tools.media.video`. - Media pipeline now auto-resolves to `google-vertex` if the agent's primary model is a Vertex model. ## Security Impact (required) - New permissions/capabilities? (`No`) - Secrets/tokens handling changed? (`No` - uses standard GCP ADC) - New/changed network calls? (`Yes` - calls to Google Vertex AI endpoints) - Command/tool execution surface changed? (`No`) - Data access scope changed? (`No`) - If any `Yes`, explain risk + mitigation: Media data is sent to Google Vertex AI. Users must opt-in by enabling the provider and configuring GCP credentials. ## Repro + Verification ### Environment - OS: Linux - Model/provider: Google Vertex AI (Gemini 3 Flash Preview) ### Steps 1. Set `GOOGLE_APPLICATION_CREDENTIALS`. 2. Configure agent with `google-vertex` model. 3. Send media (audio/image) to the agent. ### Expected - Automatic processing via Vertex Gemini models. ### Actual - Media processed successfully; confirmed via local logs and human verification. ## Evidence - [x] New unit tests in [src/media-understanding/providers/google-vertex/index.test.ts](src/media-understanding/providers/google-vertex/index.test.ts) ## Human Verification (required) Verified audio transcription and image description scenarios using a live Google Cloud project with Application Default Credentials. Tested the auto-resolution logic by setting the primary model to `google-vertex`. ## Compatibility / Migration - Backward compatible? (`Yes`) - Config/env changes? (`Yes` - requires standard GCP environment variables) - Migration needed? (`No`) ## Failure Recovery (if this breaks) - How to disable/revert this change quickly: Revert the commit or unset Vertex configuration in `openclaw.json`. - Files/config to restore: N/A. - Known bad symptoms reviewers should watch for: 403 errors if Vertex AI API is not enabled in the GCP console. ## Risks and Mitigations - Risk: User might have misconfigured ADC. - Mitigation: Users should check GCP project settings; 401/403 errors will appear in debug logs. --- **AI-Assisted PR**: This PR was prepared with assistance from GitHub Copilot (Gemini 3 Flash Preview). https://docs.openclaw.ai/nodes/audio  <h3>Greptile Summary</h3> Adds a new `google-vertex` media provider for audio transcription, image description, and video description using the `pi-ai` SDK with Google Cloud Application Default Credentials (ADC). The provider is registered in the media understanding pipeline and added to all auto-resolution arrays. - **Bug**: `google-vertex` is added to `AUTO_IMAGE_KEY_PROVIDERS` but missing from `DEFAULT_IMAGE_MODELS`, which causes image auto-resolution to silently skip this provider. Add `"google-vertex": "gemini-3-flash-preview"` to `DEFAULT_IMAGE_MODELS` to fix. - The provider implementation itself is clean, with a shared `completeVertexMedia` helper handling timeout/abort and error propagation across all three media types. - Test coverage includes audio, image, empty response, and timeout scenarios but omits `describeVideo` and uses minimal mock params that skip required type fields. <h3>Confidence Score: 3/5</h3> - This PR has a bug where image auto-resolution silently skips google-vertex; audio and video paths should work correctly. - The missing DEFAULT_IMAGE_MODELS entry for google-vertex is a logical bug that prevents the advertised image auto-resolution feature from working. The provider implementation itself is sound, and audio/video auto-resolution should work since they don't require entries in a default models map. Score of 3 reflects a real but non-breaking bug — the feature is partially functional. - Pay close attention to `src/media-understanding/defaults.ts` — the missing `DEFAULT_IMAGE_MODELS` entry for `google-vertex` needs to be added before merge. <sub>Last reviewed commit: f4426b3</sub>  <sub>(2/5) Greptile learns from your feedback when you react with thumbs up/down!</sub>