Documentation Index
Fetch the complete documentation index at: https://agno-v2-team-approvals.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Code
cookbook/05_agent_os/interfaces/whatsapp/agent_with_media.py
from agno.agent import Agent
from agno.db.sqlite import SqliteDb
from agno.models.google import Gemini
from agno.os.app import AgentOS
from agno.os.interfaces.whatsapp import Whatsapp
agent_db = SqliteDb(db_file="tmp/persistent_memory.db")
media_agent = Agent(
name="Media Agent",
model=Gemini(id="gemini-3-flash-preview"),
db=agent_db,
add_history_to_context=True,
num_history_runs=3,
add_datetime_to_context=True,
markdown=True,
)
agent_os = AgentOS(
agents=[media_agent],
interfaces=[Whatsapp(agent=media_agent)],
)
app = agent_os.get_app()
if __name__ == "__main__":
agent_os.serve(app="agent_with_media:app", reload=True)
Usage
Set up your virtual environment
uv venv --python 3.12
source .venv/bin/activate
Set Environment Variables
export WHATSAPP_ACCESS_TOKEN=your_access_token
export WHATSAPP_PHONE_NUMBER_ID=your_phone_number_id
export WHATSAPP_VERIFY_TOKEN=your_verify_token
export WHATSAPP_SKIP_SIGNATURE_VALIDATION=true # For local dev
export GOOGLE_API_KEY=your_google_api_key
See the WhatsApp Bot setup guide for how to get these values from the Meta Developer Dashboard.Install dependencies
uv pip install -U agno google-generativeai
Run Example
python cookbook/05_agent_os/interfaces/whatsapp/agent_with_media.py
Key Features
- Multimodal AI: Gemini Flash for image, video, and audio processing
- Image Analysis: Object recognition, scene understanding, text extraction
- Video Processing: Content analysis and summarization
- Audio Support: Voice message transcription and response
- Context Integration: Combines media analysis with conversation history