āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā ā š browser-use/quickstart/llm ā āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā
Source: https://docs.browser-use.com/customize/actor/all-parameters
Complete API reference for Browser Actor classes, methods, and parameters including BrowserSession, Page, Element, and Mouse
Main browser session manager.
from browser_use import Browser
browser = Browser()
await browser.start()
# Page management
page = await browser.new_page("https://example.com")
pages = await browser.get_pages()
current = await browser.get_current_page()
await browser.close_page(page)
# To stop the browser session
await browser.stop()
See Browser Parameters for complete configuration options.
Browser tab/iframe for page-level operations.
goto(url: str) - Navigate to URLgo_back(), go_forward(), reload() - History navigationget_elements_by_css_selector(selector: str) -> list[Element] - CSS selectorget_element(backend_node_id: int) -> Element - By CDP node IDget_element_by_prompt(prompt: str, llm) -> Element | None - AI-poweredmust_get_element_by_prompt(prompt: str, llm) -> Element - AI (raises if not found)evaluate(page_function: str, *args) -> str - Execute JS (arrow function format)press(key: str) - Send keyboard input ("Enter", "Control+A")set_viewport_size(width: int, height: int) - Set viewportscreenshot(format='jpeg', quality=None) -> str - Take screenshotget_url() -> str, get_title() -> str - Page infomouse -> Mouse - Get mouse interfaceextract_content(prompt: str, structured_output: type[T], llm) -> T - Extract dataIndividual DOM element interactions.
click(button='left', click_count=1, modifiers=None) - Click elementfill(text: str, clear=True) - Fill inputhover(), focus() - Mouse/focus actionscheck() - Toggle checkbox/radioselect_option(values: str | list[str]) - Select dropdown optionsdrag_to(target: Element | Position) - Drag and dropget_attribute(name: str) -> str | None - Get attributeget_bounding_box() -> BoundingBox | None - Position/sizeget_basic_info() -> ElementInfo - Complete element infoscreenshot(format='jpeg') -> str - Element screenshotCoordinate-based mouse operations.
click(x: int, y: int, button='left', click_count=1) - Click at coordinatesmove(x: int, y: int, steps=1) - Move mousedown(button='left'), up(button='left') - Press/release buttonsscroll(x=0, y=0, delta_x=None, delta_y=None) - Scroll at coordinatesSource: https://docs.browser-use.com/customize/actor/basics
Low-level Playwright-like browser automation with direct and full CDP control and precise element interactions
graph TD
A[Browser] --> B[Page]
B --> C[Element]
B --> D[Mouse]
B --> E[AI Features]
C --> F[DOM Interactions]
D --> G[Coordinate Operations]
E --> H[LLM Integration]
from browser_use import Browser, Agent
from browser_use.llm.openai import ChatOpenAI
async def main():
llm = ChatOpenAI(api_key="your-api-key")
browser = Browser()
await browser.start()
# 1. Actor: Precise navigation and element interactions
page = await browser.new_page("https://github.com/login")
email_input = await page.must_get_element_by_prompt("username field", llm=llm)
await email_input.fill("your-username")
# 2. Agent: AI-driven complex tasks
agent = Agent(browser=browser, llm=llm)
await agent.run("Complete login and navigate to my repositories")
await browser.stop()
get_elements_by_css_selector() doesn't wait for visibilityevaluate() requires arrow function format: () => {}Source: https://docs.browser-use.com/customize/actor/examples
Comprehensive examples for Browser Actor automation tasks including forms, JavaScript, mouse operations, and AI features
from browser_use import Browser
browser = Browser()
await browser.start()
# Create pages
page = await browser.new_page() # Blank tab
page = await browser.new_page("https://example.com") # With URL
# Get all pages
pages = await browser.get_pages()
current = await browser.get_current_page()
# Close page
await browser.close_page(page)
await browser.stop()
page = await browser.new_page('https://github.com')
# CSS selectors (immediate return)
elements = await page.get_elements_by_css_selector("input[type='text']")
buttons = await page.get_elements_by_css_selector("button.submit")
# Element actions
await elements[0].click()
await elements[0].fill("Hello World")
await elements[0].hover()
# Page actions
await page.press("Enter")
screenshot = await page.screenshot()
from browser_use.llm.openai import ChatOpenAI
from pydantic import BaseModel
llm = ChatOpenAI(api_key="your-api-key")
# Find elements using natural language
button = await page.get_element_by_prompt("login button", llm=llm)
await button.click()
# Extract structured data
class ProductInfo(BaseModel):
name: str
price: float
product = await page.extract_content(
"Extract product name and price",
ProductInfo,
llm=llm
)
# Simple JavaScript evaluation
title = await page.evaluate('() => document.title')
# JavaScript with arguments
result = await page.evaluate('(x, y) => x + y', 10, 20)
# Complex operations
stats = await page.evaluate('''() => ({
url: location.href,
links: document.querySelectorAll('a').length
})''')
mouse = await page.mouse
# Click at coordinates
await mouse.click(x=100, y=200)
# Drag and drop
await mouse.down()
await mouse.move(x=500, y=600)
await mouse.up()
# Scroll
await mouse.scroll(x=0, y=100, delta_y=-500)
asyncio.sleep() after actions that trigger navigationbrowser.stop() to clean up resourcesSource: https://docs.browser-use.com/customize/agent/all-parameters
Complete reference for all agent configuration options
tools: Registry of tools the agent can call. Examplebrowser: Browser object where you can specify the browser settings.output_model_schema: Pydantic model class for structured output validation. Exampleuse_vision (default: "auto"): Vision mode - "auto" includes screenshot tool but only uses vision when requested, True always includes screenshots, False never includes screenshots and excludes screenshot toolvision_detail_level (default: 'auto'): Screenshot detail level - 'low', 'high', or 'auto'page_extraction_llm: Separate LLM model for page content extraction. You can choose a small & fast model because it only needs to extract text from the page (default: same as llm)initial_actions: List of actions to run before the main task without LLM. Examplemax_actions_per_step (default: 10): Maximum actions per step, e.g. for form filling the agent can output 10 fields at once. We execute the actions until the page changes.max_failures (default: 3): Maximum retries for steps with errorsfinal_response_after_failure (default: True): If True, attempt to force one final model call with intermediate output after max_failures is reacheduse_thinking (default: True): Controls whether the agent uses its internal "thinking" field for explicit reasoning steps.flash_mode (default: False): Fast mode that skips evaluation, next goal and thinking and only uses memory. If flash_mode is enabled, it overrides use_thinking and disables the thinking process entirely. Exampleoverride_system_message: Completely replace the default system prompt.extend_system_message: Add additional instructions to the default system prompt. Examplesave_conversation_path: Path to save complete conversation historysave_conversation_path_encoding (default: 'utf-8'): Encoding for saved conversationsavailable_file_paths: List of file paths the agent can accesssensitive_data: Dictionary of sensitive data to handle carefully. Examplegenerate_gif (default: False): Generate GIF of agent actions. Set to True or string pathinclude_attributes: List of HTML attributes to include in page analysismax_history_items: Maximum number of last steps to keep in the LLM memory. If None, we keep all steps.llm_timeout (default: 90): Timeout in seconds for LLM callsstep_timeout (default: 120): Timeout in seconds for each stepdirectly_open_url (default: True): If we detect a url in the task, we directly open it.calculate_cost (default: False): Calculate and track API costsdisplay_files_in_done_text (default: True): Show file information in completion messagescontroller: Alias for tools for backwards compatibility.browser_session: Alias for browser for backwards compatibility.Source: https://docs.browser-use.com/customize/agent/basics
from browser_use import Agent, ChatBrowserUse
agent = Agent(
task="Search for latest news about AI",
llm=ChatBrowserUse(),
)
async def main():
history = await agent.run(max_steps=100)
task: The task you want to automate.llm: Your favorite LLM. See Supported Models.The agent is executed using the async run() method:
max_steps (default: 100): Maximum number of steps an agent can take.Check out all customizable parameters here.
Source: https://docs.browser-use.com/customize/agent/output-format
The run() method returns an AgentHistoryList object with the complete execution history:
history = await agent.run()
# Access useful information
history.urls() # List of visited URLs
history.screenshot_paths() # List of screenshot paths
history.screenshots() # List of screenshots as base64 strings
history.action_names() # Names of executed actions
history.extracted_content() # List of extracted content from all actions
history.errors() # List of errors (with None for steps without errors)
history.model_actions() # All actions with their parameters
history.model_outputs() # All model outputs from history
history.last_action() # Last action in history
# Analysis methods
history.final_result() # Get the final extracted content (last step)
history.is_done() # Check if agent completed successfully
history.is_successful() # Check if agent completed successfully (returns None if not done)
history.has_errors() # Check if any errors occurred
history.model_thoughts() # Get the agent's reasoning process (AgentBrain objects)
history.action_results() # Get all ActionResult objects from history
history.action_history() # Get truncated action history with essential fields
history.number_of_steps() # Get the number of steps in the history
history.total_duration_seconds() # Get total duration of all steps in seconds
# Structured output (when using output_model_schema)
history.structured_output # Property that returns parsed structured output
See all helper methods in the AgentHistoryList source code.
For structured output, use the output_model_schema parameter with a Pydantic model. Example.
Source: https://docs.browser-use.com/customize/agent/prompting-guide
Tips and tricks
Prompting can drastically improve performance and solve existing limitations of the library.
ā Specific (Recommended)
task = """
1. Go to https://quotes.toscrape.com/
2. Use extract action with the query "first 3 quotes with their authors"
3. Save results to quotes.csv using write_file action
4. Do a google search for the first quote and find when it was written
"""
ā Open-Ended
task = "Go to web and make money"
When you know exactly what the agent should do, reference actions by name:
task = """
1. Use search action to find "Python tutorials"
2. Use click to open first result in a new tab
3. Use scroll action to scroll down 2 pages
4. Use extract to extract the names of the first 5 items
5. Wait for 2 seconds if the page is not loaded, refresh it and wait 10 sec
6. Use send_keys action with "Tab Tab ArrowDown Enter"
"""
See Available Tools for the complete list of actions.
Sometimes buttons can't be clicked (you found a bug in the library - open an issue). Good news - often you can work around it with keyboard navigation!
task = """
If the submit button cannot be clicked:
1. Use send_keys action with "Tab Tab Enter" to navigate and activate
2. Or use send_keys with "ArrowDown ArrowDown Enter" for form submission
"""
# When you have custom actions
@controller.action("Get 2FA code from authenticator app")
async def get_2fa_code():
# Your implementation
pass
task = """
Login with 2FA:
1. Enter username/password
2. When prompted for 2FA, use get_2fa_code action
3. NEVER try to extract 2FA codes from the page manually
4. ALWAYS use the get_2fa_code action for authentication codes
"""
task = """
Robust data extraction:
1. Go to openai.com to find their CEO
2. If navigation fails due to anti-bot protection:
- Use google search to find the CEO
3. If page times out, use go_back and try alternative approach
"""
The key to effective prompting is being specific about actions.
Source: https://docs.browser-use.com/customize/browser/all-parameters
Complete reference for all browser configuration options
The Browser instance also provides all Actor methods for direct browser control (page management, element interactions, etc.).
cdp_url: CDP URL for connecting to existing browser instance (e.g., "http://localhost:9222")headless (default: None): Run browser without UI. Auto-detects based on display availability (True/False/None)window_size: Browser window size for headful mode. Use dict {'width': 1920, 'height': 1080} or ViewportSize objectwindow_position (default: {'width': 0, 'height': 0}): Window position from top-left corner in pixelsviewport: Content area size, same format as window_size. Use {'width': 1280, 'height': 720} or ViewportSize objectno_viewport (default: None): Disable viewport emulation, content fits to window sizedevice_scale_factor: Device scale factor (DPI). Set to 2.0 or 3.0 for high-resolution screenshotskeep_alive (default: None): Keep browser running after agent completesallowed_domains: Restrict navigation to specific domains. Domain pattern formats:
'example.com' - Matches only https://example.com/*'*.example.com' - Matches https://example.com/* and any subdomain https://*.example.com/*'http*://example.com' - Matches both http:// and https:// protocols'chrome-extension://*' - Matches any Chrome extension URLexample.*) are not allowed for security['*.google.com', 'https://example.com', 'chrome-extension://*']www.example.com and example.com variants are checked automatically.prohibited_domains: Block navigation to specific domains. Uses same pattern formats as allowed_domains. When both allowed_domains and prohibited_domains are set, allowed_domains takes precedence. Examples:
['pornhub.com', '*.gambling-site.net'] - Block specific sites and all subdomains['https://explicit-content.org'] - Block specific protocol/domain combinationallowed_domains)enable_default_extensions (default: True): Load automation extensions (uBlock Origin, cookie handlers, ClearURLs)cross_origin_iframes (default: False): Enable cross-origin iframe support (may cause complexity)is_local (default: True): Whether this is a local browser instance. Set to False for remote browsers. If we have a executable_path set, it will be automatically set to True. This can effect your download behavior.user_data_dir (default: auto-generated temp): Directory for browser profile data. Use None for incognito modeprofile_directory (default: 'Default'): Chrome profile subdirectory name ('Profile 1', 'Work Profile', etc.)storage_state: Browser storage state (cookies, localStorage). Can be file path string or dict objectproxy: Proxy configuration using ProxySettings(server='http://host:8080', bypass='localhost,127.0.0.1', username='user', password='pass')
permissions (default: ['clipboardReadWrite', 'notifications']): Browser permissions to grant. Use list like ['camera', 'microphone', 'geolocation']
headers: Additional HTTP headers for connect requests (remote browsers only)
executable_path: Path to browser executable for custom installations. Platform examples:
'/Applications/Google Chrome.app/Contents/MacOS/Google Chrome''C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe''/usr/bin/google-chrome'channel: Browser channel ('chromium', 'chrome', 'chrome-beta', 'msedge', etc.)args: Additional command-line arguments for the browser. Use list format: ['--disable-gpu', '--custom-flag=value', '--another-flag']env: Environment variables for browser process. Use dict like {'DISPLAY': ':0', 'LANG': 'en_US.UTF-8', 'CUSTOM_VAR': 'test'}chromium_sandbox (default: True except in Docker): Enable Chromium sandboxing for securitydevtools (default: False): Open DevTools panel automatically (requires headless=False)ignore_default_args: List of default args to disable, or True to disable all. Use list like ['--enable-automation', '--disable-extensions']minimum_wait_page_load_time (default: 0.25): Minimum time to wait before capturing page state in secondswait_for_network_idle_page_load_time (default: 0.5): Time to wait for network activity to cease in secondswait_between_actions (default: 0.5): Time to wait between agent actions in secondshighlight_elements (default: True): Highlight interactive elements for AI visionpaint_order_filtering (default: True): Enable paint order filtering to optimize DOM tree by removing elements hidden behind others. Slightly experimentalaccept_downloads (default: True): Automatically accept all downloadsdownloads_path: Directory for downloaded files. Use string like './downloads' or Path objectauto_download_pdfs (default: True): Automatically download PDFs instead of viewing in browseruser_agent: Custom user agent string. Example: 'Mozilla/5.0 (iPhone; CPU iPhone OS 14_0 like Mac OS X)'screen: Screen size information, same format as window_sizerecord_video_dir: Directory to save video recordings as .mp4 filesrecord_video_size (default: ViewportSize): The frame size (width, height) of the video recording.record_video_framerate (default: 30): The framerate to use for the video recording.record_har_path: Path to save network trace files as .har formattraces_dir: Directory to save complete trace files for debuggingrecord_har_content (default: 'embed'): HAR content mode ('omit', 'embed', 'attach')record_har_mode (default: 'full'): HAR recording mode ('full', 'minimal')disable_security (default: False): ā ļø NOT RECOMMENDED - Disables all browser security featuresdeterministic_rendering (default: False): ā ļø NOT RECOMMENDED - Forces consistent rendering but reduces performanceFor backward compatibility, you can pass all the parameters from above to the BrowserProfile and then to the Browser.
from browser_use import BrowserProfile
profile = BrowserProfile(headless=False)
browser = Browser(browser_profile=profile)
Browser is an alias for BrowserSession - they are exactly the same class:
Use Browser for cleaner, more intuitive code.
Source: https://docs.browser-use.com/customize/browser/basics
from browser_use import Agent, Browser, ChatBrowserUse
browser = Browser(
headless=False, # Show browser window
window_size={'width': 1000, 'height': 700}, # Set window size
)
agent = Agent(
task='Search for Browser Use',
browser=browser,
llm=ChatBrowserUse(),
)
async def main():
await agent.run()
Source: https://docs.browser-use.com/customize/browser/real-browser
Connect your existing Chrome browser to preserve authentication.
from browser_use import Agent, Browser, ChatOpenAI
# Connect to your existing Chrome browser
browser = Browser(
executable_path='/Applications/Google Chrome.app/Contents/MacOS/Google Chrome',
user_data_dir='~/Library/Application Support/Google/Chrome',
profile_directory='Default',
)
agent = Agent(
task='Visit https://duckduckgo.com and search for "browser-use founders"',
browser=browser,
llm=ChatOpenAI(model='gpt-4.1-mini'),
)
async def main():
await agent.run()
Note: You need to fully close chrome before running this example. Also, Google blocks this approach currently so we use DuckDuckGo instead.
executable_path - Path to your Chrome installationuser_data_dir - Your Chrome profile folder (keeps cookies, extensions, bookmarks)profile_directory - Specific profile name (Default, Profile 1, etc.)# macOS
executable_path='/Applications/Google Chrome.app/Contents/MacOS/Google Chrome'
user_data_dir='~/Library/Application Support/Google/Chrome'
# Windows
executable_path='C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe'
user_data_dir='%LOCALAPPDATA%\\Google\\Chrome\\User Data'
# Linux
executable_path='/usr/bin/google-chrome'
user_data_dir='~/.config/google-chrome'
Source: https://docs.browser-use.com/customize/browser/remote
The easiest way to use a cloud browser is with the built-in Browser-Use cloud service:
from browser_use import Agent, Browser, ChatBrowserUse
# Simple: Use Browser-Use cloud browser service
browser = Browser(
use_cloud=True, # Automatically provisions a cloud browser
)
# Advanced: Configure cloud browser parameters
# Using this settings can bypass any captcha protection on any website
browser = Browser(
cloud_profile_id='your-profile-id', # Optional: specific browser profile
cloud_proxy_country_code='us', # Optional: proxy location (us, uk, fr, it, jp, au, de, fi, ca, in)
cloud_timeout=30, # Optional: session timeout in minutes (MAX free: 15min, paid: 240min)
)
# Or use a CDP URL from any cloud browser provider
browser = Browser(
cdp_url="http://remote-server:9222" # Get a CDP URL from any provider
)
agent = Agent(
task="Your task here",
llm=ChatBrowserUse(),
browser=browser,
)
Prerequisites:
Cloud Browser Parameters:
cloud_profile_id: UUID of a browser profile (optional, uses default if not specified)cloud_proxy_country_code: Country code for proxy location - supports: us, uk, fr, it, jp, au, de, fi, ca, incloud_timeout: Session timeout in minutes (free users: max 15 min, paid users: max 240 min)Benefits:
You can pass in a CDP URL from any remote browser
from browser_use import Agent, Browser, ChatBrowserUse
from browser_use.browser import ProxySettings
browser = Browser(
headless=False,
proxy=ProxySettings(
server="http://proxy-server:8080",
username="proxy-user",
password="proxy-pass"
)
cdp_url="http://remote-server:9222"
)
agent = Agent(
task="Your task here",
llm=ChatBrowserUse(),
browser=browser,
)
Source: https://docs.browser-use.com/customize/code-agent/all-parameters
Complete reference for all CodeAgent configuration options
task: Task description string that defines what the agent should accomplishllm: LLM instance for code generation (required: ChatBrowserUse)browser: Browser session object for automationtools: Registry of tools the agent can callmax_steps (default: 20): Maximum number of execution steps before terminationmax_failures (default: 8): Maximum consecutive errors before terminationmax_validations (default: 0): Maximum number of times to run the validator agent (default: 0)use_vision (default: "auto"): Vision mode - "auto" includes screenshot tool but only uses vision when requested, True always includes screenshots, False never includes screenshots and excludes screenshot toolpage_extraction_llm: Separate LLM model for page content extraction. You can choose a small & fast model because it only needs to extract text from the page (default: same as llm)use_thinking (default: True): Controls whether the agent uses its internal "thinking" field for explicit reasoning stepsavailable_file_paths: List of file paths the agent can accesssensitive_data: Dictionary of sensitive data to handle carefullycalculate_cost (default: False): Calculate and track API costs (see ... to track costs)Source: https://docs.browser-use.com/customize/code-agent/basics
Write Python code locally with browser automation
CodeAgent writes and executes Python code locally with browser automation capabilities. It's designed for repetitive data extraction tasks where the agent can write reusable functions.
CodeAgent executes Python code on your local machine like Claude Code.
import asyncio
from browser_use import CodeAgent
from dotenv import load_dotenv
load_dotenv()
async def main():
task = "Extract all products from example.com and save to products.csv"
agent = CodeAgent(task=task)
await agent.run()
asyncio.run(main())
BROWSER_USE_API_KEY=your-api-key
CodeAgent currently only works with ChatBrowserUse which is optimized for this use case. Don't have one? We give you $10 to try it out here.
Best for:
Performance:
Output:
ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā ā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā