Building an Automated Web Browser with Asyncio, Aiohttp, and Playwright
Introduction
In the fast-paced digital world, automation is crucial for efficiency and scalability. Web automation, specifically, allows businesses and developers to streamline repetitive tasks, gather data, and interact with web applications programmatically. This article delves into an advanced Python script that leverages asyncio
, aiohttp
, and playwright
to create an automated browser experience, complete with proxy rotation and URL navigation.
Full Code (Simple Copy And Paste):
import asyncio
import aiohttp
from playwright.async_api import Playwright, async_playwright
import time
import random
async def make_api_request():
api_url = "https://api.prestigeproxies.com/rotate/vodafone.trial?port=01&secret=mIHVrT85ux"
async with aiohttp.ClientSession() as session:
async with session.get(api_url) as response:
if response.status == 200:
print("Proxy rotated successfully.")
else:
print(f"Failed to rotate proxy. Status: {response.status}")
async def close_and_reopen_browser(playwright: Playwright):
global browser, context, page
# Close the browser
await context.close()
await browser.close()
# Make API request to rotate the proxy
await make_api_request()
time.sleep(5)
# Reopen the browser
await run(playwright)
async def run(playwright: Playwright) -> None:
global browser, context, page
proxy_info = {
"server": f"http://yawer:[email protected]:55401",
"username": "yawer",
"password": "mIHVrT85ux"
}
browser = await playwright.chromium.launch(
headless=False,
proxy=proxy_info
)
context = await browser.new_context()
# List of URLs to open
urls = [
"https://tastycherrygames.com/games/btc/bitcoinclicker.html",
"https://blokette.com",
"https://cryptoellen.com",
"https://cryptochemy.com",
"https://circlekgame.com",
"https://cosmobulletin.com",
"https://jebate.com",
"https://queentakes.com",
"https://cryptotaxy.com",
"https://chipste.com",
"https://bit.ly/coverga",
"https://tastycherrygames.com/games/feb/aads.html",
"https://staggereddam.com/vc4vs0gxcb?key=7b529dfba7708e88def930dd1c4666d9"
]
# Shuffle URLs to open them in random order
random.shuffle(urls)
pages = []
for url in urls:
page = await context.new_page()
await page.goto(url, timeout=0)
pages.append(page)
# Wait for 10 seconds
await asyncio.sleep(15)
# Close and reopen the browser
await close_and_reopen_browser(playwright)
# Sleep for 10 minutes before closing the context and browser
await asyncio.sleep(600)
await context.close()
await browser.close()
async def main() -> None:
async with async_playwright() as playwright:
await run(playwright)
asyncio.run(main())
Understanding the Code
Before diving into the code’s functionality, let’s break down the key components and libraries involved:
- Asyncio: A library to write concurrent code using the async/await syntax.
- Aiohttp: An asynchronous HTTP client/server framework for making HTTP requests.
- Playwright: A library to automate Chromium, Firefox, and WebKit with a single API.
Code Breakdown
Importing Necessary Libraries
import asyncio
import aiohttp
from playwright.async_api import Playwright, async_playwright
import time
import random
Here, we import the required libraries. asyncio
handles the asynchronous programming, aiohttp
manages HTTP requests, and playwright.async_api
provides the tools for browser automation.
Function: make_api_request
async def make_api_request():
api_url = "https://api.prestigeproxies.com/rotate/vodafone.trial?port=01&secret=mIHVrT85ux"
async with aiohttp.ClientSession() as session:
async with session.get(api_url) as response:
if response.status == 200:
print("Proxy rotated successfully.")
else:
print(f"Failed to rotate proxy. Status: {response.status}")
This function performs an asynchronous HTTP GET request to a proxy rotation API. If the response status is 200, it indicates a successful proxy rotation.
Function: close_and_reopen_browser
async def close_and_reopen_browser(playwright: Playwright):
global browser, context, page
await context.close()
await browser.close()
await make_api_request()
time.sleep(5)
await run(playwright)
This function closes the current browser and its context, makes an API call to rotate the proxy, and then reopens the browser. The time.sleep(5)
ensures a brief pause to allow the proxy rotation to complete.
Function: run
async def run(playwright: Playwright) -> None:
global browser, context, page
proxy_info = {
"server": f"http://yawer:[email protected]:55401",
"username": "yawer",
"password": "mIHVrT85ux"
}
browser = await playwright.chromium.launch(
headless=False,
proxy=proxy_info
)
context = await browser.new_context()
urls = [
"https://tastycherrygames.com/games/btc/bitcoinclicker.html",
"https://blokette.com",
"https://cryptoellen.com",
"https://cryptochemy.com",
"https://circlekgame.com",
"https://cosmobulletin.com",
"https://jebate.com",
"https://queentakes.com",
"https://cryptotaxy.com",
"https://chipste.com",
"https://bit.ly/coverga",
"https://tastycherrygames.com/games/feb/aads.html",
"https://staggereddam.com/vc4vs0gxcb?key=7b529dfba7708e88def930dd1c4666d9"
]
random.shuffle(urls)
pages = []
for url in urls:
page = await context.new_page()
await page.goto(url, timeout=0)
pages.append(page)
await asyncio.sleep(15)
await close_and_reopen_browser(playwright)
await asyncio.sleep(600)
await context.close()
await browser.close()
This function sets up the Playwright browser with proxy settings and opens a series of URLs. Key steps include:
- Proxy Configuration: The browser is launched with specific proxy settings.
- URL Navigation: A list of URLs is shuffled and then navigated in new pages.
- Closing and Reopening: After a brief pause, the browser is closed and reopened to refresh the proxy settings.
- Sleep Intervals: The script includes sleep intervals to simulate user behavior and allow for proxy rotation.
Main Function
async def main() -> None:
async with async_playwright() as playwright:
await run(playwright)
asyncio.run(main())
The main
function initializes the asynchronous Playwright context and runs the primary automation function.
Detailed Explanation and Use Cases
1. Asynchronous Programming with Asyncio
asyncio
enables concurrent code execution, crucial for web scraping and automation tasks where waiting for network responses or browser actions can otherwise block the program flow. By using async/await
, the script can handle multiple tasks, such as making HTTP requests and controlling the browser, without being sequentially blocked.
2. Handling HTTP Requests with Aiohttp
Aiohttp is used to manage HTTP requests efficiently. In this script, it’s employed to interact with the proxy rotation API, ensuring that the browser uses a fresh proxy for each session, thereby reducing the risk of being blocked by the target websites.
3. Automating Browser Actions with Playwright
Playwright provides a robust API for browser automation. It supports multiple browsers (Chromium, Firefox, and WebKit), and its asynchronous API fits seamlessly with asyncio
. Key functionalities used in this script include:
- Launching Browser with Proxy: The browser is configured to use a proxy server for all network requests.
- Context Management: Playwright’s context feature allows the script to manage multiple browser contexts, each with isolated storage and settings.
- Page Navigation: The script opens multiple URLs in separate pages, simulating user interaction with the websites.
4. Proxy Rotation
Proxy rotation is vital for tasks like web scraping to avoid detection and blocking. By regularly changing the proxy server, the script mimics multiple users accessing the websites, thus evading anti-bot mechanisms.
5. Practical Use Cases
- Web Scraping: Automate data extraction from multiple websites without getting blocked.
- Automated Testing: Test web applications under different network conditions and proxy settings.
- SEO Monitoring: Check website rankings and performance from various locations using different proxies.
Challenges and Solutions
1. Managing Asynchronous Tasks
Handling multiple asynchronous tasks can be complex. Ensuring that tasks like making HTTP requests and controlling the browser are performed efficiently requires careful management of the event loop and task scheduling.
Solution: Using asyncio
and aiohttp
together with Playwright’s asynchronous API helps streamline these tasks, allowing for efficient and concurrent execution.
2. Proxy Reliability
Proxies can sometimes be unreliable or slow, affecting the performance of the script.
Solution: Implementing error handling and retries for proxy requests ensures that the script can recover from proxy failures and continue executing.
3. Browser Automation Stability
Automating browsers can be prone to crashes or unexpected behavior, especially when dealing with multiple tabs and contexts.
Solution: Regularly closing and reopening the browser, as done in the script, helps maintain stability. Additionally, monitoring for exceptions and implementing retries can improve reliability.
Conclusion
This Python script showcases the power of combining asyncio
, aiohttp
, and Playwright for advanced web automation. By incorporating asynchronous programming, efficient HTTP request handling, and robust browser automation, the script achieves reliable and scalable web interactions. Whether for web scraping, automated testing, or SEO monitoring, this approach provides a solid foundation for building sophisticated automation tools.