Introduction to Web Automation with Python
Web automation has become a crucial aspect of modern software development, enabling tasks such as testing, data extraction, and interaction with web services. One of the powerful tools for web automation is Playwright, a framework developed by Microsoft that provides an extensive API for automating web browsers. When combined with asyncio, Python’s library for asynchronous programming, and GoLogin, a service for managing browser profiles, developers can create robust and flexible automation scripts. This article delves into a specific code example that leverages these tools to perform continuous scrolling on a webpage.
If this is your first time and you are not code then you can simply get up to speed in setting up pything with this tutorial.
Once done in two minutes you just have to install playwright and once playwright is installed later comes the part where you just put the following code copy and paste change the url and you are good.
Notice the bot will open for 25 seconds (interval variable time here ) and then it will keep closing the opening the browser till total_time is elapsed.
Full Code:
import asyncio
from playwright.async_api import async_playwright
from gologin import GoLogin
async def main():
gl = GoLogin({
"token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiI2NjNjYjJjYTVjYmY0ODlmMzFmZTMxYjEiLCJ0eXBlIjoiZGV2Iiwiand0aWQiOiI2NjU5ZWMxZTY5YTQ5MmJjN2IxNmRkODgifQ.hytrMFf0vmOXi3zJU2oSJXBms1zkfi0Xb8TED0lb4mw",
"profile_id": "6651ae08b3f33cf3460756a4",
})
total_time = 12000
interval = 25
while total_time > 0:
try:
debugger_address = gl.start()
#gl.getRandomFingerprint(gl)
#gl.getRandomFingerprint()
async with async_playwright() as p:
browser = await p.chromium.connect_over_cdp("http://" + debugger_address)
default_context = browser.contexts[0]
page = default_context.pages[0]
await page.goto('https://blokette.com')
async def scroll_page():
while True:
await page.evaluate('window.scrollBy(0, window.innerHeight)')
await asyncio.sleep(1) # Adjust the scroll interval as needed
scroll_task = asyncio.create_task(scroll_page())
await asyncio.sleep(min(interval, total_time)) # Keep scrolling for interval or remaining time
scroll_task.cancel() # Stop the scrolling task
#await page.screenshot(path=f"gologin_{total_time}.png")
await page.close()
await browser.close()
total_time -= interval
except Exception as e:
print("Failed to connect to debugger. Retrying...")
await asyncio.sleep(1)
asyncio.get_event_loop().run_until_complete(main())
Dependencies and Their Roles
Playwright
Playwright is a Node.js library to automate Chromium, Firefox, and WebKit with a single API. It can be used for testing, scraping, and automating interactions on web pages. The Playwright Python library allows for easy integration and asynchronous operations, making it suitable for concurrent tasks.
Installation:
pip install playwright
Asyncio
Asyncio is a Python library used to write concurrent code using the async/await syntax. It provides tools to handle asynchronous programming, enabling efficient handling of tasks such as I/O-bound operations.
Installation:
Asyncio is included in Python’s standard library (from Python 3.4 onwards), so no separate installation is required.
GoLogin
GoLogin is a tool that allows users to manage multiple browser profiles and automate browser actions while preventing detection and bans from websites. It is particularly useful for scenarios where multiple browser instances with different configurations are needed.
Installation:
pip install gologin
The Code Example
Below is the detailed explanation of the code provided:
import asyncio
from playwright.async_api import async_playwright
from gologin import GoLogin
async def main():
gl = GoLogin({
"token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiI2NjNjYjJjYTVjYmY0ODlmMzFmZTMxYjEiLCJ0eXBlIjoiZGV2Iiwiand0aWQiOiI2NjU5ZWMxZTY5YTQ5MmJjN2IxNmRkODgifQ.hytrMFf0vmOXi3zJU2oSJXBms1zkfi0Xb8TED0lb4mw",
"profile_id": "6651ae08b3f33cf3460756a4",
})
total_time = 12000
interval = 25
while total_time > 0:
try:
debugger_address = gl.start()
async with async_playwright() as p:
browser = await p.chromium.connect_over_cdp("http://" + debugger_address)
default_context = browser.contexts[0]
page = default_context.pages[0]
await page.goto('https://blokette.com')
async def scroll_page():
while True:
await page.evaluate('window.scrollBy(0, window.innerHeight)')
await asyncio.sleep(1) # Adjust the scroll interval as needed
scroll_task = asyncio.create_task(scroll_page())
await asyncio.sleep(min(interval, total_time)) # Keep scrolling for interval or remaining time
scroll_task.cancel() # Stop the scrolling task
await page.close()
await browser.close()
total_time -= interval
except Exception as e:
print("Failed to connect to debugger. Retrying...")
await asyncio.sleep(1)
asyncio.get_event_loop().run_until_complete(main())
Explanation of the Code
Initialization of GoLogin
gl = GoLogin({
"token": "YOUR_GOLOGIN_TOKEN",
"profile_id": "YOUR_PROFILE_ID",
})
In this snippet, we initialize a GoLogin instance with a token and profile ID. The token authenticates the user, and the profile ID specifies which browser profile to use. This setup is essential for managing browser configurations and preventing detection by websites.
Loop Control Variables
total_time = 12000
interval = 25
Here, total_time
represents the total duration (in seconds) for which the script will run, and interval
specifies the duration (in seconds) of each scrolling session. These variables control how long the script will perform actions on the target webpage.
Starting the Browser Profile
debugger_address = gl.start()
async with async_playwright() as p:
browser = await p.chromium.connect_over_cdp("http://" + debugger_address)
default_context = browser.contexts[0]
page = default_context.pages[0]
await page.goto('https://blokette.com')
This block of code starts the GoLogin profile and connects to it using Playwright. connect_over_cdp
connects to the browser via the Chrome DevTools Protocol, allowing for remote control of the browser instance. The script then navigates to the specified URL (https://blokette.com
).
Scrolling the Webpage
async def scroll_page():
while True:
await page.evaluate('window.scrollBy(0, window.innerHeight)')
await asyncio.sleep(1) # Adjust the scroll interval as needed
scroll_task = asyncio.create_task(scroll_page())
await asyncio.sleep(min(interval, total_time)) # Keep scrolling for interval or remaining time
scroll_task.cancel() # Stop the scrolling task
The scroll_page
function continuously scrolls the webpage by one window height every second. This scrolling behavior simulates user interaction, which can be useful for tasks like loading new content dynamically or preventing idle timeouts.
Exception Handling and Loop Control
except Exception as e:
print("Failed to connect to debugger. Retrying...")
await asyncio.sleep(1)
This part handles any exceptions that might occur during the connection to the debugger or the webpage interaction. If an exception is caught, it prints an error message and retries after a short delay. This ensures the script can recover from transient issues without crashing.
Running the Script
asyncio.get_event_loop().run_until_complete(main())
Finally, this line runs the main asynchronous function main()
using asyncio’s event loop. This setup allows the script to perform asynchronous operations, such as connecting to the browser and interacting with the webpage, concurrently and efficiently.
Detailed Analysis of Components
Asynchronous Programming with Asyncio
Asynchronous programming is crucial for efficiently handling I/O-bound tasks such as web interactions. By using async/await
syntax, the script can perform multiple tasks concurrently without blocking the execution of other tasks. This approach is particularly beneficial in web automation, where operations like page loading and interaction can introduce delays.
Playwright’s Role
Playwright provides a robust API for browser automation. It supports multiple browsers (Chromium, Firefox, WebKit) and offers features like handling multiple browser contexts, simulating user interactions, and intercepting network requests. In this script, Playwright is used to connect to a Chromium instance, navigate to a webpage, and perform scrolling actions.
GoLogin Integration
GoLogin enhances the script by providing a way to manage browser profiles. These profiles can be configured with different settings, cookies, and user agents, making it easier to bypass detection mechanisms employed by websites. By using GoLogin, the script can simulate realistic browsing sessions, reducing the risk of being flagged as automated traffic.
Error Handling and Robustness
The script includes error handling to manage potential issues such as failed connections or browser crashes. By catching exceptions and retrying operations, the script ensures it can run for extended periods without manual intervention. This robustness is crucial for tasks that require continuous or long-duration interactions with web pages.
Practical Applications
- Web Scraping: Continuous scrolling can be used to scrape content from websites that load data dynamically as the user scrolls.
- Automated Testing: The script can simulate user behavior to test web applications, ensuring that features like infinite scrolling work correctly.
- Data Monitoring: Automating the browsing of specific pages can help monitor changes or updates on those pages over time.
- SEO Analysis: Continuous scrolling and interaction with a webpage can help analyze user experience metrics, such as load times and content visibility.
Conclusion
This code example demonstrates the powerful combination of Playwright, asyncio, and GoLogin for web automation tasks. By leveraging asynchronous programming, robust browser automation, and profile management, developers can create scripts that perform complex interactions with web pages efficiently and reliably. Whether for web scraping, automated testing, or other applications, these tools provide a flexible and scalable solution for modern web automation challenges.