Full Code with simple proxy setup

import asyncio
from playwright.async_api import Playwright, async_playwright

async def run(playwright: Playwright) -> None:
    proxy_info = {
        "server": f"http://11672165-all-country-US-state-5128638-city-5128581:[email protected]:12032",
        "username": "11672165-all-country-US-state-5128638-city-5128581",
        "password": "2ie6bl7813"
    }

    browser = await playwright.chromium.launch(
        headless=False,
        proxy=proxy_info
    )
    context = await browser.new_context()
    page = await context.new_page()

    # Open the "cryptotaxy.exblog.jp" page
    #await page.goto('https://cryptotaxy.exblog.jp/30775222/', timeout = 0)
    await page.goto('https://whoer.net/', timeout = 0)
    try:
        # Wait for a specific selector to appear within 3 seconds
        await page.wait_for_selector('.your-selector', timeout=43000)
    except Exception as e:
        print(f"Selector not found within 3 seconds: {e}")

    # Your code logic goes here

    # ---------------------
    
    # Sleep for 10 minutes before closing the context and browser
    await asyncio.sleep(600)

    await context.close()
    await browser.close()

async def main() -> None:
    async with async_playwright() as playwright:
        await run(playwright)

asyncio.run(main())

Automating Web Interactions with Playwright in Python

Introduction

Automating web interactions is an essential task in various fields such as web scraping, testing, and data collection. One of the modern tools for this purpose is Playwright, an open-source library for automating browsers. This article will guide you through setting up and using Playwright for automating a simple web task using Python. We will cover the prerequisites, installation steps, and detailed explanation of the provided code snippet.

Prerequisites

Before diving into the code, ensure you have the following prerequisites:

  1. Python 3.7+: Make sure you have Python installed on your system. You can download it from the official Python website.
  2. Node.js: Playwright requires Node.js. Download it from the official Node.js website.
  3. Playwright: The Playwright library for Python. Installation instructions will be provided below.
  4. asyncio: This is part of the Python standard library, used for writing concurrent code using the async/await syntax.

Setting Up the Environment

Installing Playwright

First, you need to install Playwright and its dependencies. Run the following commands in your terminal:

pip install playwright
python -m playwright install

The first command installs the Playwright library, and the second command downloads the necessary browser binaries.

Detailed Explanation of the Code

Let’s break down the provided code snippet and understand each part in detail.

Importing Required Libraries

import asyncio
from playwright.async_api import Playwright, async_playwright

Here, we import asyncio for handling asynchronous operations and Playwright along with async_playwright from the Playwright library.

Defining the Main Function

async def run(playwright: Playwright) -> None:
    proxy_info = {
        "server": f"http://11672165-all-country-US-state-5128638-city-5128581:[email protected]:12032",
        "username": "11672165-all-country-US-state-5128638-city-5128581",
        "password": "2ie6bl7813"
    }

In the run function, we define the proxy information. This includes the proxy server URL, username, and password for authentication. This is useful for accessing web pages from different IP addresses, which can help bypass geographic restrictions or avoid IP bans.

Launching the Browser

    browser = await playwright.chromium.launch(
        headless=False,
        proxy=proxy_info
    )

We launch the Chromium browser in non-headless mode (i.e., a visible browser window) and configure it to use the specified proxy.

Creating a Browser Context and Page

    context = await browser.new_context()
    page = await context.new_page()

A browser context is akin to an incognito session where no data is shared with other contexts. We create a new page within this context.

Navigating to a Web Page

    await page.goto('https://whoer.net/', timeout=0)

The goto method navigates to the specified URL. Here, we navigate to whoer.net, a site that shows information about your current IP address and connection details.

Waiting for a Selector

    try:
        await page.wait_for_selector('.your-selector', timeout=43000)
    except Exception as e:
        print(f"Selector not found within 3 seconds: {e}")

We use wait_for_selector to wait for a specific element (defined by a CSS selector) to appear on the page. If the element doesn’t appear within the specified timeout, an exception is caught and a message is printed.

Adding Custom Logic

    # Your code logic goes here

This is a placeholder for any custom logic you want to add. For example, interacting with page elements, extracting information, or performing automated tests.

Closing the Browser

    await asyncio.sleep(600)

    await context.close()
    await browser.close()

After executing the custom logic, the script sleeps for 10 minutes (600 seconds) before closing the browser context and the browser itself. This delay can be adjusted or removed based on your requirements.

Main Function to Run the Script

async def main() -> None:
    async with async_playwright() as playwright:
        await run(playwright)

asyncio.run(main())

The main function initializes Playwright in an asynchronous context and runs the run function. The asyncio.run(main()) call starts the asynchronous event loop and executes the main function.

Conclusion

This article provided a comprehensive explanation of a Python script that uses Playwright for web automation. We covered the prerequisites, installation steps, and detailed breakdown of the code. Playwright is a powerful tool for web automation, offering support for multiple browsers and robust features for handling complex web interactions. With this guide, you should be able to set up Playwright and start automating your web tasks efficiently.

By admin

Leave a Reply

Your email address will not be published. Required fields are marked *