# Asynchronous Web Scraping with aiohttp and BeautifulSoup This snippet demonstrates how to efficiently scrape multiple web pages asynchronously using `aiohttp` (for HTTP requests) and `BeautifulSoup` (for HTML parsing). This is useful for tasks like data aggregation, monitoring, or research where speed and efficiency matter. ```python import asyncio from aiohttp import ClientSession from bs4 import BeautifulSoup async def fetch_html(url, session): """Fetch HTML content asynchronously.""" async with session.get(url) as response: return await response.text() async def parse_page(url, session): """Parse a webpage and extract data (example: extracting titles).""" html = await fetch_html(url, session) soup = BeautifulSoup(html, "html.parser") title = soup.title.string if soup.title else "No title found" return (url, title) async def scrape_urls(urls): """Scrape multiple URLs concurrently using asyncio.""" async with ClientSession() as session: tasks = [parse_page(url, session) for url in urls] results = await asyncio.gather(*tasks) return results if __name__ == "__main__": # Example: Scrape titles from multiple websites urls = [ "https://python.org", "https://github.com", "https://stackoverflow.com", ] results = asyncio.run(scrape_urls(urls)) for url, title in results: print(f"URL: {url}\nTitle: {title}\n") ``` ## Explanation ### What It Does - **Fetches HTML pages asynchronously** using `aiohttp`, which is much faster than synchronous requests when dealing with multiple URLs. - **Parses the HTML** with `BeautifulSoup` to extract structured data (here, the page title). - **Runs all requests concurrently** using `asyncio.gather`, improving efficiency. ### Why It's Useful - Traditional synchronous scraping can be slow due to network delays. Asynchronous scraping speeds up the process significantly. - Useful for applications like competitive monitoring, content aggregation, or large-scale data collection. ### How to Run 1. Install dependencies: ``` pip install aiohttp beautifulsoup4 ``` 2. Save the script (e.g., `async_scraper.py`) and modify the `urls` list. 3. Execute: ``` python async_scraper.py ``` 4. The output will show each URL and its corresponding title. For more complex parsing, modify the `parse_page` function to extract other elements (e.g., links, metadata).