# Web Scraping with Python using BeautifulSoup and Requests This snippet demonstrates how to scrape a webpage to extract useful information, such as headlines from a news website. Web scraping is a common task for data collection, research, and automation. ```python import requests from bs4 import BeautifulSoup def scrape_headlines(url): # Send a GET request to the URL response = requests.get(url) # Check if request was successful if response.status_code == 200: # Parse the HTML content soup = BeautifulSoup(response.text, 'html.parser') # Extract headlines (example: looking for

tags) headlines = [h.get_text().strip() for h in soup.find_all('h2')] return headlines else: print(f"Failed to fetch page. Status code: {response.status_code}") return [] # Example usage url = "https://example-news-website.com" headlines = scrape_headlines(url) for idx, headline in enumerate(headlines, 1): print(f"{idx}. {headline}") ``` ### Explanation 1. **Imports**: - `requests` fetches the webpage. - `BeautifulSoup` parses the HTML for data extraction. 2. **Function `scrape_headlines`**: - Sends a `GET` request to the provided URL. - Checks if the request succeeded (status code 200). - Uses `BeautifulSoup` to parse the HTML and extract text from `

` tags (commonly used for headlines). 3. **Example Usage**: - Replace `https://example-news-website.com` with a real news site URL (ensure compliance with their `robots.txt`). - Prints enumerated headlines for readability. ### Why It's Useful - Automates data collection from websites. - Useful for aggregating news, monitoring updates, or building datasets. ### How to Run 1. Install dependencies: ```sh pip install requests beautifulsoup4 ``` 2. Save the script (e.g., `scraper.py`) and run: ```sh python scraper.py ``` ### Note Always respect website terms of service and `robots.txt` rules. Use delays (`time.sleep`) between requests to avoid overloading servers.