What is Beautifulsoup?
From my understanding Beautifulsoup is a parser that can parse HTML and XML and return specific parts you’re looking for.
TLDR: Makes life easier with web scraping
Documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/
Quick Start
Installation
Python:
pip install beautifulsoup4
Import
from bs4 import BeautifulSoup
Creating Soup Object
soup = BeautifulSoup(html_content, 'lxml') # This is for HTML if you want to parse xml instead of using 'lxml' use 'xml'
Use Cases
title_tag = soup.title # Gets first instance of the tag "<title>"
all_paragraphs = soup.find_all('p') # Gets all instances of the tag <p>
first_paragraph = soup.find('p') # Gets first instance of tag "<p>"
all_links = soup.select('a') # Uses CSS selectors to find elements
Extracting Information
text = title_tag.get_text() # Extracts content of the tag
link = soup.find('a')
href = link['href']