What is Beautifulsoup?

From my understanding Beautifulsoup is a parser that can parse HTML and XML and return specific parts you’re looking for.

TLDR: Makes life easier with web scraping

Documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/

Quick Start

Installation

Python:

pip install beautifulsoup4

Import

from bs4 import BeautifulSoup

Creating Soup Object

soup = BeautifulSoup(html_content, 'lxml') # This is for HTML if you want to parse xml instead of using 'lxml' use 'xml'

Use Cases

title_tag = soup.title # Gets first instance of the tag "<title>"

all_paragraphs = soup.find_all('p') # Gets all instances of the tag <p>

first_paragraph = soup.find('p') # Gets first instance of tag "<p>"

all_links = soup.select('a') # Uses CSS selectors to find elements

Extracting Information

text = title_tag.get_text() # Extracts content of the tag

link = soup.find('a')
href = link['href']