BeautifulSoup is a python package that allows you to extract data from HTML files, it is very easy and intuitive
Let us assume you have an HTML page !
First, let us assume you want the title from that HTML page….
mysoup = BeautifulSoup(response.content, 'html.parser')
title = soup.title.string if soup.title else "No title found"
Now, assuming you want to remove everything that has to do with CSS and presentation, you can remove the following things with this easy code snippet, then putting whatever is lef in a variable called text
for irrelevant in mysoup.body(["script", "style", "img", "input"]):
irrelevant.decompose()
text = soup.body.get_text(separator="\n", strip=True)