What is Web Scraping? How do you achieve it in Python?

Question

David L. · Accepted Answer

Web scraping is going to a specific web page or set of connected web pages, then sifting through the HTML of that web page for text that you want. It is common to write Python programs to do web scraping, but it is also possible to write programs in other languages to do web scraping. I helped someone use Excel VBA to do web scraping, and it is my understanding that Java, PHP, C++, Javascript, Golang, Ruby, Perl, and Rust all can be used for web scraping.

To illustrate, I'll use an example. A big fan of the WNBA hired me to help him extract all the history of basketball games on the WNBA website, www.wnba.com/schedule?season=2023&month=all&hidepast=true, If you go to that web page, you can see a schedule of all the future games for the current season, and if you turn off "Hide Previous Games" at the top, you can see the scores for the past games for the current season. You can examine the HTML for that web page to see how the visible information in that web page is built into the HTML for that web page, and then you can write a program to go through the entire web page, extract the HTML elements that hold the information you want, extract the desired information, and write the extracted information into a file. A web scraping program can automatically click on links in a web page to navigate through a series of web pages and extract content from them all.

There are two Python libraries I've used for web scraping: Beautiful Soup, and Selenium. Beautiful Soup is easier to use, but it only handles static HTML, it will not allow Javascript to alter the web page, and it cannot deal with cookies. Saving a web page to file and then reading the file with Beautiful Soup will allow Javascript to alter the HTML. But logging into a website to access password-protected is not possible with Beautiful Soup.

Selenium is another Python library (there are versions of Selenium for VBA, Java, and probably other languages) for web scraping. The library comes with builds of popular browsers, like Chrome, Firefox, and Edge, with Selenium built into them, so that your program can run that specially built browser and scrape the HTML it gets and constructs. This specially built browser handles all cookies and Javascript, so it can handle with password-protected websites and web pages where the Javascript dynamically builds at least some of the HTML.

Virgilio D. · Answer

Web scraping allows you to retrieve the body of a given URL and then search for specific items. BeautifulSoup does a great job with this.

What is Web Scraping? How do you achieve it in Python?

2 Answers By Expert Tutors

Still looking for help? Get the right answer, fast.

OR

RELATED TOPICS

RELATED QUESTIONS

need coding help in Python

Assume the days of the week are numbered 0,1,2,3,4,5,6 from Sunday to Saturday.(using python)

Nested Functions

I have to create a store receipt in Python 3.4.1. It has to allow unlimited input of items, show subtotal, tax, and total. Please help.

How do i go about seperating my numerical data into fuzzy sections and putting it into code in Python ?

RECOMMENDED TUTORS

IXL

Rosetta Stone

Education.com

TPT

Vocabulary.com

ABCya

SpanishDictionary.com

Inglés.com

Emmersion

What is Web Scraping? How do you achieve it in Python?

2 Answers By Expert Tutors

Still looking for help? Get the right answer, fast.

OR

RELATED TOPICS

RELATED QUESTIONS

need coding help in Python

Assume the days of the week are numbered 0,1,2,3,4,5,6 from Sunday to Saturday.(using python)

Nested Functions

I have to create a store receipt in Python 3.4.1. It has to allow unlimited input of items, show subtotal, tax, and total. Please help.

How do i go about seperating my numerical data into fuzzy sections and putting it into code in Python ?

RECOMMENDED TUTORS

find an online tutor