Asked • 12/27/19

Python: using a while loop to parse multiple pages

Using the URL inside the function at the bottom of the code, I want to parse all of the quotes that are listed in all of the pages. However, this code only returns the first page's quotes and gives the URL to the next page. I want to use a while loop to parse the quotes from the next page and any subsequent pages that have a next button.


import requests, bs4, urllib.parse

def process(url):
page = requests.get(url)
soup = bs4.BeautifulSoup(page.text, 'html5lib')
quotes = []
for quote in soup.select('div[class="quote"] > span[class="text"]'):
quotes.append(quote.getText())
next_button = soup.select('li[class="next"] > a')
if next_button != []:
next_url = urllib.parse.urljoin(page.url, next_button[0]['href'])
else:
next_url = None
return quotes, next_url

process('http://quotes.toscrape.com/page/1/')


Typing a while statement before the return function returns the same code, while typing a while statement as the first line after defining the function returns nothing. I have a feeling it has something to do with the page number, as this goes up to page/10/, but I can't quite figure it out. http://quotes.toscrape.com/ also returns the first page.

1 Expert Answer

By:

Mitchell F. answered • 12/31/19

Tutor
New to Wyzant

Software Engineer Specializing in Python

Rei T.

Much thanks. I'm also trying to work on compiling of the quotes into a txt file. Using your function, I've been able to write the quotes into a file, but I want each quote to display in a new line. What I've tried so far displays all of the quotes as one large block of text, and the code is only accepted if it's in bytes and not a string or list. f = open('quotes.txt', 'wb') for quote in process('http://quotes.toscrape.com/page/{}/'): f.write(quote.encode())
Report

01/01/20

Still looking for help? Get the right answer, fast.

Ask a question for free

Get a free answer to a quick problem.
Most questions answered within 4 hours.

OR

Find an Online Tutor Now

Choose an expert and meet online. No packages or subscriptions, pay only for the time you need.