Advance your web scraping skills by programmatically extracting book titles and prices from a structured demo website. Gain practical experience using Python's requests library and Beautiful Soup to navigate and parse HTML content.
Key Insights
- The article introduces books2scrape.com, a dedicated demo website designed specifically to practice web scraping techniques, offering structured data such as book titles, prices, and ratings.
- Readers are encouraged to use Python's requests library and Beautiful Soup to systematically find and extract specific elements, such as full book titles (non-truncated) and book prices converted into decimal point numbers.
- The practice exercise emphasizes solving a realistic web scraping problem, reinforcing previously demonstrated techniques and encouraging learners to independently identify HTML elements and retrieve the desired data.
Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.
We've been playing around a bunch with scraping. I'd like to give you folks a real challenge here. So, this page, books2scrape.com, is made just for these purposes.
It's some kind person made a demo website for web scraping purposes. Just assigned some random prices and ratings to these books that they had data on, and now it's up to you to practice your web scraping with it. So, inspect pages, inspect elements, try to find the identifying features you might need.
And now, here's your task. First, find the titles for all the books on this page and print them. So, A Light in the Attic, Tipping the Velvet, Submission, again, you could copy and paste, sure, but write something programmatic that could do it.
And then do the same thing for the prices. 51.77, 53.74, 50.10, et cetera. There are some nice bonuses.
How to get the non-truncated version of the title, little hint. Convert the prices to decimal point numbers, another little hint. And you shouldn't have to do anything we didn't do already.
Hit up the page with the requests library. Create a querying object variable with Beautiful Soup. Use that variable to query the HTML for the element you want.
This is the hard part, not because the querying is particularly hard, because you have to figure out, hey, where in this HTML, where in this Beautiful Soup of markup code where actually is the words A Light in the Attic in all of this? And then call get text on it if it's a single element or create a list where you call get text on every element. All right, I'm gonna let you folks get started on this. And I recommend you give this a really strong go.
You'll learn a lot if you are really engaging with a project like this, where you have the opportunity now to go back through and look at a carefully structured assignment that is, you know, if you look back through all the stuff that we have covered, we'll use the exact same techniques. See if you can generalize it to a new situation. And in the next video, we'll go over how I might do this.
See you folks there.