Web Scraping using Python
Venturing into Machine Learning, I quickly realized the need for good datasets. Web Scraping can come in very handy when datasets aren’t easily available.
It is important to use these web scraping bots in moderation and in accordance with the terms and conditions of the websites being scraped.
Static websites
Here’s how to use requests
and BeautifulSoup
to get started scraping websites using python.
Install these packages if needed:
Here’s a code snippet that fetches five headlines from BBC News
Here are the docs for BeautifulSoup.
Dynamic websites
These days, with many websites fetching data after initial page load, above method won’t cut it anymore. requests
can’t process the javascript code like a browser. Enter Selenium . A WebDriver is needed for Selenium to work. I use ChromeDriver which can be downloaded here.
Install selenium
if needed:
Here’s a code snippet that also fetches five headlines from BBC News
Here are the unofficial docs for Selenium.