In this age of data-driven world, scraping and crawling content from the web to create datasets is a crucial skill to have in your portfolio. This workshop aims to give attendees a brief introduction into data scraping from web pages to cleaning the scraped data. By the end of the workshop, attendees should have acquired some hands-on experience with the topic by creating their very own datasets.
Attendees will learn about the entire data preparation and collection stage in a machine learning pipeline. We will be using scrapy (a python web-crawling framework) to scrape content from webpages and use various python libraries to preprocess the data. By the end of the workshop, the attendees will create a news headlines dataset (text data) for sentiment analysis task and a Binary image classification dataset (Paintings Vs Photographs).