Data Cleaning & Data Scraping

Image credit: NTU OSS Publicity Material


In this age of data-driven world, scraping and crawling content from the web to create datasets is a crucial skill to have in your portfolio. This workshop aims to give attendees a brief introduction into data scraping from web pages to cleaning the scraped data. By the end of the workshop, attendees should have acquired some hands-on experience with the topic by creating their very own datasets.

Attendees will learn about the entire data preparation and collection stage in a machine learning pipeline. We will be using scrapy (a python web-crawling framework) to scrape content from webpages and use various python libraries to preprocess the data. By the end of the workshop, the attendees will create a news headlines dataset (text data) for sentiment analysis task and a Binary image classification dataset (Paintings Vs Photographs).

Sep 25, 2020 6:30 PM — Sep 25, 2021 8:30 PM