Personal projects

Detox

Detox address the increasing toxicity of the internet with easy to integrate web service that offers toxicity filtering to any web-based platform. We trained a multilingual model to classify and filter toxic content in your messages or webpages. We created an API endpoint for the model and deployed it in an EC2 instance. To handle high traffic, we attached a load balancer to the instance which would scale based on the traffic.

Language: Python (3.8.0), Java
Tech stack: Flask, Tensorflow, Transformers, sentencepiece

Matterix

banner

MatterIx is a deep learning framework built from scratch with just numpy. I started this project to understand the fundamental concepts of autodiff, optimizers and loss functions from a first principle basis. It provide features such as automatic differentiation (autodiff), optimizers, loss functions and basic modules to create your own neural networks.

Language: Python (3.8.0)
Tech stack: Numpy

Newscast API

Newscast API is a simple REST API to get you all the news articles for any given query word.
The API provides headlines, source of the articles, published timestamps, urls and various other useful data which potentially has numerous practical use cases such as tracking sentiment of a specific person in news, searching for buzz words and so on.

Language: Python (3.7.9)
Tech stack: FLASK, Scrapy, APScheduler, Pandas, MongoDB, Heroku

Learnify.AI

A web application for students to visualize their notes as a knowledge graph and helps to revise topics based on the amount of study time available for a given course.
It helps in visualizing the connections between topics in lecture improving the understanding of concepts and increases the retention.
I also worked on a feature to provide the most important topics that could be learned in the given time for a given study duration and which helps in better time management during revision.

Language: Python, Javascript
Tech stack: R​eact, FLASK, Pandas, Numpy, PyTorch, Bcrypt, Yake, sklearn, MongoDB

Multilingual Toxic Comment Classification Model

The project was based on the kaggle competition Jigsaw Multilingual Toxic Comment Classification, which is a binary classification problem but the tricky part of the competition is that it basically tests your model in zero-shot learning, i.e the training data will be in english but the test data will be in several dfferent languages. Given the size of the data, I was required to train the model using kaggle's TPU.

Language: Python
Tech stack: PyTorch, PyTorch XLA

Indie-Threads

A free platform for gamers to explore the indie gaming community, discussion, promotion by/for developers and players. To spread word of mouth thereby reducing strain on the high budget of advertising. And we'll not be collecting any personal data of user/visitor. Big gaming companies uses their major funds in advertising by paying off the streamers to spread the word of mouth among their viewers and others, this platform would provide them with the same help without the high budget requirement.

Language: Javascript
Tech stack: React.js, Firebase, Netlify

DataAnnotated

Data annotation is a crucial task in creating custom models for very specific tasks. But its really tedius to manage all the data in the annotation platform and development environment.
DataAnnotated is platform which offers data annotation service for various tasks and provides a seamless integration with the development environment with a python package. The data is updated real-time and can directly be fetch as pre-processed inputs using the package. The project tends to solve the latency and trouble of passing and managing data in a lot of experimental environments.

Language: Python, Javascript
Tech stack: React.js, Express.js, Node.js, MongoDB, Heroku

Multicategorical Recipe Classification Model

This project's goal is to classify any given recipe to different cuisines. To start the project I performed some exploratory data analysis (EDA) and observed that there was a class imbalance and hence I made the distribution normal and used GloVe embedding to encode text data. After which I trained a recurrent neural network using `Tf2.0 (TensorFlow 2.0)` to classify given recipe and validated using K-Fold Cross Validation technique. I was able to achieved 98.27% accuracy score on validation data. Finally, I Used a tensorFlow projector to visualize word embedding in 3D Space.

Language: Python
Tech stack: Tensorflow2.0, Pandas, Numpy, Tensorflowprojector

Photographs Vs Painting Classifier

This project managed to cover an entire machine learning pipeline from data collection to model deployment. First, I used scrapy to crawl images for both the classes and performed image preprocessing. After which I trained a image classification model (resnet50) using fastai library, with which I was able to achieve 96% accuracy score on validation data in spite of noises in the dataset. Finally, I created a client and server-side using starlette (AGSI toolkit), Javascript, HTML, CSS and deployed the model in the local server.

Language: Python
Tech stack: Scrapy, FastAI, Pandas, Numpy