Detox address the increasing toxicity of the internet with easy to integrate web service that offers toxicity filtering to any web-based platform. We trained a multilingual model to classify and filter toxic content in your messages or webpages. We created an API endpoint for the model and deployed it in an EC2 instance. To handle high traffic, we attached a load balancer to the instance which would scale based on the traffic.Language: Python (3.8.0), Java
Tech stack: Flask, Tensorflow, Transformers, sentencepiece
MatterIx is a deep learning framework built from scratch with just numpy. I started this project to understand the fundamental concepts of autodiff, optimizers and loss functions from a first principle basis. It provide features such as automatic differentiation (autodiff), optimizers, loss functions and basic modules to create your own neural networks.Language: Python (3.8.0)
Tech stack: Numpy
Newscast API is a simple REST API to get you all the news articles for any given query word.
The API provides headlines, source of the articles, published timestamps, urls and various other useful data which potentially has numerous practical use cases such as tracking sentiment of a specific person in news, searching for buzz words and so on.
Tech stack: FLASK, Scrapy, APScheduler, Pandas, MongoDB, Heroku
A web application for students to visualize their notes as a knowledge graph and helps to revise topics based on the amount of study time available for a given course.
It helps in visualizing the connections between topics in lecture improving the understanding of concepts and increases the retention.
I also worked on a feature to provide the most important topics that could be learned in the given time for a given study duration and which helps in better time management during revision.
Tech stack: React, FLASK, Pandas, Numpy, PyTorch, Bcrypt, Yake, sklearn, MongoDB
The project was based on the kaggle competition Jigsaw Multilingual Toxic Comment Classification, which is a binary classification problem but the tricky part of the competition is that it basically tests your model in zero-shot learning, i.e the training data will be in english but the test data will be in several dfferent languages. Given the size of the data, I was required to train the model using kaggle's TPU.Language: Python
Tech stack: PyTorch, PyTorch XLA
Tech stack: React.js, Firebase, Netlify
Data annotation is a crucial task in creating custom models for very specific tasks. But its really tedius to manage all the data in the annotation platform and development environment.
DataAnnotated is platform which offers data annotation service for various tasks and provides a seamless integration with the development environment with a python package. The data is updated real-time and can directly be fetch as pre-processed inputs using the package. The project tends to solve the latency and trouble of passing and managing data in a lot of experimental environments.
Tech stack: React.js, Express.js, Node.js, MongoDB, Heroku
This project's goal is to classify any given recipe to different cuisines. To start the project I performed some exploratory data analysis (EDA) and observed that there was a class imbalance and hence I made the distribution normal and used GloVe embedding to encode text data. After which I trained a recurrent neural network using `Tf2.0 (TensorFlow 2.0)` to classify given recipe and validated using K-Fold Cross Validation technique. I was able to achieved 98.27% accuracy score on validation data. Finally, I Used a tensorFlow projector to visualize word embedding in 3D Space.Language: Python
Tech stack: Tensorflow2.0, Pandas, Numpy, Tensorflowprojector
Tech stack: Scrapy, FastAI, Pandas, Numpy