Skip to content

Identification and Classification of Toxic comments using Machine Learning

Notifications You must be signed in to change notification settings

someaditya/toxicity-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 

Repository files navigation

Welcome tho Toxicity Classification

Open In Colab

Dataset

https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/data

Embeddings

FastText : 2 million word vectors trained on Common Crawl (600B tokens) : crawl-300d-2M.vec.zip: https://dl.fbaipublicfiles.com/fasttext/vectors-english/crawl-300d-2M.vec.zip

GloVe: Twitter (2B tweets, 27B tokens, 1.2M vocab, uncased, 25d, 50d, 100d, & 200d vectors, 1.42 GB download): glove.twitter.27B.zip http://nlp.stanford.edu/data/glove.twitter.27B.zip

Project Setup

This machine learning project uses relatively smaller datasets, so development was performed using Google Colab.

Jupyter Notebook file:

toxicity_classification.ipynb

Python libraries required:

numpy pandas matplotlib.pyplot IPython.display seaborn sklearn scipy keras skmultilearn scikitplot

About

Identification and Classification of Toxic comments using Machine Learning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published