Skip to content

Latest commit

 

History

History
65 lines (41 loc) · 4.67 KB

dl_libraries.md

File metadata and controls

65 lines (41 loc) · 4.67 KB

Awesome Deep Learning Libraries

I have listed some awesome libraries which I found useful for most machine learning practices. These libraries can make things easier and boost your productivity.

General

Computer Vision

  • mmcv: OpenMMLab Computer Vision Foundation
  • MMClassification: OpenMMLab Image Classification Toolbox and Benchmark
  • MMDetection: OpenMMLab Detection Toolbox and Benchmark
  • MMAction2: OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
  • MMSegmentation: OpenMMLab Semantic Segmentation Toolbox and Benchmark.
  • OpenSelfSup: Self-Supervised Learning Toolbox and Benchmark

Natural Language Processing

  • autonlp: AutoNLP: train state-of-the-art natural language processing models and deploy them in a scalable environment automatically
  • HuggingFaceTransformer: Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.
  • fairseq: Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Data

  • DALI: A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
  • AugLy: A data augmentations library for audio, image, text, and video.
  • Open3D: Open3D: A Modern Library for 3D Data Processing
  • HuggingFaceTokenizer: Fast State-of-the-Art Tokenizers optimized for Research and Production
  • HuggingFaceDatasets: The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools

Accelerating Training

  • Apex: A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
  • ApexDataPrefetcher: prefetch data to hide data I/O cost.
  • Horovod: Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
  • Checkpoint: A PyTorch function which implements activation checkpointing.
  • TorchPipe: A GPipe implementation in PyTorch
  • PowerSGD Communication Hook: PowerSGD (Vogels et al., NeurIPS 2019) is a gradient compression algorithm, which can provide very high compression rates and accelerate bandwidth-bound distributed training.
  • Accelerate: A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision
  • lightseq: LightSeq: A High Performance Library for Sequence Processing and Generation

Large-Scale Distributed Training

  • Megatron: Ongoing research training transformer language models at scale, including: BERT & GPT-2
  • DeepSpeed: DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
  • Ray: An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

Utilities

  • Tensorboard: TensorFlow's Visualization Toolkit
  • KnockKnock: Get notified when your training ends with only two additional lines of code
  • Neptune: Lightweight experiment tracking tool for AI/ML individuals and teams. Fits any workflow.
  • netron: Visualizer for neural network, deep learning, and machine learning models
  • scalene: Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python