Deep Vision

Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, Alan L. Yuille, Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images, arXiv:1504.06692

UML / UT

Subhashini Venugopalan, Huijuan Xu, Jeff Donahue, Marcus Rohrbach, Raymond Mooney, Kate Saenko, Translating Videos to Natural Language Using Deep Recurrent Neural Networks, NAACL-HLT, 2015.

CMU / Microsoft

Xinlei Chen, C. Lawrence Zitnick, Learning a Recurrent Visual Representation for Image Caption Generation, arXiv:1411.5654.

MS + Berkeley

MS + City Univ. of HongKong

UCLA / Baidu

UML / UT

CMU / Microsoft

Univ. Montreal / Univ. Toronto

Deep Visual-Semantic Alignments for Generating Image Description

CVPR, 2015.

Explain Images with Multimodal Recurrent Neural Networks

arXiv:1410.1090.

Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models

arXiv:1411.2539.

Show and Tell: A Neural Image Caption Generator

arXiv:1411.4555.

Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation

CVPR 2015

From Captions to Visual Concepts and Back

CVPR, 2015.

Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention

arXiv:1502.03044 / ICML 2015

Phrase-based Image Captioning

arXiv:1502.03671 / ICML 2015

Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images

arXiv:1504.06692

Language Models for Image Captioning: The Quirks and What Works

arXiv:1505.01809

Image Captioning with an Intermediate Attributes Layer

arXiv:1506.01144

Learning language through pictures

arXiv:1506.03694

Image Representations and New Domains in Neural Image Captioning

arXiv:1508.02091

Learning Query and Image Similarities with Ranking Canonical Correlation Analysis

ICCV, 2015

Univ. Montreal / Univ. Toronto

Kelvin Xu, Jimmy Lei Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, Richard S. Zemel, Yoshua Bengio, Show, Attend, and Tell: Neural Image Caption Generation with Visual Attention, arXiv:1502.03044 / ICML 2015

Video Captioning

Univ. Montreal / Univ. Sherbrooke

Univ. Toronto / MIT

TAU / USC

Univ. Montreal / Univ. Sherbrooke

Univ. Toronto / MIT

TAU / USC

Long-term Recurrent Convolutional Networks for Visual Recognition and Description

CVPR, 2015.

Translating Videos to Natural Language Using Deep Recurrent Neural Networks

arXiv:1412.4729.

Describing Multimedia Content using Attention-based Encoder-Decoder Networks

arXiv:1507.01053

Joint Modeling Embedding and Translation to Bridge Video and Language

arXiv:1505.01861.

Sequence to Sequence--Video to Text

arXiv:1505.00487.

Describing Videos by Exploiting Temporal Structure

arXiv:1502.08029

The Long-Short Story of Movie Description

arXiv:1506.01698

Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books

arXiv:1506.06724

Temporal Tessellation for Video Annotation and Summarization

arXiv:1612.06950.

Question Answering

MPI / Berkeley

Virginia Tech / MSR

Baidu / UCLA

CMU / Microsoft Research

CMU / Microsoft Research

VQA: Visual Question Answering

CVPR, 2015 SUNw:Scene Understanding workshop.

Ask Your Neurons: A Neural-based Approach to Answering Questions about Images

arXiv:1505.01121.

Image Question Answering: A Visual Semantic Embedding Model and a New Dataset

arXiv:1505.02074 / ICML 2015 deep learning workshop.

Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering

arXiv:1505.05612.

Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction

arXiv:1511.05765

Stacked Attention Networks for Image Question Answering

arXiv:1511.02274.

Dynamic Memory Networks for Visual and Textual Question Answering.

arXiv:1603.01417 (2016).

Multimodal Residual Learning for Visual QA

arXiv01455

Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding

arXiv:1606.01847

Training Recurrent Answering Units with Joint Loss Minimization for VQA

arXiv:1606.03647

Hadamard Product for Low-rank Bilinear Pooling

arXiv:1610.04325.

Image Generation

Convolutional / Recurrent Networks 166 updated 10y ago

Aäron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, Koray Kavukcuoglu. "Conditional Image Generation with PixelCNN Decoders"[[Paper]](https://arxiv.org/pdf/1606.05328v2.pdf)

Adversarial Networks 4.0k updated 6y ago

Jun-Yan Zhu, Philipp Krahenbuhl, Eli Shechtman, and Alexei A. Efros, "Generative Visual Manipulation on the Natural Image Manifold", ECCV 2016. [Paper] [Video]

Visual Attention and Saliency

Mr-CNN

Predicting Eye Fixations using Convolutional Neural Networks, CVPR, 2015.

Learning a Sequential Search for Landmarks

Predicting Eye Fixations using Convolutional Neural Networks, CVPR, 2015.

Multiple Object Recognition with Visual Attention

Predicting Eye Fixations using Convolutional Neural Networks, CVPR, 2015.

Recurrent Models of Visual Attention

Predicting Eye Fixations using Convolutional Neural Networks, NIPS, 2014.

Recurrent Models of Visual Attention

Human Pose Estimation

BVLC Caffe 34.6k updated 2y ago

The deep learning framework used for object detection and recognition.

Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields 34.0k updated 2y ago

CVPR, 2017.

Stacked hourglass networks for human pose estimation

ECCV, 2016.

Joint training of a convolutional network and a graphical model for human pose estimation

NIPS, 2014.

Convolutional Pose Machines

Official implementation of Convolutional Pose Machines.

Stacked Hourglass Networks

Official implementation of Stacked Hourglass Networks for human pose estimation.

Flowing ConvNets

Official implementation of Flowing ConvNets for human pose estimation in videos.

Object Recognition

FV-CNN

Deep Filter Banks for Texture Recognition and Segmentation, CVPR, 2015.

Sequence Labeling

Seqeval

A practical and popular Python library for sequence labeling tasks.

Text-to-Image Synthesis

Stable Diffusion

A latent text-to-image diffusion model.

Latent Diffusion Models 14.0k updated 2y ago

Latent diffusion models for high-resolution image synthesis.

Image-Text Representation Learning

CLIP 33.4k updated 4mo ago

Contrastive Language-Image Pre-training from OpenAI.

NLP Models

Hugging Face Transformers 160.1k updated 3mo ago

State-of-the-art Natural Language Processing for TensorFlow 2.x and PyTorch.

UniLM 22.1k updated 6mo ago

Unified Language Model Pre-training for Natural Language Understanding and Generation.

BERT

Bidirectional Encoder Representations from Transformers.

Image Classification

Vision Transformer 36.7k updated 3mo ago

A PyTorch implementation of the Vision Transformer (ViT) model.

Deep Learning Examples 14.8k updated 1y ago

This repository provides examples of training deep learning models on NVIDIA GPUs.

torchvision 17.7k updated 3mo ago

Datasets, models, and transforms for computer vision in PyTorch.

Sequence Modeling

Fairseq 32.2k (archived)

A sequence modeling toolkit that allows researchers to train custom models for transcription and translation.

Deep Learning Framework

TensorFlow 195.0k updated 3mo ago

An open source machine learning framework for everyone.

Keras 64.1k updated 3mo ago

Deep Learning for humans.

PyTorch 99.6k updated 3mo ago

Tensors and Dynamic neural networks in Python with strong GPU acceleration.

Computer Vision Models

cvnets 2.0k updated 2y ago

A versatile Python library for implementing and training state-of-the-art computer vision models.

3D Computer Vision

PyTorch3D 9.9k updated 4mo ago

A library for deep 3D computer vision research, built off PyTorch.

Generative Models

StyleGAN 14.4k updated 2y ago

Official PyTorch implementation of StyleGAN.

StyleGAN2 11.2k updated 2y ago

Official PyTorch implementation of StyleGAN2.

StyleGAN3 6.9k updated 2y ago

Official PyTorch implementation of StyleGAN3.

StyleGAN2-PyTorch 2.8k updated 2y ago

A PyTorch implementation of StyleGAN2.

Self-Attention GAN

Official implementation of the paper 'Self-Attention Generative Adversarial Networks'.

Self-Attention GAN (PyTorch)

A PyTorch implementation of Self-Attention GAN.

GANs

A collection of Generative Adversarial Network implementations.

StyleGAN2-ADA 4.5k updated 2y ago

Official PyTorch implementation of StyleGAN2-ADA.

Image-to-Image Translation

pix2pixHD

High-Resolution Image-to-Image Synthesis and Semantic Manipulation.

CycleGAN

Official implementations of CycleGAN, pix2pix, and related image-to-image translation models.

Music and Art Generation

Magenta 19.8k (archived)

Magenta is a research project by Google exploring the role of machine learning in creating art and music.

Adversarial Attacks

Adversarial Examples

A repository containing PyTorch implementations of various adversarial attack methods against CNNs.

Reinforcement Learning

Deep Reinforcement Learning

A toolkit for deep reinforcement learning, featuring implementations of popular algorithms.

Baselines 16.7k updated 2y ago

OpenAI Baselines: reliable implementations of reinforcement learning algorithms.

Distributed Computing

Ray 42.4k updated 3mo ago

Ray is a fast and simple framework for building distributed applications.

Dask 13.8k updated 3mo ago

Parallel computing with Python.

Big Data Processing

Apache Spark 43.2k updated 3mo ago

Unified analytics engine for large-scale data processing.

Monitoring

Prometheus 6.1k updated 3mo ago

Prometheus Helm charts maintained by the community.

Grafana 73.5k updated 3mo ago

The leading tool for visualizing, monitoring and discovering your metrics.

Search and Analytics

Elasticsearch 76.6k updated 3mo ago

Open source, distributed, RESTful search and analytics engine.

Log Analysis

Logloki

An advanced log analysis tool based on deep learning.

Log Datasets

Loghub 2.7k updated 3mo ago

A comprehensive dataset for log analysis.

Inference Engine

ONNX Runtime 20.4k updated 3mo ago

A performance-focused cross platform machine learning inference and training accelerator.

Deep Learning Compiler

Apache TVM 13.3k updated 3mo ago

An open deep learning compiler stack for CPUs, GPUs, and specialized accelerators.

Model Serving

TensorFlow Serving 6.3k updated 3mo ago

A flexible, high-performance serving system for machine learning models, designed for production environments.

TorchServe 4.4k (archived)

A flexible and easy-to-use tool for serving PyTorch models.

Model Format

ONNX 20.7k updated 3mo ago

Open Neural Network Exchange format.

Image Datasets

ImageNet

A dataset of millions of labeled images for object recognition.

CIFAR Datasets

A collection of datasets for image classification tasks (CIFAR-10, CIFAR-100).

Object Detection and Segmentation Datasets

COCO Dataset

Common Objects in Context dataset for object detection, segmentation, and captioning.

Image-Text Datasets

LAION-5B

The largest publicly available dataset of image-text pairs.

Reinforcement Learning Environments

ViZDoom

A toolkit for experimenting with AI agents in the Doom game.

DeepMind Control Suite

A toolkit for controlling agents in simulated physics environments.

OpenAI Gym 37.2k (archived)

A toolkit for developing and comparing reinforcement learning algorithms.

Mixed Precision Training

Apex 9.0k updated 3mo ago

A PyTorch extension for distributed and mixed precision training.

Software

Applications

Source code for the paper "Holistically-Nested Edge Detection", ICCV 2015. 1.9k updated 2y ago

Code and hyperparameters for the paper "Generative Adversarial Networks" 4.1k updated 6y ago

Source code for "Understanding Deep Image Representations by Inverting Them," CVPR, 2015. 168 updated 8y ago

Source code for the paper "Rich feature hierarchies for accurate object detection and semantic segmentation," CVPR, 2014. 2.4k updated 9y ago

Source code for the paper "Fully Convolutional Networks for Semantic Segmentation," CVPR, 2015. 81 updated 9y ago

Image Super-Resolution for Anime-Style-Art 28.2k updated 3y ago

Source code for the paper "DeepContour: A Deep Convolutional Feature Learned by Positive-Sharing Loss for Contour Detection," CVPR, 2015. 95 updated 4y ago

Framework

torchnet 992 (archived)

Blocks 1.2k updated 7y ago

Lasagne 3.9k updated 4y ago

Deepgaze 1.9k updated 2y ago

A computer vision library for human-computer interaction based on CNNs

Tutorials

Applied Deep Learning for Computer Vision with Torch 874 updated 9y ago

Question Answering

Virginia Tech / MSR updated 8y ago

VQA: Visual Question Answering, CVPR, 2015 SUNw:Scene Understanding workshop.

MPI / Berkeley

Ask Your Neurons: A Neural-based Approach to Answering Questions about Images, arXiv:1505.01121.

Toronto

Image Question Answering: A Visual Semantic Embedding Model and a New Dataset, arXiv:1505.02074 / ICML 2015 deep learning workshop.

Baidu / UCLA

Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering, arXiv:1505.05612.

POSTECH

Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction, arXiv:1511.05765

CMU / Microsoft Research

Stacked Attention Networks for Image Question Answering. arXiv:1511.02274.

MetaMind

Dynamic Memory Networks for Visual and Textual Question Answering. arXiv:1603.01417 (2016).

SNU + NAVER

Multimodal Residual Learning for Visual QA, arXiv01455

UC Berkeley + Sony

Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding, arXiv:1606.01847

Postech

Training Recurrent Answering Units with Joint Loss Minimization for VQA, arXiv:1606.03647

SNU + NAVER

Hadamard Product for Low-rank Bilinear Pooling, arXiv:1610.04325.

Deep Vision

Contents

Papers

Object Detection

Object Tracking

Other Applications

Semantic Segmentation

Understanding CNN

Image Captioning

Video Captioning

Question Answering

Image Generation

Other Topics

Visual Attention and Saliency

Human Pose Estimation

Object Recognition

Sequence Labeling

Text-to-Image Synthesis

Image-Text Representation Learning

NLP Models

Image Classification

Sequence Modeling

Deep Learning Framework

Computer Vision Models

3D Computer Vision

Generative Models

Image-to-Image Translation

Music and Art Generation

Adversarial Attacks

Reinforcement Learning

Distributed Computing

Big Data Processing

Monitoring

Search and Analytics

Log Analysis

Log Datasets

Inference Engine

Deep Learning Compiler

Model Serving

Model Format

Image Datasets

Object Detection and Segmentation Datasets

Image-Text Datasets

Reinforcement Learning Environments

Mixed Precision Training

Software

Applications

Framework

Tutorials

Question Answering