Deep Vision
Contents
- Object Detection
- Object Tracking
- Other Applications
- Semantic Segmentation
- Understanding CNN
- Image Captioning
- Video Captioning
- Question Answering
- Image Generation
- Other Topics
- Visual Attention and Saliency
- Human Pose Estimation
- Object Recognition
- Sequence Labeling
- Text-to-Image Synthesis
- Image-Text Representation Learning
- NLP Models
- Image Classification
- Sequence Modeling
- Deep Learning Framework
- Computer Vision Models
- 3D Computer Vision
- Generative Models
- Image-to-Image Translation
- Music and Art Generation
- Adversarial Attacks
- Reinforcement Learning
- Distributed Computing
- Big Data Processing
- Monitoring
- Search and Analytics
- Log Analysis
- Log Datasets
- Inference Engine
- Deep Learning Compiler
- Model Serving
- Model Format
- Image Datasets
- Object Detection and Segmentation Datasets
- Image-Text Datasets
- Reinforcement Learning Environments
- Mixed Precision Training
Papers
Object Detection
Detectron2 is FAIR's next-generation library that enables state-of-the-art computer vision research.
A framework for building and training object detection models in TensorFlow.
Implementation of Mask R-CNN, an algorithm for object instance segmentation.
Object Tracking
Other Applications
Semantic Segmentation
Understanding CNN
Matthrew Zeiler, Rob Fergus, Visualizing and Understanding Convolutional Networks, ECCV, 2014.
[[Paper]](https://www.cs.nyu.edu/~fergus/papers/zeilerECCV2014.pdf)
Image Captioning
Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, Alan L. Yuille, Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images, arXiv:1504.06692
Subhashini Venugopalan, Huijuan Xu, Jeff Donahue, Marcus Rohrbach, Raymond Mooney, Kate Saenko, Translating Videos to Natural Language Using Deep Recurrent Neural Networks, NAACL-HLT, 2015.
Xinlei Chen, C. Lawrence Zitnick, Learning a Recurrent Visual Representation for Image Caption Generation, arXiv:1411.5654.
arXiv:1502.03044 / ICML 2015
Video Captioning
Question Answering
arXiv:1505.02074 / ICML 2015 deep learning workshop.
arXiv:1505.05612.
arXiv:1511.05765
Image Generation
Other Topics
Visual Attention and Saliency
Predicting Eye Fixations using Convolutional Neural Networks, CVPR, 2015.
Predicting Eye Fixations using Convolutional Neural Networks, CVPR, 2015.
Human Pose Estimation
The deep learning framework used for object detection and recognition.
CVPR, 2017.
NIPS, 2014.
Text-to-Image Synthesis
Image-Text Representation Learning
NLP Models
Image Classification
Sequence Modeling
Deep Learning Framework
Computer Vision Models
3D Computer Vision
Generative Models
Image-to-Image Translation
Music and Art Generation
Adversarial Attacks
Reinforcement Learning
Distributed Computing
Big Data Processing
Monitoring
Search and Analytics
Inference Engine
Deep Learning Compiler
Model Serving
Image Datasets
Object Detection and Segmentation Datasets
Image-Text Datasets
Reinforcement Learning Environments
Mixed Precision Training
Software
Applications
Question Answering
VQA: Visual Question Answering, CVPR, 2015 SUNw:Scene Understanding workshop.
Ask Your Neurons: A Neural-based Approach to Answering Questions about Images, arXiv:1505.01121.
Image Question Answering: A Visual Semantic Embedding Model and a New Dataset, arXiv:1505.02074 / ICML 2015 deep learning workshop.
Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering, arXiv:1505.05612.
Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction, arXiv:1511.05765
Dynamic Memory Networks for Visual and Textual Question Answering. arXiv:1603.01417 (2016).