This repository has been archived by the owner on Oct 9, 2023. It is now read-only.
Extending the Question Answering Models to Visual Question Answering #1382
Labels
enhancement
New feature or request
help wanted
Extra attention is needed
won't fix
This will not be worked on
馃殌 Feature
Extending the idea of Question Answering to Visual Question Answering
Motivation
I was going through the example and was interested in using transformers for the purpose of Visual Question Answering (could not find many resources related to the same as code), so I thought of contributing my own implementation (implemented in PyTorch), for the same. I believe that the implementation is simple enough to be quickly able to fine-tune on any dataset with ease.
Pitch
I am not sure about how to pitch, but I have managed to implement the model and get fair results on the same model. I want to extend the applicability of the model for any dataset and since this is a multi-modal model, it would be helpful for the research community as well
Alternatives
Not sure about it, since this is a model contribution.
Additional context
Here is the implementation for the same here
What does this implementation contain?
The text was updated successfully, but these errors were encountered: