I have used a code of Flava model from this link:
https://huggingface.co/docs/transformers/model_doc/flava#transformers.FlavaModel.forward.example
But I am getting the following error:
'FlavaModelOutput' object has no attribute 'contrastive_logits_per_image'
I tried using FlavaForPreTraining
model instead, so updated code was :
from PIL import Image
import requests
from transformers import FlavaProcessor, FlavaForPreTraining
model = FlavaForPreTraining.from_pretrained("facebook/flava-full")
processor = FlavaProcessor.from_pretrained("facebook/flava-full")
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(text=["a photo of a cat"], images=image, return_tensors="pt", padding=True, return_codebook_pixels = True)
inputs.update(
{
"input_ids_masked": inputs.input_ids,
}
)
outputs = model(**inputs)
logits_per_image = outputs.contrastive_logits_per_image # this is the image-text similarity score
probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities
but I'm still getting this as error:
/usr/local/lib/python3.7/dist-packages/transformers/modeling_utils.py:714: FutureWarning: The `device` argument is deprecated and will be removed in v5 of Transformers.
"The `device` argument is deprecated and will be removed in v5 of Transformers.", FutureWarning
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-44-bdb428b8184a> in <module>()
----> 1 outputs = model(**inputs)
2 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1129 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130 return forward_call(*input, **kwargs)
1131 # Do not call functions when jit is used
1132 full_backward_hooks, non_full_backward_hooks = [], []
/usr/local/lib/python3.7/dist-packages/transformers/models/flava/modeling_flava.py in forward(self, input_ids, input_ids_masked, pixel_values, codebook_pixel_values, attention_mask, token_type_ids, bool_masked_pos, position_ids, image_attention_mask, skip_unmasked_multimodal_encoder, mlm_labels, mim_labels, itm_labels, output_attentions, output_hidden_states, return_dict, return_loss)
1968 if mim_labels is not None:
1969 mim_labels = self._resize_to_2d(mim_labels)
-> 1970 bool_masked_pos = self._resize_to_2d(bool_masked_pos)
1971 mim_labels[bool_masked_pos.ne(True)] = self.ce_ignore_index
1972
/usr/local/lib/python3.7/dist-packages/transformers/models/flava/modeling_flava.py in _resize_to_2d(self, x)
1765
1766 def _resize_to_2d(self, x: torch.Tensor):
-> 1767 if x.dim() > 2:
1768 x = x.view(x.size(0), -1)
1769 return x
AttributeError: 'NoneType' object has no attribute 'dim'
Can anyone provide suggestions with what's going wrong?
CodePudding user response:
FLAVA's author here.
Can you please add the following arguments to your processor call:
return_codebook_pixels=True, return_image_mask=True
Here is an example colab if you want to see how to call FLAVA model: https://colab.research.google.com/drive/1c3l4r4cEA5oXfq9uXhrJibddwRkcBxzP?usp=sharing#scrollTo=xtkrSjfhCdv-