You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# only last token for inputs_ids if the state is passed along.
ifcache_paramsisnotNone:
input_ids=input_ids[:, -1].unsqueeze(-1)
ifinputs_embedsisnotNoneandcache_paramsisNone:
model_inputs= {"inputs_embeds": inputs_embeds}
else:
model_inputs= {"input_ids": input_ids}
model_inputs["cache_params"] =cache_params
returnmodel_inputs
I noticed when I wanted to get the cache when using model.generate, but it was not there although I set use_cache=True.
Edit: Just saw that GenerateDecoderOnlyOutput would have to be adjusted as well. It would need to contain cache_params similarly to past_key_values. I don't know if it's okay for you to bloat that even more.
Hi :)
I think that use_cache is supposed to be passed through here as well:
transformers/src/transformers/models/mamba/modeling_mamba.py
Lines 634 to 647 in 9fd606d
I noticed when I wanted to get the cache when using
model.generate
, but it was not there although I setuse_cache=True
.Edit: Just saw that
GenerateDecoderOnlyOutput
would have to be adjusted as well. It would need to containcache_params
similarly topast_key_values
. I don't know if it's okay for you to bloat that even more.transformers/src/transformers/generation/utils.py
Lines 2244 to 2251 in 9fd606d
Cheers and thanks for the great work!
The text was updated successfully, but these errors were encountered: