transformers-like-implementation

#1

Use the siglip implementation from hf SiglipModel + add flash-attn 2 + get the model.safetensor from google/siglip-so400m-patch14-384

Leyo changed pull request status to open

not that it matters but is there a reason to use nn.init.normal_ instead of nn.init.xavier_uniform_?

thank you for this, looks good!

Because with nn.init.xavier_uniform_ I would get ValueError("Fan in and fan out can not be computed for tensor with fewer than 2 dimensions"). I think this is due to ds zero3, but since I was not getting it for the nn.init.normal, and we use a pretrained checkpoint, I thought it was simpler to just switch to normal.
Alternatively I can use a context manager or get rid of it all together

Leyo changed pull request status to merged

Sign up or log in to comment