Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
1
1
1
Xiaoxia Wu
xiaoxiawu123
Follow
21world's profile picture
1 follower
·
0 following
xwuShirley
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
12 days ago
APOLLO: SGD-like Memory, AdamW-level Performance
View all activity
Organizations
xiaoxiawu123
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
New activity in
meta-llama/Llama-3.1-405B-Instruct
5 months ago
why "num_key_value_heads": 16,
#14 opened 5 months ago by
xiaoxiawu123