data-leaderboard

Sleeping

Weyaxi commited on Feb 28, 2024

Commit

6ad2cd0

verified ·

1 Parent(s): 43cdccf

fixes

Files changed (1) hide show

app.py CHANGED Viewed

@@ -40,12 +40,14 @@ df_author_copy = df.copy()
 df["author"] = df["author"].apply(lambda x: clickable(x))
 df['Total Usage'] = df[['models', 'datasets', 'spaces']].sum(axis=1)
-df = df[['Serial Number', "author", "Total Usage", "models", "datasets", "spaces"]]
 df = df.sort_values(by='Total Usage', ascending=False)
 naturalsize_columns = ['Total Usage', 'models', 'datasets', 'spaces']
 df[naturalsize_columns] = df[naturalsize_columns].applymap(naturalsize)
 df['Serial Number'] = [i for i in range(1, len(df)+1)]
 df = apply_headers(df, ["🔢 Serial Number", "👤 Author", "⚡️ Total Usage", "🏛️ Models", "📊 Datasets", "🚀 Spaces"])
@@ -65,6 +67,17 @@ These 125k authors have been selected based on their [🤗 Huggingface Leaderboa
 - 🚀 Top 50k authors in the spaces category
 """
 # Write note maybe?

 df["author"] = df["author"].apply(lambda x: clickable(x))
 df['Total Usage'] = df[['models', 'datasets', 'spaces']].sum(axis=1)
 df = df.sort_values(by='Total Usage', ascending=False)
+sum_all_author = naturalsize(sum(merged_df['models'].tolist()+merged_df['datasets'].tolist()+merged_df['spaces'].tolist()))
 naturalsize_columns = ['Total Usage', 'models', 'datasets', 'spaces']
 df[naturalsize_columns] = df[naturalsize_columns].applymap(naturalsize)
+df = df[['Serial Number', "author", "Total Usage", "models", "datasets", "spaces"]]
 df['Serial Number'] = [i for i in range(1, len(df)+1)]
 df = apply_headers(df, ["🔢 Serial Number", "👤 Author", "⚡️ Total Usage", "🏛️ Models", "📊 Datasets", "🚀 Spaces"])
 - 🚀 Top 50k authors in the spaces category
+## 📒 Notes
+Note that these numbers may not be entirely accurate due to the following reasons:
+- I only calculated the data usage from the main branch and did not include deleted files that cannot be directly seen.
+- There may be large datasets/models to which I don't have access (either private or gated).
+# 📶 Total Data Usage From All Authors
+According to this leaderboard, there is a total of {sum_all_author} of data on this platform.
 """
 # Write note maybe?