spacemanidol commited on
Commit
6ae1a8a
1 Parent(s): 4bee6ff

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -23
README.md CHANGED
@@ -2801,7 +2801,6 @@ model-index:
2801
  - type: v_measure
2802
  value: 81.46426354153643
2803
  ---
2804
- ---
2805
  <h1 align="center">Snowflake's Artic-embed-m</h1>
2806
  <h4 align="center">
2807
  <p>
@@ -2837,10 +2836,10 @@ The models are trained by leveraging existing open-source text representation mo
2837
 
2838
  | Name | MTEB Retrieval Score (NDCG @ 10) | Parameters (Millions) | Embedding Dimension |
2839
  | ----------------------------------------------------------------------- | -------------------------------- | --------------------- | ------------------- |
2840
- | [arctic-embed-s](https://huggingface.co/Snowflake/arctic-embed-xs/) | 50.15 | 22 | 384 |
2841
  | [arctic-embed-s](https://huggingface.co/Snowflake/arctic-embed-s/) | 51.98 | 33 | 384 |
2842
- | [arctic-embed-s](https://huggingface.co/Snowflake/arctic-embed-m/) | 54.90 | 110 | 768 |
2843
- | [arctic-embed-s](https://huggingface.co/Snowflake/arctic-embed-m-long/) | 54.83 | 137 | 768 |
2844
  | [arctic-embed-s](https://huggingface.co/Snowflake/arctic-embed-l/) | 55.98 | 335 | 1024 |
2845
 
2846
 
@@ -2849,32 +2848,32 @@ Aside from being great open-source models, the largest model, [arctic-embed-l](h
2849
 
2850
  | Model Name | MTEB Retrieval Score (NDCG @ 10) |
2851
  | ------------------------------------------------------------------ | -------------------------------- |
2852
- | [arctic-embed-s](https://huggingface.co/Snowflake/arctic-embed-l/) | 55.98 |
2853
  | Google-gecko-text-embedding | 55.7 |
2854
  | text-embedding-3-large | 55.44 |
2855
  | Cohere-embed-english-v3.0 | 55.00 |
2856
  | bge-large-en-v1.5 | 54.29 |
2857
 
2858
 
2859
- ### [arctic-embed-xs](https://huggingface.co/Snowflake/arctic-embed-xs/)
2860
 
2861
 
2862
- This tiny model packs quite the punch based on the [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) model. With only 22m parameters and 384 dimensions, this model should meet even the strictest latency/TCO budgets. Despite its size, its retrieval accuracy is closer to that of models with 100m paramers.
2863
 
2864
 
2865
  | Model Name | MTEB Retrieval Score (NDCG @ 10) |
2866
  | ------------------------------------------------------------------- | -------------------------------- |
2867
- | [arctic-embed-s](https://huggingface.co/Snowflake/arctic-embed-xs/) | 50.15 |
2868
  | GIST-all-MiniLM-L6-v2 | 45.12 |
2869
  | gte-tiny | 44.92 |
2870
  | all-MiniLM-L6-v2 | 41.95 |
2871
  | bge-micro-v2 | 42.56 |
2872
 
2873
 
2874
- ### Arctic-embed-s
2875
 
2876
 
2877
- Based on the [all-MiniLM-L12-v2](https://huggingface.co/intfloat/e5-base-unsupervised) model, this small model does not trade off retrieval accuracy for its small size. With only 33m parameters and 384 dimensions, this model should easily allow scaling to large datasets.
2878
 
2879
 
2880
  | Model Name | MTEB Retrieval Score (NDCG @ 10) |
@@ -2886,34 +2885,33 @@ Based on the [all-MiniLM-L12-v2](https://huggingface.co/intfloat/e5-base-unsuper
2886
  | e5-small-v2 | 49.04 |
2887
 
2888
 
2889
- ### [arctic-embed-m-long](https://huggingface.co/Snowflake/arctic-embed-m-long/)
2890
 
2891
 
2892
- Based on the [nomic-embed-text-v1](https://huggingface.co/nomic-ai/nomic-embed-text-v1) model, this long-context variant of our medium-sized model is perfect for workloads that can be constrained by the regular 512 token context of our other models. Without the use of RPE, this model supports up to 2048 tokens. With RPE, it can scale to 8192!
2893
 
2894
 
2895
  | Model Name | MTEB Retrieval Score (NDCG @ 10) |
2896
  | ------------------------------------------------------------------ | -------------------------------- |
2897
- | [arctic-embed-s](https://huggingface.co/Snowflake/arctic-embed-m/) | 54.90 |
2898
  | bge-base-en-v1.5 | 53.25 |
2899
- | nomic-embed-text-v1.5 | 53.01 |
2900
  | GIST-Embedding-v0 | 52.31 |
2901
  | gte-base | 52.31 |
2902
 
2903
-
2904
- ### Arctic-embed-m
2905
 
2906
 
2907
- Based on the [intfloat/e5-base-unsupervised](https://huggingface.co/intfloat/e5-base-unsupervised) model, this medium model is the workhorse that provides the best retrieval performance without slowing down inference.
2908
 
2909
 
2910
  | Model Name | MTEB Retrieval Score (NDCG @ 10) |
2911
  | ------------------------------------------------------------------ | -------------------------------- |
2912
- | [arctic-embed-s](https://huggingface.co/Snowflake/arctic-embed-m/) | 54.90 |
2913
- | bge-base-en-v1.5 | 53.25 |
2914
- | nomic-embed-text-v1.5 | 53.25 |
2915
- | GIST-Embedding-v0 | 52.31 |
2916
- | gte-base | 52.31 |
2917
 
2918
 
2919
  ### [arctic-embed-l](https://huggingface.co/Snowflake/arctic-embed-l/)
@@ -2924,7 +2922,7 @@ Based on the [intfloat/e5-large-unsupervised](https://huggingface.co/intfloat/e5
2924
 
2925
  | Model Name | MTEB Retrieval Score (NDCG @ 10) |
2926
  | ------------------------------------------------------------------ | -------------------------------- |
2927
- | [arctic-embed-s](https://huggingface.co/Snowflake/arctic-embed-l/) | 55.98 |
2928
  | UAE-Large-V1 | 54.66 |
2929
  | bge-large-en-v1.5 | 54.29 |
2930
  | mxbai-embed-large-v1 | 54.39 |
 
2801
  - type: v_measure
2802
  value: 81.46426354153643
2803
  ---
 
2804
  <h1 align="center">Snowflake's Artic-embed-m</h1>
2805
  <h4 align="center">
2806
  <p>
 
2836
 
2837
  | Name | MTEB Retrieval Score (NDCG @ 10) | Parameters (Millions) | Embedding Dimension |
2838
  | ----------------------------------------------------------------------- | -------------------------------- | --------------------- | ------------------- |
2839
+ | [arctic-embed-xs](https://huggingface.co/Snowflake/arctic-embed-xs/) | 50.15 | 22 | 384 |
2840
  | [arctic-embed-s](https://huggingface.co/Snowflake/arctic-embed-s/) | 51.98 | 33 | 384 |
2841
+ | [arctic-embed-m](https://huggingface.co/Snowflake/arctic-embed-m/) | 54.90 | 110 | 768 |
2842
+ | [arctic-embed-m-long](https://huggingface.co/Snowflake/arctic-embed-m-long/) | 54.83 | 137 | 768 |
2843
  | [arctic-embed-s](https://huggingface.co/Snowflake/arctic-embed-l/) | 55.98 | 335 | 1024 |
2844
 
2845
 
 
2848
 
2849
  | Model Name | MTEB Retrieval Score (NDCG @ 10) |
2850
  | ------------------------------------------------------------------ | -------------------------------- |
2851
+ | [arctic-embed-l](https://huggingface.co/Snowflake/arctic-embed-l/) | 55.98 |
2852
  | Google-gecko-text-embedding | 55.7 |
2853
  | text-embedding-3-large | 55.44 |
2854
  | Cohere-embed-english-v3.0 | 55.00 |
2855
  | bge-large-en-v1.5 | 54.29 |
2856
 
2857
 
2858
+ ### [Arctic-embed-xs](https://huggingface.co/Snowflake/arctic-embed-xs)
2859
 
2860
 
2861
+ This tiny model packs quite the punch. Based on the [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) model with only 22m parameters and 384 dimensions, this model should meet even the strictest latency/TCO budgets. Despite its size, its retrieval accuracy is closer to that of models with 100m paramers.
2862
 
2863
 
2864
  | Model Name | MTEB Retrieval Score (NDCG @ 10) |
2865
  | ------------------------------------------------------------------- | -------------------------------- |
2866
+ | [arctic-embed-xs](https://huggingface.co/Snowflake/arctic-embed-xs/) | 50.15 |
2867
  | GIST-all-MiniLM-L6-v2 | 45.12 |
2868
  | gte-tiny | 44.92 |
2869
  | all-MiniLM-L6-v2 | 41.95 |
2870
  | bge-micro-v2 | 42.56 |
2871
 
2872
 
2873
+ ### [Arctic-embed-s](https://huggingface.co/Snowflake/arctic-embed-s)
2874
 
2875
 
2876
+ Based on the [all-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L12-v2) model, this small model does not trade off retrieval accuracy for its small size. With only 33m parameters and 384 dimensions, this model should easily allow scaling to large datasets.
2877
 
2878
 
2879
  | Model Name | MTEB Retrieval Score (NDCG @ 10) |
 
2885
  | e5-small-v2 | 49.04 |
2886
 
2887
 
2888
+ ### [Arctic-embed-m](https://huggingface.co/Snowflake/arctic-embed-m/)
2889
 
2890
 
2891
+ Based on the [intfloat/e5-base-unsupervised](https://huggingface.co/intfloat/e5-base-unsupervised) model, this medium model is the workhorse that provides the best retrieval performance without slowing down inference.
2892
 
2893
 
2894
  | Model Name | MTEB Retrieval Score (NDCG @ 10) |
2895
  | ------------------------------------------------------------------ | -------------------------------- |
2896
+ | [arctic-embed-m](https://huggingface.co/Snowflake/arctic-embed-m/) | 54.90 |
2897
  | bge-base-en-v1.5 | 53.25 |
2898
+ | nomic-embed-text-v1.5 | 53.25 |
2899
  | GIST-Embedding-v0 | 52.31 |
2900
  | gte-base | 52.31 |
2901
 
2902
+ ### [arctic-embed-m-long](https://huggingface.co/Snowflake/arctic-embed-m-long/)
 
2903
 
2904
 
2905
+ Based on the [nomic-embed-text-v1](https://huggingface.co/nomic-ai/nomic-embed-text-v1) model, this long-context variant of our medium-sized model is perfect for workloads that can be constrained by the regular 512 token context of our other models. Without the use of RPE, this model supports up to 2048 tokens. With RPE, it can scale to 8192!
2906
 
2907
 
2908
  | Model Name | MTEB Retrieval Score (NDCG @ 10) |
2909
  | ------------------------------------------------------------------ | -------------------------------- |
2910
+ | [arctic-embed-m-long](https://huggingface.co/Snowflake/arctic-embed-m-long/) | 54.83 |
2911
+ | nomic-embed-text-v1.5 | 53.01 |
2912
+ | nomic-embed-text-v1 | 52.81 |
2913
+
2914
+
2915
 
2916
 
2917
  ### [arctic-embed-l](https://huggingface.co/Snowflake/arctic-embed-l/)
 
2922
 
2923
  | Model Name | MTEB Retrieval Score (NDCG @ 10) |
2924
  | ------------------------------------------------------------------ | -------------------------------- |
2925
+ | [arctic-embed-l](https://huggingface.co/Snowflake/arctic-embed-l/) | 55.98 |
2926
  | UAE-Large-V1 | 54.66 |
2927
  | bge-large-en-v1.5 | 54.29 |
2928
  | mxbai-embed-large-v1 | 54.39 |