pszemraj
/

bart-large-summary-map-reduce

Text2Text Generation

Inference Endpoints

Model card Files Files and versions Community

pszemraj commited on Nov 5, 2024

Commit

d73faa3

·

verified ·

1 Parent(s): a75dd0f

Update README.md

Files changed (1) hide show

README.md +4 -0

README.md CHANGED Viewed

@@ -31,6 +31,9 @@ It achieves the following results on the evaluation set:
 ## usage
 an example of aggregating summaries from chunks of a long document:
 ```py
@@ -66,6 +69,7 @@ res = pipe(
 print(res[0]["generated_text"])
 ```
 ## Training procedure
 ### Training hyperparameters

 ## usage
+> [!TIP]
+> BART supports several speedups for inference on GPU, including [flash-attention2](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2) and [torch SDPA](https://huggingface.co/docs/transformers/perf_infer_gpu_one#pytorch-scaled-dot-product-attention)
 an example of aggregating summaries from chunks of a long document:
 ```py
 print(res[0]["generated_text"])
 ```
 ## Training procedure
 ### Training hyperparameters