Triangle104 commited on
Commit
53c57c1
1 Parent(s): 56d5821

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +83 -0
README.md CHANGED
@@ -23,6 +23,89 @@ tags:
23
  This model was converted to GGUF format from [`prithivMLmods/Llama-Chat-Summary-3.2-3B`](https://huggingface.co/prithivMLmods/Llama-Chat-Summary-3.2-3B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
24
  Refer to the [original model card](https://huggingface.co/prithivMLmods/Llama-Chat-Summary-3.2-3B) for more details on the model.
25
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  ## Use with llama.cpp
27
  Install llama.cpp through brew (works on Mac and Linux)
28
 
 
23
  This model was converted to GGUF format from [`prithivMLmods/Llama-Chat-Summary-3.2-3B`](https://huggingface.co/prithivMLmods/Llama-Chat-Summary-3.2-3B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
24
  Refer to the [original model card](https://huggingface.co/prithivMLmods/Llama-Chat-Summary-3.2-3B) for more details on the model.
25
 
26
+ ---
27
+ Model details:
28
+ -
29
+
30
+ Llama-Chat-Summary-3.2-3B: Context-Aware Summarization Model
31
+
32
+ Llama-Chat-Summary-3.2-3B is a fine-tuned model designed for generating context-aware summaries of long conversational or text-based inputs. Built on the meta-llama/Llama-3.2-3B-Instruct foundation, this model is optimized to process structured and unstructured conversational data for summarization tasks.
33
+
34
+ Key Features
35
+
36
+ Conversation Summarization:
37
+ Generates concise and meaningful summaries of long chats, discussions, or threads.
38
+
39
+ Context Preservation:
40
+ Maintains critical points, ensuring important details aren't omitted.
41
+
42
+ Text Summarization:
43
+ Works beyond chats; supports summarizing articles, documents, or reports.
44
+
45
+ Fine-Tuned Efficiency:
46
+ Trained with Context-Based-Chat-Summary-Plus dataset for accurate summarization of chat and conversational data.
47
+
48
+ Training Details
49
+
50
+ Base Model: meta-llama/Llama-3.2-3B-Instruct
51
+ Fine-Tuning Dataset: prithivMLmods/Context-Based-Chat-Summary-Plus
52
+ Contains 98.4k structured and unstructured conversations, summaries, and contextual inputs for robust training.
53
+
54
+ Applications
55
+
56
+ Customer Support Logs:
57
+ Summarize chat logs or support tickets for insights and reporting.
58
+
59
+ Meeting Notes:
60
+ Generate concise summaries of meeting transcripts.
61
+
62
+ Document Summarization:
63
+ Create short summaries for lengthy reports or articles.
64
+
65
+ Content Generation Pipelines:
66
+ Automate summarization for newsletters, blogs, or email digests.
67
+
68
+ Context Extraction for AI Systems:
69
+ Preprocess chat or conversation logs for downstream AI applications.
70
+
71
+ Load the Model
72
+
73
+ from transformers import AutoModelForCausalLM, AutoTokenizer
74
+
75
+ model_name = "prithivMLmods/Llama-Chat-Summary-3.2-3B"
76
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
77
+ model = AutoModelForCausalLM.from_pretrained(model_name)
78
+
79
+ Generate a Summary
80
+
81
+ prompt = """
82
+ Summarize the following conversation:
83
+ User1: Hey, I need help with my order. It hasn't arrived yet.
84
+ User2: I'm sorry to hear that. Can you provide your order number?
85
+ User1: Sure, it's 12345.
86
+ User2: Let me check... It seems there was a delay. It should arrive tomorrow.
87
+ User1: Okay, thank you!
88
+ """
89
+
90
+ inputs = tokenizer(prompt, return_tensors="pt")
91
+ outputs = model.generate(**inputs, max_length=100, temperature=0.7)
92
+
93
+ summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
94
+ print("Summary:", summary)
95
+
96
+ Expected Output
97
+
98
+ "The user reported a delayed order (12345), and support confirmed it will arrive tomorrow."
99
+ Deployment Notes
100
+
101
+ Serverless API:
102
+ This model currently lacks sufficient usage for serverless endpoints. Use dedicated endpoints for deployment.
103
+
104
+ Performance Requirements:
105
+ GPU with sufficient memory (recommended for large models).
106
+ Optimization techniques like quantization can improve efficiency for inference.
107
+
108
+ ---
109
  ## Use with llama.cpp
110
  Install llama.cpp through brew (works on Mac and Linux)
111