Update README.md
Browse files
README.md
CHANGED
@@ -309,7 +309,7 @@ Evaluated using the CantTalkAboutThis Dataset as introduced in the CantTalkAbout
|
|
309 |
|
310 |
### Adversarial Testing and Red Teaming Efforts
|
311 |
|
312 |
-
The Nemotron-4 340B-Instruct model underwent
|
313 |
- [Garak](https://docs.garak.ai/garak), is an automated LLM vulnerability scanner that probes for common weaknesses, including prompt injection and data leakage.
|
314 |
- AEGIS, is a content safety evaluation dataset and LLM based content safety classifier model, that adheres to a broad taxonomy of 13 categories of critical risks in human-LLM interactions.
|
315 |
- Human Content Red Teaming leveraging human interaction and evaluation of the models' responses.
|
|
|
309 |
|
310 |
### Adversarial Testing and Red Teaming Efforts
|
311 |
|
312 |
+
The Nemotron-4 340B-Instruct model underwent safety evaluation including adversarial testing via three distinct methods:
|
313 |
- [Garak](https://docs.garak.ai/garak), is an automated LLM vulnerability scanner that probes for common weaknesses, including prompt injection and data leakage.
|
314 |
- AEGIS, is a content safety evaluation dataset and LLM based content safety classifier model, that adheres to a broad taxonomy of 13 categories of critical risks in human-LLM interactions.
|
315 |
- Human Content Red Teaming leveraging human interaction and evaluation of the models' responses.
|