LLMDataParser

LLMDataParser is a Python library that provides parsers for benchmark datasets used in evaluating Large Language Models (LLMs). It offers a unified interface for loading and parsing datasets like MMLU and GSM8k, simplifying dataset preparation for LLM evaluation.

Features

Unified Interface: Consistent DatasetParser for all datasets.
LLM-Agnostic: Independent of any specific language model.
Easy to Use: Simple methods and built-in Python types.
Extensible: Easily add support for new datasets.

Installation

Option 1: Using pip

You can install the package directly using pip. Even with only a pyproject.toml file, this method works for standard installations.

Clone the Repository:

git clone https://github.com/jeff52415/LLMDataParser.git
cd LLMDataParser

Install Dependencies with pip:
```
pip install .
```

Option 2: Using Poetry

Poetry manages the virtual environment and dependencies automatically, so you don't need to create a conda environment first.

Install Dependencies with Poetry:
```
poetry install
```
Activate the Virtual Environment:
```
poetry shell
```

Available Parsers

MMLUParser: Parses the MMLU dataset.
GSM8kParser: Parses the GSM8k dataset.

Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For questions or support, please open an issue on GitHub or contact your-email@example.com.