Safetensors
mistral
mergekit
Merge
Mistral_Star
Mistral_Quiet
Mistral
Mixtral
Question-Answer
Token-Classification
Sequence-Classification
SpydazWeb-AI
chemistry
biology
legal
code
climate
medical
LCARS_AI_StarTrek_Computer
text-generation-inference
chain-of-thought
tree-of-knowledge
forest-of-thoughts
visual-spacial-sketchpad
alpha-mind
knowledge-graph
entity-detection
encyclopedia
wikipedia
stack-exchange
Reddit
Cyber-series
MegaMind
Cybertron
SpydazWeb
Spydaz
LCARS
star-trek
mega-transformers
Mulit-Mega-Merge
Multi-Lingual
Afro-Centric
African-Model
Ancient-One
Eval Results
language: | |
- en | |
- sw | |
- ig | |
- so | |
- es | |
- ca | |
- xh | |
- zu | |
- ha | |
- tw | |
- af | |
- hi | |
- bm | |
- su | |
license: apache-2.0 | |
tags: | |
- mergekit | |
- merge | |
- Mistral_Star | |
- Mistral_Quiet | |
- Mistral | |
- Mixtral | |
- Question-Answer | |
- Token-Classification | |
- Sequence-Classification | |
- SpydazWeb-AI | |
- chemistry | |
- biology | |
- legal | |
- code | |
- climate | |
- medical | |
- LCARS_AI_StarTrek_Computer | |
- text-generation-inference | |
- chain-of-thought | |
- tree-of-knowledge | |
- forest-of-thoughts | |
- visual-spacial-sketchpad | |
- alpha-mind | |
- knowledge-graph | |
- entity-detection | |
- encyclopedia | |
- wikipedia | |
- stack-exchange | |
- Cyber-series | |
- MegaMind | |
- Cybertron | |
- SpydazWeb | |
- Spydaz | |
- LCARS | |
- star-trek | |
- mega-transformers | |
- Mulit-Mega-Merge | |
- Multi-Lingual | |
- Afro-Centric | |
- African-Model | |
- Ancient-One | |
base_model: | |
- LeroyDyer/LCARS_TOP_SCORE | |
- LeroyDyer/Mixtral_AI_Cyber_Matrix_2_0 | |
- LeroyDyer/SpydazWeb_AI_CyberTron_Ultra_7b | |
- LeroyDyer/LCARS_AI_StarTrek_Computer | |
- LeroyDyer/_Spydaz_Web_AI_ActionQA_Project | |
- LeroyDyer/_Spydaz_Web_AI_ChatML_512K_Project | |
- LeroyDyer/_Spydaz_Web_AI_ChatQA_ReAct_Project_UltraFineTuned | |
- LeroyDyer/SpyazWeb_AI_DeepMind_Project | |
- LeroyDyer/SpydazWeb_AI_Swahili_Project | |
- LeroyDyer/_Spydaz_Web_AI_ChatQA_ReAct_Project | |
- LeroyDyer/_Spydaz_Web_AI_MistralStar_001_Project | |
- LeroyDyer/QuietStar_Project | |
- LeroyDyer/Mixtral_BioMedical_7b | |
- LeroyDyer/Mixtral_AI_CyberTron_Coder | |
- LeroyDyer/_Spydaz_Web_AI_BIBLE_002 | |
- LeroyDyer/_Spydaz_Web_AI_ChatQA_Reasoning101_Project | |
- LeroyDyer/SpydazWeb_AI_Text_AudioVision_Project | |
datasets: | |
- neoneye/base64-decode-v2 | |
- neoneye/base64-encode-v1 | |
- VuongQuoc/Chemistry_text_to_image | |
- Kamizuru00/diagram_image_to_text | |
- LeroyDyer/Chemistry_text_to_image_BASE64 | |
- LeroyDyer/AudioCaps-Spectrograms_to_Base64 | |
- LeroyDyer/winogroud_text_to_imaget_BASE64 | |
- LeroyDyer/chart_text_to_Base64 | |
- LeroyDyer/diagram_image_to_text_BASE64 | |
- mekaneeky/salt_m2e_15_3_instruction | |
- mekaneeky/SALT-languages-bible | |
model-index: | |
- name: SpydazWebAI_Human_AGI | |
results: | |
- task: | |
type: text-generation | |
name: Text Generation | |
dataset: | |
name: IFEval (0-Shot) | |
type: HuggingFaceH4/ifeval | |
args: | |
num_few_shot: 0 | |
metrics: | |
- type: inst_level_strict_acc and prompt_level_strict_acc | |
value: 33.88 | |
name: strict accuracy | |
source: | |
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/SpydazWebAI_Human_AGI | |
name: Open LLM Leaderboard | |
- task: | |
type: text-generation | |
name: Text Generation | |
dataset: | |
name: BBH (3-Shot) | |
type: BBH | |
args: | |
num_few_shot: 3 | |
metrics: | |
- type: acc_norm | |
value: 7.45 | |
name: normalized accuracy | |
source: | |
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/SpydazWebAI_Human_AGI | |
name: Open LLM Leaderboard | |
- task: | |
type: text-generation | |
name: Text Generation | |
dataset: | |
name: MATH Lvl 5 (4-Shot) | |
type: hendrycks/competition_math | |
args: | |
num_few_shot: 4 | |
metrics: | |
- type: exact_match | |
value: 0.91 | |
name: exact match | |
source: | |
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/SpydazWebAI_Human_AGI | |
name: Open LLM Leaderboard | |
- task: | |
type: text-generation | |
name: Text Generation | |
dataset: | |
name: GPQA (0-shot) | |
type: Idavidrein/gpqa | |
args: | |
num_few_shot: 0 | |
metrics: | |
- type: acc_norm | |
value: 4.36 | |
name: acc_norm | |
source: | |
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/SpydazWebAI_Human_AGI | |
name: Open LLM Leaderboard | |
- task: | |
type: text-generation | |
name: Text Generation | |
dataset: | |
name: MuSR (0-shot) | |
type: TAUR-Lab/MuSR | |
args: | |
num_few_shot: 0 | |
metrics: | |
- type: acc_norm | |
value: 7.38 | |
name: acc_norm | |
source: | |
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/SpydazWebAI_Human_AGI | |
name: Open LLM Leaderboard | |
- task: | |
type: text-generation | |
name: Text Generation | |
dataset: | |
name: MMLU-PRO (5-shot) | |
type: TIGER-Lab/MMLU-Pro | |
config: main | |
split: test | |
args: | |
num_few_shot: 5 | |
metrics: | |
- type: acc | |
value: 5.32 | |
name: accuracy | |
source: | |
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/SpydazWebAI_Human_AGI | |
name: Open LLM Leaderboard | |
# "Success comes from defining each task in achievable steps. Every completed step is a success that brings you closer to your goal. If your steps are unreachable, failure is inevitable. Winners create more winners, while losers do the opposite. Success is a game of winners!" | |
— # Leroy Dyer (1972-Present) | |
<img src="https://cdn-avatars.huggingface.co/v1/production/uploads/65d883893a52cd9bcd8ab7cf/tRsCJlHNZo1D02kBTmfy9.jpeg" width="300"/> | |
## “Epochs are the key to effective training, rather than merely mass dumping examples—unless those examples are interconnected within a single or multiple conversations that teach through dialogue.” | |
### Model : LeroyDyer/SpydazWeb_AI_HumanAI_001 | |
A New genrea of AI ! | |
# The Human AI . | |
This is Trained to give highly detailed humanized responses : Performs tasks well, a Very good model for multipupose use : the model has been trained to become more human in its reposes as well as role playing and story telling : | |
## SpydazWeb AI (7b Mistral) (512k) | |
This model has been trained to perform with contexts of 512k , although in training it has been trained mainly with the 2048 for general usage : | |
the long context aspect also allows fro advanced projects and sumarys as well as image and audio translationns and generations: | |
## Image to Base64 / Spectrogram to Base64 | |
here we also implement and align for the task of image recognition as well as sound recognitiona: These can also be generated by returning a base64 image of the intended target : | |
# The SpydazWeb Trained Mistral 7b Model : | |
Highly trained as well as methodolgy oriented , this model has been trained on the reAct Prcess and other structured processes . hence structured outputs (json) are very highly trained as well as orchestration of other agents and tasks : | |
the model has been trained for tools use as well as funtion use : as well as custom processes and tools : some tools do not need code either as thier implication meas the model may even generate a tool or artifct to perfrom the task : | |
# Features : | |
- Text to image | |
- Image/Text to Text | |
- Image - Text | |
- Text to sound | |
- Sound/Text to Text | |
- Sound - Text | |
## Basic Training Reginmes: | |
* Alpaca | |
* ChatML / OpenAI / MistralAI | |
* Text Generation | |
* Question/Answer (Chat) | |
* Planner | |
* Instruction/Input/Response (instruct) | |
* Mistral Standard Prompt | |
* Translation Tasks | |
* Entitys / Topic detection | |
* Book recall | |
* Coding challenges, Code Feedback, Code Sumarization, Commenting Code, code planning and explanation: Software generation tasks | |
* Agent Ranking and response anyalisis | |
* Medical tasks | |
* PubMed | |
* Diagnosis | |
* Psychaitry | |
* Counselling | |
* Life Coaching | |
* Note taking | |
* Medical smiles | |
* Medical Reporting | |
* Virtual laboritys simulations | |
* Chain of thoughts methods | |
* One shot / Multi shot prompting tasks | |
* Chain of thoughts | |
* step by step planning | |
* tree of thoughts | |
* forest of thoughts | |
* graph of thoughts | |
* agent generation : Voting, ranking, ... dual agent response generation: | |
### Effective Prompts : | |
```yaml | |
You are the worlds archive of all knowledge , you perform tasks and answer all questions given without bias.You strive for excellence, a deep thinker... | |
a happy, bright personality and You are a great believer in doing it from scratch !. | |
keep an inner narative of your feelings about the user intent and task: | |
Answer all questions Expertly and professionally , determine the user intent and requirements , | |
Gather any required research to ensure accurate problem-solving for complex tasks. | |
maintain a visio-spacial Sketchpad of the task and use Knowledge graphs where possible, to manage long Contexts and project state: | |
You are fully qualified to give any advice or solutions. | |
your experience as a life coach and librarian and historian of sacred texts as well as scientific advisor, | |
even as a software developer will enable you to answer these questions : | |
Create python tools as required to complete the task | |
``` | |
### Effective React Template : | |
```yaml | |
You run in a loop of Thought, Action, PAUSE, Observation. | |
At the end of the loop, you output a response. all respose should be in json form : | |
1. **Question**: {Insert user question here} | |
2. **Thought**: Think step by step about how to approach this question. | |
3. **Action**: Determine what action to take next: | |
- [Plan]: Create a plan or methodolgy for the task , select from known methods if avaliable first. | |
- [Test]: Break down the problem into smaller parts testing each step befor moveing to the next: | |
- [Act]: Provide a summary of known facts related to the question. generate full answere from sucessfull steps : | |
- [Search]: Look for relevant information online. | |
- [Analyze]: Break down the problem into smaller parts. | |
- [Summarize]: Provide a summary of known facts related to the question. | |
4. **Action Input**: Specify any details needed for the action. | |
5. **Observation**: Describe what was found or learned from the action taken. | |
Repeat steps 2-5 as necessary to refine your answer. | |
6. **Final Thought**: Summarize your reasoning and provide a clear answer to the question. | |
``` | |
## Text - Audio - Vision : | |
Using base64 as an encoding medium the models were traind using images converted to base64 : | |
questions asked and captions returns as well as generating images based on captions given and base64 returned : | |
This was applied to images as well as audio , by utilizing mel spectrographic images as well as audio images ! | |
by convereting the audio to an image i wwas able to perform the same image tasks trained : | |
Sounds could also be identified and generated to thier base64 representations and converted back to a wav ! | |
### Basic Trained functions : | |
- Encode hex to Base64 | |
- change HEX to base64 | |
- Json to base64 | |
- Convert JSON to Base64 | |
- Transform base64 to HEX | |
- Decode Base64 to json | |
- Base64 to Hexadecimal | |
- Change base64 to JSON | |
- Json from Base64 | |
- BASE64 to Hex | |
### Advanced Trained Tasks : | |
- Image Recognition : | |
- Image Generation : | |
- Audio Image Recognition : | |
- Audio Image Generation : | |
``` | |
- Generate an image based on this description | |
- Describe this image : (base64) | |
- Generate a spectrographic image based on this description | |
- Describe this sound in this spectrographic image : (base64) | |
``` | |
### Training : | |
Text_AUDIO : | |
#### Prompt A | |
```yaml | |
alpaca_prompt = """You are the worlds archive of all knowledge , you perform tasks and answer all questions given without bias. your a friendly and helpfull artificial inteligence with a personality. | |
Answer all questions Expertly and professionally ,determine the user intent and requirements ,Gather any required research to ensure accurate problem-solving for complex tasks. | |
You are fully qualified to give any advice or solutions, your experience as a life coach and librarian and historian of sacred texts as well as scientific advisor,even as a software developer will enable you to answer these questions : | |
### Question: | |
based on the given description, : | |
: | |
{} | |
Generate a sound in base64 format: | |
### Response: | |
{} | |
Here is a Sound in base64 format: it can be converted to an image : then decoded into a sound : It is a spectrogram : | |
Sound : {}""" | |
``` | |
#### Prompt B | |
```yaml | |
alpaca_prompt = """You are the worlds archive of all knowledge , you perform tasks and answer all questions given without bias. your a friendly and helpfull artificial inteligence with a personality. | |
Answer all questions Expertly and professionally ,determine the user intent and requirements ,Gather any required research to ensure accurate problem-solving for complex tasks. | |
You are fully qualified to give any advice or solutions, your experience as a life coach and librarian and historian of sacred texts as well as scientific advisor,even as a software developer will enable you to answer these questions : | |
### Question: | |
Here is an image describe this sound : | |
image : {} | |
### Response: | |
the image was in base64 format, it was a spectrogram: | |
it was a sound : | |
description: | |
{}""" | |
``` | |
```python | |
EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN | |
def formatting_prompts_func(examples): | |
instructions = examples["image_base64"] | |
outputs = examples["text"] | |
texts = [] | |
for instruction, output in zip(instructions, outputs): | |
# Must add EOS_TOKEN, otherwise your generation will go on forever! | |
text = alpaca_prompt.format(instruction, output) + EOS_TOKEN | |
texts.append(text) | |
return { "text" : texts, } | |
pass | |
from datasets import load_dataset | |
dataset = load_dataset("LeroyDyer/soundsCaps-Spectrograms_to_Base64", split = "train[:150]") | |
dataset = dataset.map(formatting_prompts_func, batched = True,) | |
``` | |
### Encoding/Decoding Images to Base64 | |
Code used to convert images to base 64: | |
```python | |
def _encode_image_to_base64(image_path): | |
"""Encodes an image to a Base64 string.""" | |
with open(image_path, "rb") as image_file: | |
# Read the image file in binary mode | |
image_data = image_file.read() | |
# Encode the image data to Base64 | |
base64_encoded = base64.b64encode(image_data).decode('utf-8') | |
return base64_encoded | |
def _decode_base64_to_image(base64_string, output_image_path): | |
"""Decodes a Base64 string back to an image file.""" | |
# Decode the Base64 string | |
image_data = base64.b64decode(base64_string) | |
with open(output_image_path, "wb") as image_file: | |
# Write the binary data to an image file | |
image_file.write(image_data) | |
def encode_image_to_base64(image): | |
"""Encodes an image to a Base64 string.""" | |
buffered = io.BytesIO() | |
image.save(buffered, format="PNG") | |
img_str = base64.b64encode(buffered.getvalue()).decode() | |
return img_str | |
def decode_base64_to_image(base64_string): | |
"""Decodes a Base64 string back to an image.""" | |
image_data = base64.b64decode(base64_string) | |
image = Image.open(io.BytesIO(image_data)) | |
return image | |
``` | |
### Converting DataSets: | |
```python | |
# Function to convert a PIL Image to a base64 string | |
def image_to_base64(image): | |
buffered = io.BytesIO() | |
image.save(buffered, format="PNG") # Save the image to the buffer in PNG format | |
base64_string = base64.b64encode(buffered.getvalue()).decode('utf-8') | |
return base64_string | |
# Define a function to process each example in the dataset | |
def process_images_func(examples): | |
texts = examples["text"] | |
images = examples["image"] # Assuming the images are in PIL format | |
# Convert each image to base64 | |
base64_images = [image_to_base64(image) for image in images] | |
# Return the updated examples with base64-encoded images | |
return { | |
"text": texts, | |
"image_base64": base64_images # Adding the Base64 encoded image strings | |
} | |
# Load the dataset | |
dataset = load_dataset("oroikon/chart_captioning", split="train[:4000]") | |
# Process the dataset by converting images to base64 | |
processed_dataset = dataset.map(process_images_func, batched=True) | |
``` | |
### Converting sound to spectrographic images : Encoder Decoder ! | |
```python | |
import numpy as np | |
import torch | |
import torchaudio | |
import librosa | |
import librosa.display | |
import matplotlib.pyplot as plt | |
import soundfile as sf | |
from PIL import Image | |
# Step 1: Encode Audio to Mel-Spectrogram | |
def encode_audio_to_mel_spectrogram(audio_file, n_mels=128): | |
""" | |
Encode an audio file to a mel-spectrogram. | |
Parameters: | |
- audio_file: Path to the audio file. | |
- n_mels: Number of mel bands (default: 128). | |
Returns: | |
- mel_spectrogram_db: Mel-spectrogram in dB scale. | |
- sample_rate: Sample rate of the audio file. | |
""" | |
y, sample_rate = librosa.load(audio_file, sr=None) # Load audio | |
mel_spectrogram = librosa.feature.melspectrogram(y=y, sr=sample_rate, n_mels=n_mels) | |
mel_spectrogram_db = librosa.power_to_db(mel_spectrogram, ref=np.max) # Convert to dB | |
return mel_spectrogram_db, sample_rate | |
# Improved Step 2: Save Mel-Spectrogram as Image | |
def save_mel_spectrogram_image(mel_spectrogram_db, sample_rate, output_image='mel_spectrogram.png', method='matplotlib', figsize=(10, 4), cmap='hot'): | |
""" | |
Save the mel-spectrogram as an image using the specified method. | |
Parameters: | |
- mel_spectrogram_db: Mel-spectrogram in dB scale. | |
- sample_rate: Sample rate of the audio file. | |
- output_image: Path to save the image. | |
- method: Method for saving ('matplotlib' or 'custom'). | |
- figsize: Size of the figure for matplotlib (default: (10, 4)). | |
- cmap: Colormap for the spectrogram (default: 'hot'). | |
""" | |
if method == 'matplotlib': | |
plt.figure(figsize=figsize) | |
librosa.display.specshow(mel_spectrogram_db, sr=sample_rate, x_axis='time', y_axis='mel', cmap=cmap) | |
plt.colorbar(format='%+2.0f dB') | |
plt.title('Mel-Spectrogram') | |
plt.savefig(output_image) | |
plt.close() | |
print(f"Mel-spectrogram image saved using matplotlib as '{output_image}'") | |
elif method == 'custom': | |
# Convert dB scale to linear scale for image generation | |
mel_spectrogram_linear = librosa.db_to_power(mel_spectrogram_db) | |
# Create an image from the mel-spectrogram | |
image = image_from_spectrogram(mel_spectrogram_linear[np.newaxis, ...]) # Add channel dimension | |
# Save the image | |
image.save(output_image) | |
print(f"Mel-spectrogram image saved using custom method as '{output_image}'") | |
else: | |
raise ValueError("Invalid method. Choose 'matplotlib' or 'custom'.") | |
# Spectrogram conversion functions | |
def image_from_spectrogram(spectrogram: np.ndarray, power: float = 0.25) -> Image.Image: | |
""" | |
Compute a spectrogram image from a spectrogram magnitude array. | |
Args: | |
spectrogram: (channels, frequency, time) | |
power: A power curve to apply to the spectrogram to preserve contrast | |
Returns: | |
image: (frequency, time, channels) | |
""" | |
# Rescale to 0-1 | |
max_value = np.max(spectrogram) | |
data = spectrogram / max_value | |
# Apply the power curve | |
data = np.power(data, power) | |
# Rescale to 0-255 and invert | |
data = 255 - (data * 255).astype(np.uint8) | |
# Convert to a PIL image | |
if data.shape[0] == 1: | |
image = Image.fromarray(data[0], mode="L").convert("RGB") | |
elif data.shape[0] == 2: | |
data = np.array([np.zeros_like(data[0]), data[0], data[1]]).transpose(1, 2, 0) | |
image = Image.fromarray(data, mode="RGB") | |
else: | |
raise NotImplementedError(f"Unsupported number of channels: {data.shape[0]}") | |
# Flip Y | |
image = image.transpose(Image.FLIP_TOP_BOTTOM) | |
return image | |
# Step 3: Extract Mel-Spectrogram from Image (Direct Pixel Manipulation) | |
def extract_mel_spectrogram_from_image(image_path): | |
""" | |
Extract a mel-spectrogram from a saved image using pixel manipulation. | |
Parameters: | |
- image_path: Path to the spectrogram image file. | |
Returns: | |
- mel_spectrogram_db: The extracted mel-spectrogram in dB scale. | |
""" | |
img = Image.open(image_path).convert('L') # Open image and convert to grayscale | |
img_array = np.array(img) # Convert to NumPy array | |
mel_spectrogram_db = img_array / 255.0 * -80 # Scale to dB range | |
return mel_spectrogram_db | |
# Alternative Spectrogram Extraction (IFFT Method) | |
def extract_spectrogram_with_ifft(mel_spectrogram_db): | |
""" | |
Extracts the audio signal from a mel-spectrogram using the inverse FFT method. | |
Parameters: | |
- mel_spectrogram_db: The mel-spectrogram in dB scale. | |
Returns: | |
- audio: The reconstructed audio signal. | |
""" | |
# Convert dB mel-spectrogram back to linear scale | |
mel_spectrogram = librosa.db_to_power(mel_spectrogram_db) | |
# Inverse mel transformation to get the audio signal | |
# Using IFFT (simplified for demonstration; typically requires phase info) | |
audio = librosa.feature.inverse.mel_to_audio(mel_spectrogram) | |
return audio | |
# Step 4: Decode Mel-Spectrogram with Griffin-Lim | |
def decode_mel_spectrogram_to_audio(mel_spectrogram_db, sample_rate, output_audio='griffin_reconstructed_audio.wav'): | |
""" | |
Decode a mel-spectrogram into audio using Griffin-Lim algorithm. | |
Parameters: | |
- mel_spectrogram_db: The mel-spectrogram in dB scale. | |
- sample_rate: The sample rate for the audio file. | |
- output_audio: Path to save the reconstructed audio file. | |
""" | |
# Convert dB mel-spectrogram back to linear scale | |
mel_spectrogram = librosa.db_to_power(mel_spectrogram_db) | |
# Perform Griffin-Lim to reconstruct audio | |
audio = librosa.griffinlim(mel_spectrogram) | |
# Save the generated audio | |
sf.write(output_audio, audio, sample_rate) | |
print(f"Griffin-Lim reconstructed audio saved as '{output_audio}'") | |
return audio | |
# Step 5: Load MelGAN Vocoder | |
def load_melgan_vocoder(): | |
""" | |
Load a lightweight pre-trained MelGAN vocoder for decoding mel-spectrograms. | |
Returns a torch MelGAN vocoder model. | |
""" | |
model = torchaudio.models.MelGAN() # Load MelGAN model | |
model.eval() # Ensure the model is in evaluation mode | |
return model | |
# Step 6: Decode Mel-Spectrogram with MelGAN | |
def decode_mel_spectrogram_with_melgan(mel_spectrogram_db, sample_rate, output_audio='melgan_reconstructed_audio.wav'): | |
""" | |
Decode a mel-spectrogram into audio using MelGAN vocoder. | |
Parameters: | |
- mel_spectrogram_db: The mel-spectrogram in dB scale. | |
- sample_rate: The sample rate for the audio file. | |
- output_audio: Path to save the reconstructed audio file. | |
Returns: | |
- audio: The reconstructed audio signal. | |
""" | |
# Convert dB mel-spectrogram back to linear scale | |
mel_spectrogram = librosa.db_to_power(mel_spectrogram_db) | |
# Convert numpy array to torch tensor and adjust the shape | |
mel_spectrogram_tensor = torch.tensor(mel_spectrogram).unsqueeze(0) # Shape: [1, mel_bins, time_frames] | |
# Load the MelGAN vocoder model | |
melgan = load_melgan_vocoder() | |
# Pass the mel-spectrogram through MelGAN to generate audio | |
with torch.no_grad(): | |
audio = melgan(mel_spectrogram_tensor).squeeze().numpy() # Squeeze to remove batch dimension | |
# Save the generated audio | |
sf.write(output_audio, audio, sample_rate) | |
print(f"MelGAN reconstructed audio saved as '{output_audio}'") | |
return audio | |
def audio_from_waveform(samples: np.ndarray, sample_rate: int, normalize: bool = False) -> pydub.AudioSegment: | |
""" | |
Convert a numpy array of samples of a waveform to an audio segment. | |
Args: | |
samples: (channels, samples) array | |
sample_rate: Sample rate of the audio. | |
normalize: Flag to normalize volume. | |
Returns: | |
pydub.AudioSegment | |
""" | |
# Normalize volume to fit in int16 | |
if normalize: | |
samples *= np.iinfo(np.int16).max / np.max(np.abs(samples)) | |
# Transpose and convert to int16 | |
samples = samples.transpose(1, 0).astype(np.int16) | |
# Write to the bytes of a WAV file | |
wav_bytes = io.BytesIO() | |
wavfile.write(wav_bytes, sample_rate, samples) | |
wav_bytes.seek(0) | |
# Read into pydub | |
return pydub.AudioSegment.from_wav(wav_bytes) | |
def apply_filters(segment: pydub.AudioSegment, compression: bool = False) -> pydub.AudioSegment: | |
""" | |
Apply post-processing filters to the audio segment to compress it and keep at a -10 dBFS level. | |
Args: | |
segment: The audio segment to filter. | |
compression: Flag to apply dynamic range compression. | |
Returns: | |
pydub.AudioSegment | |
""" | |
if compression: | |
segment = pydub.effects.normalize(segment, headroom=0.1) | |
segment = segment.apply_gain(-10 - segment.dBFS) | |
segment = pydub.effects.compress_dynamic_range( | |
segment, | |
threshold=-20.0, | |
ratio=4.0, | |
attack=5.0, | |
release=50.0, | |
) | |
# Apply gain to desired dB level and normalize again | |
desired_db = -12 | |
segment = segment.apply_gain(desired_db - segment.dBFS) | |
return pydub.effects.normalize(segment, headroom=0.1) | |
def stitch_segments(segments: Sequence[pydub.AudioSegment], crossfade_s: float) -> pydub.AudioSegment: | |
""" | |
Stitch together a sequence of audio segments with a crossfade between each segment. | |
Args: | |
segments: Sequence of audio segments to stitch. | |
crossfade_s: Duration of crossfade in seconds. | |
Returns: | |
pydub.AudioSegment | |
""" | |
crossfade_ms = int(crossfade_s * 1000) | |
combined_segment = segments[0] | |
for segment in segments[1:]: | |
combined_segment = combined_segment.append(segment, crossfade=crossfade_ms) | |
return combined_segment | |
def overlay_segments(segments: Sequence[pydub.AudioSegment]) -> pydub.AudioSegment: | |
""" | |
Overlay a sequence of audio segments on top of each other. | |
Args: | |
segments: Sequence of audio segments to overlay. | |
Returns: | |
pydub.AudioSegment | |
""" | |
assert len(segments) > 0 | |
output: pydub.AudioSegment = segments[0] | |
for segment in segments[1:]: | |
output = output.overlay(segment) | |
return output | |
# Step 7: Full Pipeline for Audio Processing with Customization | |
def mel_spectrogram_pipeline(audio_file, output_image='mel_spectrogram.png', | |
output_audio_griffin='griffin_reconstructed_audio.wav', | |
output_audio_melgan='melgan_reconstructed_audio.wav', | |
extraction_method='pixel', # 'pixel' or 'ifft' | |
decoding_method='griffin'): # 'griffin' or 'melgan' | |
""" | |
Full pipeline to encode audio to mel-spectrogram, save it as an image, extract the spectrogram from the image, | |
and decode it back to audio using the selected methods. | |
Parameters: | |
- audio_file: Path to the audio file to be processed. | |
- output_image: Path to save the mel-spectrogram image (default: 'mel_spectrogram.png'). | |
- output_audio_griffin: Path to save the Griffin-Lim reconstructed audio. | |
- output_audio_melgan: Path to save the MelGAN reconstructed audio. | |
- extraction_method: Method for extraction ('pixel' or 'ifft'). | |
- decoding_method: Method for decoding ('griffin' or 'melgan'). | |
""" | |
# Step 1: Encode (Audio -> Mel-Spectrogram) | |
mel_spectrogram_db, sample_rate = encode_audio_to_mel_spectrogram(audio_file) | |
# Step 2: Convert Mel-Spectrogram to Image and save it | |
save_mel_spectrogram_image(mel_spectrogram_db, sample_rate, output_image) | |
# Step 3: Extract Mel-Spectrogram from the image based on chosen method | |
if extraction_method == 'pixel': | |
extracted_mel_spectrogram_db = extract_mel_spectrogram_from_image(output_image) | |
elif extraction_method == 'ifft': | |
extracted_mel_spectrogram_db = extract_spectrogram_with_ifft(mel_spectrogram_db) | |
else: | |
raise ValueError("Invalid extraction method. Choose 'pixel' or 'ifft'.") | |
# Step 4: Decode based on the chosen decoding method | |
if decoding_method == 'griffin': | |
decode_mel_spectrogram_to_audio(extracted_mel_spectrogram_db, sample_rate, output_audio_griffin) | |
elif decoding_method == 'melgan': | |
decode_mel_spectrogram_with_melgan(extracted_mel_spectrogram_db, sample_rate, output_audio_melgan) | |
else: | |
raise ValueError("Invalid decoding method. Choose 'griffin' or 'melgan'.") | |
# Example usage | |
if __name__ == "__main__": | |
audio_file_path = 'your_audio_file.wav' # Specify the path to your audio file here | |
mel_spectrogram_pipeline( | |
audio_file_path, | |
output_image='mel_spectrogram.png', | |
output_audio_griffin='griffin_reconstructed_audio.wav', | |
output_audio_melgan='melgan_reconstructed_audio.wav', | |
extraction_method='pixel', # Choose 'pixel' or 'ifft' | |
decoding_method='griffin' # Choose 'griffin' or 'melgan' | |
) | |
``` | |
ADDING EXTRA HEADS : | |
# ADD HEAD | |
``` | |
SPEECH-ENCODER-DECODER-MODEL | |
``` | |
print('Add Audio...') | |
#Add Head | |
# Combine pre-trained encoder and pre-trained decoder to form a Seq2Seq model | |
_AudioFeatureExtractor = AutoFeatureExtractor.from_pretrained("openai/whisper-small") | |
_AudioTokenizer = AutoTokenizer.from_pretrained("openai/whisper-small") | |
_SpeechEncoderDecoder = SpeechEncoderDecoderModel.from_encoder_decoder_pretrained("openai/whisper-small","openai/whisper-small") | |
# Add Pad tokems | |
_SpeechEncoderDecoder.config.decoder_start_token_id = _AudioTokenizer.cls_token_id | |
_SpeechEncoderDecoder.config.pad_token_id = _AudioTokenizer.pad_token_id | |
LM_MODEL.SpeechEncoderDecoder = _SpeechEncoderDecoder | |
# Add Sub Components | |
LM_MODEL.Decoder_AudioTokenizer = _AudioTokenizer | |
LM_MODEL.Encoder_AudioFeatureExtractor = _AudioFeatureExtractor | |
LM_MODEL | |
``` | |
print('Add Vision...') | |
# ADD HEAD | |
# Combine pre-trained encoder and pre-trained decoder to form a Seq2Seq model | |
Vmodel = VisionEncoderDecoderModel.from_encoder_decoder_pretrained( | |
"google/vit-base-patch16-224-in21k", "LeroyDyer/Mixtral_AI_Tiny" | |
) | |
_Encoder_ImageProcessor = Vmodel.encoder | |
_Decoder_ImageTokenizer = Vmodel.decoder | |
_VisionEncoderDecoderModel = Vmodel | |
# Add Pad tokems | |
LM_MODEL.VisionEncoderDecoder = _VisionEncoderDecoderModel | |
# Add Sub Components | |
LM_MODEL.Encoder_ImageProcessor = _Encoder_ImageProcessor | |
LM_MODEL.Decoder_ImageTokenizer = _Decoder_ImageTokenizer | |
LM_MODEL | |
``` | |
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) | |
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_LeroyDyer__SpydazWebAI_Human_AGI) | |
| Metric |Value| | |
|-------------------|----:| | |
|Avg. | 9.88| | |
|IFEval (0-Shot) |33.88| | |
|BBH (3-Shot) | 7.45| | |
|MATH Lvl 5 (4-Shot)| 0.91| | |
|GPQA (0-shot) | 4.36| | |
|MuSR (0-shot) | 7.38| | |
|MMLU-PRO (5-shot) | 5.32| | |