hhsavich commited on
Commit
d38979f
1 Parent(s): 8a57a67

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +140 -3
README.md CHANGED
@@ -1,3 +1,140 @@
1
- ---
2
- license: openrail
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+
3
+ # Model Card for LatAm Accent Determination
4
+ Wav2Vec2 Model to classify audio based on the accent of the speaker as Puerto Rican, Colombian, Venezuelan, Peruvian, or Chilean
5
+
6
+
7
+ # Table of Contents
8
+
9
+ - [Model Card for LatAm Accent Determination](#model-card-for--model_id-)
10
+ - [Table of Contents](#table-of-contents)
11
+ - [Model Details](#model-details)
12
+ - [Model Description](#model-description)
13
+ - [Uses](#uses)
14
+ - [Direct Use](#direct-use)
15
+ - [Out-of-Scope Use](#out-of-scope-use)
16
+ - [Bias, Risks, and Limitations](#bias-risks-and-limitations)
17
+ - [Training Details](#training-details)
18
+ - [Training Data](#training-data)
19
+ - [Training Procedure](#training-procedure)
20
+ - [Preprocessing](#preprocessing)
21
+ - [Speeds, Sizes, Times](#speeds-sizes-times)
22
+ - [Evaluation](#evaluation)
23
+ - [Testing Data, Factors & Metrics](#testing-data-factors--metrics)
24
+ - [Testing Data](#testing-data)
25
+ - [Factors](#factors)
26
+ - [Metrics](#metrics)
27
+ - [Results](#results)
28
+ - [Model Examination](#model-examination)
29
+ - [Technical Specs](#technical-specifications)
30
+ - [Model Architecture and Objective](#model-architecture-and-objective)
31
+ - [Compute Infrastructure](#compute-infrastructure)
32
+ - [Hardware](#hardware)
33
+ - [Software](#software)
34
+ - [Citation](#citation)
35
+ - [Model Card Authors](#model-card-authors)
36
+ - [Model Card Contact](#model-card-contact)
37
+ - [How to Get Started with the Model](#how-to-get-started-with-the-model)
38
+
39
+
40
+ # Model Details
41
+
42
+ ## Model Description
43
+
44
+ Wav2Vec2 Model to classify audio based on the accent of the speaker as Puerto Rican, Colombian, Venezuelan, Peruvian, or Chilean
45
+
46
+ - **Developed by:** Henry Savich
47
+ - **Shared by [Optional]:** Henry Savich
48
+ - **Model type:** Language model
49
+ - **Language(s) (NLP):** es
50
+ - **License:** openrail
51
+ - **Parent Model:** Wav2Vec2 Base
52
+ - **Resources for more information:**
53
+ - [GitHub Repo](https://github.com/HSavich/dialect_discrimination)
54
+
55
+
56
+ # Uses
57
+
58
+ ## Direct Use
59
+
60
+ Classify an audio clip as Puerto Rican, Peruvian, Venezuelan, Colombian, or Chilean Spanish
61
+
62
+ ## Out-of-Scope Use
63
+
64
+ The model was trained on speakers reciting pre-chosen sentences, thus it does not reflect any knowledge of lexical differences between dialects.
65
+
66
+ # Bias, Risks, and Limitations
67
+
68
+ Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.
69
+
70
+ # Training Details
71
+
72
+ ## Training Data
73
+
74
+ OpenSLR 71,72,73,74,75,76
75
+
76
+ ## Training Procedure
77
+
78
+ ### Preprocessing
79
+
80
+ Data was Train-Test split on speakers, so as to prevent the model from achieving high test accuracy by matching voices.
81
+
82
+ ### Speeds, Sizes, Times
83
+
84
+ Trained on ~3000 5-second audio clips, Training is lightwegiht taking < 1 hr on using Google Colaboratory Premium GPUs
85
+
86
+ # Evaluation
87
+
88
+ ## Testing Data, Factors & Metrics
89
+
90
+ ### Testing Data
91
+
92
+ OpenSLR 71,72,73,74,75,76
93
+ https://huggingface.co/datasets/openslr
94
+
95
+ ### Factors
96
+
97
+ Audio Quality - training and testing data was higher quality than can be expected from found audio
98
+
99
+ ### Metrics
100
+
101
+ Accuracy
102
+
103
+ ## Results
104
+
105
+ 98.0%
106
+
107
+ # Model Examination
108
+
109
+ Even splitting on speakers, our model achieves excellent accuracy on the testing set. This is interesting because it indicates that accent classification, at least at this granularity, is an easier task than voice identification, which could have just as easily met the training objective.
110
+
111
+ The confusion matrix shows that Basque is the most easily distinguished, which should be expecting as it is the only language that isn't Spanish. Puerto Rican was the hardest to identify in the testing set, but I think this is more having to do with PR having the least data moreso than something about the accent itself.
112
+
113
+ I think if this same size of dataset was used for this same experiment, but there were more speakers (and so not as much fitting on individual voices), we could expect near perfect accuracy.
114
+
115
+ # Technical Specifications
116
+
117
+ ## Model Architecture and Objective
118
+
119
+ Wav2Vec2
120
+
121
+ ## Compute Infrastructure
122
+
123
+ Google Colaboratory Pro+
124
+
125
+ ### Hardware
126
+
127
+ Google Colaboratory Pro+ Premium GPUS
128
+
129
+ ### Software
130
+
131
+ Pytorch via huggingface
132
+
133
+
134
+ # Model Card Authors
135
+
136
+ Henry Savich
137
+
138
+ # Model Card Contact
139
+
140
+ henry.h.savich@vanderbilt.edu