Spaces:

ahmedghani
/

svoice_demo

Build error

App Files Files Community

ahmedghani commited on Nov 29, 2022

Commit

8235b4f

1 Parent(s): f8fec52

initial commit

Browse files

Files changed (22) hide show

CODE_OF_CONDUCT.md +78 -0
CONTRIBUTING.md +25 -0
LICENSE +437 -0
README.md +123 -13
app.py +100 -0
packages.txt +2 -0
requirements.txt +18 -0
svoice/__init__.py +5 -0
svoice/data/__init__.py +5 -0
svoice/data/audio.py +89 -0
svoice/data/data.py +207 -0
svoice/data/preprocess.py +74 -0
svoice/distrib.py +95 -0
svoice/evaluate.py +212 -0
svoice/evaluate_auto_select.py +184 -0
svoice/executor.py +85 -0
svoice/models/__init__.py +5 -0
svoice/models/sisnr_loss.py +124 -0
svoice/models/swave.py +294 -0
svoice/separate.py +174 -0
svoice/solver.py +227 -0
svoice/utils.py +241 -0

CODE_OF_CONDUCT.md ADDED Viewed

	@@ -0,0 +1,78 @@

+# Code of Conduct
+## Our Pledge
+In the interest of fostering an open and welcoming environment, we as
+contributors and maintainers pledge to make participation in our project and
+our community a harassment-free experience for everyone, regardless of age, body
+size, disability, ethnicity, sex characteristics, gender identity and expression,
+level of experience, education, socio-economic status, nationality, personal
+appearance, race, religion, or sexual identity and orientation.
+## Our Standards
+Examples of behavior that contributes to creating a positive environment
+include:
+* Using welcoming and inclusive language
+* Being respectful of differing viewpoints and experiences
+* Gracefully accepting constructive criticism
+* Focusing on what is best for the community
+* Showing empathy towards other community members
+Examples of unacceptable behavior by participants include:
+* The use of sexualized language or imagery and unwelcome sexual attention or
+  advances
+* Trolling, insulting/derogatory comments, and personal or political attacks
+* Public or private harassment
+* Publishing others' private information, such as a physical or electronic
+  address, without explicit permission
+* Other conduct which could reasonably be considered inappropriate in a
+  professional setting
+## Our Responsibilities
+Project maintainers are responsible for clarifying the standards of acceptable
+behavior and are expected to take appropriate and fair corrective action in
+response to any instances of unacceptable behavior.
+Project maintainers have the right and responsibility to remove, edit, or
+reject comments, commits, code, wiki edits, issues, and other contributions
+that are not aligned to this Code of Conduct, or to ban temporarily or
+permanently any contributor for other behaviors that they deem inappropriate,
+threatening, offensive, or harmful.
+## Scope
+This Code of Conduct applies within all project spaces, and it also applies when
+an individual is representing the project or its community in public spaces.
+Examples of representing a project or community include using an official
+project e-mail address, posting via an official social media account, or acting
+as an appointed representative at an online or offline event. Representation of
+a project may be further defined and clarified by project maintainers.
+## Enforcement
+Instances of abusive, harassing, or otherwise unacceptable behavior may be
+reported by contacting the project team at <opensource-conduct@fb.com>. All
+complaints will be reviewed and investigated and will result in a response that
+is deemed necessary and appropriate to the circumstances. The project team is
+obligated to maintain confidentiality with regard to the reporter of an incident.
+Further details of specific enforcement policies may be posted separately.
+Project maintainers who do not follow or enforce the Code of Conduct in good
+faith may face temporary or permanent repercussions as determined by other
+members of the project's leadership.
+## Attribution
+This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
+available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
+[homepage]: https://www.contributor-covenant.org
+For answers to common questions about this code of conduct, see
+https://www.contributor-covenant.org/faq

CONTRIBUTING.md ADDED Viewed

	@@ -0,0 +1,25 @@

+# Contributing to Denoiser
+## Pull Requests
+In order to accept your pull request, we need you to submit a CLA. You only need
+to do this once to work on any of Facebook's open source projects.
+Complete your CLA here: <https://code.facebook.com/cla>
+Demucs is the implementation of a research paper.
+Therefore, we do not plan on accepting many pull requests for new features.
+We certainly welcome them for bug fixes.
+## Issues
+We use GitHub issues to track public bugs. Please ensure your description is
+clear and has sufficient instructions to be able to reproduce the issue.
+Please first check existing issues as well as the README for existing solutions.
+## License
+By contributing to this repository, you agree that your contributions will be licensed
+under the LICENSE file in the root directory of this source tree.

LICENSE ADDED Viewed

	@@ -0,0 +1,437 @@

+Attribution-NonCommercial-ShareAlike 4.0 International
+=======================================================================
+Creative Commons Corporation ("Creative Commons") is not a law firm and
+does not provide legal services or legal advice. Distribution of
+Creative Commons public licenses does not create a lawyer-client or
+other relationship. Creative Commons makes its licenses and related
+information available on an "as-is" basis. Creative Commons gives no
+warranties regarding its licenses, any material licensed under their
+terms and conditions, or any related information. Creative Commons
+disclaims all liability for damages resulting from their use to the
+fullest extent possible.
+Using Creative Commons Public Licenses
+Creative Commons public licenses provide a standard set of terms and
+conditions that creators and other rights holders may use to share
+original works of authorship and other material subject to copyright
+and certain other rights specified in the public license below. The
+following considerations are for informational purposes only, are not
+exhaustive, and do not form part of our licenses.
+     Considerations for licensors: Our public licenses are
+     intended for use by those authorized to give the public
+     permission to use material in ways otherwise restricted by
+     copyright and certain other rights. Our licenses are
+     irrevocable. Licensors should read and understand the terms
+     and conditions of the license they choose before applying it.
+     Licensors should also secure all rights necessary before
+     applying our licenses so that the public can reuse the
+     material as expected. Licensors should clearly mark any
+     material not subject to the license. This includes other CC-
+     licensed material, or material used under an exception or
+     limitation to copyright. More considerations for licensors:
+	wiki.creativecommons.org/Considerations_for_licensors
+     Considerations for the public: By using one of our public
+     licenses, a licensor grants the public permission to use the
+     licensed material under specified terms and conditions. If
+     the licensor's permission is not necessary for any reason--for
+     example, because of any applicable exception or limitation to
+     copyright--then that use is not regulated by the license. Our
+     licenses grant only permissions under copyright and certain
+     other rights that a licensor has authority to grant. Use of
+     the licensed material may still be restricted for other
+     reasons, including because others have copyright or other
+     rights in the material. A licensor may make special requests,
+     such as asking that all changes be marked or described.
+     Although not required by our licenses, you are encouraged to
+     respect those requests where reasonable. More_considerations
+     for the public:
+	wiki.creativecommons.org/Considerations_for_licensees
+=======================================================================
+Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International
+Public License
+By exercising the Licensed Rights (defined below), You accept and agree
+to be bound by the terms and conditions of this Creative Commons
+Attribution-NonCommercial-ShareAlike 4.0 International Public License
+("Public License"). To the extent this Public License may be
+interpreted as a contract, You are granted the Licensed Rights in
+consideration of Your acceptance of these terms and conditions, and the
+Licensor grants You such rights in consideration of benefits the
+Licensor receives from making the Licensed Material available under
+these terms and conditions.
+Section 1 -- Definitions.
+  a. Adapted Material means material subject to Copyright and Similar
+     Rights that is derived from or based upon the Licensed Material
+     and in which the Licensed Material is translated, altered,
+     arranged, transformed, or otherwise modified in a manner requiring
+     permission under the Copyright and Similar Rights held by the
+     Licensor. For purposes of this Public License, where the Licensed
+     Material is a musical work, performance, or sound recording,
+     Adapted Material is always produced where the Licensed Material is
+     synched in timed relation with a moving image.
+  b. Adapter's License means the license You apply to Your Copyright
+     and Similar Rights in Your contributions to Adapted Material in
+     accordance with the terms and conditions of this Public License.
+  c. BY-NC-SA Compatible License means a license listed at
+     creativecommons.org/compatiblelicenses, approved by Creative
+     Commons as essentially the equivalent of this Public License.
+  d. Copyright and Similar Rights means copyright and/or similar rights
+     closely related to copyright including, without limitation,
+     performance, broadcast, sound recording, and Sui Generis Database
+     Rights, without regard to how the rights are labeled or
+     categorized. For purposes of this Public License, the rights
+     specified in Section 2(b)(1)-(2) are not Copyright and Similar
+     Rights.
+  e. Effective Technological Measures means those measures that, in the
+     absence of proper authority, may not be circumvented under laws
+     fulfilling obligations under Article 11 of the WIPO Copyright
+     Treaty adopted on December 20, 1996, and/or similar international
+     agreements.
+  f. Exceptions and Limitations means fair use, fair dealing, and/or
+     any other exception or limitation to Copyright and Similar Rights
+     that applies to Your use of the Licensed Material.
+  g. License Elements means the license attributes listed in the name
+     of a Creative Commons Public License. The License Elements of this
+     Public License are Attribution, NonCommercial, and ShareAlike.
+  h. Licensed Material means the artistic or literary work, database,
+     or other material to which the Licensor applied this Public
+     License.
+  i. Licensed Rights means the rights granted to You subject to the
+     terms and conditions of this Public License, which are limited to
+     all Copyright and Similar Rights that apply to Your use of the
+     Licensed Material and that the Licensor has authority to license.
+  j. Licensor means the individual(s) or entity(ies) granting rights
+     under this Public License.
+  k. NonCommercial means not primarily intended for or directed towards
+     commercial advantage or monetary compensation. For purposes of
+     this Public License, the exchange of the Licensed Material for
+     other material subject to Copyright and Similar Rights by digital
+     file-sharing or similar means is NonCommercial provided there is
+     no payment of monetary compensation in connection with the
+     exchange.
+  l. Share means to provide material to the public by any means or
+     process that requires permission under the Licensed Rights, such
+     as reproduction, public display, public performance, distribution,
+     dissemination, communication, or importation, and to make material
+     available to the public including in ways that members of the
+     public may access the material from a place and at a time
+     individually chosen by them.
+  m. Sui Generis Database Rights means rights other than copyright
+     resulting from Directive 96/9/EC of the European Parliament and of
+     the Council of 11 March 1996 on the legal protection of databases,
+     as amended and/or succeeded, as well as other essentially
+     equivalent rights anywhere in the world.
+  n. You means the individual or entity exercising the Licensed Rights
+     under this Public License. Your has a corresponding meaning.
+Section 2 -- Scope.
+  a. License grant.
+       1. Subject to the terms and conditions of this Public License,
+          the Licensor hereby grants You a worldwide, royalty-free,
+          non-sublicensable, non-exclusive, irrevocable license to
+          exercise the Licensed Rights in the Licensed Material to:
+            a. reproduce and Share the Licensed Material, in whole or
+               in part, for NonCommercial purposes only; and
+            b. produce, reproduce, and Share Adapted Material for
+               NonCommercial purposes only.
+       2. Exceptions and Limitations. For the avoidance of doubt, where
+          Exceptions and Limitations apply to Your use, this Public
+          License does not apply, and You do not need to comply with
+          its terms and conditions.
+       3. Term. The term of this Public License is specified in Section
+          6(a).
+       4. Media and formats; technical modifications allowed. The
+          Licensor authorizes You to exercise the Licensed Rights in
+          all media and formats whether now known or hereafter created,
+          and to make technical modifications necessary to do so. The
+          Licensor waives and/or agrees not to assert any right or
+          authority to forbid You from making technical modifications
+          necessary to exercise the Licensed Rights, including
+          technical modifications necessary to circumvent Effective
+          Technological Measures. For purposes of this Public License,
+          simply making modifications authorized by this Section 2(a)
+          (4) never produces Adapted Material.
+       5. Downstream recipients.
+            a. Offer from the Licensor -- Licensed Material. Every
+               recipient of the Licensed Material automatically
+               receives an offer from the Licensor to exercise the
+               Licensed Rights under the terms and conditions of this
+               Public License.
+            b. Additional offer from the Licensor -- Adapted Material.
+               Every recipient of Adapted Material from You
+               automatically receives an offer from the Licensor to
+               exercise the Licensed Rights in the Adapted Material
+               under the conditions of the Adapter's License You apply.
+            c. No downstream restrictions. You may not offer or impose
+               any additional or different terms or conditions on, or
+               apply any Effective Technological Measures to, the
+               Licensed Material if doing so restricts exercise of the
+               Licensed Rights by any recipient of the Licensed
+               Material.
+       6. No endorsement. Nothing in this Public License constitutes or
+          may be construed as permission to assert or imply that You
+          are, or that Your use of the Licensed Material is, connected
+          with, or sponsored, endorsed, or granted official status by,
+          the Licensor or others designated to receive attribution as
+          provided in Section 3(a)(1)(A)(i).
+  b. Other rights.
+       1. Moral rights, such as the right of integrity, are not
+          licensed under this Public License, nor are publicity,
+          privacy, and/or other similar personality rights; however, to
+          the extent possible, the Licensor waives and/or agrees not to
+          assert any such rights held by the Licensor to the limited
+          extent necessary to allow You to exercise the Licensed
+          Rights, but not otherwise.
+       2. Patent and trademark rights are not licensed under this
+          Public License.
+       3. To the extent possible, the Licensor waives any right to
+          collect royalties from You for the exercise of the Licensed
+          Rights, whether directly or through a collecting society
+          under any voluntary or waivable statutory or compulsory
+          licensing scheme. In all other cases the Licensor expressly
+          reserves any right to collect such royalties, including when
+          the Licensed Material is used other than for NonCommercial
+          purposes.
+Section 3 -- License Conditions.
+Your exercise of the Licensed Rights is expressly made subject to the
+following conditions.
+  a. Attribution.
+       1. If You Share the Licensed Material (including in modified
+          form), You must:
+            a. retain the following if it is supplied by the Licensor
+               with the Licensed Material:
+                 i. identification of the creator(s) of the Licensed
+                    Material and any others designated to receive
+                    attribution, in any reasonable manner requested by
+                    the Licensor (including by pseudonym if
+                    designated);
+                ii. a copyright notice;
+               iii. a notice that refers to this Public License;
+                iv. a notice that refers to the disclaimer of
+                    warranties;
+                 v. a URI or hyperlink to the Licensed Material to the
+                    extent reasonably practicable;
+            b. indicate if You modified the Licensed Material and
+               retain an indication of any previous modifications; and
+            c. indicate the Licensed Material is licensed under this
+               Public License, and include the text of, or the URI or
+               hyperlink to, this Public License.
+       2. You may satisfy the conditions in Section 3(a)(1) in any
+          reasonable manner based on the medium, means, and context in
+          which You Share the Licensed Material. For example, it may be
+          reasonable to satisfy the conditions by providing a URI or
+          hyperlink to a resource that includes the required
+          information.
+       3. If requested by the Licensor, You must remove any of the
+          information required by Section 3(a)(1)(A) to the extent
+          reasonably practicable.
+  b. ShareAlike.
+     In addition to the conditions in Section 3(a), if You Share
+     Adapted Material You produce, the following conditions also apply.
+       1. The Adapter's License You apply must be a Creative Commons
+          license with the same License Elements, this version or
+          later, or a BY-NC-SA Compatible License.
+       2. You must include the text of, or the URI or hyperlink to, the
+          Adapter's License You apply. You may satisfy this condition
+          in any reasonable manner based on the medium, means, and
+          context in which You Share Adapted Material.
+       3. You may not offer or impose any additional or different terms
+          or conditions on, or apply any Effective Technological
+          Measures to, Adapted Material that restrict exercise of the
+          rights granted under the Adapter's License You apply.
+Section 4 -- Sui Generis Database Rights.
+Where the Licensed Rights include Sui Generis Database Rights that
+apply to Your use of the Licensed Material:
+  a. for the avoidance of doubt, Section 2(a)(1) grants You the right
+     to extract, reuse, reproduce, and Share all or a substantial
+     portion of the contents of the database for NonCommercial purposes
+     only;
+  b. if You include all or a substantial portion of the database
+     contents in a database in which You have Sui Generis Database
+     Rights, then the database in which You have Sui Generis Database
+     Rights (but not its individual contents) is Adapted Material,
+     including for purposes of Section 3(b); and
+  c. You must comply with the conditions in Section 3(a) if You Share
+     all or a substantial portion of the contents of the database.
+For the avoidance of doubt, this Section 4 supplements and does not
+replace Your obligations under this Public License where the Licensed
+Rights include other Copyright and Similar Rights.
+Section 5 -- Disclaimer of Warranties and Limitation of Liability.
+  a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
+     EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
+     AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
+     ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
+     IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
+     WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
+     PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
+     ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
+     KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
+     ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
+  b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
+     TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
+     NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
+     INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
+     COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
+     USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
+     ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
+     DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
+     IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
+  c. The disclaimer of warranties and limitation of liability provided
+     above shall be interpreted in a manner that, to the extent
+     possible, most closely approximates an absolute disclaimer and
+     waiver of all liability.
+Section 6 -- Term and Termination.
+  a. This Public License applies for the term of the Copyright and
+     Similar Rights licensed here. However, if You fail to comply with
+     this Public License, then Your rights under this Public License
+     terminate automatically.
+  b. Where Your right to use the Licensed Material has terminated under
+     Section 6(a), it reinstates:
+       1. automatically as of the date the violation is cured, provided
+          it is cured within 30 days of Your discovery of the
+          violation; or
+       2. upon express reinstatement by the Licensor.
+     For the avoidance of doubt, this Section 6(b) does not affect any
+     right the Licensor may have to seek remedies for Your violations
+     of this Public License.
+  c. For the avoidance of doubt, the Licensor may also offer the
+     Licensed Material under separate terms or conditions or stop
+     distributing the Licensed Material at any time; however, doing so
+     will not terminate this Public License.
+  d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
+     License.
+Section 7 -- Other Terms and Conditions.
+  a. The Licensor shall not be bound by any additional or different
+     terms or conditions communicated by You unless expressly agreed.
+  b. Any arrangements, understandings, or agreements regarding the
+     Licensed Material not stated herein are separate from and
+     independent of the terms and conditions of this Public License.
+Section 8 -- Interpretation.
+  a. For the avoidance of doubt, this Public License does not, and
+     shall not be interpreted to, reduce, limit, restrict, or impose
+     conditions on any use of the Licensed Material that could lawfully
+     be made without permission under this Public License.
+  b. To the extent possible, if any provision of this Public License is
+     deemed unenforceable, it shall be automatically reformed to the
+     minimum extent necessary to make it enforceable. If the provision
+     cannot be reformed, it shall be severed from this Public License
+     without affecting the enforceability of the remaining terms and
+     conditions.
+  c. No term or condition of this Public License will be waived and no
+     failure to comply consented to unless expressly agreed to by the
+     Licensor.
+  d. Nothing in this Public License constitutes or may be interpreted
+     as a limitation upon, or waiver of, any privileges and immunities
+     that apply to the Licensor or You, including from the legal
+     processes of any jurisdiction or authority.
+=======================================================================
+Creative Commons is not a party to its public
+licenses. Notwithstanding, Creative Commons may elect to apply one of
+its public licenses to material it publishes and in those instances
+will be considered the “Licensor.” The text of the Creative Commons
+public licenses is dedicated to the public domain under the CC0 Public
+Domain Dedication. Except for the limited purpose of indicating that
+material is shared under a Creative Commons public license or as
+otherwise permitted by the Creative Commons policies published at
+creativecommons.org/policies, Creative Commons does not authorize the
+use of the trademark "Creative Commons" or any other trademark or logo
+of Creative Commons without its prior written consent including,
+without limitation, in connection with any unauthorized modifications
+to any of its public licenses or any other arrangements,
+understandings, or agreements concerning use of licensed material. For
+the avoidance of doubt, this paragraph does not form part of the
+public licenses.
+Creative Commons may be contacted at creativecommons.org.

README.md CHANGED Viewed

@@ -1,13 +1,123 @@
----
-title: Svoice Demo
-emoji: 🐠
-colorFrom: purple
-colorTo: pink
-sdk: gradio
-sdk_version: 3.11.0
-app_file: app.py
-pinned: false
-license: cc-by-nc-sa-4.0
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# Speaker Voice Separation using Neural Nets Gradio Demo
+## Installation
+```bash
+git clone https://github.com/Muhammad-Ahmad-Ghani/svoice_demo.git
+cd svoice_demo
+conda create -n svoice python=3.7 -y
+conda activate svoice
+conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.3 -c pytorch -y
+pip install -r requirements.txt
+```
+| Pretrained-Model | Dataset | Epochs | Train Loss | Valid Loss |
+|:------------:|:------------:|:------------:|:------------:|:------------:
+| [checkpoint.th](https://drive.google.com/drive/folders/1WzhvH1oIB9LqoTyItA6jViTRai5aURzJ?usp=sharing) | Librimix-7 (16k-mix_clean) | 31 | 0.04 | 0.64 |
+This is an intermediate checkpoint just for demo purpose.
+create directory ```outputs/exp_``` and save checkpoint there
+```
+svoice_demo
+├── outputs
+│   └── exp_
+│       └── checkpoint.th
+...
+```
+## Running End To End project
+#### Terminal 1
+```bash
+conda activate svoice
+python demo.py
+```
+## Training
+Create dataset ```mix_clean``` with sample rate ```16K``` using [librimix](https://github.com/shakeddovrat/librimix) repo.
+Dataset Structure
+```
+svoice_demo
+├── Libri7Mix_Dataset
+│   └── wav16k
+│       └── min
+│       │   └── dev
+│       │       └── ...
+│       │   └── test
+│       │       └── ...
+│       │   └── train-360
+│       │       └── ...
+...
+```
+#### Create ```metadata``` files
+For Librimix7 dataset
+```
+bash create_metadata_librimix7.sh
+```
+For Librimix10 dataset
+```
+bash create_metadata_librimix10.sh
+```
+Change ```conf/config.yaml``` according to your settings. Set ```C: 10``` value at line 66 for number of speakers.
+```
+python train.py
+```
+This will automaticlly read all the configurations from the `conf/config.yaml` file.
+To know more about the training you may refer to original [svoice](https://github.com/facebookresearch/svoice) repo.
+#### Distributed Training
+```
+python train.py ddp=1
+```
+### Evaluating
+```
+python -m svoice.evaluate <path to the model> <path to folder containing mix.json and all target separated channels json files s<ID>.json>
+```
+### Citation
+The svoice code is borrowed from original [svoice](https://github.com/facebookresearch/svoice) repository. All rights of code are reserved by [META Research](https://github.com/facebookresearch).
+```
+@inproceedings{nachmani2020voice,
+  title={Voice Separation with an Unknown Number of Multiple Speakers},
+  author={Nachmani, Eliya and Adi, Yossi and Wolf, Lior},
+  booktitle={Proceedings of the 37th international conference on Machine learning},
+  year={2020}
+}
+```
+```
+@misc{cosentino2020librimix,
+    title={LibriMix: An Open-Source Dataset for Generalizable Speech Separation},
+    author={Joris Cosentino and Manuel Pariente and Samuele Cornell and Antoine Deleforge and Emmanuel Vincent},
+    year={2020},
+    eprint={2005.11262},
+    archivePrefix={arXiv},
+    primaryClass={eess.AS}
+}
+```
+## License
+This repository is released under the CC-BY-NC-SA 4.0. license as found in the [LICENSE](LICENSE) file.
+The file: `svoice/models/sisnr_loss.py` and `svoice/data/preprocess.py` were adapted from the [kaituoxu/Conv-TasNet][convtas] repository. It is an unofficial implementation of the [Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation][convtas-paper] paper, released under the MIT License.
+Additionally, several input manipulation functions were borrowed and modified from the [yluo42/TAC][tac] repository, released under the CC BY-NC-SA 3.0 License.
+[icml]: https://arxiv.org/abs/2003.01531.pdf
+[icassp]: https://arxiv.org/pdf/2011.02329.pdf
+[web]: https://enk100.github.io/speaker_separation/
+[pytorch]: https://pytorch.org/
+[hydra]: https://github.com/facebookresearch/hydra
+[hydra-web]: https://hydra.cc/
+[convtas]: https://github.com/kaituoxu/Conv-TasNet
+[convtas-paper]: https://arxiv.org/pdf/1809.07454.pdf
+[tac]: https://github.com/yluo42/TAC
+[nprirgen]: https://github.com/ty274/rir-generator
+[rir]:https://asa.scitation.org/doi/10.1121/1.382599

app.py ADDED Viewed

	@@ -0,0 +1,100 @@

+from svoice.separate import *
+import scipy.io as sio
+from scipy.io.wavfile import write
+import gradio as gr
+import os
+from transformers import AutoProcessor, pipeline
+from optimum.onnxruntime import ORTModelForSpeechSeq2Seq
+from glob import glob
+load_model()
+BASE_PATH = os.path.dirname(os.path.abspath(__file__))
+os.makedirs('input', exist_ok=True)
+os.makedirs('separated', exist_ok=True)
+os.makedirs('whisper_checkpoint', exist_ok=True)
+print("Loading ASR model...")
+processor = AutoProcessor.from_pretrained("openai/whisper-small")
+if not os.path.exists("whisper_checkpoint"):
+    model = ORTModelForSpeechSeq2Seq.from_pretrained("openai/whisper-small", from_transformers=True)
+    speech_recognition_pipeline = pipeline(
+    "automatic-speech-recognition",
+        model=model,
+        feature_extractor=processor.feature_extractor,
+        tokenizer=processor.tokenizer,
+    )
+    model.save_pretrained("whisper_checkpoint")
+else:
+    model = ORTModelForSpeechSeq2Seq.from_pretrained("whisper_checkpoint", from_transformers=False)
+    speech_recognition_pipeline = pipeline(
+    "automatic-speech-recognition",
+        model=model,
+        feature_extractor=processor.feature_extractor,
+        tokenizer=processor.tokenizer,
+    )
+print("Whisper ASR model loaded.")
+def separator(audio, rec_audio):
+    outputs= {}
+    if audio:
+        write('input/original.wav', audio[0], audio[1])
+    elif rec_audio:
+        write('input/original.wav', rec_audio[0], rec_audio[1])
+    separate_demo(mix_dir="./input")
+    separated_files = glob(os.path.join('separated', "*.wav"))
+    separated_files = [f for f in separated_files if "original.wav" not in f]
+    outputs['transcripts'] = []
+    for file in sorted(separated_files):
+        separated_audio = sio.wavfile.read(file)
+        outputs['transcripts'].append(speech_recognition_pipeline(separated_audio[1])['text'])
+    return sorted(separated_files) + outputs['transcripts']
+def set_example_audio(example: list) -> dict:
+    return gr.Audio.update(value=example[0])
+demo = gr.Blocks()
+with demo:
+    gr.Markdown('''
+    <center>
+        <h1>Multiple Voice Separation with Transcription DEMO</h1>
+        <div style="display:flex;align-items:center;justify-content:center;"><iframe src="https://streamable.com/e/0x8osl?autoplay=1&nocontrols=1" frameborder="0" allow="autoplay"></iframe></div>
+        <p>
+            This is a demo for the multiple voice separation algorithm. The algorithm is trained on the LibriMix7 dataset and can be used to separate multiple voices from a single audio file.
+        </p>
+    </center>
+    ''')
+    with gr.Row():
+        input_audio = gr.Audio(label="Input audio", type="numpy")
+        rec_audio = gr.Audio(label="Record Using Microphone", type="numpy", source="microphone")
+    with gr.Row():
+        output_audio1 = gr.Audio(label='Speaker 1', interactive=False)
+        output_text1 = gr.Text(label='Speaker 1', interactive=False)
+        output_audio2 = gr.Audio(label='Speaker 2', interactive=False)
+        output_text2 = gr.Text(label='Speaker 2', interactive=False)
+    with gr.Row():
+        output_audio3 = gr.Audio(label='Speaker 3', interactive=False)
+        output_text3 = gr.Text(label='Speaker 3', interactive=False)
+        output_audio4 = gr.Audio(label='Speaker 4', interactive=False)
+        output_text4 = gr.Text(label='Speaker 4', interactive=False)
+    with gr.Row():
+        output_audio5 = gr.Audio(label='Speaker 5', interactive=False)
+        output_text5 = gr.Text(label='Speaker 5', interactive=False)
+        output_audio6 = gr.Audio(label='Speaker 6', interactive=False)
+        output_text6 = gr.Text(label='Speaker 6', interactive=False)
+    with gr.Row():
+        output_audio7 = gr.Audio(label='Speaker 7', interactive=False)
+        output_text7 = gr.Text(label='Speaker 7', interactive=False)
+    outputs_audio = [output_audio1, output_audio2, output_audio3, output_audio4, output_audio5, output_audio6, output_audio7]
+    outputs_text = [output_text1, output_text2, output_text3, output_text4, output_text5, output_text6, output_text7]
+    button = gr.Button("Separate")
+    button.click(separator, inputs=[input_audio, rec_audio], outputs=outputs_audio + outputs_text)
+demo.launch()

packages.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ ffmpeg
2	+ libsndfile1-dev

requirements.txt ADDED Viewed

	@@ -0,0 +1,18 @@

+pesq==0.0.2
+tqdm
+hydra_core==1.0.3
+hydra_colorlog==1.0.0
+pystoi==0.3.3
+librosa==0.7.1
+numba==0.48
+numpy
+flask
+flask-cors
+uvicorn[standard]
+asgiref
+gradio
+transformers==4.24.0
+torch
+torchvision
+torchaudio
+optimum[onnxruntime]==1.5.0

svoice/__init__.py ADDED Viewed

	@@ -0,0 +1,5 @@

+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.

svoice/data/__init__.py ADDED Viewed

	@@ -0,0 +1,5 @@

+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.

svoice/data/audio.py ADDED Viewed

	@@ -0,0 +1,89 @@

+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+# Author: Alexandre Défossez @adefossez, 2020
+import json
+from pathlib import Path
+import math
+import os
+import tqdm
+import sys
+import torchaudio
+torchaudio.set_audio_backend("sox_io")
+import soundfile as sf
+import torch as th
+from torch.nn import functional as F
+# If used, this should be saved somewhere as it takes quite a bit
+# of time to generate
+def find_audio_files(path, exts=[".wav"], progress=True):
+    audio_files = []
+    for root, folders, files in os.walk(path, followlinks=True):
+        for file in files:
+            file = Path(root) / file
+            if file.suffix.lower() in exts:
+                audio_files.append(str(os.path.abspath(file)))
+    meta = []
+    if progress:
+        audio_files = tqdm.tqdm(audio_files,  ncols=80)
+    for file in audio_files:
+        siginfo, _ = torchaudio.info(file)
+        length = siginfo.length // siginfo.channels
+        meta.append((file, length))
+    meta.sort()
+    return meta
+class Audioset:
+    def __init__(self, files, length=None, stride=None, pad=True, augment=None):
+        """
+        files should be a list [(file, length)]
+        """
+        self.files = files
+        self.num_examples = []
+        self.length = length
+        self.stride = stride or length
+        self.augment = augment
+        for file, file_length in self.files:
+            if length is None:
+                examples = 1
+            elif file_length < length:
+                examples = 1 if pad else 0
+            elif pad:
+                examples = int(
+                    math.ceil((file_length - self.length) / self.stride) + 1)
+            else:
+                examples = (file_length - self.length) // self.stride + 1
+            self.num_examples.append(examples)
+    def __len__(self):
+        return sum(self.num_examples)
+    def __getitem__(self, index):
+        for (file, _), examples in zip(self.files, self.num_examples):
+            if index >= examples:
+                index -= examples
+                continue
+            num_frames = 0
+            offset = 0
+            if self.length is not None:
+                offset = self.stride * index
+                num_frames = self.length
+            #  out = th.Tensor(sf.read(str(file), start=offset, frames=num_frames)[0]).unsqueeze(0)
+            out = torchaudio.load(str(file), frame_offset=offset,
+                                  num_frames=num_frames)[0]
+            if self.augment:
+                out = self.augment(out.squeeze(0).numpy()).unsqueeze(0)
+            if num_frames:
+                out = F.pad(out, (0, num_frames - out.shape[-1]))
+            return out[0]
+if __name__ == "__main__":
+    json.dump(find_audio_files(sys.argv[1]), sys.stdout, indent=4)
+    print()

svoice/data/data.py ADDED Viewed

	@@ -0,0 +1,207 @@

+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+# Authors: Yossi Adi (adiyoss) and Alexandre Défossez (adefossez)
+import json
+import logging
+import math
+from pathlib import Path
+import os
+import re
+import librosa
+import numpy as np
+import torch
+import torch.utils.data as data
+from .preprocess import preprocess_one_dir
+from .audio import Audioset
+logger = logging.getLogger(__name__)
+def sort(infos): return sorted(
+    infos, key=lambda info: int(info[1]), reverse=True)
+class Trainset:
+    def __init__(self, json_dir, sample_rate=16000, segment=4.0, stride=1.0, pad=True):
+        mix_json = os.path.join(json_dir, 'mix.json')
+        s_jsons = list()
+        s_infos = list()
+        sets_re = re.compile(r's[0-9]+.json')
+        print(os.listdir(json_dir))
+        for s in os.listdir(json_dir):
+            if sets_re.search(s):
+                s_jsons.append(os.path.join(json_dir, s))
+        with open(mix_json, 'r') as f:
+            mix_infos = json.load(f)
+        for s_json in s_jsons:
+            with open(s_json, 'r') as f:
+                s_infos.append(json.load(f))
+        length = int(sample_rate * segment)
+        stride = int(sample_rate * stride)
+        kw = {'length': length, 'stride': stride, 'pad': pad}
+        self.mix_set = Audioset(sort(mix_infos), **kw)
+        self.sets = list()
+        for s_info in s_infos:
+            self.sets.append(Audioset(sort(s_info), **kw))
+        # verify all sets has the same size
+        for s in self.sets:
+            assert len(s) == len(self.mix_set)
+    def __getitem__(self, index):
+        mix_sig = self.mix_set[index]
+        tgt_sig = [self.sets[i][index] for i in range(len(self.sets))]
+        return self.mix_set[index], torch.LongTensor([mix_sig.shape[0]]), torch.stack(tgt_sig)
+    def __len__(self):
+        return len(self.mix_set)
+class Validset:
+    """
+    load entire wav.
+    """
+    def __init__(self, json_dir):
+        mix_json = os.path.join(json_dir, 'mix.json')
+        s_jsons = list()
+        s_infos = list()
+        sets_re = re.compile(r's[0-9]+.json')
+        for s in os.listdir(json_dir):
+            if sets_re.search(s):
+                s_jsons.append(os.path.join(json_dir, s))
+        with open(mix_json, 'r') as f:
+            mix_infos = json.load(f)
+        for s_json in s_jsons:
+            with open(s_json, 'r') as f:
+                s_infos.append(json.load(f))
+        self.mix_set = Audioset(sort(mix_infos))
+        self.sets = list()
+        for s_info in s_infos:
+            self.sets.append(Audioset(sort(s_info)))
+        for s in self.sets:
+            assert len(s) == len(self.mix_set)
+    def __getitem__(self, index):
+        mix_sig = self.mix_set[index]
+        tgt_sig = [self.sets[i][index] for i in range(len(self.sets))]
+        return self.mix_set[index], torch.LongTensor([mix_sig.shape[0]]), torch.stack(tgt_sig)
+    def __len__(self):
+        return len(self.mix_set)
+# The following piece of code was adapted from https://github.com/kaituoxu/Conv-TasNet
+# released under the MIT License.
+# Author: Kaituo XU
+# Created on 2018/12
+class EvalDataset(data.Dataset):
+    def __init__(self, mix_dir, mix_json, batch_size, sample_rate=8000):
+        """
+        Args:
+            mix_dir: directory including mixture wav files
+            mix_json: json file including mixture wav files
+        """
+        super(EvalDataset, self).__init__()
+        assert mix_dir != None or mix_json != None
+        if mix_dir is not None:
+            # Generate mix.json given mix_dir
+            preprocess_one_dir(mix_dir, mix_dir, 'mix',
+                               sample_rate=sample_rate)
+            mix_json = os.path.join(mix_dir, 'mix.json')
+        with open(mix_json, 'r') as f:
+            mix_infos = json.load(f)
+        # sort it by #samples (impl bucket)
+        def sort(infos): return sorted(
+            infos, key=lambda info: int(info[1]), reverse=True)
+        sorted_mix_infos = sort(mix_infos)
+        # generate minibach infomations
+        minibatch = []
+        start = 0
+        while True:
+            end = min(len(sorted_mix_infos), start + batch_size)
+            minibatch.append([sorted_mix_infos[start:end],
+                              sample_rate])
+            if end == len(sorted_mix_infos):
+                break
+            start = end
+        self.minibatch = minibatch
+    def __getitem__(self, index):
+        return self.minibatch[index]
+    def __len__(self):
+        return len(self.minibatch)
+class EvalDataLoader(data.DataLoader):
+    """
+    NOTE: just use batchsize=1 here, so drop_last=True makes no sense here.
+    """
+    def __init__(self, *args, **kwargs):
+        super(EvalDataLoader, self).__init__(*args, **kwargs)
+        self.collate_fn = _collate_fn_eval
+def _collate_fn_eval(batch):
+    """
+    Args:
+        batch: list, len(batch) = 1. See AudioDataset.__getitem__()
+    Returns:
+        mixtures_pad: B x T, torch.Tensor
+        ilens : B, torch.Tentor
+        filenames: a list contain B strings
+    """
+    # batch should be located in list
+    assert len(batch) == 1
+    mixtures, filenames = load_mixtures(batch[0])
+    # get batch of lengths of input sequences
+    ilens = np.array([mix.shape[0] for mix in mixtures])
+    # perform padding and convert to tensor
+    pad_value = 0
+    mixtures_pad = pad_list([torch.from_numpy(mix).float()
+                             for mix in mixtures], pad_value)
+    ilens = torch.from_numpy(ilens)
+    return mixtures_pad, ilens, filenames
+def load_mixtures(batch):
+    """
+    Returns:
+        mixtures: a list containing B items, each item is T np.ndarray
+        filenames: a list containing B strings
+        T varies from item to item.
+    """
+    mixtures, filenames = [], []
+    mix_infos, sample_rate = batch
+    # for each utterance
+    for mix_info in mix_infos:
+        mix_path = mix_info[0]
+        # read wav file
+        mix, _ = librosa.load(mix_path, sr=sample_rate)
+        mixtures.append(mix)
+        filenames.append(mix_path)
+    return mixtures, filenames
+def pad_list(xs, pad_value):
+    n_batch = len(xs)
+    max_len = max(x.size(0) for x in xs)
+    pad = xs[0].new(n_batch, max_len, * xs[0].size()[1:]).fill_(pad_value)
+    for i in range(n_batch):
+        pad[i, :xs[i].size(0)] = xs[i]
+    return pad

svoice/data/preprocess.py ADDED Viewed

	@@ -0,0 +1,74 @@

+# The following piece of code was adapted from https://github.com/kaituoxu/Conv-TasNet
+# released under the MIT License.
+# Author: Kaituo XU
+# Created on 2018/12
+# Revised by: Eliya Nachmani (enk100), Yossi Adi (adiyoss), Lior Wolf
+import argparse
+import json
+import os
+import librosa
+from tqdm import tqdm
+def preprocess_one_dir(in_dir, out_dir, out_filename, sample_rate=8000):
+    file_infos = []
+    in_dir = os.path.abspath(in_dir)
+    wav_list = os.listdir(in_dir)
+    for wav_file in tqdm(wav_list):
+        if not wav_file.endswith('.wav'):
+            continue
+        wav_path = os.path.join(in_dir, wav_file)
+        samples, _ = librosa.load(wav_path, sr=sample_rate)
+        file_infos.append((wav_path, len(samples)))
+    if not os.path.exists(out_dir):
+        os.makedirs(out_dir)
+    with open(os.path.join(out_dir, out_filename + '.json'), 'w') as f:
+        json.dump(file_infos, f, indent=4)
+def preprocess(args):
+    for data_type in ['tr', 'cv', 'tt']:
+        for signal in ['noisy', 'clean']:
+            preprocess_one_dir(os.path.join(args.in_dir, data_type, signal),
+                               os.path.join(args.out_dir, data_type),
+                               signal,
+                               sample_rate=args.sample_rate)
+def preprocess_alldirs(args):
+    for d in os.listdir(args.in_dir):
+        local_dir = os.path.join(args.in_dir, d)
+        if os.path.isdir(local_dir):
+            preprocess_one_dir(os.path.join(args.in_dir, local_dir),
+                               os.path.join(args.out_dir),
+                               d,
+                               sample_rate=args.sample_rate)
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser("WSJ0 data preprocessing")
+    parser.add_argument('--in_dir', type=str, default=None,
+                        help='Directory path of wsj0 including tr, cv and tt')
+    parser.add_argument('--out_dir', type=str, default=None,
+                        help='Directory path to put output files')
+    parser.add_argument('--sample_rate', type=int, default=16000,
+                        help='Sample rate of audio file')
+    parser.add_argument("--one_dir", action="store_true",
+                        help="Generate json files from specific directory")
+    parser.add_argument("--all_dirs", action="store_true",
+                        help="Generate json files from all dirs in specific directory")
+    parser.add_argument('--json_name', type=str, default=None,
+                        help='The name of the json to be generated. '
+                             'To be used only with one-dir option.')
+    args = parser.parse_args()
+    print(args)
+    if args.all_dirs:
+        preprocess_alldirs(args)
+    elif args.one_dir:
+        preprocess_one_dir(args.in_dir, args.out_dir,
+                           args.json_name, sample_rate=args.sample_rate)
+    else:
+        preprocess(args)

svoice/distrib.py ADDED Viewed

	@@ -0,0 +1,95 @@

+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+# author: adefossez
+import logging
+import os
+import torch
+from torch.utils.data.distributed import DistributedSampler
+from torch.utils.data import DataLoader, Subset
+from torch.nn.parallel.distributed import DistributedDataParallel
+logger = logging.getLogger(__name__)
+rank = 0
+world_size = 1
+def init(args):
+    """init.
+    Initialize DDP using the given rendezvous file.
+    """
+    global rank, world_size
+    if args.ddp:
+        assert args.rank is not None and args.world_size is not None
+        rank = args.rank
+        world_size = args.world_size
+    if world_size == 1:
+        return
+    torch.cuda.set_device(rank)
+    torch.distributed.init_process_group(
+        backend=args.ddp_backend,
+        init_method='file://' + os.path.abspath(args.rendezvous_file),
+        world_size=world_size,
+        rank=rank)
+    logger.debug("Distributed rendezvous went well, rank %d/%d", rank, world_size)
+def average(metrics, count=1.):
+    """average.
+    Average all the relevant metrices across processes
+    `metrics`should be a 1D float32 fector. Returns the average of `metrics`
+    over all hosts. You can use `count` to control the weight of each worker.
+    """
+    if world_size == 1:
+        return metrics
+    tensor = torch.tensor(list(metrics) + [1], device='cuda', dtype=torch.float32)
+    tensor *= count
+    torch.distributed.all_reduce(tensor, op=torch.distributed.ReduceOp.SUM)
+    return (tensor[:-1] / tensor[-1]).cpu().numpy().tolist()
+def wrap(model):
+    """wrap.
+    Wrap a model with DDP if distributed training is enabled.
+    """
+    if world_size == 1:
+        return model
+    else:
+        return DistributedDataParallel(
+            model,
+            device_ids=[torch.cuda.current_device()],
+            output_device=torch.cuda.current_device())
+def barrier():
+    if world_size > 1:
+        torch.distributed.barrier()
+def loader(dataset, *args, shuffle=False, klass=DataLoader, **kwargs):
+    """loader.
+    Create a dataloader properly in case of distributed training.
+    If a gradient is going to be computed you must set `shuffle=True`.
+    :param dataset: the dataset to be parallelized
+    :param args: relevant args for the loader
+    :param shuffle: shuffle examples
+    :param klass: loader class
+    :param kwargs: relevant args
+    """
+    if world_size == 1:
+        return klass(dataset, *args, shuffle=shuffle, **kwargs)
+    if shuffle:
+        # train means we will compute backward, we use DistributedSampler
+        sampler = DistributedSampler(dataset)
+        # We ignore shuffle, DistributedSampler already shuffles
+        return klass(dataset, *args, **kwargs, sampler=sampler)
+    else:
+        # We make a manual shard, as DistributedSampler otherwise replicate some examples
+        dataset = Subset(dataset, list(range(rank, len(dataset), world_size)))
+        return klass(dataset, *args, shuffle=shuffle)

svoice/evaluate.py ADDED Viewed

	@@ -0,0 +1,212 @@

+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+# Authors: Eliya Nachmani (enk100), Yossi Adi (adiyoss), Lior Wolf and Alexandre Defossez (adefossez)
+import argparse
+from concurrent.futures import ProcessPoolExecutor
+import json
+import logging
+import sys
+import numpy as np
+from pesq import pesq
+from pystoi import stoi
+import torch
+from .models.sisnr_loss import cal_loss
+from .data.data import Validset
+from . import distrib
+from .utils import bold, deserialize_model, LogProgress
+logger = logging.getLogger(__name__)
+parser = argparse.ArgumentParser(
+    'Evaluate separation performance using MulCat blocks')
+parser.add_argument('model_path',
+                    help='Path to model file created by training')
+parser.add_argument('data_dir',
+                    help='directory including mix.json, s1.json, s2.json, ... files')
+parser.add_argument('--device', default="cuda")
+parser.add_argument('--sdr', type=int, default=0)
+parser.add_argument('--sample_rate', default=16000,
+                    type=int, help='Sample rate')
+parser.add_argument('--num_workers', type=int, default=5)
+parser.add_argument('-v', '--verbose', action='store_const', const=logging.DEBUG,
+                    default=logging.INFO, help="More loggging")
+def evaluate(args, model=None, data_loader=None, sr=None):
+    total_sisnr = 0
+    total_pesq = 0
+    total_stoi = 0
+    total_cnt = 0
+    updates = 5
+    # Load model
+    if not model:
+        pkg = torch.load(args.model_path, map_location=args.device)
+        if 'model' in pkg:
+            model = pkg['model']
+        else:
+            model = pkg
+        model = deserialize_model(model)
+        if 'best_state' in pkg:
+            model.load_state_dict(pkg['best_state'])
+    logger.debug(model)
+    model.eval()
+    model.to(args.device)
+    # Load data
+    if not data_loader:
+        dataset = Validset(args.data_dir)
+        data_loader = distrib.loader(
+            dataset, batch_size=1, num_workers=args.num_workers)
+        sr = args.sample_rate
+    pendings = []
+    with ProcessPoolExecutor(args.num_workers) as pool:
+        with torch.no_grad():
+            iterator = LogProgress(logger, data_loader, name="Eval estimates")
+            for i, data in enumerate(iterator):
+                # Get batch data
+                mixture, lengths, sources = [x.to(args.device) for x in data]
+                # Forward
+                with torch.no_grad():
+                    mixture /= mixture.max()
+                    estimate = model(mixture)[-1]
+                sisnr_loss, snr, estimate, reorder_estimate = cal_loss(
+                    sources, estimate, lengths)
+                reorder_estimate = reorder_estimate.cpu()
+                sources = sources.cpu()
+                mixture = mixture.cpu()
+                pendings.append(
+                    pool.submit(_run_metrics, sources, reorder_estimate, mixture, None,
+                                sr=sr))
+                total_cnt += sources.shape[0]
+            for pending in LogProgress(logger, pendings, updates, name="Eval metrics"):
+                sisnr_i, pesq_i, stoi_i = pending.result()
+                total_sisnr += sisnr_i
+                total_pesq += pesq_i
+                total_stoi += stoi_i
+    metrics = [total_sisnr, total_pesq, total_stoi]
+    sisnr, pesq, stoi = distrib.average(
+        [m/total_cnt for m in metrics], total_cnt)
+    logger.info(
+        bold(f'Test set performance: SISNRi={sisnr:.2f} PESQ={pesq}, STOI={stoi}.'))
+    return sisnr, pesq, stoi
+def _run_metrics(clean, estimate, mix, model, sr, pesq=False):
+    if model is not None:
+        torch.set_num_threads(1)
+        # parallel evaluation here
+        with torch.no_grad():
+            estimate = model(estimate)[-1]
+    estimate = estimate.numpy()
+    clean = clean.numpy()
+    mix = mix.numpy()
+    sisnr = cal_SISNRi(clean, estimate, mix)
+    if pesq:
+        pesq_i = cal_PESQ(clean, estimate, sr=sr)
+        stoi_i = cal_STOI(clean, estimate, sr=sr)
+    else:
+        pesq_i = 0
+        stoi_i = 0
+    return sisnr.mean(), pesq_i, stoi_i
+def cal_SISNR(ref_sig, out_sig, eps=1e-8):
+    """Calcuate Scale-Invariant Source-to-Noise Ratio (SI-SNR)
+    Args:
+        ref_sig: numpy.ndarray, [B, T]
+        out_sig: numpy.ndarray, [B, T]
+    Returns:
+        SISNR
+    """
+    assert len(ref_sig) == len(out_sig)
+    B, T = ref_sig.shape
+    ref_sig = ref_sig - np.mean(ref_sig, axis=1).reshape(B, 1)
+    out_sig = out_sig - np.mean(out_sig, axis=1).reshape(B, 1)
+    ref_energy = (np.sum(ref_sig ** 2, axis=1) + eps).reshape(B, 1)
+    proj = (np.sum(ref_sig * out_sig, axis=1).reshape(B, 1)) * \
+        ref_sig / ref_energy
+    noise = out_sig - proj
+    ratio = np.sum(proj ** 2, axis=1) / (np.sum(noise ** 2, axis=1) + eps)
+    sisnr = 10 * np.log(ratio + eps) / np.log(10.0)
+    return sisnr.mean()
+def cal_PESQ(ref_sig, out_sig, sr):
+    """Calculate PESQ.
+    Args:
+        ref_sig: numpy.ndarray, [B, C, T]
+        out_sig: numpy.ndarray, [B, C, T]
+    Returns
+        PESQ
+    """
+    B, C, T = ref_sig.shape
+    ref_sig = ref_sig.reshape(B*C, T)
+    out_sig = out_sig.reshape(B*C, T)
+    pesq_val = 0
+    for i in range(len(ref_sig)):
+        pesq_val += pesq(sr, ref_sig[i], out_sig[i], 'nb')
+    return pesq_val / (B*C)
+def cal_STOI(ref_sig, out_sig, sr):
+    """Calculate STOI.
+    Args:
+        ref_sig: numpy.ndarray, [B, C, T]
+        out_sig: numpy.ndarray, [B, C, T]
+    Returns:
+        STOI
+    """
+    B, C, T = ref_sig.shape
+    ref_sig = ref_sig.reshape(B*C, T)
+    out_sig = out_sig.reshape(B*C, T)
+    try:
+        stoi_val = 0
+        for i in range(len(ref_sig)):
+            stoi_val += stoi(ref_sig[i], out_sig[i], sr, extended=False)
+        return stoi_val / (B*C)
+    except:
+        return 0
+def cal_SISNRi(src_ref, src_est, mix):
+    """Calculate Scale-Invariant Source-to-Noise Ratio improvement (SI-SNRi)
+    Args:
+        src_ref: numpy.ndarray, [B, C, T]
+        src_est: numpy.ndarray, [B, C, T], reordered by best PIT permutation
+        mix: numpy.ndarray, [T]
+    Returns:
+        average_SISNRi
+    """
+    avg_SISNRi = 0.0
+    B, C, T = src_ref.shape
+    for c in range(C):
+        sisnr = cal_SISNR(src_ref[:, c], src_est[:, c])
+        sisnrb = cal_SISNR(src_ref[:, c], mix)
+        avg_SISNRi += (sisnr - sisnrb)
+    avg_SISNRi /= C
+    return avg_SISNRi
+def main():
+    args = parser.parse_args()
+    logging.basicConfig(stream=sys.stderr, level=args.verbose)
+    logger.debug(args)
+    sisnr, pesq, stoi = evaluate(args)
+    json.dump({'sisnr': sisnr,
+               'pesq': pesq, 'stoi': stoi}, sys.stdout)
+    sys.stdout.write('\n')
+if __name__ == '__main__':
+    main()

svoice/evaluate_auto_select.py ADDED Viewed

	@@ -0,0 +1,184 @@

+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+# Authors: Yossi Adi (adiyoss)
+import argparse
+from concurrent.futures import ProcessPoolExecutor
+import json
+import logging
+import sys
+import numpy as np
+from pesq import pesq
+from pystoi import stoi
+import torch
+from .models.sisnr_loss import cal_loss
+from .data.data import Validset
+from . import distrib
+from .utils import bold, deserialize_model, LogProgress
+from .evaluate import _run_metrics
+logger = logging.getLogger(__name__)
+parser = argparse.ArgumentParser(
+    'Evaluate model automatic selection performance')
+parser.add_argument('model_path_2spk',
+                    help='Path to 2spk model file created by training')
+parser.add_argument('model_path_3spk',
+                    help='Path to 3spk model file created by training')
+parser.add_argument('model_path_4spk',
+                    help='Path to 4spk model file created by training')
+parser.add_argument('model_path_5spk',
+                    help='Path to 5spk model file created by training')
+parser.add_argument(
+    'data_dir', help='directory including mix.json, s1.json and s2.json files')
+parser.add_argument('--device', default="cuda")
+parser.add_argument('--sample_rate', default=8000,
+                    type=int, help='Sample rate')
+parser.add_argument('--thresh', default=0.001,
+                    type=float, help='Threshold for model auto selection')
+parser.add_argument('--num_workers', type=int, default=5)
+parser.add_argument('-v', '--verbose', action='store_const', const=logging.DEBUG,
+                    default=logging.INFO, help="More loggging")
+# test pariwise matching
+def pair_wise(padded_source, estimate_source):
+    pair_wise = torch.sum(padded_source.unsqueeze(
+        1)*estimate_source.unsqueeze(2), dim=3)
+    if estimate_source.shape[1] != padded_source.shape[1]:
+        idxs = pair_wise.argmax(dim=1)
+        new_src = torch.FloatTensor(padded_source.shape)
+        for b, idx in enumerate(idxs):
+            new_src[b:, :, ] = estimate_source[b][idx]
+        padded_source_pad = padded_source
+        estimate_source_pad = new_src.cuda()
+    else:
+        padded_source_pad = padded_source
+        estimate_source_pad = estimate_source
+    return estimate_source_pad
+def evaluate_auto_select(args):
+    total_sisnr = 0
+    total_pesq = 0
+    total_stoi = 0
+    total_cnt = 0
+    updates = 5
+    models = list()
+    paths = [args.model_path_2spk, args.model_path_3spk,
+             args.model_path_4spk, args.model_path_5spk]
+    for path in paths:
+        # Load model
+        pkg = torch.load(path)
+        if 'model' in pkg:
+            model = pkg['model']
+        else:
+            model = pkg
+        model = deserialize_model(model)
+        if 'best_state' in pkg:
+            model.load_state_dict(pkg['best_state'])
+        logger.debug(model)
+        model.eval()
+        model.to(args.device)
+        models.append(model)
+    # Load data
+    dataset = Validset(args.data_dir)
+    data_loader = distrib.loader(
+        dataset, batch_size=1, num_workers=args.num_workers)
+    sr = args.sample_rate
+    y_hat = torch.zeros((4))
+    pendings = []
+    with ProcessPoolExecutor(args.num_workers) as pool:
+        with torch.no_grad():
+            iterator = LogProgress(logger, data_loader, name="Eval estimates")
+            for i, data in enumerate(iterator):
+                # Get batch data
+                mixture, lengths, sources = [x.to(args.device) for x in data]
+                estimated_sources = list()
+                reorder_estimated_sources = list()
+                for model in models:
+                    # Forward
+                    with torch.no_grad():
+                        raw_estimate = model(mixture)[-1]
+                    estimate = pair_wise(sources, raw_estimate)
+                    sisnr_loss, snr, estimate, reorder_estimate = cal_loss(
+                        sources, estimate, lengths)
+                    estimated_sources.insert(0, raw_estimate)
+                    reorder_estimated_sources.insert(0, reorder_estimate)
+                # =================== DETECT NUM. NON-ACTIVE CHANNELS ============== #
+                selected_idx = 0
+                thresh = args.thresh
+                max_spk = 5
+                mix_spk = 2
+                ground = (max_spk - mix_spk)
+                while (selected_idx <= ground):
+                    no_sils = 0
+                    vals = torch.mean(
+                        (estimated_sources[selected_idx]/torch.abs(estimated_sources[selected_idx]).max())**2, axis=2)
+                    new_selected_idx = max_spk - len(vals[vals > thresh])
+                    if new_selected_idx == selected_idx:
+                        break
+                    else:
+                        selected_idx = new_selected_idx
+                if selected_idx < 0:
+                    selected_idx = 0
+                elif selected_idx > ground:
+                    selected_idx = ground
+                y_hat[ground - selected_idx] += 1
+                reorder_estimate = reorder_estimated_sources[selected_idx].cpu(
+                )
+                sources = sources.cpu()
+                mixture = mixture.cpu()
+                pendings.append(
+                    pool.submit(_run_metrics, sources, reorder_estimate, mixture, None,
+                                sr=sr))
+                total_cnt += sources.shape[0]
+            for pending in LogProgress(logger, pendings, updates, name="Eval metrics"):
+                sisnr_i, pesq_i, stoi_i = pending.result()
+                total_sisnr += sisnr_i
+                total_pesq += pesq_i
+                total_stoi += stoi_i
+    metrics = [total_sisnr, total_pesq, total_stoi]
+    sisnr, pesq, stoi = distrib.average(
+        [m/total_cnt for m in metrics], total_cnt)
+    logger.info(bold(f'Test set performance: SISNRi={sisnr:.2f} '
+                     f'PESQ={pesq}, STOI={stoi}.'))
+    logger.info(f'Two spks prob: {y_hat[0]/(total_cnt)}')
+    logger.info(f'Three spks prob: {y_hat[1]/(total_cnt)}')
+    logger.info(f'Four spks prob: {y_hat[2]/(total_cnt)}')
+    logger.info(f'Five spks prob: {y_hat[3]/(total_cnt)}')
+    return sisnr, pesq, stoi
+def main():
+    args = parser.parse_args()
+    logging.basicConfig(stream=sys.stderr, level=args.verbose)
+    logger.debug(args)
+    sisnr, pesq, stoi = evaluate_auto_select(args)
+    json.dump({'sisnr': sisnr,
+               'pesq': pesq, 'stoi': stoi}, sys.stdout)
+    sys.stdout.write('\n')
+if __name__ == '__main__':
+    main()

svoice/executor.py ADDED Viewed

	@@ -0,0 +1,85 @@

+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+# Author: Alexandre Defossez (adefossez)
+"""
+Start multiple process locally for DDP.
+"""
+import logging
+import subprocess as sp
+import sys
+from hydra import utils
+logger = logging.getLogger(__name__)
+class ChildrenManager:
+    def __init__(self):
+        self.children = []
+        self.failed = False
+    def add(self, child):
+        child.rank = len(self.children)
+        self.children.append(child)
+    def __enter__(self):
+        return self
+    def __exit__(self, exc_type, exc_value, traceback):
+        if exc_value is not None:
+            logger.error(
+                "An exception happened while starting workers %r", exc_value)
+            self.failed = True
+        try:
+            while self.children and not self.failed:
+                for child in list(self.children):
+                    try:
+                        exitcode = child.wait(0.1)
+                    except sp.TimeoutExpired:
+                        continue
+                    else:
+                        self.children.remove(child)
+                        if exitcode:
+                            logger.error(
+                                f"Worker {child.rank} died, killing all workers")
+                            self.failed = True
+        except KeyboardInterrupt:
+            logger.error(
+                "Received keyboard interrupt, trying to kill all workers.")
+            self.failed = True
+        for child in self.children:
+            child.terminate()
+        if not self.failed:
+            logger.info("All workers completed successfully")
+def start_ddp_workers():
+    import torch as th
+    world_size = th.cuda.device_count()
+    if not world_size:
+        logger.error(
+            "DDP is only available on GPU. Make sure GPUs are properly configured with cuda.")
+        sys.exit(1)
+    logger.info(f"Starting {world_size} worker processes for DDP.")
+    with ChildrenManager() as manager:
+        for rank in range(world_size):
+            kwargs = {}
+            argv = list(sys.argv)
+            argv += [f"world_size={world_size}", f"rank={rank}"]
+            if rank > 0:
+                kwargs['stdin'] = sp.DEVNULL
+                kwargs['stdout'] = sp.DEVNULL
+                kwargs['stderr'] = sp.DEVNULL
+                log = utils.HydraConfig().cfg.hydra.job_logging.handlers.file.filename
+                log += f".{rank}"
+                argv.append("hydra.job_logging.handlers.file.filename=" + log)
+            manager.add(sp.Popen([sys.executable] + argv,
+                                 cwd=utils.get_original_cwd(), **kwargs))
+    sys.exit(int(manager.failed))

svoice/models/__init__.py ADDED Viewed

	@@ -0,0 +1,5 @@

+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.

svoice/models/sisnr_loss.py ADDED Viewed

	@@ -0,0 +1,124 @@

+# The following piece of code was adapted from https://github.com/kaituoxu/Conv-TasNet
+# released under the MIT License.
+# Author: Kaituo XU
+# Created on 2018/12
+from itertools import permutations
+import torch
+import torch.nn.functional as F
+EPS = 1e-8
+def cal_loss(source, estimate_source, source_lengths):
+    """
+    Args:
+        source: [B, C, T], B is batch size
+        estimate_source: [B, C, T]
+        source_lengths: [B]
+    """
+    max_snr, perms, max_snr_idx, snr_set = cal_si_snr_with_pit(source,
+                                                               estimate_source,
+                                                               source_lengths)
+    B, C, T = estimate_source.shape
+    loss = 0 - torch.mean(max_snr)
+    reorder_estimate_source = reorder_source(
+        estimate_source, perms, max_snr_idx)
+    return loss, max_snr, estimate_source, reorder_estimate_source
+def cal_si_snr_with_pit(source, estimate_source, source_lengths):
+    """Calculate SI-SNR with PIT training.
+    Args:
+        source: [B, C, T], B is batch size
+        estimate_source: [B, C, T]
+        source_lengths: [B], each item is between [0, T]
+    """
+    assert source.size() == estimate_source.size()
+    B, C, T = source.size()
+    # mask padding position along T
+    mask = get_mask(source, source_lengths)
+    estimate_source *= mask
+    # Step 1. Zero-mean norm
+    num_samples = source_lengths.view(-1, 1, 1).float()  # [B, 1, 1]
+    mean_target = torch.sum(source, dim=2, keepdim=True) / num_samples
+    mean_estimate = torch.sum(estimate_source, dim=2,
+                              keepdim=True) / num_samples
+    zero_mean_target = source - mean_target
+    zero_mean_estimate = estimate_source - mean_estimate
+    # mask padding position along T
+    zero_mean_target *= mask
+    zero_mean_estimate *= mask
+    # Step 2. SI-SNR with PIT
+    # reshape to use broadcast
+    s_target = torch.unsqueeze(zero_mean_target, dim=1)  # [B, 1, C, T]
+    s_estimate = torch.unsqueeze(zero_mean_estimate, dim=2)  # [B, C, 1, T]
+    # s_target = <s', s>s / ||s||^2
+    pair_wise_dot = torch.sum(s_estimate * s_target,
+                              dim=3, keepdim=True)  # [B, C, C, 1]
+    s_target_energy = torch.sum(
+        s_target ** 2, dim=3, keepdim=True) + EPS  # [B, 1, C, 1]
+    pair_wise_proj = pair_wise_dot * s_target / s_target_energy  # [B, C, C, T]
+    # e_noise = s' - s_target
+    e_noise = s_estimate - pair_wise_proj  # [B, C, C, T]
+    # SI-SNR = 10 * log_10(||s_target||^2 / ||e_noise||^2)
+    pair_wise_si_snr = torch.sum(
+        pair_wise_proj ** 2, dim=3) / (torch.sum(e_noise ** 2, dim=3) + EPS)
+    pair_wise_si_snr = 10 * torch.log10(pair_wise_si_snr + EPS)  # [B, C, C]
+    pair_wise_si_snr = torch.transpose(pair_wise_si_snr, 1, 2)
+    # Get max_snr of each utterance
+    # permutations, [C!, C]
+    perms = source.new_tensor(list(permutations(range(C))), dtype=torch.long)
+    # one-hot, [C!, C, C]
+    index = torch.unsqueeze(perms, 2)
+    perms_one_hot = source.new_zeros((*perms.size(), C)).scatter_(2, index, 1)
+    # [B, C!] <- [B, C, C] einsum [C!, C, C], SI-SNR sum of each permutation
+    snr_set = torch.einsum('bij,pij->bp', [pair_wise_si_snr, perms_one_hot])
+    max_snr_idx = torch.argmax(snr_set, dim=1)  # [B]
+    #  max_snr = torch.gather(snr_set, 1, max_snr_idx.view(-1, 1))  # [B, 1]
+    max_snr, _ = torch.max(snr_set, dim=1, keepdim=True)
+    max_snr /= C
+    return max_snr, perms, max_snr_idx, snr_set / C
+def reorder_source(source, perms, max_snr_idx):
+    """
+    Args:
+        source: [B, C, T]
+        perms: [C!, C], permutations
+        max_snr_idx: [B], each item is between [0, C!)
+    Returns:
+        reorder_source: [B, C, T]
+    """
+    B, C, *_ = source.size()
+    # [B, C], permutation whose SI-SNR is max of each utterance
+    # for each utterance, reorder estimate source according this permutation
+    max_snr_perm = torch.index_select(perms, dim=0, index=max_snr_idx)
+    # print('max_snr_perm', max_snr_perm)
+    # maybe use torch.gather()/index_select()/scatter() to impl this?
+    reorder_source = torch.zeros_like(source)
+    for b in range(B):
+        for c in range(C):
+            reorder_source[b, c] = source[b, max_snr_perm[b][c]]
+    return reorder_source
+def get_mask(source, source_lengths):
+    """
+    Args:
+        source: [B, C, T]
+        source_lengths: [B]
+    Returns:
+        mask: [B, 1, T]
+    """
+    B, _, T = source.size()
+    mask = source.new_ones((B, 1, T))
+    for i in range(B):
+        mask[i, :, source_lengths[i]:] = 0
+    return mask

svoice/models/swave.py ADDED Viewed

	@@ -0,0 +1,294 @@

+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+# Authors: Eliya Nachmani (enk100), Yossi Adi (adiyoss), Lior Wolf
+import sys
+import numpy as np
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torch.autograd import Variable
+from ..utils import overlap_and_add
+from ..utils import capture_init
+class MulCatBlock(nn.Module):
+    def __init__(self, input_size, hidden_size, dropout=0, bidirectional=False):
+        super(MulCatBlock, self).__init__()
+        self.input_size = input_size
+        self.hidden_size = hidden_size
+        self.num_direction = int(bidirectional) + 1
+        self.rnn = nn.LSTM(input_size, hidden_size, 1, dropout=dropout,
+                           batch_first=True, bidirectional=bidirectional)
+        self.rnn_proj = nn.Linear(hidden_size * self.num_direction, input_size)
+        self.gate_rnn = nn.LSTM(input_size, hidden_size, num_layers=1,
+                                batch_first=True, dropout=dropout, bidirectional=bidirectional)
+        self.gate_rnn_proj = nn.Linear(
+            hidden_size * self.num_direction, input_size)
+        self.block_projection = nn.Linear(input_size * 2, input_size)
+    def forward(self, input):
+        output = input
+        # run rnn module
+        rnn_output, _ = self.rnn(output)
+        rnn_output = self.rnn_proj(rnn_output.contiguous(
+        ).view(-1, rnn_output.shape[2])).view(output.shape).contiguous()
+        # run gate rnn module
+        gate_rnn_output, _ = self.gate_rnn(output)
+        gate_rnn_output = self.gate_rnn_proj(gate_rnn_output.contiguous(
+        ).view(-1, gate_rnn_output.shape[2])).view(output.shape).contiguous()
+        # apply gated rnn
+        gated_output = torch.mul(rnn_output, gate_rnn_output)
+        gated_output = torch.cat([gated_output, output], 2)
+        gated_output = self.block_projection(
+            gated_output.contiguous().view(-1, gated_output.shape[2])).view(output.shape)
+        return gated_output
+class ByPass(nn.Module):
+    def __init__(self):
+        super(ByPass, self).__init__()
+    def forward(self, input):
+        return input
+class DPMulCat(nn.Module):
+    def __init__(self, input_size, hidden_size, output_size, num_spk,
+                 dropout=0, num_layers=1, bidirectional=True, input_normalize=False):
+        super(DPMulCat, self).__init__()
+        self.input_size = input_size
+        self.output_size = output_size
+        self.hidden_size = hidden_size
+        self.in_norm = input_normalize
+        self.num_layers = num_layers
+        self.rows_grnn = nn.ModuleList([])
+        self.cols_grnn = nn.ModuleList([])
+        self.rows_normalization = nn.ModuleList([])
+        self.cols_normalization = nn.ModuleList([])
+        # create the dual path pipeline
+        for i in range(num_layers):
+            self.rows_grnn.append(MulCatBlock(
+                input_size, hidden_size, dropout, bidirectional=bidirectional))
+            self.cols_grnn.append(MulCatBlock(
+                input_size, hidden_size, dropout, bidirectional=bidirectional))
+            if self.in_norm:
+                self.rows_normalization.append(
+                    nn.GroupNorm(1, input_size, eps=1e-8))
+                self.cols_normalization.append(
+                    nn.GroupNorm(1, input_size, eps=1e-8))
+            else:
+                # used to disable normalization
+                self.rows_normalization.append(ByPass())
+                self.cols_normalization.append(ByPass())
+        self.output = nn.Sequential(
+            nn.PReLU(), nn.Conv2d(input_size, output_size * num_spk, 1))
+    def forward(self, input):
+        batch_size, _, d1, d2 = input.shape
+        output = input
+        output_all = []
+        for i in range(self.num_layers):
+            row_input = output.permute(0, 3, 2, 1).contiguous().view(
+                batch_size * d2, d1, -1)
+            row_output = self.rows_grnn[i](row_input)
+            row_output = row_output.view(
+                batch_size, d2, d1, -1).permute(0, 3, 2, 1).contiguous()
+            row_output = self.rows_normalization[i](row_output)
+            # apply a skip connection
+            if self.training:
+                output = output + row_output
+            else:
+                output += row_output
+            col_input = output.permute(0, 2, 3, 1).contiguous().view(
+                batch_size * d1, d2, -1)
+            col_output = self.cols_grnn[i](col_input)
+            col_output = col_output.view(
+                batch_size, d1, d2, -1).permute(0, 3, 1, 2).contiguous()
+            col_output = self.cols_normalization[i](col_output).contiguous()
+            # apply a skip connection
+            if self.training:
+                output = output + col_output
+            else:
+                output += col_output
+            output_i = self.output(output)
+            if self.training or i == (self.num_layers - 1):
+                output_all.append(output_i)
+        return output_all
+class Separator(nn.Module):
+    def __init__(self, input_dim, feature_dim, hidden_dim, output_dim, num_spk=2,
+                 layer=4, segment_size=100, input_normalize=False, bidirectional=True):
+        super(Separator, self).__init__()
+        self.input_dim = input_dim
+        self.feature_dim = feature_dim
+        self.hidden_dim = hidden_dim
+        self.output_dim = output_dim
+        self.layer = layer
+        self.segment_size = segment_size
+        self.num_spk = num_spk
+        self.input_normalize = input_normalize
+        self.rnn_model = DPMulCat(self.feature_dim, self.hidden_dim,
+                                  self.feature_dim, self.num_spk, num_layers=layer, bidirectional=bidirectional, input_normalize=input_normalize)
+    # ======================================= #
+    # The following code block was borrowed and modified from https://github.com/yluo42/TAC
+    # ================ BEGIN ================ #
+    def pad_segment(self, input, segment_size):
+        # input is the features: (B, N, T)
+        batch_size, dim, seq_len = input.shape
+        segment_stride = segment_size // 2
+        rest = segment_size - (segment_stride + seq_len %
+                               segment_size) % segment_size
+        if rest > 0:
+            pad = Variable(torch.zeros(batch_size, dim, rest)
+                           ).type(input.type())
+            input = torch.cat([input, pad], 2)
+        pad_aux = Variable(torch.zeros(
+            batch_size, dim, segment_stride)).type(input.type())
+        input = torch.cat([pad_aux, input, pad_aux], 2)
+        return input, rest
+    def create_chuncks(self, input, segment_size):
+        # split the feature into chunks of segment size
+        # input is the features: (B, N, T)
+        input, rest = self.pad_segment(input, segment_size)
+        batch_size, dim, seq_len = input.shape
+        segment_stride = segment_size // 2
+        segments1 = input[:, :, :-segment_stride].contiguous().view(batch_size,
+                                                                    dim, -1, segment_size)
+        segments2 = input[:, :, segment_stride:].contiguous().view(
+            batch_size, dim, -1, segment_size)
+        segments = torch.cat([segments1, segments2], 3).view(
+            batch_size, dim, -1, segment_size).transpose(2, 3)
+        return segments.contiguous(), rest
+    def merge_chuncks(self, input, rest):
+        # merge the splitted features into full utterance
+        # input is the features: (B, N, L, K)
+        batch_size, dim, segment_size, _ = input.shape
+        segment_stride = segment_size // 2
+        input = input.transpose(2, 3).contiguous().view(
+            batch_size, dim, -1, segment_size*2)  # B, N, K, L
+        input1 = input[:, :, :, :segment_size].contiguous().view(
+            batch_size, dim, -1)[:, :, segment_stride:]
+        input2 = input[:, :, :, segment_size:].contiguous().view(
+            batch_size, dim, -1)[:, :, :-segment_stride]
+        output = input1 + input2
+        if rest > 0:
+            output = output[:, :, :-rest]
+        return output.contiguous()  # B, N, T
+    # ================= END ================= #
+    def forward(self, input):
+        # create chunks
+        enc_segments, enc_rest = self.create_chuncks(
+            input, self.segment_size)
+        # separate
+        output_all = self.rnn_model(enc_segments)
+        # merge back audio files
+        output_all_wav = []
+        for ii in range(len(output_all)):
+            output_ii = self.merge_chuncks(
+                output_all[ii], enc_rest)
+            output_all_wav.append(output_ii)
+        return output_all_wav
+class SWave(nn.Module):
+    @capture_init
+    def __init__(self, N, L, H, R, C, sr, segment, input_normalize):
+        super(SWave, self).__init__()
+        # hyper-parameter
+        self.N, self.L, self.H, self.R, self.C, self.sr, self.segment = N, L, H, R, C, sr, segment
+        self.input_normalize = input_normalize
+        self.context_len = 2 * self.sr / 1000
+        self.context = int(self.sr * self.context_len / 1000)
+        self.layer = self.R
+        self.filter_dim = self.context * 2 + 1
+        self.num_spk = self.C
+        # similar to dprnn paper, setting chancksize to sqrt(2*L)
+        self.segment_size = int(
+            np.sqrt(2 * self.sr * self.segment / (self.L/2)))
+        # model sub-networks
+        self.encoder = Encoder(L, N)
+        self.decoder = Decoder(L)
+        self.separator = Separator(self.filter_dim + self.N, self.N, self.H,
+                                   self.filter_dim, self.num_spk, self.layer, self.segment_size, self.input_normalize)
+        # init
+        for p in self.parameters():
+            if p.dim() > 1:
+                nn.init.xavier_normal_(p)
+    def forward(self, mixture):
+        mixture_w = self.encoder(mixture)
+        output_all = self.separator(mixture_w)
+        # fix time dimension, might change due to convolution operations
+        T_mix = mixture.size(-1)
+        # generate wav after each RNN block and optimize the loss
+        outputs = []
+        for ii in range(len(output_all)):
+            output_ii = output_all[ii].view(
+                mixture.shape[0], self.C, self.N, mixture_w.shape[2])
+            output_ii = self.decoder(output_ii)
+            T_est = output_ii.size(-1)
+            output_ii = F.pad(output_ii, (0, T_mix - T_est))
+            outputs.append(output_ii)
+        return torch.stack(outputs)
+class Encoder(nn.Module):
+    def __init__(self, L, N):
+        super(Encoder, self).__init__()
+        self.L, self.N = L, N
+        # setting 50% overlap
+        self.conv = nn.Conv1d(
+            1, N, kernel_size=L, stride=L // 2, bias=False)
+    def forward(self, mixture):
+        mixture = torch.unsqueeze(mixture, 1)
+        mixture_w = F.relu(self.conv(mixture))
+        return mixture_w
+class Decoder(nn.Module):
+    def __init__(self, L):
+        super(Decoder, self).__init__()
+        self.L = L
+    def forward(self, est_source):
+        est_source = torch.transpose(est_source, 2, 3)
+        est_source = nn.AvgPool2d((1, self.L))(est_source)
+        est_source = overlap_and_add(est_source, self.L//2)
+        return est_source

svoice/separate.py ADDED Viewed

	@@ -0,0 +1,174 @@

+import argparse
+import logging
+import os
+import sys
+import librosa
+import torch
+import tqdm
+from .data.data import EvalDataLoader, EvalDataset
+from . import distrib
+from .utils import remove_pad
+from .utils import bold, deserialize_model, LogProgress
+logger = logging.getLogger(__name__)
+def load_model():
+    global device
+    global model
+    global pkg
+    print("Loading svoice model if available...")
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    pkg = torch.load('checkpoint.th')
+    if 'model' in pkg:
+        model = pkg['model']
+    else:
+        model = pkg
+    model = deserialize_model(model)
+    logger.debug(model)
+    model.eval()
+    model.to(device)
+    print("svoice model loaded.")
+    print("Device: {}".format(device))
+parser = argparse.ArgumentParser("Speech separation using MulCat blocks")
+parser.add_argument("model_path", type=str, help="Model name")
+parser.add_argument("out_dir", type=str, default="exp/result",
+                    help="Directory putting enhanced wav files")
+parser.add_argument("--mix_dir", type=str, default=None,
+                    help="Directory including mix wav files")
+parser.add_argument("--mix_json", type=str, default=None,
+                    help="Json file including mix wav files")
+parser.add_argument('--device', default="cuda")
+parser.add_argument("--sample_rate", default=8000,
+                    type=int, help="Sample rate")
+parser.add_argument("--batch_size", default=1, type=int, help="Batch size")
+parser.add_argument('-v', '--verbose', action='store_const', const=logging.DEBUG,
+                    default=logging.INFO, help="More loggging")
+def save_wavs(estimate_source, mix_sig, lengths, filenames, out_dir, sr=16000):
+    # Remove padding and flat
+    flat_estimate = remove_pad(estimate_source, lengths)
+    mix_sig = remove_pad(mix_sig, lengths)
+    # Write result
+    for i, filename in enumerate(filenames):
+        filename = os.path.join(
+            out_dir, os.path.basename(filename).strip(".wav"))
+        write(mix_sig[i], filename + ".wav", sr=sr)
+        C = flat_estimate[i].shape[0]
+        # future support for wave playing
+        for c in range(C):
+            write(flat_estimate[i][c], filename + f"_s{c + 1}.wav", sr=sr)
+def write(inputs, filename, sr=8000):
+    librosa.output.write_wav(filename, inputs, sr, norm=True)
+def separate_demo(mix_dir='mix/', batch_size=1, sample_rate=16000):
+    mix_dir, mix_json = mix_dir, None
+    out_dir = 'separated'
+    # Load data
+    eval_dataset = EvalDataset(
+        mix_dir,
+        mix_json,
+        batch_size=batch_size,
+        sample_rate=sample_rate,
+    )
+    eval_loader = distrib.loader(
+        eval_dataset, batch_size=1, klass=EvalDataLoader)
+    if distrib.rank == 0:
+        os.makedirs(out_dir, exist_ok=True)
+    distrib.barrier()
+    with torch.no_grad():
+        for i, data in enumerate(tqdm.tqdm(eval_loader, ncols=120)):
+            # Get batch data
+            mixture, lengths, filenames = data
+            mixture = mixture.to(device)
+            lengths = lengths.to(device)
+            # Forward
+            estimate_sources = model(mixture)[-1]
+            # save wav files
+            save_wavs(estimate_sources, mixture, lengths,
+                      filenames, out_dir, sr=sample_rate)
+    separated_files = [os.path.join(out_dir, f) for f in os.listdir(out_dir)]
+    separated_files = [os.path.abspath(f) for f in separated_files]
+    separated_files = [f for f in separated_files if not f.endswith('original.wav')]
+    return separated_files
+def get_mix_paths(args):
+    mix_dir = None
+    mix_json = None
+    # fix mix dir
+    try:
+        if args.dset.mix_dir:
+           mix_dir = args.dset.mix_dir
+    except:
+        mix_dir = args.mix_dir
+    # fix mix json
+    try:
+        if args.dset.mix_json:
+            mix_json = args.dset.mix_json
+    except:
+        mix_json = args.mix_json
+    return mix_dir, mix_json
+def separate(args, model=None, local_out_dir=None):
+    mix_dir, mix_json = get_mix_paths(args)
+    if not mix_json and not mix_dir:
+        logger.error("Must provide mix_dir or mix_json! "
+                     "When providing mix_dir, mix_json is ignored.")
+    # Load model
+    if not model:
+        # model
+        pkg = torch.load(args.model_path)
+        if 'model' in pkg:
+            model = pkg['model']
+        else:
+            model = pkg
+        model = deserialize_model(model)
+        logger.debug(model)
+    model.eval()
+    model.to(args.device)
+    if local_out_dir:
+        out_dir = local_out_dir
+    else:
+        out_dir = args.out_dir
+    # Load data
+    eval_dataset = EvalDataset(
+        mix_dir,
+        mix_json,
+        batch_size=args.batch_size,
+        sample_rate=args.sample_rate,
+    )
+    eval_loader = distrib.loader(
+        eval_dataset, batch_size=1, klass=EvalDataLoader)
+    if distrib.rank == 0:
+        os.makedirs(out_dir, exist_ok=True)
+    distrib.barrier()
+    with torch.no_grad():
+        for i, data in enumerate(tqdm.tqdm(eval_loader, ncols=120)):
+            # Get batch data
+            mixture, lengths, filenames = data
+            mixture = mixture.to(args.device)
+            lengths = lengths.to(args.device)
+            # Forward
+            estimate_sources = model(mixture)[-1]
+            # save wav files
+            save_wavs(estimate_sources, mixture, lengths,
+                      filenames, out_dir, sr=args.sample_rate)
+if __name__ == "__main__":
+    args = parser.parse_args()
+    logging.basicConfig(stream=sys.stderr, level=args.verbose)
+    logger.debug(args)
+    separate(args, local_out_dir=args.out_dir)

svoice/solver.py ADDED Viewed

	@@ -0,0 +1,227 @@

+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+# Author: Eliya Nachmani (enk100), Yossi Adi (adiyoss), Lior Wolf
+import json
+import logging
+from pathlib import Path
+import os
+import time
+import numpy as np
+import torch
+import torch.nn.functional as F
+from torch.optim.lr_scheduler import ReduceLROnPlateau, StepLR
+from . import distrib
+from .separate import separate
+from .evaluate import evaluate
+from .models.sisnr_loss import cal_loss
+from .models.swave import SWave
+from .utils import bold, copy_state, pull_metric, serialize_model, swap_state, LogProgress
+logger = logging.getLogger(__name__)
+class Solver(object):
+    def __init__(self, data, model, optimizer, args):
+        self.tr_loader = data['tr_loader']
+        self.cv_loader = data['cv_loader']
+        self.tt_loader = data['tt_loader']
+        self.model = model
+        self.dmodel = distrib.wrap(model)
+        self.optimizer = optimizer
+        if args.lr_sched == 'step':
+            self.sched = StepLR(
+                self.optimizer, step_size=args.step.step_size, gamma=args.step.gamma)
+        elif args.lr_sched == 'plateau':
+            self.sched = ReduceLROnPlateau(
+                self.optimizer, factor=args.plateau.factor, patience=args.plateau.patience)
+        else:
+            self.sched = None
+        # Training config
+        self.device = args.device
+        self.epochs = args.epochs
+        self.max_norm = args.max_norm
+        # Checkpoints
+        self.continue_from = args.continue_from
+        self.eval_every = args.eval_every
+        self.checkpoint = Path(
+            args.checkpoint_file) if args.checkpoint else None
+        if self.checkpoint:
+            logger.debug("Checkpoint will be saved to %s",
+                         self.checkpoint.resolve())
+        self.history_file = args.history_file
+        self.best_state = None
+        self.restart = args.restart
+        # keep track of losses
+        self.history = []
+        # Where to save samples
+        self.samples_dir = args.samples_dir
+        # logging
+        self.num_prints = args.num_prints
+        # for seperation tests
+        self.args = args
+        self._reset()
+    def _serialize(self, path):
+        package = {}
+        package['model'] = serialize_model(self.model)
+        package['optimizer'] = self.optimizer.state_dict()
+        package['history'] = self.history
+        package['best_state'] = self.best_state
+        package['args'] = self.args
+        torch.save(package, path)
+    def _reset(self):
+        load_from = None
+        # Reset
+        if self.checkpoint and self.checkpoint.exists() and not self.restart:
+            load_from = self.checkpoint
+        elif self.continue_from:
+            load_from = self.continue_from
+        if load_from:
+            logger.info(f'Loading checkpoint model: {load_from}')
+            package = torch.load(load_from, 'cpu')
+            if load_from == self.continue_from and self.args.continue_best:
+                self.model.load_state_dict(package['best_state'])
+            else:
+                self.model.load_state_dict(package['model']['state'])
+            if 'optimizer' in package and not self.args.continue_best:
+                self.optimizer.load_state_dict(package['optimizer'])
+            self.history = package['history']
+            self.best_state = package['best_state']
+    def train(self):
+        # Optimizing the model
+        if self.history:
+            logger.info("Replaying metrics from previous run")
+        for epoch, metrics in enumerate(self.history):
+            info = " ".join(f"{k}={v:.5f}" for k, v in metrics.items())
+            logger.info(f"Epoch {epoch}: {info}")
+        for epoch in range(len(self.history), self.epochs):
+            # Train one epoch
+            self.model.train()  # Turn on BatchNorm & Dropout
+            start = time.time()
+            logger.info('-' * 70)
+            logger.info("Training...")
+            train_loss = self._run_one_epoch(epoch)
+            logger.info(bold(f'Train Summary | End of Epoch {epoch + 1} | '
+                             f'Time {time.time() - start:.2f}s | Train Loss {train_loss:.5f}'))
+            # Cross validation
+            logger.info('-' * 70)
+            logger.info('Cross validation...')
+            self.model.eval()  # Turn off Batchnorm & Dropout
+            with torch.no_grad():
+                valid_loss = self._run_one_epoch(epoch, cross_valid=True)
+            logger.info(bold(f'Valid Summary | End of Epoch {epoch + 1} | '
+                             f'Time {time.time() - start:.2f}s | Valid Loss {valid_loss:.5f}'))
+            # learning rate scheduling
+            if self.sched:
+                if self.args.lr_sched == 'plateau':
+                    self.sched.step(valid_loss)
+                else:
+                    self.sched.step()
+                logger.info(
+                    f'Learning rate adjusted: {self.optimizer.state_dict()["param_groups"][0]["lr"]:.5f}')
+            best_loss = min(pull_metric(self.history, 'valid') + [valid_loss])
+            metrics = {'train': train_loss,
+                       'valid': valid_loss, 'best': best_loss}
+            # Save the best model
+            if valid_loss == best_loss or self.args.keep_last:
+                logger.info(bold('New best valid loss %.4f'), valid_loss)
+                self.best_state = copy_state(self.model.state_dict())
+            # evaluate and separate samples every 'eval_every' argument number of epochs
+            # also evaluate on last epoch
+            if (epoch + 1) % self.eval_every == 0 or epoch == self.epochs - 1:
+                # Evaluate on the testset
+                logger.info('-' * 70)
+                logger.info('Evaluating on the test set...')
+                # We switch to the best known model for testing
+                with swap_state(self.model, self.best_state):
+                    sisnr, pesq, stoi = evaluate(
+                        self.args, self.model, self.tt_loader, self.args.sample_rate)
+                metrics.update({'sisnr': sisnr, 'pesq': pesq, 'stoi': stoi})
+                # separate some samples
+                logger.info('Separate and save samples...')
+                separate(self.args, self.model, self.samples_dir)
+            self.history.append(metrics)
+            info = " | ".join(
+                f"{k.capitalize()} {v:.5f}" for k, v in metrics.items())
+            logger.info('-' * 70)
+            logger.info(bold(f"Overall Summary | Epoch {epoch + 1} | {info}"))
+            if distrib.rank == 0:
+                json.dump(self.history, open(self.history_file, "w"), indent=2)
+                # Save model each epoch
+                if self.checkpoint:
+                    self._serialize(self.checkpoint)
+                    logger.debug("Checkpoint saved to %s",
+                                 self.checkpoint.resolve())
+    def _run_one_epoch(self, epoch, cross_valid=False):
+        total_loss = 0
+        data_loader = self.tr_loader if not cross_valid else self.cv_loader
+        # get a different order for distributed training, otherwise this will get ignored
+        data_loader.epoch = epoch
+        label = ["Train", "Valid"][cross_valid]
+        name = label + f" | Epoch {epoch + 1}"
+        logprog = LogProgress(logger, data_loader,
+                              updates=self.num_prints, name=name)
+        for i, data in enumerate(logprog):
+            mixture, lengths, sources = [x.to(self.device) for x in data]
+            estimate_source = self.dmodel(mixture)
+            # only eval last layer
+            if cross_valid:
+                estimate_source = estimate_source[-1:]
+            loss = 0
+            cnt = len(estimate_source)
+            # apply a loss function after each layer
+            with torch.autograd.set_detect_anomaly(True):
+                for c_idx, est_src in enumerate(estimate_source):
+                    coeff = ((c_idx+1)*(1/cnt))
+                    loss_i = 0
+                    # SI-SNR loss
+                    sisnr_loss, snr, est_src, reorder_est_src = cal_loss(
+                        sources, estimate_source[c_idx], lengths)
+                    loss += (coeff * sisnr_loss)
+                loss /= len(estimate_source)
+                if not cross_valid:
+                    # optimize model in training mode
+                    self.optimizer.zero_grad()
+                    loss.backward()
+                    torch.nn.utils.clip_grad_norm_(self.model.parameters(),
+                                                   self.max_norm)
+                    self.optimizer.step()
+            total_loss += loss.item()
+            logprog.update(loss=format(total_loss / (i + 1), ".5f"))
+            # Just in case, clear some memory
+            del loss, estimate_source
+        return distrib.average([total_loss / (i + 1)], i + 1)[0]

svoice/utils.py ADDED Viewed

	@@ -0,0 +1,241 @@

+# Copyright (c) Facebook, Inc. and its affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+# Authors: Yossi Adi (adiyoss) and Alexandre Defossez (adefossez)
+import functools
+import logging
+from contextlib import contextmanager
+import inspect
+import os
+import time
+import math
+import torch
+logger = logging.getLogger(__name__)
+def capture_init(init):
+    """
+    Decorate `__init__` with this, and you can then
+    recover the *args and **kwargs passed to it in `self._init_args_kwargs`
+    """
+    @functools.wraps(init)
+    def __init__(self, *args, **kwargs):
+        self._init_args_kwargs = (args, kwargs)
+        init(self, *args, **kwargs)
+    return __init__
+def deserialize_model(package, strict=False):
+    klass = package['class']
+    if strict:
+        model = klass(*package['args'], **package['kwargs'])
+    else:
+        sig = inspect.signature(klass)
+        kw = package['kwargs']
+        for key in list(kw):
+            if key not in sig.parameters:
+                logger.warning("Dropping inexistant parameter %s", key)
+                del kw[key]
+        model = klass(*package['args'], **kw)
+    model.load_state_dict(package['state'])
+    return model
+def copy_state(state):
+    return {k: v.cpu().clone() for k, v in state.items()}
+def serialize_model(model):
+    args, kwargs = model._init_args_kwargs
+    state = copy_state(model.state_dict())
+    return {"class": model.__class__, "args": args, "kwargs": kwargs, "state": state}
+@contextmanager
+def swap_state(model, state):
+    old_state = copy_state(model.state_dict())
+    model.load_state_dict(state)
+    try:
+        yield
+    finally:
+        model.load_state_dict(old_state)
+@contextmanager
+def swap_cwd(cwd):
+    old_cwd = os.getcwd()
+    os.chdir(cwd)
+    try:
+        yield
+    finally:
+        os.chdir(old_cwd)
+def pull_metric(history, name):
+    out = []
+    for metrics in history:
+        if name in metrics:
+            out.append(metrics[name])
+    return out
+class LogProgress:
+    """
+    Sort of like tqdm but using log lines and not as real time.
+    """
+    def __init__(self, logger, iterable, updates=5, total=None,
+                 name="LogProgress", level=logging.INFO):
+        self.iterable = iterable
+        self.total = total or len(iterable)
+        self.updates = updates
+        self.name = name
+        self.logger = logger
+        self.level = level
+    def update(self, **infos):
+        self._infos = infos
+    def __iter__(self):
+        self._iterator = iter(self.iterable)
+        self._index = -1
+        self._infos = {}
+        self._begin = time.time()
+        return self
+    def __next__(self):
+        self._index += 1
+        try:
+            value = next(self._iterator)
+        except StopIteration:
+            raise
+        else:
+            return value
+        finally:
+            log_every = max(1, self.total // self.updates)
+            # logging is delayed by 1 it, in order to have the metrics from update
+            if self._index >= 1 and self._index % log_every == 0:
+                self._log()
+    def _log(self):
+        self._speed = (1 + self._index) / (time.time() - self._begin)
+        infos = " | ".join(f"{k.capitalize()} {v}" for k,
+                           v in self._infos.items())
+        if self._speed < 1e-4:
+            speed = "oo sec/it"
+        elif self._speed < 0.1:
+            speed = f"{1/self._speed:.1f} sec/it"
+        else:
+            speed = f"{self._speed:.1f} it/sec"
+        out = f"{self.name} | {self._index}/{self.total} | {speed}"
+        if infos:
+            out += " | " + infos
+        self.logger.log(self.level, out)
+def colorize(text, color):
+    code = f"\033[{color}m"
+    restore = f"\033[0m"
+    return "".join([code, text, restore])
+def bold(text):
+    return colorize(text, "1")
+def calculate_grad_norm(model):
+    total_norm = 0.0
+    is_first = True
+    for p in model.parameters():
+        param_norm = p.data.grad.flatten()
+        if is_first:
+            total_norm = param_norm
+            is_first = False
+        else:
+            total_norm = torch.cat((total_norm.unsqueeze(
+                1), p.data.grad.flatten().unsqueeze(1)), dim=0).squeeze(1)
+    return total_norm.norm(2) ** (1. / 2)
+def calculate_weight_norm(model):
+    total_norm = 0.0
+    is_first = True
+    for p in model.parameters():
+        param_norm = p.data.flatten()
+        if is_first:
+            total_norm = param_norm
+            is_first = False
+        else:
+            total_norm = torch.cat((total_norm.unsqueeze(
+                1), p.data.flatten().unsqueeze(1)), dim=0).squeeze(1)
+    return total_norm.norm(2) ** (1. / 2)
+def remove_pad(inputs, inputs_lengths):
+    """
+    Args:
+        inputs: torch.Tensor, [B, C, T] or [B, T], B is batch size
+        inputs_lengths: torch.Tensor, [B]
+    Returns:
+        results: a list containing B items, each item is [C, T], T varies
+    """
+    results = []
+    dim = inputs.dim()
+    if dim == 3:
+        C = inputs.size(1)
+    for input, length in zip(inputs, inputs_lengths):
+        if dim == 3:  # [B, C, T]
+            results.append(input[:, :length].view(C, -1).cpu().numpy())
+        elif dim == 2:  # [B, T]
+            results.append(input[:length].view(-1).cpu().numpy())
+    return results
+def overlap_and_add(signal, frame_step):
+    """Reconstructs a signal from a framed representation.
+    Adds potentially overlapping frames of a signal with shape
+    `[..., frames, frame_length]`, offsetting subsequent frames by `frame_step`.
+    The resulting tensor has shape `[..., output_size]` where
+        output_size = (frames - 1) * frame_step + frame_length
+    Args:
+        signal: A [..., frames, frame_length] Tensor. All dimensions may be unknown, and rank must be at least 2.
+        frame_step: An integer denoting overlap offsets. Must be less than or equal to frame_length.
+    Returns:
+        A Tensor with shape [..., output_size] containing the overlap-added frames of signal's inner-most two dimensions.
+        output_size = (frames - 1) * frame_step + frame_length
+    Based on https://github.com/tensorflow/tensorflow/blob/r1.12/tensorflow/contrib/signal/python/ops/reconstruction_ops.py
+    """
+    outer_dimensions = signal.size()[:-2]
+    frames, frame_length = signal.size()[-2:]
+    # gcd=Greatest Common Divisor
+    subframe_length = math.gcd(frame_length, frame_step)
+    subframe_step = frame_step // subframe_length
+    subframes_per_frame = frame_length // subframe_length
+    output_size = frame_step * (frames - 1) + frame_length
+    output_subframes = output_size // subframe_length
+    subframe_signal = signal.view(*outer_dimensions, -1, subframe_length)
+    frame = torch.arange(0, output_subframes).unfold(
+        0, subframes_per_frame, subframe_step)
+    frame = frame.clone().detach().long().to(signal.device)
+    # frame = signal.new_tensor(frame).clone().long()  # signal may in GPU or CPU
+    frame = frame.contiguous().view(-1)
+    result = signal.new_zeros(
+        *outer_dimensions, output_subframes, subframe_length)
+    result.index_add_(-2, frame, subframe_signal)
+    result = result.view(*outer_dimensions, -1)
+    return result