Error when running

#2
by juhoinkinen - opened

Hi and thanks for the tool! It seemed just what I was looking for.

However, when trying to use to the tool to migrate the Annif-corpora GitHub repository to a Hugging Face repository an error occurs after some minutes. The destination HF repo is created (https://huggingface.co/datasets/juhoinkinen/Annif-corpora-via-migration-tool) but no files are transferred to it.

There is just a general error info in the UI:
image.png

(Also, I have tried different ways to sync the GitHub repo to Hugging Face Hub: A GitHub Action utilizing huggingface-cli upload in the source repo errors with 504 Server Error, e.g. this run; running huggingface-cli uploadon a local clone of the repo the process eventually gets killed while using several Gigabytes of RAM. This might be related to the problem with this tool.)

Any help is appreciated!

Actually, after internal discussion, the Annif-corpora as such is best to not be published at Hugging Face Hub. But the issue of 504 Server Error might still be worth to look at.

Librarian Bots org

Thanks for reporting it @juhoinkinen will take a look at why this might be happening! Also if you don't mind sharing, is there a particular reason you decided not to publish on the Hugging Face Hub?

We want do some cleaning and restructuring of the corpora before publishing on the HF Hub :) The repository actually contains multiple datasets, and in some only links to PDF files which need to be downloaded and converted in a separate step. So the dataset as such is not very compatible with HF Hub I think.

And the 504 Server Error is probably not related to this app, but maybe to the HF Hub limit/recommendation for the number of files per folder (in the repo there is over 27k files in /subjects/ subdirectories.), because the same error occurs when using huggingface-cli.

Sign up or log in to comment