textgraphs / docs /build.md
Paco Nathan
A new start
91eaff6

A newer version of the Streamlit SDK is available: 1.36.0

Upgrade

Build Instructions

API by Adnen Kadri from the Noun Project

!!! note In most cases you won't need to build this package locally.

Unless you're doing development work on the textgraphs library itself, simply install based on the instructions in "Getting Started".

Setup

To set up the build environment locally:

python3 -m venv venv
source venv/bin/activate
python3 -m pip install -U pip wheel setuptools

python3 -m pip install -e .
python3 -m pip install -r requirements-dev.txt

We use pre-commit hooks based on pre-commit and to configure that locally:

pre-commit install --hook-type pre-commit

Test Coverage

This project uses pytest for unit test coverage. Source for unit tests is in the tests subdirectory.

To run the unit tests:

python3 -m pytest

Note that these tests run as part of the CI workflow whenever code is updated on the GitHub repo.

Online Documentation

To generate documentation pages, you will also need to download ChromeDriver for your version of the Chrome browser, saved as chromedriver in this directory.

Source for the documentation is in the docs subdirectory.

To build the documentation:

./bin/nb_md.sh
./pkg_doc.py docs/ref.md
mkdocs build

Then run ./bin/preview.py and load http://127.0.0.1:8000/docs/ in your browser to preview the generated microsite locally.

To package the generated microsite for deployment on a web server:

tar cvzf txg.tgz site/

Remote Repo Updates

To update source code repo on GitHub:

git remote set-url origin https://github.com/DerwenAI/textgraphs.git
git push

Create new releases on GitHub then run git pull locally prior to updating Hugging Face or making a new package release.

To update source code repo+demo on Hugging Face:

git remote set-url origin https://huggingface.co/spaces/DerwenAI/textgraphs
git push

Package Release

To update the release on PyPi:

./bin/push_pypi.sh

Packaging

Both the spaCy and PyPi teams induce packaging errors since they have "opinionated" views which conflict against each other and also don't quite follow the Python packaging standards.

Moreover, the various dependencies here use a wide range of approaches for model downloads: quite appropriately, the spaCy team does not want to package their language models on PyPi. However, they don't use more contemporary means of model download, such as HF transformers, either -- and that triggers logging problems. Overall, logging approaches used by the dependencies here for errors/warnings are mostly ad-hoc.

These three issues (packaging, model downloads, logging) pose a small nightmare for managing Python library packaging downstream. To that point, this project implements several workarounds so that applications can download from PyPi.

Meanwhile keep watch on developments of the following dependencies, if they introduce breaking changes or move toward more standard packaging practices:

  • spaCy -- model downloads, logging
  • OpenNRE -- PyPi packaging, logging
  • HF transformers and tokenizers -- logging
  • WikiMedia APIs -- SSL certificate expiry