File size: 3,551 Bytes
91eaff6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
# Build Instructions

<img src="../assets/nouns/api.png" alt="API by Adnen Kadri from the Noun Project" />

!!! note
    In most cases you won't need to build this package locally.

Unless you're doing development work on the **textgraphs** library itself,
simply install based on the instructions in
["Getting Started"](https://derwen.ai/docs/txg/start/).


## Setup

To set up the build environment locally:
```
python3 -m venv venv
source venv/bin/activate
python3 -m pip install -U pip wheel setuptools

python3 -m pip install -e .
python3 -m pip install -r requirements-dev.txt
```

We use *pre-commit hooks* based on [`pre-commit`](https://pre-commit.com/)
and to configure that locally:
```
pre-commit install --hook-type pre-commit
```


## Test Coverage

This project uses
[`pytest`](https://docs.pytest.org/)
for *unit test* coverage.
Source for unit tests is in the
[`tests`](https://github.com/DerwenAI/textgraphs/tree/main/tests)
subdirectory.

To run the unit tests:
```
python3 -m pytest
```

Note that these tests run as part of the CI workflow
whenever code is updated on the GitHub repo.


## Online Documentation

To generate documentation pages, you will also need to download
[`ChromeDriver`](https://googlechromelabs.github.io/chrome-for-testing/)
for your version of the `Chrome` browser, saved as `chromedriver` in
this directory.

Source for the documentation is in the
[`docs`](https://github.com/DerwenAI/textgraphs/tree/main/docs)
subdirectory.

To build the documentation:
```
./bin/nb_md.sh
./pkg_doc.py docs/ref.md
mkdocs build
```

Then run `./bin/preview.py` and load <http://127.0.0.1:8000/docs/>
in your browser to preview the generated microsite locally.

To package the generated microsite for deployment on a
web server:
```
tar cvzf txg.tgz site/
```


## Remote Repo Updates

To update source code repo on GitHub:

```
git remote set-url origin https://github.com/DerwenAI/textgraphs.git
git push
```

Create new releases on GitHub then run `git pull` locally prior to
updating Hugging Face or making a new package release.

To update source code repo+demo on Hugging Face:

```
git remote set-url origin https://huggingface.co/spaces/DerwenAI/textgraphs
git push
```


## Package Release

To update the [release on PyPi](https://pypi.org/project/textgraphs/):
```
./bin/push_pypi.sh
```


## Packaging

Both the spaCy and PyPi teams induce packaging errors since they
have "opinionated" views which conflict against each other and also
don't quite follow the [Python packaging standards](https://peps.python.org/pep-0621/).

Moreover, the various dependencies here use a wide range of approaches
for model downloads: quite appropriately, the spaCy team does not want
to package their language models on PyPi.
However, they don't use more contemporary means of model download,
such as HF transformers, either -- and that triggers logging problems.
Overall, logging approaches used by the dependencies here for errors/warnings
are mostly ad-hoc.

These three issues (packaging, model downloads, logging) pose a small nightmare
for managing Python library packaging downstream.
To that point, this project implements several workarounds so that
applications can download from PyPi.

Meanwhile keep watch on developments of the following dependencies,
if they introduce breaking changes or move toward more standard
packaging practices:

  * `spaCy` -- model downloads, logging
  * `OpenNRE` -- PyPi packaging, logging
  * HF `transformers` and `tokenizers` -- logging
  * WikiMedia APIs -- SSL certificate expiry