web-search-api / documents /webpage_content_extractor.py

Commit History

:boom: [Fix] WebpageContentExtractor: UnicodeDecodeError
cff1afc

Hansimov commited on

:gem: [Feature] New BatchWebpageContentExtractor: Extract webpage content from multiple html_paths concurrently
1db460d

Hansimov commited on

:zap: [Enhance] ignore classes pattern, especially for 163.com
3dda344

Hansimov commited on

:recycle: [Refactor] WebpageContentExtractor: Separate html and markdown processing
a636bcb

Hansimov commited on

:recycle: [Refactor] Move hardcoded consts to network_configs
af2c647

Hansimov commited on

:gem: [Feature] SearchAPIApp: New extract_content param
4d3e890

Hansimov commited on

:gem: [Feature] New WebpageContentExtractor: extract webpage content as clean markdown
e773696

Hansimov commited on