AI & ML interests

Information Retrieval, Natural Language Processing, Text Mining, Argumentation

Recent Activity

webis's activity

cschroederย 
posted an update 6 days ago
view post
Post
321
๐Ÿ’ก๐—Ÿ๐—ผ๐—ผ๐—ธ๐—ถ๐—ป๐—ด ๐—ณ๐—ผ๐—ฟ ๐˜€๐˜‚๐—ฝ๐—ฝ๐—ผ๐—ฟ๐˜: ๐—›๐—ฎ๐˜ƒ๐—ฒ ๐˜†๐—ผ๐˜‚ ๐—ฒ๐˜ƒ๐—ฒ๐—ฟ ๐—ต๐—ฎ๐—ฑ ๐˜๐—ผ ๐—ผ๐˜ƒ๐—ฒ๐—ฟ๐—ฐ๐—ผ๐—บ๐—ฒ ๐—ฎ ๐—น๐—ฎ๐—ฐ๐—ธ ๐—ผ๐—ณ ๐—น๐—ฎ๐—ฏ๐—ฒ๐—น๐—ฒ๐—ฑ ๐—ฑ๐—ฎ๐˜๐—ฎ ๐˜๐—ผ ๐—ฑ๐—ฒ๐—ฎ๐—น ๐˜„๐—ถ๐˜๐—ต ๐—ฎ๐—ป ๐—ก๐—Ÿ๐—ฃ ๐˜๐—ฎ๐˜€๐—ธ?

Are you working on Natural Language Processing tasks and have faced the challenge of a lack of labeled data before? ๐—ช๐—ฒ ๐—ฎ๐—ฟ๐—ฒ ๐—ฐ๐˜‚๐—ฟ๐—ฟ๐—ฒ๐—ป๐˜๐—น๐˜† ๐—ฐ๐—ผ๐—ป๐—ฑ๐˜‚๐—ฐ๐˜๐—ถ๐—ป๐—ด ๐—ฎ ๐˜€๐˜‚๐—ฟ๐˜ƒ๐—ฒ๐˜† to explore the strategies used to address this bottleneck, especially in the context of recent advancements, including but not limited to large language models.

The survey is non-commercial and conducted solely for academic research purposes. The results will contribute to an open-access publication that also benefits the community.

๐Ÿ‘‰ With only 5โ€“15 minutes of your time, you would greatly help to investigate which strategies are used by the #NLP community to overcome a lack of labeled data.

โค๏ธHow you can help even more: If you know others working on supervised learning and NLP, please share this survey with themโ€”weโ€™d really appreciate it!

Survey: https://bildungsportal.sachsen.de/umfragen/limesurvey/index.php/538271
Estimated time required: 5โ€“15 minutes
Deadline for participation: January 12, 2025

#NLP #ML
christopherย 
posted an update 15 days ago
view post
Post
1558
The folks at Foursquare released a dataset of 104.5 million places of interest ( foursquare/fsq-os-places) and here's all of them on a plot
ยท
christopherย 
posted an update 17 days ago
cschroederย 
posted an update 28 days ago
view post
Post
1082
๐Ÿฃ New release: small-text v2.0.0.dev1

With small language models on the rise, the new version of small-text has been long overdue! Despite the generative AI hype, many real-world tasks still rely on supervised learningโ€”which is reliant on labeled data.

Highlights:
- Four new query strategies: Try even more combinations than before.
- Vector indices integration: HNSW and KNN indices are now available via a unified interface and can easily be used within your code.
- Simplified installation: We dropped the torchtext dependency and cleaned up a lot of interfaces.

Github: https://github.com/webis-de/small-text

๐Ÿ‘‚ Try it out for yourself! We are eager to hear your feedback.
๐Ÿ”ง Share your small-text applications and experiments in the newly added showcase section.
๐ŸŒŸ Support the project by leaving a star on the repo!

#activelearning #nlproc #machinelearning
cschroederย 
posted an update about 1 month ago
view post
Post
695
#EMNLP2024 is happening soon! Unfortunately, I will not be on site, but I will present our poster virtually on Wednesday, Nov 13 (7:45 EST / 13:45 CEST) in Virtual Poster Session 2.

In this work, we leverage self-training in an active learning loop in order to train small language models with even less data. Hope to see you there!
  • 1 reply
ยท
cschroederย 
posted an update 3 months ago
view post
Post
401
โš–๏ธ ๐€๐ˆ ๐“๐ซ๐š๐ข๐ง๐ข๐ง๐  ๐ข๐ฌ ๐‚๐จ๐ฉ๐ฒ๐ซ๐ข๐ ๐ก๐ญ ๐ˆ๐ง๐Ÿ๐ซ๐ข๐ง๐ ๐ž๐ฆ๐ž๐ง๐ญ

This bold claim is not my opinion, but it has been made in a recent "report" of a group, whose stance is recognizable in their name. It is roughly translated as "Authors' Rights Initiative". They published a report which was also presented before the EU Parliament according to the LinkedIn post below.

I am not really interested in politics, but as an EU citizen I am of course somewhat interested in a reasonable and practical version of the EU AI Act. Not saying there should not be rules around data and AI, but this report is obviously very biased towards one side.

While I think the report itself does not deserve attention, I post it in the hope that you find more examples, where they did not address the issue adequately. Feel free to add to my LinkedIn posts (where the original authors will see it) or here.

[en] Executive summary: https://urheber.info/media/pages/diskurs/ai-training-is-copyright-infringement/3b900058e6-1725460935/executive-summary_engl_final_29-08-2024.pdf
[de] Full report: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4946214

LinkedIn: https://www.linkedin.com/posts/activity-7238912869268959232-6cFx

christopherย 
posted an update 4 months ago
view post
Post
1323
4 million chess puzzles
cschroederย 
posted an update 4 months ago
view post
Post
681
๐ŸŒŸ Liger Kernel: Efficient Triton Kernels for LLM Training

LIGER "is a [Hugging Face-compatible] collection of Triton kernels designed specifically for LLM training. It can effectively increase multi-GPU training throughput by 20% and reduces memory usage by 60%."

GitHub: https://github.com/linkedin/Liger-Kernel