Germany 100k.zip Apr 2026

: Approximately 100,000 documents with titles, tables, and images removed to provide clean, plain text.

This dataset typically contains extracted from German Wikipedia . It is widely used by researchers for tasks such as: Germany 100k.zip

: Providing a large corpus for both extractive and abstractive summarization techniques. : Approximately 100,000 documents with titles, tables, and

: Identifying specific locations, organizations, or names within German-language text. Dataset Composition : Approximately 100

: These datasets often represent millions of individual word tokens, making them suitable for training small-to-medium scale language models.