explosion spaCy Language Support · Discussions · GitHub

Sort by: Latest activity

Language Support Discussions

Discuss the language data and training models for new languages

Pinned to Language Support

Adding models for new languages master thread
enhancement Feature requests and improvements lang / all Global language data new language Adding support for new languages to spaCy.
ines started Dec 16, 2018 in Language Support

141

Discussions

You must be logged in to vote

Arabic language support
lang / ar Arabic language data and models
jeknov started Feb 21, 2021 in Language Support

14
You must be logged in to vote

New Language: Classical Armenian

LilitKharatyan started May 20, 2024 in Language Support

0
You must be logged in to vote

Why is the german word "stark" always recognized as an ADV without a sentiment value?

Diapolo started May 13, 2024 in Language Support

0
You must be logged in to vote

Why does the German sentence tokenizer consider a semicolon a sentence ending?
lang / de German language data and models feat / tokenizer Feature: Tokenizer
TamaraAtanasoska started Feb 26, 2024 in Language Support

2
You must be logged in to vote

Other Languages Support
models Issues related to the statistical models
firqaaa started Feb 13, 2024 in Language Support · Closed

0
You must be logged in to vote

Portuguese words starting with a capital letter are not correctly lemmatized
lang / pt Portuguese language data and models feat / lemmatizer Feature: Rule-based and lookup lemmatization
dcaled started Apr 1, 2021 in Language Support

6
You must be logged in to vote

Adding support for Tibetan in spacy
new language Adding support for new languages to spaCy.
wienergm started Dec 24, 2023 in Language Support

0
You must be logged in to vote

Feedback on alpha Finnish, Korean and Swedish trained pipelines
enhancement Feature requests and improvements lang / ko Korean language data and models lang / sv Swedish language data and models lang / fi Finnish language data and models v3.3 Related to v3.3
adrianeboyd started Apr 5, 2022 in Language Support

16
You must be logged in to vote

English models' Accuracy Evaluation values
lang / en English language data and models
ojo4f3 started Dec 4, 2023 in Language Support

1
You must be logged in to vote

Update russian library
lang / ru Russian language data and models third-party Third-party packages and services feat / lemmatizer Feature: Rule-based and lookup lemmatization
fitwist started Nov 15, 2023 in Language Support

1
You must be logged in to vote

Floret vectors for Italian
training Training and updating models feat / vectors Feature: Word vectors and similarity
darioprencipe started Oct 30, 2023 in Language Support

1
You must be logged in to vote

Improving Bengali Stopwords collection and Exception
lang / bn Bengali language data and models new language Adding support for new languages to spaCy.
Debangan-MishraIIIT started Sep 7, 2023 in Language Support

1
You must be logged in to vote

conected words in Portuguese
lang / pt Portuguese language data and models
ClioBrNl2023 started Aug 22, 2023 in Language Support

1
You must be logged in to vote

Training coreference resolver on Italian Ontonotes produces low scores
training Training and updating models feat / coref Feature: Coreference resolution
ghidav started Aug 14, 2023 in Language Support

3
You must be logged in to vote

Chinese word segmentation model for spaCy
models Issues related to the statistical models lang / zh Chinese language data and models third-party Third-party packages and services
PythonCancer started Aug 18, 2023 in Language Support

1
You must be logged in to vote

Italian lemmatizer low performance on agglitinated verbs
lang / it Italian language data and models feat / lemmatizer Feature: Rule-based and lookup lemmatization
ferrixio started Aug 14, 2023 in Language Support

2
You must be logged in to vote

How do I use url_match with Japanese?
lang / ja Japanese language data and models feat / tokenizer Feature: Tokenizer
ryanheise started Jul 27, 2023 in Language Support

5
You must be logged in to vote

Thai Language not working properly
lang / th Thai language data and models
atawur started Jul 27, 2023 in Language Support

1
You must be logged in to vote

Losing POS Tagging & Other Token Attributes when Segmenting with Jieba or Pkuseg
usage General spaCy usage feat / tokenizer Feature: Tokenizer
creolio started Jul 20, 2023 in Language Support

1
You must be logged in to vote

Chinese tokenization is bad
lang / zh Chinese language data and models
bittlingmayer started Dec 14, 2021 in Language Support

7
You must be logged in to vote

Support for Balochi
meta Meta topics, e.g. repo organisation and issue management new language Adding support for new languages to spaCy.
strickvl started Jun 8, 2023 in Language Support

1
You must be logged in to vote

Sentiment analysis for all standard spacy language models
feat / textcat Feature: Text Classifier
thomasdhuw started May 22, 2023 in Language Support · Closed

1
You must be logged in to vote

xx_sent_ud_sm bad sentence split
models Issues related to the statistical models lang / zh Chinese language data and models lang / xx Multi-language data and models feat / senter Feature: Sentence Recognizer
lance0108 started May 18, 2023 in Language Support

1
You must be logged in to vote

Support for Hebrew
models Issues related to the statistical models lang / he Hebrew language data and models
Hilla-Merhav started Apr 4, 2023 in Language Support

13
You must be logged in to vote

Slovenian: Feedback on alpha trained pipelines for upcoming spaCy v3.6
models Issues related to the statistical models lang / sl Slovenian language data and models v3.6 Related to v3.6
adrianeboyd started May 16, 2023 in Language Support

0