Language Tools
Resiliparse language tools API documentation.
- resiliparse.parse.lang.detect_fast(text, cutoff=1200, n_results=1, langs=None)
Perform a very fast (linear-time) language detection on the input string.
The output is a tuple of the detected language name and the calculated out-of-place rank, which indicates how far the given text is from the closest-matching language profile. The higher the rank, the less accurate the detection is. Values above 1200 are usually false results.
The given Unicode string should be in composed normal form (NFC) for the best results.
- Parameters
text (str) – input text
cutoff (int) – OOP rank cutoff after which to return
"unknown"
n_results (int) – if this is greater than one, a list of the
n_results
best matches will be returnedlangs (list[str]) – restrict detection to these languages
- Returns
tuple of the detected language (or
"unknown"
) and its out-of-place rank- Return type
(str, int) | list[(str, int)]
- resiliparse.parse.lang.supported_langs()
Get a list of all languages that are supported by the fast language detector.
- Returns
list of supported languages
- Return type
list[str]
- resiliparse.parse.lang.train_language_examples(examples, vec_len=256)
Train a language vector for fast language detection on a list of example texts.
- Parameters
examples (t.Iterable[str]) – list of example texts for this language
vec_len (int) – output vector length
- Returns
vector of trained values
- Return type
list[int]