Language Tools
Resiliparse language tools API documentation.
- resiliparse.parse.lang.detect_fast(text, cutoff=1200, n_results=1, langs=None)
Perform a very fast (linear-time) language detection on the input string.
The output is a tuple of the detected language name and the calculated out-of-place rank, which indicates how far the given text is from the closest-matching language profile. The higher the rank, the less accurate the detection is. Values above 1200 are usually false results.
The given Unicode string should be in composed normal form (NFC) for the best results.
- Parameters:
text (str) – input text
cutoff (int) – OOP rank cutoff after which to return
"unknown"
n_results (int) – if this is greater than one, a list of the
n_results
best matches will be returnedlangs (list[str]) – restrict detection to these languages
- Returns:
tuple of the detected language (or
"unknown"
) and its out-of-place rank- Return type:
(str, int) | list[(str, int)]
- resiliparse.parse.lang.supported_langs()
Get a list of all languages that are supported by the fast language detector.
- Returns:
list of supported languages
- Return type:
list[str]
- resiliparse.parse.lang.train_language_examples(examples, vec_len=256)
Train a language vector for fast language detection on a list of example texts.
- Parameters:
examples (t.Iterable[str]) – list of example texts for this language
vec_len (int) – output vector length
- Returns:
vector of trained values
- Return type:
list[int]