Language Tools

Resiliparse language tools API documentation.

resiliparse.parse.lang.detect_fast(text, cutoff=1200, n_results=1, langs=None)

Perform a very fast (linear-time) language detection on the input string.

The output is a tuple of the detected language name and the calculated out-of-place rank, which indicates how far the given text is from the closest-matching language profile. The higher the rank, the less accurate the detection is. Values above 1200 are usually false results.

The given Unicode string should be in composed normal form (NFC) for the best results.

Parameters:
  • text (str) – input text

  • cutoff (int) – OOP rank cutoff after which to return "unknown"

  • n_results (int) – if this is greater than one, a list of the n_results best matches will be returned

  • langs (list[str]) – restrict detection to these languages

Returns:

tuple of the detected language (or "unknown") and its out-of-place rank

Return type:

(str, int) | list[(str, int)]

resiliparse.parse.lang.supported_langs()

Get a list of all languages that are supported by the fast language detector.

Returns:

list of supported languages

Return type:

list[str]

resiliparse.parse.lang.train_language_examples(examples, vec_len=256)

Train a language vector for fast language detection on a list of example texts.

Parameters:
  • examples (t.Iterable[str]) – list of example texts for this language

  • vec_len (int) – output vector length

Returns:

vector of trained values

Return type:

list[int]