Language Tools

Resiliparse language tools API documentation.

resiliparse.parse.lang.detect_fast(text, cutoff=1200, n_results=1, langs=None)

Perform a very fast (linear-time) language detection on the input string.

The output is a tuple of the detected language name and the calculated out-of-place rank, which indicates how far the given text is from the closest-matching language profile. The higher the rank, the less accurate the detection is. Values above 1200 are usually false results.

The given Unicode string should be in composed normal form (NFC) for the best results.

Parameters
  • text (str) – input text

  • cutoff (int) – OOP rank cutoff after which to return "unknown"

  • n_results (int) – if this is greater than one, a list of the n_results best matches will be returned

  • langs (list[str]) – restrict detection to these languages

Returns

tuple of the detected language (or "unknown") and its out-of-place rank

Return type

(str, int) | list[(str, int)]

resiliparse.parse.lang.supported_langs()

Get a list of all languages that are supported by the fast language detector.

Returns

list of supported languages

Return type

list[str]

resiliparse.parse.lang.train_language_examples(examples, vec_len=256)

Train a language vector for fast language detection on a list of example texts.

Parameters
  • examples (t.Iterable[str]) – list of example texts for this language

  • vec_len (int) – output vector length

Returns

vector of trained values

Return type

list[int]