ChatNoir Resiliparse
stable

User Manual

  • Installation Instructions
    • Building Resiliparse From Source
  • Resiliparse Parsing Utilities
    • Character Encoding
      • Character Encoding Detection
      • Map Encodings to WHATWG Specification
      • Convert Byte String to Unicode
      • Detect MIME Types
    • HTTP Tools
      • Read Chunked HTTP Payloads
    • HTML Parsing
      • DOM Selection
        • Elements
        • Attributes
        • HTML and Text Serialization
      • DOM Tree Traversal
        • Advanced Traversal
      • DOM Tree Manipulation
        • Elements
        • Attributes
        • Inner HTML and Inner Text
      • Benchmarks
    • Language Tools
      • Fast Language Detection
        • Benchmarks
        • Supported Languages
  • Resiliparse Extraction Utilities
    • HTML2Text
      • Basic Plain Text Conversion
      • Main Content Extraction
  • Resiliparse Process Guards
    • TimeGuard
      • Interrupt Escalation Behaviour
      • Reporting Progress
      • Progress Loops
      • Using TimeGuard as a Context Manager
      • TimeGuard Check Interval
    • MemGuard
      • Using MemGuard as a Context Manager
      • MemGuard Check Interval
  • Resiliparse Itertools
    • Exception Loops
    • WARC Retry Loops
  • Resiliparse Beam Transforms
    • Installing Resiliparse Beam Transforms
    • Reading WARC Files
    • Reading Text Files
    • Bulk-indexing to Elasticsearch
  • FastWARC
    • Why FastWARC and not WARCIO?
    • Installing FastWARC
    • Building FastWARC From Source
    • Iterating WARC Files
    • Filtering Records
      • Record Type Filter
      • Content-Length Filter
      • Function Filter
      • Digest Filter
    • Record Properties
    • Verifying Record Digests
    • Benchmarks

API Documentation

  • Resiliparse Parsing Utilities
    • Character Encoding
    • HTTP Tools
    • HTML Parsing
    • Language Tools
  • Resiliparse Extraction Utilities
    • HTML2Text
  • Resiliparse Process Guards
  • Resiliparse Itertools
  • Resiliparse Beam Transforms
    • Elasticsearch
    • File I/O
    • Text File I/O
    • WARC I/O
  • FastWARC
    • WARC
    • StreamIO

CLI Documentation

  • Resiliparse CLI
    • Top-Level Commands
      • resiliparse
    • Full Command Listing
      • resiliparse
        • encoding
        • html
        • lang
  • FastWARC CLI
    • Top-Level Commands
      • fastwarc
    • Full Command Listing
      • fastwarc
        • benchmark
        • check
        • extract
        • index
        • recompress
ChatNoir Resiliparse
  • »
  • Python Module Index

Python Module Index

f | r
 
f
- fastwarc
    fastwarc.stream_io
    fastwarc.warc
 
r
- resiliparse
    resiliparse.beam.coders
    resiliparse.beam.elasticsearch
    resiliparse.beam.fileio
    resiliparse.beam.textio
    resiliparse.beam.warcio
    resiliparse.extract.html2text
    resiliparse.itertools
    resiliparse.parse.encoding
    resiliparse.parse.html
    resiliparse.parse.http
    resiliparse.parse.lang
    resiliparse.process_guard

© Copyright 2021, Janek Bevendorff. Revision 734ea12b.

Built with Sphinx using a theme provided by Read the Docs.
Read the Docs v: stable
Versions
latest
stable
Downloads
On Read the Docs
Project Home
Builds