Utilities for working with files in Apache Beam.
- class resiliparse.beam.fileio.MatchFiles(file_pattern: str, empty_match_treatment: apache_beam.io.fileio.EmptyMatchTreatment = 'ALLOW_IF_WILDCARD', shuffle: bool = True)
Match a file pattern using
Unlike the original Beam implementation, this file matcher enforces a fusion break by reshuffling the matched file names. This circumvents limitations in certain Beam runners that do not automatically distribute splits, such as the FlinkRunner.
file_pattern – file glob
empty_match_treatment – what to do with empty glob matches
shuffle – shuffle matches to break fusion (setting this to
Falseeffectively falls back to the original Beam implementation)