File I/O

Utilities for working with files in Apache Beam.

class resiliparse.beam.fileio.MatchFiles(file_pattern: str, empty_match_treatment: EmptyMatchTreatment = 'ALLOW_IF_WILDCARD', shuffle: bool = True)

Bases: PTransform

Match a file pattern using apache_beam.io.filesystems.FileSystems.match().

Unlike the original Beam implementation, this file matcher enforces a fusion break by reshuffling the matched file names. This circumvents limitations in certain Beam runners that do not automatically distribute splits, such as the FlinkRunner.

Parameters:
  • file_pattern – file glob

  • empty_match_treatment – what to do with empty glob matches

  • shuffle – shuffle matches to break fusion (setting this to False effectively falls back to the original Beam implementation)