dataconverter

Base DataConverter class(es) used in GraphNeT.

class graphnet.data.dataconverter.FileSet(i3_file: str, gcd_file: str)[source]

Bases: object

Parameters:
  • i3_file (str) –

  • gcd_file (str) –

i3_file: str
gcd_file: str
graphnet.data.dataconverter.init_global_index(index, output_files)[source]

Make global_index available to pool workers.

Return type:

None

Parameters:
  • index (Synchronized) –

  • output_files (List[str]) –

graphnet.data.dataconverter.cache_output_files(process_method)[source]

Decorate process_method to cache output file names.

Return type:

TypeVar(F, bound= Callable[..., Any])

Parameters:

process_method (F) –

class graphnet.data.dataconverter.DataConverter(extractors, outdir, gcd_rescue, *, nb_files_to_batch, sequential_batch_pattern, input_file_batch_pattern, workers, index_column, icetray_verbose)[source]

Bases: ABC, Logger

Base class for converting I3-files to intermediate file format.

Construct DataConverter.

When using input_file_batch_pattern, regular expressions are used to group files according to their names. All files that match a certain pattern up to wildcards are grouped into the same output file. This output file has the same name as the input files that are group into it, with wildcards replaced with “x”. Periods (.) and wildcards (*) have a special meaning: Periods are interpreted as literal periods, and not as matching any character (as in standard regex); and wildcards are interpreted as “.*” in standard regex.

For instance, the pattern “[A-Z]{1}_[0-9]{5}*.i3.zst” will find all I3 files whose names contain:

  • one capital letter, followed by

  • an underscore, followed by

  • five numbers, followed by

  • any string of characters ending in “.i3.zst”

This means that, e.g., the files:
  • upgrade_genie_step4_141020_A_000000.i3.zst

  • upgrade_genie_step4_141020_A_000001.i3.zst

  • upgrade_genie_step4_141020_A_000008.i3.zst

  • upgrade_genie_step4_141020_A_000009.i3.zst

would be grouped into the output file named “upgrade_genie_step4_141020_A_00000x.<suffix>” but the file

  • upgrade_genie_step4_141020_A_000010.i3.zst

would end up in a separate group, named “upgrade_genie_step4_141020_A_00001x.<suffix>”.

Parameters:
  • extractors (List[I3Extractor]) –

  • outdir (str) –

  • gcd_rescue (str | None) –

  • nb_files_to_batch (int | None) –

  • sequential_batch_pattern (str | None) –

  • input_file_batch_pattern (str | None) –

  • workers (int) –

  • index_column (str) –

  • icetray_verbose (int) –

abstract property file_suffix: str

Suffix to use on output files.

execute(filesets)[source]

General method for processing a set of I3 files.

The files are converted individually according to the inheriting class/ intermediate file format.

Parameters:

filesets (List[FileSet]) – List of paths to I3 and corresponding GCD files.

Return type:

None

abstract save_data(data, output_file)[source]

Implementation-specific method for saving data to file.

Parameters:
  • data (List[OrderedDict]) – List of extracted features.

  • output_file (str) – Name of output file.

Return type:

None

abstract merge_files(output_file, input_files)[source]

Implementation-specific method for merging output files.

Parameters:
  • output_file (str) – Name of the output file containing the merged results.

  • input_files (Optional[List[str]], default: None) – Intermediate files to be merged, according to the specific implementation. Default to None, meaning that all files output by the current instance are merged.

Raises:

NotImplementedError – If the method has not been implemented for the backend in question.

Return type:

None

get_map_function(nb_files, unit file(s)')[source]

Identify map function to use (pure python or multiprocess).

Return type:

Tuple[Any, Optional[Pool]]

Parameters:
  • nb_files (int) –

  • unit (str) –