sqlite_dataconverter¶

DataConverter for the SQLite backend.

class graphnet.data.sqlite.sqlite_dataconverter.SQLiteDataConverter(extractors, outdir, gcd_rescue, *, nb_files_to_batch, sequential_batch_pattern, input_file_batch_pattern, workers, index_column, icetray_verbose)[source]¶

Bases: DataConverter

Class for converting I3-file(s) to SQLite format.

Construct DataConverter.

When using input_file_batch_pattern, regular expressions are used to group files according to their names. All files that match a certain pattern up to wildcards are grouped into the same output file. This output file has the same name as the input files that are group into it, with wildcards replaced with “x”. Periods (.) and wildcards (*) have a special meaning: Periods are interpreted as literal periods, and not as matching any character (as in standard regex); and wildcards are interpreted as “.*” in standard regex.

For instance, the pattern “[A-Z]{1}_[0-9]{5}*.i3.zst” will find all I3 files whose names contain:

one capital letter, followed by

an underscore, followed by

five numbers, followed by

any string of characters ending in “.i3.zst”

This means that, e.g., the files:

upgrade_genie_step4_141020_A_000000.i3.zst
upgrade_genie_step4_141020_A_000001.i3.zst
…
upgrade_genie_step4_141020_A_000008.i3.zst
upgrade_genie_step4_141020_A_000009.i3.zst

would be grouped into the output file named “upgrade_genie_step4_141020_A_00000x.<suffix>” but the file

upgrade_genie_step4_141020_A_000010.i3.zst

would end up in a separate group, named “upgrade_genie_step4_141020_A_00001x.<suffix>”.

Parameters:

extractors (List[I3Extractor]) –
outdir (str) –
gcd_rescue (str | None) –
nb_files_to_batch (int | None) –
sequential_batch_pattern (str | None) –
input_file_batch_pattern (str | None) –
workers (int) –
index_column (str) –
icetray_verbose (int) –

file_suffix = 'db'¶

save_data(data, output_file)[source]¶

Save data to SQLite database.

Return type:

None

Parameters:

data (List[OrderedDict]) –
output_file (str) –

merge_files(output_file, input_files, max_table_size)[source]¶

SQLite-specific method for merging output files/databases.

Parameters:

output_file (str) – Name of the output file containing the merged results.
input_files (Optional[List[str]], default: None) – Intermediate files/databases to be merged, according to the specific implementation. Default to None, meaning that all files/databases output by the current instance are merged.
max_table_size (Optional[int], default: None) – The maximum number of rows in any given table. If any one table exceed this limit, a new database will be created.

Return type:

None

any_pulsemap_is_non_empty(data_dict)[source]¶

Check whether there are non-empty pulsemaps extracted from P frame.

Takes in the data extracted from the P frame, then retrieves the values, if there are any, from the pulsemap key(s) (e.g SplitInIcePulses). If at least one of the pulsemaps is non-empty then return true. If no pulsemaps exist, i.e., if no I3FeatureExtractor is called e.g. because I3GenericExtractor is used instead, always return True.

Return type:: bool
Parameters:: data_dict (Dict[str, Dict]) –

graphnet.data.sqlite.sqlite_dataconverter.construct_dataframe(extraction)[source]¶

Convert extraction to pandas.DataFrame.

Parameters:: extraction (Dict[str, Any]) – Dictionary with the extracted data.
Return type:: DataFrame
Returns:: Extraction as pandas.DataFrame.

graphnet.data.sqlite.sqlite_dataconverter.is_pulse_map(table_name)[source]¶

Check whether table_name corresponds to a pulse map.

Return type:: bool
Parameters:: table_name (str) –

graphnet.data.sqlite.sqlite_dataconverter.is_mc_tree(table_name)[source]¶

Check whether table_name corresponds to an MC tree.

Return type:: bool
Parameters:: table_name (str) –