How-to : HEPData Preparation

Reinterpretation materials are required of each analysis in EXO. These can be classified into:

HepData Sandbox - Signal generation fragments, cutflows, yields in paper plots, etc..
Full Statistical Models - Datacards, ROOT files, instructions to set-up and run Combine, etc..

Although these materials are submitted for review on similar timelines, their preparation and approval are slightly different.

HepData

From HEPData webpage: The Durham High-Energy Physics Database (HEPData) has been built up over the past four decades as a unique open-access repository for scattering data from experimental particle physics. It currently comprises the data points from plots and tables related to several thousand publications including those from the Large Hadron Collider (LHC). HEPData is funded by a grant from the UK STFC and is based at the IPPP at Durham University.

The HepData Team in EXO

The HepData team in EXO is mainly the EXO HepData contact: Gianfranco de Castro gdecastr@bu.edu.Egroup address (including the MC&I conveners): cms-phys-exo-hepdata@cern.ch.

Submitting a HepData Entry

Analyzers reach out to the EXO HepData contact to create an entry around the time of their Approval talk and CWR.
- Please do this with anticipation to avoid any delays.
The EXO HepData contact will create a HepData entry for analyzers to submit the materials for review.
- Please submit materials to the entry created by the HepData team and not your own personal entry.
Upon review, one of two things happen:
- ✅ The HepData entry gets the green light if all of the materials below are included.
- ❌ Feedback will be given to analyzers to include missing materials or improve the entry.
After the green light is given, the record is expected to come out ~ ArXiv submission / INSPIRE entry creation.
- The HepData link should be added to the paper draft before submission. It is made public after the paper is public.

Required HepData Materials

Please include all of these materials in the entry.

MC Generator inputs
- Signal Gridpacks
- Corresponding cross sections and k-factors
Cutflow tables
For each signal model used in the analysis, please include the relative and absolute selection efficiencies after each region's event level cuts. Please provide either the statistical uncertainty associated to each efficiency or the gen-level yields for each signal samples.
This is a minimal requirement for reinterpretation; other inputs lose their value if this is not shared.
Scientists external to CMS reinterpreting our results usually do not have much computing power. As GEANT4 is computationally time consuming, they often use fast reconstruction (such as Delphes) and public analysis frameworks (such as MadAnalysis) to reinterpret our results. As a result, cutflow tables are necessary in order to validate their processing of signal samples (generation, reconstruction, and selection).
Note on NNs: The process of making neural network architectures/weights public is still being discussed internally by CMS higher-ups. In the meantime, providing the efficiency of a NN selection on the signal should still be included in these cutflow tables.
Example : Take a look at "Signal Efficiency CSC Category" and "Signal Efficiency DT Category"
Information from Relevant Figures
Numbers shown in plots from the paper should be included in the HepData entry if they contain:
- Upper limits on the analysis' figure of merit
- Background, signal and data yields in regions used to derive these ULs
- Other relevant figures used for reinterpretation
Additional information from other figures can be included at the author's discretion but is not necessary.

Optional HepData Materials

Covariance matrices
Covariance matrices, which show how event yields migrate to different bins and regions. Now that Combine is public, we are moving away from encouraging these materials. However, in cases where the full likelihood cannot be provided, this is essential for estimating the simplified likelihood.
CMS Note on simplified likelihoods and how they use the covariance matrix in this process.
Combine Tutorial for generating covariance matrices in your analysis
Example : Take a look at "Background Covariance Matrix (e channel)"

Full Statistical Models

There is a central CMS effort to publish full statistical models now that Combine is public. This includes publishing the datacards and ROOT files required to produce the main results of the analysis, along with a short tutorial on how to run the necessary fits with Combine.

The Full Statistical Model Team in EXO

The team for publishing the full statistical models in EXO includes the EXO HepData and Combine contacts. HepData: Gianfranco de Castro gdecastr@bu.edu. Combine: Cesare Tiziano Cazzaniga cesare.cazzaniga@cern.ch.

Submitting a Full Statistical Model - Setting up Datacards

Reach out to the Combine contact to make an entry for your analysis using its CADI here around the time of your Approval talk and CWR.
Follow this tutorial to set-up your analysis' repository. Any datacards and ROOT files required to exactly reproduce results in the paper must be included.
- Renaming systematics to the common style (more here)
- Running checks on the datacards/ROOT files using the built-in CI
Reach out to the Combine contact for either:
- ✅ The repository gets the green light if everything looks good.
- ❌ Feedback will be given to analyzers if necessary.

Submitting a Full Statistical Model - Setting up Tutorials and Additional Materials

Populate the ReadMe in the repository approved by the Combine contact with instructions on:
- Setting up and running Combine software, including commands (or helper scripts) needed to reproduce results quoted in the paper.
- Optional: Tutorials on setting up and running any additional reintepretation materials. This includes software for sample generation (ex: Pythia), public reconstruction software (ex: Delphes), and public analyzer software (ex: MadAnalysis)
Reach out to the HepData contact for either:
- ✅ The material get the green light if everything looks good.
- ❌ Feedback will be given to analyzers if necessary.

Submitting a Full Statistical Model - Publishing

Reach out to CAT saying your repository is ready to publish. A new page on repository.cern.ch will be created for your analysis.

The datacards, ROOT files, and any helper scripts from the CADI's entry here will be compressed into a tar-ball included in your analysis' page
The ReadMe from the CADI's entry here will be displayed in Markdown formatting.

Before the analysis is public, CAT can show you a preview of what the public analysis page will look like. Once your paper is made public, let them know and they can make your analysis' page public.

Relevant Talks

Talk on publishing full statistical models at the EXO General meeting during March 2024 link
Talk on HEPData at the EXO Welcoming Meeting during October 2024 link
Talk on HEPData preparations at the EXO General Meeting during June 2022 link
Talk on HEPData preparation at the CMS week during April 2022 link
Talk on reinterpretation at the CMS EXO Workshop 2021 link

Context on Additional Reinterpretation Materials

Sample Generation
Providing the Pythia fragments to generate samples used in the analysis is important for scientists external to our collaboration to be able to reinterpret our results. After samples are generated, passed through fast reconstruction, and have analysis-level selections applied, we can validate the entire process by cross-checking with the cutflow efficiencies provided in HepData.
Reconstruction
It's often difficult for reconstruction tools (such as Delphes) to fully reproduce the shape and normalization of analysis variables. Reconstruction software can try to mimic the behavior of the detector by parametrizing its effects in terms of variables such as kinematics (pT, eta, etc).
An example of this can be seen in the nominal Delphes card. Although optimized for RunI, the parametrization is a good rough estimate of the effects of different parts of the detector on the reconstruction of objects such as leptons and jets. Analyzers are encouraged to optimize/modify theses Delphes cards to meet the needs of their analysis for scientists external to CMS to be able to accurately reinterpret our results.
Central CMS efforts are underway to optimize these cards for detector conditions in RunII, RunIII, and Phase II.
Analysis Code
Analyzers also have to apply event-level and object selections on the reconstructed events. There exist public frameworks, such as MadAnalysis, which give analyzers the tools to apply these selections. After doing so, analyzers can validate that their workflow correctly replicates results from the paper in question by cross-checking the cutflow efficiencies reported in HepData with those derived from the generation, reconstruction, analysis selection workflow.
An example of an analysis doing this is the MadAnalysis materials created for EXO-20-004 by A. Albert. No Delphes card tuning was needed for this analysis, but they produced MadAnalysis code for scientists external to CMS to be able to replicate and reinterpret paper results with their own signal models.
Link to EXO-24-004 MadAnalysis

PreviousHow-to : Gridpack Production NextOthers

Last updated 4 months ago

Was this helpful?