How-to : Sample Production
How to produce your private samples
RunII-Legacy (Ultra Legacy) Campaign Steps
(wmLHE)GEN : Produces events in a generator level (ME, PS) with the MC generators.\
GEN : Sample production with generator python fragments (Pythia, Sherpa, ...).\
wmLHEGEN : Sample production with gridpacks (MadGraph, Powheg, ...) and hadronizer python fragments (Pythia, Herwig, ...).
SIM : Simulates the energy released by the particles in the crossed detectors. Beamspot, detector geometries are considered as the parameters. \
DIGIPremix : The simulated detector signals are digitised and the prepared pileup events are overlayed/premixed to the samples.\
HLT : Regional reconstructions are done by the HLT algorithms and makes trigger decisions using the HLT level objects.\
RECO : Physics objects (e.g. muons, jets, etc.) are reconstructed. The output format of this step is generally in AODSIM format. \
MiniAOD : The RECO level events are skimmed and reduced in size by running the MiniAOD module, which saves information of physics objects which can be directly used for analyses.\
NanoAOD : The MiniAOD level events are further skimmed and reduced in size by running the NanoAOD module. The idea of NanoAOD format is to have a plain ROOT sample which can be analysed outside any CMSSW environment.
How to Produce Sample
Please send us an email (cms-exo-mci@cern.ch) if you happen to use this, we would like to know how helpful this can be. And hopefully make updates for RunIII if it does indeed help.
Keep in mind that this is not a CMS centrally provided feature! It's rather a somewhat private work from Sihyun Jeon (shjeon@cern.ch) to aid the analysers who want to do their sample studies before the requested official samples arrive which can take 4-6 months from time to time.
For massive amount of samples in need (due to many BSM parameters or requires too-many number of events), you should ask for centrally produced official samples. Even if you were able to produce all your samples privately with this, you still need to ask for official samples for validation purposes to check your gridpacks, generator fragments, detector conditions, and pileup premixing scenarios.
Automised codes that helps analysers to produce their RunII-Legacy (Ultra-Legacy) samples privately using CRAB job submissions.
Instructions
1. Git clone the EXO-MCsampleProductions gitlab repository link. 2. Prepare a CSV file with inputs in following order to run (wmLHE)GEN step (split with commas) :
Warning note for DIGIPremix step. For every each step, make sure you initiated voms-proxy-init --voms cms command. Especially if you don't for DIGIPremix step, you'll basically end up with 0 pileup samples.
DATASETNAME : Name of the dataset that will be used for DAS publication.\
GENFRAGMENT : Generator fragment python file that will be used for hadronizer or generator. It should be stored in skeleton/genfragments directory.\
NEVENTS : Number of events in total that will be made. Jet matching or filter efficiencies should be taken into account. e.g. If matching efficiency is 0.4, and you want 10000 events to be produced, write 10000 X 1/0.4 = 25000.\
NSPLITJOBS : Number of jobs that will be split into CRAB. e.g. NEVENTS=25000 with NSPLITJOBS=25 will run 1000 events per 1 CRAB job.\
GRIDPACK : Path to gridpack if gridpack is used. It should be in CVMFS area.
Minor tips for number of events per CRAB job. The number of events you have in your output file after (wmLHE)GEN step should not be bigger than 2000-2500. Otherwise SIM step might face CRAB job failue with JobExitCodes:50664 (wall time error) due to its long processing time. So please check the matching/filtering efficiencies before running the scripts here. e.g. NEVENTS=100000, NJOBS=10, efficiency=50% => 5000 events survive from 10000 events per (wmLHE)GEN step CRAB job => 5000 events get submitted to SIM step and this is NOT OKAY. e.g. NEVENTS=100000, NJOBS=10, efficiency=15% => 1500 events survive from 10000 evenst per (wmLHE)GEN step CRAB job => 1500 events get submitted to SIM step and this is OKAY. e.g. NEVENTS=100000, NJOBS=100, efficiency=100% => 1000 events survive from 1000 evenst per (wmLHE)GEN step CRAB job => 1000 events get submitted to SIM step and this is OKAY.
3. Execute the setup.py file to build CMSSW releases and choose the Tier2/3 sites to store your samples produced through CRAB jobs.
You should check you have the writing permissions in the Tier2/3 sites you chose by executing the command below :
4. Move to the <Simulation>/<Campaign>/(wmLHE)GEN__<CMSSW> directory and execute the config_(wmLHE)GEN.py file to build the cmsDriver commands and CRAB configuration files.
5. When everything goes well, submit_crab_<CSV name>.sh will be given.
6. Once the CRAB jobs for the (wmLHE)GEN step are finished, prepare a CSV file with inputs in following order (split with commas) :
DATASETNAME : Name of the dataset that will be used for DAS publication.\
OUTPUTDATASET : Published DAS dataset path from the previous step. You can get this by executing CRAB status command.
7. The rest of the steps, from SIM to NanoAOD are just the same, CSV file with dataset name and the path to the published DAS dataset from the previous step should be given.
If anything is unclear, please contact Sihyun Jeon (shjeon@cern.ch, Skype : sihyun_jeon) before starting the production.
Last updated