How-to : Sample Production

How to produce your private samples

RunII-Legacy (Ultra Legacy) Campaign Steps

  1. (wmLHE)GEN : Produces events in a generator level (ME, PS) with the MC generators.\

    • GEN : Sample production with generator python fragments (Pythia, Sherpa, ...).\

    • wmLHEGEN : Sample production with gridpacks (MadGraph, Powheg, ...) and hadronizer python fragments (Pythia, Herwig, ...).

  2. SIM : Simulates the energy released by the particles in the crossed detectors. Beamspot, detector geometries are considered as the parameters. \

  3. DIGIPremix : The simulated detector signals are digitised and the prepared pileup events are overlayed/premixed to the samples.\

  4. HLT : Regional reconstructions are done by the HLT algorithms and makes trigger decisions using the HLT level objects.\

  5. RECO : Physics objects (e.g. muons, jets, etc.) are reconstructed. The output format of this step is generally in AODSIM format. \

  6. MiniAOD : The RECO level events are skimmed and reduced in size by running the MiniAOD module, which saves information of physics objects which can be directly used for analyses.\

  7. NanoAOD : The MiniAOD level events are further skimmed and reduced in size by running the NanoAOD module. The idea of NanoAOD format is to have a plain ROOT sample which can be analysed outside any CMSSW environment.

How to Produce Sample

Please send us an email (cms-exo-mci@cern.ch) if you happen to use this, we would like to know how helpful this can be. And hopefully make updates for RunIII if it does indeed help.

Keep in mind that this is not a CMS centrally provided feature! It's rather a somewhat private work from Sihyun Jeon (shjeon@cern.ch) to aid the analysers who want to do their sample studies before the requested official samples arrive which can take 4-6 months from time to time.

For massive amount of samples in need (due to many BSM parameters or requires too-many number of events), you should ask for centrally produced official samples. Even if you were able to produce all your samples privately with this, you still need to ask for official samples for validation purposes to check your gridpacks, generator fragments, detector conditions, and pileup premixing scenarios.

Automised codes that helps analysers to produce their RunII-Legacy (Ultra-Legacy) samples privately using CRAB job submissions.

Instructions

1. Git clone the EXO-MCsampleProductions gitlab repository link. 2. Prepare a CSV file with inputs in following order to run (wmLHE)GEN step (split with commas) :

Warning note for DIGIPremix step. For every each step, make sure you initiated voms-proxy-init --voms cms command. Especially if you don't for DIGIPremix step, you'll basically end up with 0 pileup samples.

  1. DATASETNAME : Name of the dataset that will be used for DAS publication.\

  2. GENFRAGMENT : Generator fragment python file that will be used for hadronizer or generator. It should be stored in skeleton/genfragments directory.\

  3. NEVENTS : Number of events in total that will be made. Jet matching or filter efficiencies should be taken into account. e.g. If matching efficiency is 0.4, and you want 10000 events to be produced, write 10000 X 1/0.4 = 25000.\

  4. NSPLITJOBS : Number of jobs that will be split into CRAB. e.g. NEVENTS=25000 with NSPLITJOBS=25 will run 1000 events per 1 CRAB job.\

  5. GRIDPACK : Path to gridpack if gridpack is used. It should be in CVMFS area.

Minor tips for number of events per CRAB job. The number of events you have in your output file after (wmLHE)GEN step should not be bigger than 2000-2500. Otherwise SIM step might face CRAB job failue with JobExitCodes:50664 (wall time error) due to its long processing time. So please check the matching/filtering efficiencies before running the scripts here. e.g. NEVENTS=100000, NJOBS=10, efficiency=50% => 5000 events survive from 10000 events per (wmLHE)GEN step CRAB job => 5000 events get submitted to SIM step and this is NOT OKAY. e.g. NEVENTS=100000, NJOBS=10, efficiency=15% => 1500 events survive from 10000 evenst per (wmLHE)GEN step CRAB job => 1500 events get submitted to SIM step and this is OKAY. e.g. NEVENTS=100000, NJOBS=100, efficiency=100% => 1000 events survive from 1000 evenst per (wmLHE)GEN step CRAB job => 1000 events get submitted to SIM step and this is OKAY.

# How your CSV file should look like if using gridpack (MCandI_wmLHEGENexample.csv).
WRtoNLtoLLJJ_WR4000_N1000_TuneCP2,Hadronizer_TuneCP2_13TeV_generic_LHE_pythia8_cff.py,100000,50,/cvmfs/cms.cern.ch/phys_generator/gridpacks/2017/13TeV/madgraph/V5_2.6.5/WRtoNLtoLLJJ/WRtoNLtoLLJJ_WR4000_N1000_slc6_amd64_gcc630_CMSSW_9_3_8_tarball.tar.xz
WRtoNLtoLLJJ_WR4000_N2000_TuneCP2,Hadronizer_TuneCP2_13TeV_generic_LHE_pythia8_cff.py,100000,50,/cvmfs/cms.cern.ch/phys_generator/gridpacks/2017/13TeV/madgraph/V5_2.6.5/WRtoNLtoLLJJ/WRtoNLtoLLJJ_WR4000_N2000_slc6_amd64_gcc630_CMSSW_9_3_8_tarball.tar.xz
WRtoNLtoLLJJ_WR4000_N3000_TuneCP2,Hadronizer_TuneCP2_13TeV_generic_LHE_pythia8_cff.py,100000,50,/cvmfs/cms.cern.ch/phys_generator/gridpacks/2017/13TeV/madgraph/V5_2.6.5/WRtoNLtoLLJJ/WRtoNLtoLLJJ_WR4000_N3000_slc6_amd64_gcc630_CMSSW_9_3_8_tarball.tar.xz

# How your CSV file should look like if not using gridpack (MCandI_GENexample.csv)
Estar_EG_L10000_M-250_TuneCP2,Estar_EG_L10000_M-250_TuneCP2_13TeV-pythia8.py,100000,50

3. Execute the setup.py file to build CMSSW releases and choose the Tier2/3 sites to store your samples produced through CRAB jobs.

python setup.py
# Setting up Fast and Full simulation sample production workflows.
# What is your T2/T3 storage site [T2_CH_CERN,T3_KR_KNU,T3_US_FNALLPC,..]? TX_YY_ZZZZ

You should check you have the writing permissions in the Tier2/3 sites you chose by executing the command below :

# Make sure you have the writing permissions by executing the command :
source /cvmfs/cms.cern.ch/crab3/crab.sh
source /cvmfs/cms.cern.ch/cmsset_default.sh
cmsrel CMSSW_X_Y_Z # Any CMSSW release versions will work.
cd CMSSW_X_Y_Z/src/
cmsenv
crab checkwrite --site=TX_YY_ZZZZ

# Checkwrite test results should end like below if successful :
# Checkwrite Result:
# Success: Able to write in /store/user/CERNID on site TX_YY_ZZZZ

4. Move to the <Simulation>/<Campaign>/(wmLHE)GEN__<CMSSW> directory and execute the config_(wmLHE)GEN.py file to build the cmsDriver commands and CRAB configuration files.

# Example command to produce Full simulation, RunIISummer20UL16 campaign, wmLHEGEN samples.
cd FullSimulation/RunIISummer20UL16/wmLHEGEN__CMSSW_10_6_18/src/
python config_wmLHEGEN.py MCandI_wmLHEGENexample.csv

# python configuration command should end like below if successful :
# [INFO] cmsDriver build for datasets below have completed
# [INFO] >>      WRtoNLtoLLJJ_WR4000_N1000_TuneCP2
# [INFO] >>      WRtoNLtoLLJJ_WR4000_N2000_TuneCP2
# [INFO] >>      WRtoNLtoLLJJ_WR4000_N3000_TuneCP2
# [INFO] Execute the command to submit the jobs to CRAB
# [INFO] >>     voms-proxy-init
# [INFO] >>     source submit_crab_MCandIexample.sh

5. When everything goes well, submit_crab_<CSV name>.sh will be given.

source /cvmfs/cms.cern.ch/crab3/crab.sh # Make sure you set up the CRAB environment.
source submit_crab_MCandI_wmLHEGENexample.sh

# How to check the status of your CRAB jobs and the published DAS dataset path.
cd MCandI_wmLHEGENexample/WRtoNLtoLLJJ_WR4000_N1000_TuneCP2/
crab status -d crab_projects/crab_WRtoNLtoLLJJ_WR4000_N1000_TuneCP2 --long

6. Once the CRAB jobs for the (wmLHE)GEN step are finished, prepare a CSV file with inputs in following order (split with commas) :

  1. DATASETNAME : Name of the dataset that will be used for DAS publication.\

  2. OUTPUTDATASET : Published DAS dataset path from the previous step. You can get this by executing CRAB status command.

# Published DAS output dataset path given from the previous wmLHEGEN step's CRAB status command.
# Path with RAWSIMoutput should be used for the next step.
# Path with LHEoutput has no use from now on.
# This one should be used :
# Output dataset:			/WRtoNLtoLLJJ_WR4000_N1000_TuneCP2/shjeon-RunIISummer20UL16_wmLHEGEN_RAWSIMoutput-bb4c54429f345d8180f3c54dc66a10f8/USER
# This one should not be used :
# Output dataset:			/WRtoNLtoLLJJ_WR4000_N1000_TuneCP2/shjeon-RunIISummer20UL16_wmLHEGEN_LHEoutput-bb4c54429f345d8180f3c54dc66a10f8/USER

# How your CSV file should look like.
WRtoNLtoLLJJ_WR4000_N1000_TuneCP2,/WRtoNLtoLLJJ_WR4000_N1000_TuneCP2/shjeon-RunIISummer20UL16_wmLHEGEN_RAWSIMoutput-bb4c54429f345d8180f3c54dc66a10f8/USER

7. The rest of the steps, from SIM to NanoAOD are just the same, CSV file with dataset name and the path to the published DAS dataset from the previous step should be given.

If anything is unclear, please contact Sihyun Jeon (shjeon@cern.ch, Skype : sihyun_jeon) before starting the production.

Last updated