IRTGJets2

From KIP Wiki
⧼kip-jumptonavigation⧽⧼kip-jumptosearch⧽

Jet Calibration

In the following, a simple in-situ jet energy calibration will be derived from simulated prompt photon production events. This calibration will then be applied to jets in the sample as a closure test to validate the method. This is in principle used as an example to touch the following subjects regarding jet based analyses:

  • Study of simple jet related variables
  • Overlap removal with other reconstructed objects
  • Study of correlations between variables to e.g. establish event selection criteria
  • Different levels of jet reconstruction
  • Truth jet matching
  • Jet Energy Scale
  • Deriving jet energy corrections in-situ
  • Application of jet corrections to data

Motivation

Jets are usually constructed by some algorithm on one of three levels:

  • Parton Level
    • The hard interaction at the level where one would usually draw Feynman Graphs. At leading order, one outgoing parton (gluon or quark) usually corresponds to one jet.
  • Particle/Hadron Level
    • On Particle Level jets are constructed from stable particles after hadronization. A jet usually consists of a group of particles that do not directly correspond to one parton in the hard interaction, but differs by e.g. out of cone particles or underlying event contributions
  • Calorimeter Level
    • The particles reached a calorimeter and deposited their energy in the form of hadronic and electromagnetic showers. Jets are reconstructed from energy depositions in calorimeter cells, or from clusters of cells corresponding to showers.

The calorimeter level is the only one available for experimental data, while the Parton Level is most easily described by theory. As jets on calorimeter level suffer from a large number of measuring inaccuracies and uncertainties, they need to be calibrated to correspond to either particle or parton level energies.

The simulated data used here consist of prompt photon events, where the final state on leading order parton level consists of one photon and one parton. As the photon can be measured very precisely, these events provide one opportunity to calibrate calorimeter level jets using data.

Acquiring the files and first look

The files required for this Tutorial can be downloaded from [1]

After the download put the file into a work directory and extract the contents, e.g. like:

mkdir $HOME/jet_tut
mv jet_calib.tar.gz $HOME/jet_tut/.
cd $HOME/jet_tut
tar xvzf jet_calib.tar.gz

3 files (plus one backup) contained in the archive:

phojet_sample.root

This is a flat root ntuple containing variables that describe 200000 events of simulated prompt photon production at an LHC experiment. There are two collections of jets stored in the file:

  • reconstructed jets based on a measurement of calorimeter showers
  • truth jets based on Monte Carlo particles after hadronization

plus a collection of photon candidates.

To have a quick look into the data structure and at first distributions, the Tree Viewer of Root can be used:

root -l phojet_sample.root
TTree *t = gDirectory->Get("CollectionTree")
t->StartViewer()

The Viewer is a graphical interface to the tree->Draw() command and allows to draw 1 to 3 dimensional histograms of variables by dragging them onto the axes and using the draw button (bottom left). The bulk of the data is stored as a group of vectors for each event, containing a single variable for several jets or particles. Possible tasks are:

  • have a look at several variables for both jet collections, e.g. transverse momenta Et of the jets and photon candidates (for this tutorial, we will neglect quark masses and assume Et = pT)
  • Monte Carlo truth and reconstructed (H1Topo) jet quantities can also be directly compared using a 2-dimensional histogram, e.g. the jet multiplicities (jetNum)

For 2-dimensional histograms, it is advisable to first do:

gStyle->SetPalette(1)

in the root command line first and use the draw option "colz" (at the top of the Viewer). More sophisticated things like applying event selection cuts can also be done using the Viewer.

jet_cal.C and jet_cal.h

The analysis and jet calibration are done using a root "Selector" class. You could create your own Selector by doing:

t->MakeSelector("somename")

but this was already done and some extras were added to the resulting files for convenience. In jet_cal.h the class jet_calib is defined. jet_cal.C contains the actual analysis code. Open it with an editor of your choice. Three functions created by MakeSelector will be used:

  • Begin(): This function is called once at the beginning and thus histograms and file access are set up here
    • Everything required in Begin() for this tutorial is already set up
  • Process(): This function is called for every event and will contain the main analysis code
  • Terminate(): This function is called once at the end of the analysis and thus useful to store outputs

The main part of the tutorial will consist of adding functionality to Process() plus using user defined functions.

The Analysis

First Steps

The analysis code can be run on a suitable ntuple out of the box. For convenience create a new file open.c with the following content:

{TChain chain("CollectionTree");
chain.Add("phojet_sample.root");}

and start root with:

root -l open.c

The analysis can now be run by:

chain.Process("jet_calib.C")

but currently does not do anything useful. As a first step simple plots studied using the tree-viewer can be added, e.g.:

h_leading_jet_pt->Fill(jetEtAntiKt4H1TopoJets->at(0)/1000.0);

in the Process() function below "jet_calib::GetEntry(entry);". The jet vectors are sorted by pT in the ntuple, allowing the simple use of at(0) for the jet with the highest pT. Rerun the analysis and draw h_leading_jet_pt.

Simple Event Selection

The final goal is to derive an in-situ calibration of the Jet Energy Scale (JES) using events where one precisely measured photon balances the leading jet in the transverse plane. The JES can then be obtained by comparing the jet and photon pT. To have clear event topologies, events with exactly one photon candidate are selected. Insert the following line into Process() at an appropriate place (after reading the event by getEntry and before filling any histograms):

if (PhoAOD_nc != 1) return kTRUE; //exit if no or more than one photon candidates are in the event

Doing event selection inside the analysis code is not the optimal solution, though, as rerunning the analysis to e.g. add a new histogram would require running over all events (additionally leave the above line in Process(), though, as a fail-safe).

Event Selection Using Event Lists

Create a new function outside of Process():

void jet_calib::doCuts(Long64_t entry)
{
  if (PhoAOD_nc != 1) return;
  elist->Enter(fChain->GetChainEntryNumber(entry));
  return;
}

This function is called for events and builds a list containing event numbers that have passed selection. Add the call of doCuts to Process() directly below getEntry:

// if option fillList, fill the event list and do nothing else
if (fillList) { doCuts(entry); return kTRUE;}

As the function should not be called every time the analysis is run, a switch fillList is used. This switched is controlled by a command line option fed to the analysis. Rerun the code with the following command:

chain.Process("jet_calib.C","fillList")

The necessary code to save the event list to a file for later use is already included in Terminate(). The number of events passing the selection and thus being in the list can be obtained by:

elist->Print()

To use the event list in Process(), add the following line directly above jet_calib::getEntry():

if ((useList)&&(!elist->Contains(entry))) return kTRUE;

If the analysis is started with the option useList, only events that have passed the selection will be processed:

chain.Process("jet_calib.C","useList")

Removal of the Fake Jet Caused by the Photon

The jet collections also contain the photon as a "fake jet", as no overlap removal between different physics object containers has been done. To avoid this double counting the jet faked by the photon has to be identified. A function, findOverlapJet(), is already in the code doing the following:

  • determine the minimum distance between the photon and any jet, by looping over the jets and the use of a distance metric in eta and phi
  • if the minimum distance is larger than some arbitrary value, the jet which is faked by the photon could not be determined
  • to decide on this arbitrary value, the histogram h_delta_r is filled

If there is no jet clearly corresponding to the photon, the event should be rejected. Add a call of findOverlapJet() to the Process() function after the call of doCuts():

// find the jet that corresponds to the photon, if there is none, exit this event
if (!findOverlapJet()) return kTRUE;

Rerun the analysis using the event list and have a look at h_delta_r. Apparently the majority of photons do have a corresponding fake jet within 0.02, motivating the cut on r_min that can be found in findOverlapJet(). Please comment the line filling h_delta_r in that function out now, as the histogram will be reused later. The new event selection criterion should also be added to doCuts():

if (!findOverlapJet()) return;

After rerunning with the option fillList, the event list should contain roughly 16000 events.

Access to the uncalibrated scale of the leading jet

H1Topo in the jet collection indicates that the JES has already been calibrated using a Monte Carlo based approach. For a purely data based calibration the raw, uncalibrated scale is required first. In addition the leading jet has to be determined after removal of the fake jet. Both jobs are done by the function findRawLeadingJet():

  • the vector jetPt_raw is used to store the uncalibrated transverse momentum
  • for each event it is first cleared and then elements are appended at the end for each jet in the event, based on the uncalibrated Px and Py
  • the leading (highest pT) non-fake jet is determined

This function is required to run in both doCuts() and Process(), so function calls need to be added directly after the respective overlap removal (findOverlapJet()).

JES Determination using Monte Carlo Truth Information

To determine the JES by comparing Truth to reconstructed jets, they have to be matched first, using matchTruthJet().

Again, a quick glance at this function is sufficient here:

  • the identifier of a reconstructed jet is handed to the function
  • looping over all truth jets, the best match using an eta-phi metric is found
  • if no good match was found, return -1

The function can be called in Process() e.g. like:

  for (Int_t i = 0; i < jetNumAntiKt4H1TopoJets; i++) {
     if (i == overlap_jet) continue;
     match[i] = matchTruthJet(i);
     // insert the filling of h_jet_response_truth here later
  }

The definition of a good enough match, a cut at 0.3 inside matchTruthJet(), can be motivated by looking at h_delta_r after running the analysis. To determine the JES, fill a histogram for matched jets inside the above loop over reconstructed jets in Process():

if (match[i] != -1) h_jet_response_truth->Fill(jetEtAntiKt4TruthJets->at(match[i])/1000.0, jetPt_raw.at(i)/jetEtAntiKt4TruthJets->at(match[i]));

This plots the measured over the true jet pT, called jet response, as a function of the true pT. A strong pT dependence of the JES is visible, and the reconstructed pT is generally 20 to 30 percent too low. The main cause of this difference is the non-compensating nature of the calorimeter used to measure the jets. The goal of the in-situ calibration is to correct for this using data only.

The photon pT as estimator for the true jet pT

Before using the pT of the photon in data to correct the jet pT, it has to be checked that the photon pT is a suitable estimator for the true jet pT, e.g. by adding the following line to Process():

if (match[leading_jet]!=-1) h_photon_vs_truth->Fill( PhoAOD_cl_et->at(0) / jetEtAntiKt4TruthJets->at(match[leading_jet]) );

to plot photon pT divided by the true jet pT.

Drawing h_photon_vs_truth, it is obvious that this is not the case without further event selection. On average the photon pT is 20% above the true jet pT, and the distribution has a significant tail towards high values. One option to improve the pT balance is to check if the photon and jet are back-to-back in the plane transverse to the beam axis. If they are not, some transverse momentum is missing in our picture and the event is not usable. This is done by the function checkBackToBack():

  • measures the deviation of the jet and photon axes being perfectly back-to-back in phi
  • if this deviation is smaller than some selection criterion, return true, else false

After adding a call of the function in Process() one can check a monitoring histogram:

h_photon_balance_vs_delta_r->Draw("colz")

If photon and leading jet are not perfectly back-to-back (y-value = 0) the distribution migrates towards higher values of the pT balance. This correlation motivates the cut already included in the function. The call of checkBackToBack() can thus be removed from Process() and added to doCuts() to improve the pT balance:

if (!checkBackToBack()) return;

Redo the event list and rerun the analysis. h_photon_vs_truth should now have significantly improved, especially regarding the tails towards high values. But the distribution is not sufficiently clean yet.

If there are additional jets in the event, but only the leading jet is being balanced with the photon, parts of the picture can be missing even if checkBackToBack() returns true. Thus it is sensible (due to time constraints without checking first) to add another criterion to the event selection in doCuts():

if (jetNumAntiKt4H1TopoJets > 2) return;

which now should look like:

void jet_calib::doCuts(Long64_t entry)
{
  if (PhoAOD_nc != 1) return;
  if (!findOverlapJet()) return;
  findRawLeadingJet();
  if (!checkBackToBack()) return;
  if (jetNumAntiKt4H1TopoJets > 2) return;
  elist->Enter(fChain->GetChainEntryNumber(entry));
  return;
}

Redoing the event list and analysis, h_photon_vs_jet should have further improved, while only 1350 events pass the selection. But there is still is a bias of ~8% and the distribution is not symmetric. Dominant factors are usually:

  • the reconstructed jet and photon having different sizes in the detector, they pick up different amounts of noise and underlying event
  • as the photon does not radiate gluons, the photon pT is an estimator of the parton level pT, while a reconstructed jet is an estimator of the particle level pT. To go to the parton level corrections for final state radiation and hadronization would be needed.

A bias of ~10% is acceptable for the proof of principle done in this exercise, though. Thus event selection is finally done.

Deriving and Applying an In-Situ JES Correction

The next step towards a calibration is simply plotting the estimator of true jet pT (the photon pT) vs the measured jet pT (in Process()):

h_rec_vs_photon->Fill(PhoAOD_cl_et->at(0)/1000.0, jetPt_raw.at(leading_jet)/1000.0);

This histogram is binned in photon pT to avoid bin migration effects due to the jet energy resolution. But it thus shows the uncalibrated pT for each calibrated value of the jet pT. To apply the correction to any jet, in any type of event, this needs to be turned around. In addition the histogram provides a step function. A smoother, at least continuous, calibration function would be preferable. For both this tasks, exchanging the x and y axis and a linear interpolation between bins, a TGraph is well suited. Add the following to Terminate():

  const Int_t n = 6;
  Double_t x[n], y[n];
  if (doCalib) {
     for (Int_t i=0; i<n; i++) {
        x[i] = h_rec_vs_photon->GetBinContent(i+1)*1000;
        y[i] = (bin_borders[i+1]+bin_borders[i])*1000/2;
     }
     grap = new TGraph(n, x, y);
     grap->SetName("calibration");
     gROOT->GetListOfSpecials()->Add(grap);
     TFile out("calib.root","recreate");
     grap->Write();
     out.Close();
  }
  • the bin contents are used to define x-y points for the function
  • the TGraph is created
  • its written out to file for later use

Redo the analysis including the option doCalib:

chain.Process("jet_calib.C","useListdoCalib") 

A TGraph is constructed to provide the corresponding calibrated pT for any uncalibrated pT, including a linear interpolation between bins. It can be drawn by:

grap = (TGraph*)gROOT->FindObject("calibration")
grap->Draw("AL")

and is also written to a file for later use. To apply the calibration add the following to Process() after findRawLeadingJet():

  if (applyCalib) {
     grap = (TGraph*)gROOT->FindObject("calibration");
     for (Int_t i = 0; i < jetNumAntiKt4H1TopoJets; i++) {
        if (i == overlap_jet) continue;
        jetPt_raw.at(i) = grap->Eval(jetPt_raw.at(i));
     }
  }
  • loop over all jets
  • grap is a function returning the calibrated pT for any raw pT, substitute the raw one by its output

and rerun with the applyCalib option, substituting calibrated values of jet pT for the uncalibrated ones.

In the function Begin() a variable max_event is set to 50000. As the sample contains 200000 events this can now be set higher. Afterwards rerun the analysis in the following order:

  • fillList
  • useListdoCalib
  • useListapplyCalib

The histogram h_jet_response_truth is the closure test to check the performance of the calibration. There should now be a positive bias of 5% to 15%, in rough agreement with the uncorrected bias in using the photon pT as true jet pT estimator with our cuts. While this is obviously not perfect, the JES is now roughly constant and its error has been reduced from 20%-30% down to 10% with the naive photon+jet calibration.

Congratulations, you have just derived an in-situ calibration from photon+jet events and applied it to pseudo-data!

Possible further tasks:

  • run the code with "applyCalib" only, without "useList". Is there a difference in response?
  • Is there a bias in the JES measurement due to the truth jet matching?
  • Can the binning used to derive the calibration, set via bin_borders(), be optimized?
  • Check for correlations between the phi difference of photon and jet and the number of reconstructed jets. Is there a way to increase statistics by adjusting the cuts?
  • Look at the pT distribution of the photons. As it is steeply falling, is it reasonable to take the simple bin centre while creating the TGraph?
  • ...

Author

(c) Frederik Ruehr