next up previous
Next: Ensemble Tests Up: tsigex Previous: Samples

Fits

For each sample (i.e. experiment) we fit for the unknown parameters which are the fraction of events of class 1, 2, and 3. In the ideal situation the fit should return $\theta_i({\rm {FIT}}) \approx \frac{1}{3}$. To perform the fit we set the fitting PDF to be:

\begin{displaymath}
F= [\theta_1 P_1 (\vec{x}) + \theta_2 P_2 (\vec{x}) + \theta_3 P_3 (\vec{x})]   ,
\end{displaymath} (5)

where the joint PDFs are $P_k(\vec{x})$ (k=1,2,3). The fitting algorithm loops over all bins $j$ and minimized the negative of the log likelihood function:
\begin{displaymath}
- \log L (\nu_{\rm {tot}}, \vec{\theta}) = \nu_{\rm {tot}}(\...
...sum_{j=1}^{\ensuremath {{\cal{N}}}} n_j \log n_{\rm {tot}} F_j
\end{displaymath} (6)

For each sample we perform three (3) different fits associated with the different methods used for the calculation of the joint PDF:

  1. Standard Technique (1D): The 1D fit is very straightforward and relies on the Eq. (1). We used 100 bins in the calculation of the joint PDFs.

  2. Projection and Correlation Approximation (PCA): The transformed marginal PDFs are mapped into Gaussian. Although not exact, it represents a good approximation compared to the standard method when there is large correlations between the input variables. For each class one can compute the transformation matrix for $x \to y$ from the MC events. The caveat with the PCA method is to identify the canonical transformation $x \to y$ for the data! The PCA approach works very well for the calculation of likelihood ratios and was used at LEP for $WW$ event selection [2]. It was designed to classify events as signal ($S$) or background ($B$), where the signal to background ratio was large ($S/B » 1$). At LEP, the data was therefore transformed like the signal MC. Here, we have three classes with $\theta_1 \approx \theta_2 \approx \theta_3 \approx 1/3$ so there is an ambiguity on how to transform the data; we decided not to transform the data but to calculate the joint PDFs as described in Eq. (2).

  3. Multi-Dimensional Approach (Multi-D): The Multi-D fit is also straightforward and relies on the Eq. (3). Since we have two input variables the fit used a grid of 100 $\times$ 100 bins (i.e. 2D fit).

The results of each fit is stored and will be used for the ensemble test; which is described in the next section.


next up previous
Next: Ensemble Tests Up: tsigex Previous: Samples
Alain Bellerive 2006-05-19