1 Machine Learning and its application in Muon Tomography¶

Machine Learning inspired techniques can be applied to Muon Tomography to resolve many of associated problems. The two problems of interest to us are low resolution tomograms and data inefficiency. In this notebook, I am discussing the use of ML as applied to MT to solve problem of data inefficiency.

1.1 Data Inefficiency Problem¶

We define "good muon events" as events with 4 by 4 coincidence hits. Such events have defined spatial points in both top and bottom trays. For example, for the hit on the right the information $A (x_1,y_1)$ $B (x_2,y_2)$

are known. Due to limited capabilities of our instruments, such events are rare (less than $40\%$ withour best set up) with many events following the trend of the hit on the left.

$C (x_3,y_3)$

$D (x_4,?)$

We know that $y_4 \in (y_{min}, y_{max})$ and is discernable by $y_{res}$ which is the limited hardware spatial resolution of the telescope. Now, the question is what this $y_4$ may be?

Once we know how to address this question, we can reduce our data redundancy drastically and improve our overall resolution and efficiency of our schema.

1.2 Track reconstruction using non-perfect events¶

Each event is assigned a probability that describes the likelihood of using information from that event to reconstruct the muon track. Such a scheme means that events with 4 by 4 coincidence hits have a probability of 1 and events with no hits have probability 0. Events with 3 by 4 hits are thus also assigned a probability.

The way we assign this probability requires the use of some neural networks - RNN's and LSTM's. The idea is as follows:

Contextualize the entire dataset into a RNN framework
Calculate key statistics of overall dataset and individual events
Locate events with missing information (i.e. not perfect 4 by 4 coincidence hits)
Assign probabilities to candidates for the missing information using wholistic statistics, TDC data, and the angular distribution of muons.
Use RNN's and LSTM's to predict the missing datum in 3 by 4 events using the probabilistic approach.

1.3 Probability distribution of the muon hit of a single spatial dimension¶

Using the coordinate with full information (C in our case) and the other spatial coordinate ( $x_4$ ), we assign probabilites to each $y_i \in (y_{min}, y_{max})$ . Formally, it can be represented using Bayes' theorem.

$P(y_4 = y_i | x_3,y_3,x_4)$

Now, the question is how do we create such a probability distribution? What controlled/measured factors can help us make such decisions?

1.3.1 No information known¶

When we know previous/extra information about the muon hit, one can imagine the distribution is uniform on the space of the spatial dimension.

$P(y_4 = y_i | x_3,y_3,x_4) \sim \mathcal{U} (y_{min}, y_{max})$

1.3.2 One spatial coordinate and one single spatial dimension known¶

The distribution for such a case is much more complicated. The following are some ways of generating the distribution.

1.3.2.1 Exact Aggregate Data Approach¶

In this approach, we simply assign the probabilities based on the most likely value from the set of "good" events that share the exact coordinates with the event under consideration. The probability simply becomes a fraction of the two aggregate type of events.

$f_1(y_i) = P(y_4 = y_i) = \frac{E(y_i,x_3,y_3,x_4)}{E(x_3,y_3,x_4)}$

1.3.2.2 Similar Aggregate Data Approach¶

We do a similar thing like the Exact Aggregate Data Approach with the added consideration of neighbors to influence the statistic. For a tolerance unit of $\chi$ . The following is the probability distribution.

$f_2(y_i) = P(y_4 = y_i) = \frac{E(y_i,x_3+\chi,y_3+\chi,x_4+\chi)}{E(x_3+\chi,y_3+\chi,x_4+\chi)}$

1.3.2.3 Shift of TDC Approach¶

Since, we have demonstrated that TDC values are correlated with the transverse distance of the muon hit along the scintilator bar. It is possible to extrapolate 2D information from such values. In this approach, we do just that to determine the missing spatial dimension.

Let's consider the following case of event D.

Y and X direction has been mistakenly reveresed here in the plots

We are classing the groups of muon hits along the readout channel of event D into 4 arbitrary groups - W, Q,Y,T. We know that there is a relationship between the average TDC measured at the one of the channels of this group and the measured distance along the axis as illustrated by the following figure.

The discernable peaks/mean here represent the different "classes". The vector of such statistic is directly correlated with the value of the missing orthogonal dimension. Using this relationship - $y_p = m\mu + c$ -, we can predict the missing dimension ( $y_4$ in our case).

We define a function $g(y_4 = y_1)$ such that

$g(y_4 = y_i) = \frac{\lvert (m\mu_{known} + c)-y_i \lvert}{y_{min}+y_{max}}$

Thus,

$f_3(y_i) = P(y_4 = y_i | \mu = \mu_{known}) = \left\{ \begin{array}{ll} 1 - g(y_4 = y_i) & g(y_4 = y_i) \le 1 \\ 0 & g(y_4 = y_i) \ge 1 \\ \end{array} \right.$

1.3.2.4 Most Common Zenith Angle Approach¶

In this approach, we select the missing spatial dimension such that the angle resembles the most common zenith angle, $\alpha_{mean}$ with $\vec \alpha$ being the vector of angles computed from the data set.

$\alpha_{mean} = E(\vec \alpha)$

Since the space of $y_i$ is discrete. It is, thus, possible to generate a zenith angle distribution over such a space given the set $x_3,y_3,x_4$ is known as is the case in the event of interest.

For example, lets consider such an arbitrary distribution.

Example

We define a function $d(y_i)$ (given $x_3,y_3,x_4$ ) as follows:

$d(y_i | x_3,y_3,x_4) = \frac{\lvert \alpha_{mean} - \alpha_i(y_i) \lvert}{\alpha_{mean}}$

Thus, using this function we generate a probability distribution that selects for the most common zenith angle.

$f_4(y_i) = P(y_4 = y_i) = 1 - d(y_i | x_3,y_3,x_4)$

The best method should be some combination of all such approaches. This is where ML comes in. The ML scheme would work to solve for the coffecients that of such a weighted sum to maximize resolution of tomogram and minimize data inefficiency.

1.4 Applying ML¶

1.4.1 Comprehensive Probability Function¶

$P(y_4 = y_i) = Af_1(y_i) + Bf_2(y_i) + Cf_3(y_i) + Df_4(y_i)$

Let $\vec \theta$ be the vector of the coefficients.

$\vec \theta = \begin{bmatrix} A \\ B \\ C \\ D \end{bmatrix}$

1.4.2 Objective Function for Training¶

Here, $E_\text{total}$ is the total number of events in the data set and $E_\text{used}(\theta)$ is the number of events that are being used for analysis purposes as dictated by $\theta$ .

$\underset{\boldsymbol{\theta}} {{\text{minimize}}}\;\; E_\text{total} - E_\text{used}(\theta)$