MuonDataLib Tutorial 3: Sample Log Filtering
One of the key advantages of event data is the option to create histograms from a subset of the information. Typically this is done based on the values of sample logs.
Sample logs
The sample logs will automatically be loaded if they are present in the event nexus file. However, if they are missing it is possible to add them manually. In this example we will load some data, then create a pair of sample logs:
An oscillation in the temperature
A linear field
The first step is to create the simulated log values
[1]:
import numpy as np
def linear(x, m, c):
return m*x + c
def osc(x, amp, omega, phi):
return amp*np.sin(omega*x + phi) + amp*1.1
Next we will need to load the data.
[2]:
from MuonDataLib.data.loader.load_events import load_events
from MuonDataLib.plot.basic import Figure
import os
file_name = 'HIFI00195790.nxs'
input_file = os.path.join('..', '..', '..', '..', 'test', 'data_files', file_name)
data = load_events(input_file, 64)
WARNING: The metadata **RUN** is missing. Using fallback values
WARNING: The metadata **TITLE** is missing. Using fallback values
WARNING: The metadata **EXPERIMENT IDENTIFIER** is missing. Using fallback values
To create simulated data that matches the actual data, we need to know when each frame of data starts. The get_frame_start_times method provides a list of the start times in seconds.
[3]:
frame_start_times = data.get_frame_start_times()
print(frame_start_times)
[0.99375522 1.01375986 1.05375966 1.07375982 1.09375972 1.11375948
1.1537596 1.17375944 1.1937592 1.2137593 1.25375926 1.2737591
1.293759 1.31375898 1.3537587 1.37375882 1.39375864 1.4137587
1.45375862 1.4737584 1.4937582 1.5137583 1.5537582 1.57375816
1.5937582 1.61375798 1.65375788 1.6737578 1.69375772 1.71375766
1.75375742 1.7737576 1.79375736 1.8137575 1.8537573 1.873757
1.89375702 1.91375692 1.95375682 1.9737567 1.9937567 2.01376154
2.05376114 2.07376132 2.0937613 2.1137612 2.15376084 2.17376082
2.19376088 2.2137608 2.25376056 2.2737605 2.29376058 2.31376048
2.35376032 2.37376016 2.39376014 2.41376014 2.45376014 2.4737599
2.49375992 2.51375984 2.55375956 2.57375972 2.59375972 2.61375954
2.65375926 2.67375942 2.69375924 2.71375926 2.75375916 2.77375892
2.79375886 2.81375892 2.85375876 2.87375874 2.89375862 2.91375866
2.95375842 2.97375834 2.99375838 3.01376312 3.05376272 3.07376278
3.09376266 3.11376256 3.1537625 3.1737624 ]
From the frame start times we can see that the first frame is from about \(0.994\) until \(1.014\) seconds. We can also see that the last frame starts at about \(3.174\) seconds. Now we will create the simulated sample logs (includes random noise), which includes \(50\) measurments across the whole collection period.
[4]:
from MuonDataLib.data.utils import create_data_from_function
start = frame_start_times[0]
end = frame_start_times[-1]+1
step = (frame_start_times[-1]-frame_start_times[0])/40
x, y = create_data_from_function(start, end, step, [3, 6.1, 0.91], osc, seed=1)
data.add_sample_log("Sample Temp", x, y)
fig = Figure(y_label='Temperature (Kelvin)', x_label='Time (seconds)')
fig.plot(x, y, 'Temp data')
fig.show()
x, y = create_data_from_function(start, end, step, [3.1, 0.1], linear, seed=1)
data.add_sample_log("field", x, y)
fig2 = Figure(y_label='Field (MHz)', x_label='Time (seconds)')
fig2.plot(x, y, 'Field data')
fig2.show()
Data type cannot be displayed: application/vnd.plotly.v1+json
Data type cannot be displayed: application/vnd.plotly.v1+json
The sample logs were added by the data.add_sample_log command. The arguments are the name of the sample log, the x (time in seconds) values and the y values. Now we have the sample log data we can look at the different types of filters. The two plots above will be helpful for creating sensible filters. Now lets look at some unfiltered histogram data.
[5]:
no_filter_hist, bins = data.histogram()
fig = Figure(y_label='Counts')
fig.plot_from_histogram(bins, no_filter_hist, [0])
fig.show()
Data type cannot be displayed: application/vnd.plotly.v1+json
Sample log filters - Keeping data above a threshold value
The first type of filter we will look at is one that is directly applied based on a sample log value. To start lets remove all of the data corresponding to field value of less than six.
[6]:
data.keep_data_sample_log_above('field', 6.)
The keep_data_sample_log_above command is used to add a filter, the first argument is the sample log name we want to apply the filter to and the second argument is the minimum value we want to keep. Lets compare the resultant histogram with the unfiltered data.
[7]:
hist_above_6, bins = data.histogram()
fig = Figure(y_label='Counts')
fig.plot_from_histogram(bins, no_filter_hist, [0], 'unfiltered, ')
fig.plot_from_histogram(bins, hist_above_6, [0], 'field >=6, ')
fig.show()
WARNING: The target 0.0 is before the first frame start time 0.99375522 seconds. Difference is 0.99375522 seconds
Data type cannot be displayed: application/vnd.plotly.v1+json
We can see that the filter has removed some counts, as expected. However, we may want to check that the filter has behaved as expected. To plot the original and filtered sample log data;
[8]:
fig = Figure(y_label='Field (MHz)', x_label='Time (seconds)')
fig.plot_sample_log(data, 'field')
fig.show()
Data type cannot be displayed: application/vnd.plotly.v1+json
We can also check which data has been removed from the other sample log (Sample Temp),
[9]:
fig = Figure(y_label='Temperature (Kelvin)', x_label='Time (seconds)')
fig.plot_sample_log(data, 'Sample Temp')
fig.show()
Data type cannot be displayed: application/vnd.plotly.v1+json
If we add the wrong filter (e.g. typo), we can remove it with the command
[10]:
data.delete_sample_log_filter('field')
where the argument is the name of the filter to be deleted. To check that it worked, lets plot the histogram (generating/saving a histogram will update all of the data).
[11]:
hist_check, bins = data.histogram()
fig = Figure(y_label='Counts')
fig.plot_from_histogram(bins, no_filter_hist, [0], 'unfiltered, ')
fig.plot_from_histogram(bins, hist_check, [0], 'Check filter removed, ')
fig.show()
Data type cannot be displayed: application/vnd.plotly.v1+json
Sample log filters - Keeping data below a threshold value
The second type of filter we will look at removes all of the data above a sepcific value. For this example lets remove all of the data with a field value of greater than 10.
[12]:
data.keep_data_sample_log_below('field', 10.)
The keep_data_sample_log_below command is used to add a filter, the first argument is the sample log name we want to apply the filter to and the second argument is the maximum value we want to keep. Lets compare the resultant histogram with the unfiltered data.
[13]:
hist_below_10, bins = data.histogram()
fig = Figure(y_label='Counts')
fig.plot_from_histogram(bins, no_filter_hist, [0], 'unfiltered, ')
fig.plot_from_histogram(bins, hist_above_6, [0], 'field <= 10, ')
fig.show()
WARNING: The target 3.224538981929381 is after the last frame start time 3.1737624 seconds. Difference is 0.050776581929380615 seconds
WARNING: The target 3.224538981929381 is after the last frame start time 3.1737624 seconds. Difference is 0.050776581929380615 seconds
WARNING: The target 4.147959927132639 is after the last frame start time 3.1737624 seconds. Difference is 0.9741975271326391 seconds
Data type cannot be displayed: application/vnd.plotly.v1+json
Next lets plot the filtered sample log data;
[14]:
fig = Figure(y_label='Field (MHz)', x_label='Time (seconds)')
fig.plot_sample_log(data, 'field')
fig.show()
Data type cannot be displayed: application/vnd.plotly.v1+json
[15]:
fig = Figure(y_label='Temperature (Kelvin)', x_label='Time (seconds)')
fig.plot_sample_log(data, 'Sample Temp')
fig.show()
Data type cannot be displayed: application/vnd.plotly.v1+json
Lets remove the filter,
[16]:
data.delete_sample_log_filter('field')
Sample log filters - Keeping data within a range of values
The last type of sample log filter we will look at keeps data between two values. At present it is only possible to add one of these band filters per sample log. Lets start by adding the filter to the Temp sample log
[17]:
data.keep_data_sample_log_between('Sample Temp', 2, 5.5)
The keep_data_sample_log_between command is used to add a filter that keeps the data between the two values. The first argument is the sample log name we want to apply the filter to, the second argument is the minimum value we want to keep and the final argument is the maximum value we want to keep. Lets compare the resultant histogram with the unfiltered data.
[18]:
hist_band, bins = data.histogram()
fig = Figure(y_label='Counts')
fig.plot_from_histogram(bins, no_filter_hist, [0], 'unfiltered, ')
fig.plot_from_histogram(bins, hist_band, [0], 'Temp band, ')
fig.show()
WARNING: The target 0.0 is before the first frame start time 0.99375522 seconds. Difference is 0.99375522 seconds
WARNING: The target 3.5009574226307287 is after the last frame start time 3.1737624 seconds. Difference is 0.3271950226307285 seconds
WARNING: The target 4.1001561053122115 is after the last frame start time 3.1737624 seconds. Difference is 0.9263937053122113 seconds
WARNING: The target 3.3321140251960304 is after the last frame start time 3.1737624 seconds. Difference is 0.1583516251960302 seconds
WARNING: The target 3.941556354150607 is after the last frame start time 3.1737624 seconds. Difference is 0.7677939541506067 seconds
WARNING: The target 4.147959927132639 is after the last frame start time 3.1737624 seconds. Difference is 0.9741975271326391 seconds
Data type cannot be displayed: application/vnd.plotly.v1+json
Next lets plot the filtered sample log data;
[19]:
fig = Figure(y_label='Temperature (Kelvin)', x_label='Time (seconds)')
fig.plot_sample_log(data, 'Sample Temp')
fig.show()
Data type cannot be displayed: application/vnd.plotly.v1+json
[20]:
fig = Figure(y_label='Field (MHz)', x_label='Time (seconds)')
fig.plot_sample_log(data, 'field')
fig.show()
Data type cannot be displayed: application/vnd.plotly.v1+json
Lets remove the filter,
[21]:
data.delete_sample_log_filter('Sample Temp')