MuonDataLib Tutorial 7: The Cache

In this tutorial we will discuss the cache, this is used to minimize the amount of time spent doing calculations. When the code creates a histogram it will save a copy to the cache, so if you ask for it again later the calculation can be skipped. If the filters or resolution of the histograms change then the cache is cleared and the calculation will happen the next time the histograms are requested.

It is important that this tutorial is done in order so that the cache can be correctly demonstrated. In these examples we use the save_histograms command, but the same applies when using the histogram command.

In this example the file is relatively small, so the performance improvement from using the cache is not obvious. Running this locally with a large file will show the differences more clearly.

[1]:
from MuonDataLib.data.loader.load_events import load_events
import time
import os

file_name = 'HIFI00195790.nxs'
input_file = os.path.join('..', '..', '..', '..', 'test', 'data_files', file_name)

output_file = os.path.join('..', 'Output_files', file_name)
data = load_events(input_file, 64)
WARNING: The metadata **RUN** is missing. Using fallback values
WARNING: The metadata **TITLE** is missing. Using fallback values
WARNING: The metadata **EXPERIMENT IDENTIFIER** is missing. Using fallback values

The time package is being used to record the time taken to save the histogram nexus file. It should take a fraction of a second.

[2]:
start = time.time()
data.save_histograms(output_file)
print('time taken', time.time() - start)
time taken 0.012064456939697266

Next we will repeat the above code again and see that the time to write the histogram nexus file is less than a second. This is because the cache has used its stored values instead of repeating the calculation.

[3]:
start = time.time()
data.save_histograms(output_file)
print('time taken', time.time() - start)
time taken 0.010390758514404297

Adding a filter will reset the cache, so the next time the histograms are requested it will recalculate them. This is because the filter will change the events in the histogram.

[4]:
data.remove_data_time_between('any', 0, 1)

Repeating the histogram calculation will be slower than before.

[5]:
start = time.time()
data.save_histograms(output_file)
print('time taken', time.time() - start)
WARNING: The target 0.0 is before the first frame start time 0.99375522 seconds. Difference is 0.99375522 seconds
time taken 0.011232852935791016

However, repeating the above code will still use the cache and as a result is much quicker.

[6]:
start = time.time()
data.save_histograms(output_file)
print('time taken', time.time() - start)
time taken 0.010731697082519531

Clearing the filters will also clear the cache. So the next save_histograms will need to calculate them again. This is because the cache only saves the data for the previous calculation and not the full history.

[7]:
data.clear_filters()
Resolution: 0.016 μs
[8]:
start = time.time()
data.save_histograms(output_file)
print('time taken', time.time() - start)
time taken 0.01096200942993164
[9]:
os.remove(output_file)