{ "cells": [ { "cell_type": "markdown", "id": "b4080e41-77db-4708-98f2-b870fe823778", "metadata": {}, "source": [ "# MuonDataLib Tutorial 7: The Cache\n", "\n", "\n", "In this tutorial we will discuss the cache, this is used to minimize the amount of time spent doing calculations. When the code creates a histogram it will save a copy to the cache, so if you ask for it again later the calculation can be skipped. If the filters or resolution of the histograms change then the cache is cleared and the calculation will happen the next time the histograms are requested. \n", "\n", "It is important that this tutorial is done in order so that the cache can be correctly demonstrated. In these examples we use the `save_histograms` command, but the same applies when using the `histogram` command.\n", "\n", "In this example the file is relatively small, so the performance improvement from using the cache is not obvious. Running this locally with a large file will show the differences more clearly. " ] }, { "cell_type": "code", "execution_count": null, "id": "f1d22833-90fa-4ba0-820b-89dbf0184907", "metadata": {}, "outputs": [], "source": [ "from MuonDataLib.data.loader.load_events import load_events\n", "import time\n", "import os\n", "\n", "file_name = 'HIFI00195790.nxs'\n", "input_file = os.path.join('..', '..', '..', '..', 'test', 'data_files', file_name)\n", "\n", "output_file = os.path.join('..', 'Output_files', file_name)\n", "data = load_events(input_file, 64)" ] }, { "cell_type": "markdown", "id": "fbd38d4b-279f-4499-98be-9eed1efe1859", "metadata": {}, "source": [ "The `time` package is being used to record the time taken to save the histogram nexus file. It should take a fraction of a second." ] }, { "cell_type": "code", "execution_count": null, "id": "7d9ecd31-9ea4-4a42-97cb-5b579a56ce6a", "metadata": {}, "outputs": [], "source": [ "start = time.time()\n", "data.save_histograms(output_file)\n", "print('time taken', time.time() - start)\n", " " ] }, { "cell_type": "markdown", "id": "6b6b1f3f-93fb-4c11-8851-f447a22fd287", "metadata": {}, "source": [ "Next we will repeat the above code again and see that the time to write the histogram nexus file is less than a second. This is because the cache has used its stored values instead of repeating the calculation. " ] }, { "cell_type": "code", "execution_count": null, "id": "101e852a-b41e-4ce2-bff7-c631135ec604", "metadata": {}, "outputs": [], "source": [ "start = time.time()\n", "data.save_histograms(output_file)\n", "print('time taken', time.time() - start)" ] }, { "cell_type": "markdown", "id": "f6d590e0-789f-4199-9657-560f35eed425", "metadata": {}, "source": [ "Adding a filter will reset the cache, so the next time the histograms are requested it will recalculate them. This is because the filter will change the events in the histogram." ] }, { "cell_type": "code", "execution_count": null, "id": "9775419f-e39d-41ee-8918-a6fb850e58de", "metadata": {}, "outputs": [], "source": [ "data.remove_data_time_between('any', 0, 1)" ] }, { "cell_type": "markdown", "id": "224324b4-d32e-4306-950a-d9b2d32f3fe3", "metadata": {}, "source": [ "Repeating the histogram calculation will be slower than before. " ] }, { "cell_type": "code", "execution_count": null, "id": "28b9d005-d28c-42ac-93e4-e99398f8e36c", "metadata": {}, "outputs": [], "source": [ "start = time.time()\n", "data.save_histograms(output_file)\n", "print('time taken', time.time() - start)" ] }, { "cell_type": "markdown", "id": "851aa422-388b-400f-ad7c-26d52f2c8d4d", "metadata": {}, "source": [ "However, repeating the above code will still use the cache and as a result is much quicker." ] }, { "cell_type": "code", "execution_count": null, "id": "11023484-4a59-4036-bad9-d887d95784c2", "metadata": {}, "outputs": [], "source": [ "start = time.time()\n", "data.save_histograms(output_file)\n", "print('time taken', time.time() - start)" ] }, { "cell_type": "markdown", "id": "add3d101-21fd-4d52-b4d4-44fae714e652", "metadata": {}, "source": [ "Clearing the filters will also clear the cache. So the next `save_histograms` will need to calculate them again. This is because the cache only saves the data for the previous calculation and not the full history. " ] }, { "cell_type": "code", "execution_count": null, "id": "d78a52c2-2ab1-4a43-b267-995e4d83d567", "metadata": {}, "outputs": [], "source": [ "data.clear_filters()" ] }, { "cell_type": "code", "execution_count": null, "id": "03b36453-29cc-4767-b71c-0da984e1842c", "metadata": {}, "outputs": [], "source": [ "start = time.time()\n", "data.save_histograms(output_file)\n", "print('time taken', time.time() - start)" ] }, { "cell_type": "code", "execution_count": null, "id": "b44cdc9c-6438-47ce-85e3-b7a495d6750e", "metadata": {}, "outputs": [], "source": [ "os.remove(output_file)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.5" } }, "nbformat": 4, "nbformat_minor": 5 }