{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "1a7118fc-0e6c-4376-9888-f526f1838b42",
   "metadata": {},
   "source": [
    "# MuonDataLib Tutorial 3: Sample Log Filtering\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e4706a89-6c64-4125-948b-00dd82de4e14",
   "metadata": {},
   "source": [
    "One of the key advantages of event data is the option to create histograms from a subset of the information. Typically this is done based on the values of sample logs. \n",
    "\n",
    "## Sample logs\n",
    "\n",
    "The sample logs will automatically be loaded if they are present in the event nexus file. However, if they are missing it is possible to add them manually. \n",
    "In this example we will load some data, then create a pair of sample logs:\n",
    "\n",
    "- An oscillation in the temperature\n",
    "- A linear field\n",
    "\n",
    "The first step is to create the simulated log values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "816e1ddd-6261-425c-a574-25470748a29a",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "def linear(x, m, c):\n",
    "    return m*x + c\n",
    "\n",
    "def osc(x, amp, omega, phi):\n",
    "    return amp*np.sin(omega*x + phi) + amp*1.1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "29ac0892-633d-4a8d-ac85-b8fcce76ad4e",
   "metadata": {},
   "source": [
    "Next we will need to load the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9279a747-96b0-4a72-a466-ef32480709a7",
   "metadata": {},
   "outputs": [],
   "source": [
    "from MuonDataLib.data.loader.load_events import load_events\n",
    "from MuonDataLib.plot.basic import Figure\n",
    "import os\n",
    "\n",
    "file_name = 'HIFI00195790.nxs'\n",
    "input_file = os.path.join('..', '..', '..', '..', 'test', 'data_files', file_name)\n",
    "data = load_events(input_file, 64)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "48f08ba5-bf7b-464f-b4ee-e18dd5ffb84c",
   "metadata": {},
   "source": [
    "To create simulated data that matches the actual data, we need to know when each frame of data starts. The `get_frame_start_times` method provides a list of the start times in seconds."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8ef10f7c-64bc-414f-ab52-42181aa3d200",
   "metadata": {},
   "outputs": [],
   "source": [
    "frame_start_times = data.get_frame_start_times()\n",
    "print(frame_start_times)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7bc3fc31-1f12-4458-89a1-dd6e7e09b23e",
   "metadata": {},
   "source": [
    "From the frame start times we can see that the first frame is from about $0.994$ until $1.014$ seconds. We can also see that the last frame starts at about $3.174$ seconds. Now we will create the simulated sample logs (includes random noise), which includes $50$ measurments across the whole collection period."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f5d49448-c758-4a13-9c21-a82f214ee4fe",
   "metadata": {},
   "outputs": [],
   "source": [
    "from MuonDataLib.data.utils import create_data_from_function\n",
    "\n",
    "start = frame_start_times[0]\n",
    "end = frame_start_times[-1]+1\n",
    "step = (frame_start_times[-1]-frame_start_times[0])/40\n",
    "\n",
    "x, y = create_data_from_function(start, end, step, [3, 6.1, 0.91], osc, seed=1)\n",
    "data.add_sample_log(\"Sample Temp\", x, y)\n",
    "fig = Figure(y_label='Temperature (Kelvin)', x_label='Time (seconds)')\n",
    "fig.plot(x, y, 'Temp data')\n",
    "fig.show()\n",
    "\n",
    "x, y = create_data_from_function(start, end, step, [3.1, 0.1], linear, seed=1)\n",
    "data.add_sample_log(\"field\", x, y)\n",
    "fig2 = Figure(y_label='Field (MHz)', x_label='Time (seconds)')\n",
    "fig2.plot(x, y, 'Field data')\n",
    "fig2.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f3657b90-f2b6-49f1-ad3c-c29fe336340f",
   "metadata": {},
   "source": [
    "The sample logs were added by the `data.add_sample_log` command. The arguments are the name of the sample log, the x (time in seconds) values and the y values. Now we have the sample log data we can look at the different types of filters. The two plots above will be helpful for creating sensible filters. Now lets look at some unfiltered histogram data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2e9164ba-46ab-4277-8189-8e69b606dadc",
   "metadata": {},
   "outputs": [],
   "source": [
    "no_filter_hist, bins = data.histogram()\n",
    "fig = Figure(y_label='Counts')\n",
    "fig.plot_from_histogram(bins, no_filter_hist, [0])\n",
    "fig.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "34e7a894-447e-4839-81e8-65113dd5deed",
   "metadata": {},
   "source": [
    "## Sample log filters - Keeping data above a threshold value\n",
    "\n",
    "The first type of filter we will look at is one that is directly applied based on a sample log value. To start lets remove all of the data corresponding to field value of less than six. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c1ba64a6-bade-4df0-b329-9611059569ed",
   "metadata": {},
   "outputs": [],
   "source": [
    "data.keep_data_sample_log_above('field', 6.)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b61f30dd-140f-4f7f-9471-f86dcaca8af6",
   "metadata": {},
   "source": [
    "The `keep_data_sample_log_above` command is used to add a filter, the first argument is the sample log name we want to apply the filter to and the second argument is the minimum value we want to keep. Lets compare the resultant histogram with the unfiltered data. \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3e1cc928-47c1-4b36-848b-f6249b7f08de",
   "metadata": {},
   "outputs": [],
   "source": [
    "hist_above_6, bins = data.histogram()\n",
    "fig = Figure(y_label='Counts')\n",
    "fig.plot_from_histogram(bins, no_filter_hist, [0], 'unfiltered, ')\n",
    "fig.plot_from_histogram(bins, hist_above_6, [0], 'field >=6, ')\n",
    "fig.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bb52e27e-c50f-4342-86fd-2709a69a150b",
   "metadata": {},
   "source": [
    "We can see that the filter has removed some counts, as expected. However, we may want to check that the filter has behaved as expected. To plot the original and filtered sample log data;"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ad24827c-78ca-4eca-9eae-eca7a04933cf",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig = Figure(y_label='Field (MHz)', x_label='Time (seconds)')\n",
    "fig.plot_sample_log(data, 'field')\n",
    "fig.show()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5ee7a554-b929-44ff-a753-fb7b2bd3eb92",
   "metadata": {},
   "source": [
    "We can also check which data has been removed from the other sample log (Sample Temp),"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a6ba6a85-a4ee-405a-a64b-d22a9a97d436",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig = Figure(y_label='Temperature (Kelvin)', x_label='Time (seconds)')\n",
    "fig.plot_sample_log(data, 'Sample Temp')\n",
    "fig.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1e50ad68-44bd-4227-9da3-de90329a4706",
   "metadata": {},
   "source": [
    "If we add the wrong filter (e.g. typo), we can remove it with the command"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "53b163b1-2ea6-4085-a22c-26a24cca94b0",
   "metadata": {},
   "outputs": [],
   "source": [
    "data.delete_sample_log_filter('field')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a0d23400-ba60-4477-8f84-2ad1bc3d3ea6",
   "metadata": {},
   "source": [
    "where the argument is the name of the filter to be deleted. To check that it worked, lets plot the histogram (generating/saving a histogram will update all of the data). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d0eb4a8c-b721-48f1-b97b-12adcecf4870",
   "metadata": {},
   "outputs": [],
   "source": [
    "hist_check, bins = data.histogram()\n",
    "fig = Figure(y_label='Counts')\n",
    "fig.plot_from_histogram(bins, no_filter_hist, [0], 'unfiltered, ')\n",
    "fig.plot_from_histogram(bins, hist_check, [0], 'Check filter removed, ')\n",
    "fig.show()"
   ]
  },
  {
   "cell_type": "raw",
   "id": "68819ecc-6c02-4cbc-89bf-3a82559d1a80",
   "metadata": {},
   "source": [
    "As you can see from the plot, the two histograms match. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fc61b639-09c5-435e-aab3-5c9beff539d4",
   "metadata": {},
   "source": [
    "## Sample log filters - Keeping data below a threshold value\n",
    "\n",
    "The second type of filter we will look at removes all of the data above a sepcific value. For this example lets remove all of the data with a field value of greater than 10."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5c39ecc1",
   "metadata": {},
   "outputs": [],
   "source": [
    "data.keep_data_sample_log_below('field', 10.)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a60951df",
   "metadata": {},
   "source": [
    "The `keep_data_sample_log_below` command is used to add a filter, the first argument is the sample log name we want to apply the filter to and the second argument is the maximum value we want to keep. Lets compare the resultant histogram with the unfiltered data. \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "13a6ae55",
   "metadata": {},
   "outputs": [],
   "source": [
    "hist_below_10, bins = data.histogram()\n",
    "fig = Figure(y_label='Counts')\n",
    "fig.plot_from_histogram(bins, no_filter_hist, [0], 'unfiltered, ')\n",
    "fig.plot_from_histogram(bins, hist_above_6, [0], 'field <= 10, ')\n",
    "fig.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "69374769",
   "metadata": {},
   "source": [
    "Next lets plot the filtered sample log data;"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f5148d44",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig = Figure(y_label='Field (MHz)', x_label='Time (seconds)')\n",
    "fig.plot_sample_log(data, 'field')\n",
    "fig.show()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3234011f",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig = Figure(y_label='Temperature (Kelvin)', x_label='Time (seconds)')\n",
    "fig.plot_sample_log(data, 'Sample Temp')\n",
    "fig.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6ddc0ce4",
   "metadata": {},
   "source": [
    "Lets remove the filter,"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d64683f2",
   "metadata": {},
   "outputs": [],
   "source": [
    "data.delete_sample_log_filter('field')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a68c0e17",
   "metadata": {},
   "source": [
    "## Sample log filters - Keeping data within a range of values\n",
    "\n",
    "The last type of sample log filter we will look at keeps data between two values. At present it is only possible to add one of these band filters per sample log. Lets start by adding the filter to the `Temp` sample log"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bf55b074",
   "metadata": {},
   "outputs": [],
   "source": [
    "data.keep_data_sample_log_between('Sample Temp', 2, 5.5)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3bc5bdb1",
   "metadata": {},
   "source": [
    "The `keep_data_sample_log_between` command is used to add a filter that keeps the data between the two values. The first argument is the sample log name we want to apply the filter to, the second argument is the minimum value we want to keep and the final argument is the maximum value we want to keep. Lets compare the resultant histogram with the unfiltered data. \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "18b6a99d",
   "metadata": {},
   "outputs": [],
   "source": [
    "hist_band, bins = data.histogram()\n",
    "fig = Figure(y_label='Counts')\n",
    "fig.plot_from_histogram(bins, no_filter_hist, [0], 'unfiltered, ')\n",
    "fig.plot_from_histogram(bins, hist_band, [0], 'Temp band, ')\n",
    "fig.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "45c35b13",
   "metadata": {},
   "source": [
    "Next lets plot the filtered sample log data;"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3eda75a8",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig = Figure(y_label='Temperature (Kelvin)', x_label='Time (seconds)')\n",
    "fig.plot_sample_log(data, 'Sample Temp')\n",
    "fig.show()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "168a19e4",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig = Figure(y_label='Field (MHz)', x_label='Time (seconds)')\n",
    "fig.plot_sample_log(data, 'field')\n",
    "fig.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ea8334b2",
   "metadata": {},
   "source": [
    "Lets remove the filter,"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "36e6edc0",
   "metadata": {},
   "outputs": [],
   "source": [
    "data.delete_sample_log_filter('Sample Temp')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.11"
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {},
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}