{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n'Difficult' example\n^^^^^^^^^^^^^^^^^^^\nThe <insertname> package was mainly designed keeping horseshoe bat calls \nin mind. These calls are high-frequency (>50kHz) and short (20-50ms) sounds\nwhich are quite unique in their structure. Many of the default parameter\nvalues reflect the original dataset. In fact, many of the default parameters\ndon't even work for some of the example datasets themselves!\nIt should be no surprise that unpredictable things happen when segmentation\nand tracking is run with default values. \n\nThis example will guide you through understanding the various parameters\nthat can be tweaked and what effect they actually have. It is not \nan exhaustive treatment of the implementation, but a 'lite' intro. For more\ndetails of course, the original documentation should hopefully be helpful. \n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from matplotlib.lines import Line2D\nimport matplotlib.pyplot as plt\nimport itsfm \nfrom itsfm.data import example_calls, all_wav_files\n\n# a chosen set of tricky calls to illustrate various points\n\ntricky_rec = list(map( lambda X: '2018-08-17_23_115' in X, all_wav_files))\nindex = tricky_rec.index(True)\naudio, fs = example_calls[index] # load the relevant example audio"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Step 1: the right `signal_level`\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>\nIn the given audio segment, the first step is to identify what is \nbackground and what is signal. The signal of interest is identified as\nbeing above a particular dB rms, as calculated y a moving dB rms window\nof a user-defined `window_size`. \n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "If we want high temporal resolution to segment out the call, we need a short\n`window_size`. Let's try out  0.5 and 2ms for now. \n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "halfms_windowsize = int(fs*0.5*10**-3)\ntwoms_windowsize = halfms_windowsize*4\nplt.figure()\nax = plt.subplot(211)\nitsfm.plot_movingdbrms(audio, fs, window_size=halfms_windowsize)\nitsfm.plot_movingdbrms(audio, fs, window_size=twoms_windowsize)\n\nfirst_color = '#1f77b4'\nsecond_color = '#ff7f0e'\ncustom_lines = [Line2D([0],[0], color=first_color),\n                Line2D([1],[1],color=second_color),]\nax.legend(custom_lines, ['0.5ms', '2ms'])\nplt.ylabel('Moving dB rms')\nplt.subplot(212, sharex=ax)\n_ = itsfm.make_specgram(audio, fs);"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The fact that the 0.5ms moving rms profile is so 'rough' is already a bad\nsign. The signal of interest is any region/s which are above or equal to \nthe `signal_level`. When the moving rms fluctuates so wildly, the relevant\nsignal region may be hard to capture because it keeps going above and \nbelow the threshold - leading to many tiny '\u00edslands'. Let's choose the 2ms `window_size` \nbecause it doesn't fluctuate crazily and is also a relatively short time scale\nin comparison the the signal duration. -40 dB rms seems to be a sensible value \nwhen we compare the approximate start and end times of the signal with the dB rms profile. \n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "keywords = {'segment_method':'pwvd',\n            'signal_level':-40,\n            'window_size':twoms_windowsize}\n\noutputs = itsfm.segment_and_measure_call(audio, fs,**keywords)\noutput_inspector = itsfm.itsFMInspector(outputs, audio, fs)\n\noutput_inspector.visualise_geq_signallevel()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let's check the output as it is right now\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "output_inspector.visualise_cffm_segmentation()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Inspect initial outputs\nThe CF-FM segmentation is *clearly* not correct. There's FM component recognised\nat all - how is this happening? The reason it's not happening is likely because\nthe :code:`fmrate` has been misspecified or the frequency profile wasn't\nestimated correctly. Let's view the frequency profile first. \n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "output_inspector.visualise_frequency_profiles()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The cleaned frequency profile seems to somehow 'ignore' the \ndownward FM sweep in the call. Why is this happening? The\n'flatness' in the cleaned frequency profile is likely \ncoming from the spike detection. Spikes in the \nfrequency profile are detected when the 'accelaration' of \n(the 2nd derivative) the frequency profile increases beyond\na threshold. Let's check out the accelaration profile \n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "output_inspector.visualise_accelaration()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The accelaration profile matches this suspicion. When a spikey \nregion is encountered in the frequency profile in the `pwvd`\nfrequency tracking - it backs up a bit and extrapolates the \nslope according to what's just behind the spikey region. \nThe 'length' of this backing up in seconds is decided by \nthe :code:`extrap_window`, which is short for extrapolation \nwindow. Let's reduce the :code:`extrap_window` and see if \nthe frequency is tracked better. \n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "keywords['extrap_window'] = 50*10**-6\noutputs_refined = itsfm.segment_and_measure_call(audio, fs,**keywords)\nout_refined_inspector = itsfm.itsFMInspector(outputs_refined, audio, fs)\nout_refined_inspector.visualise_frequency_profiles()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "So, we've managed to get a much better tracking by telling the \nalgorithm not to 'backup' too much to infer the trend\nthe frequency profile was heading in. It's not perfect, but\nit does recover the fact that there is an FM region. Remember this\nissue came up because of the weird reflection of the CF part\nthat is of comparable intensity as the actual FM part itself. \n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "How the CF-FM segmentation works\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>\n\n![](../_static/fmrate_workflow.png)\n\n   :width: 66 %\n   :align: left\nCF-FM segmentation occurs through a multi step process. \nFirst the instantaneous frequency of the signal is estimated at a sample-level\nresolution, the raw frequency profile - `raw_fp`. Then the `raw_fp` is refined \nas it can be quite noisy because of well, noise, or abrupt changes in signal\nlevel across the sound. \n\nMinor jumps will be corrected to give rise to the \ncleaned frequency profile - `cleaned_fp`.  The `cleaned_fp` however, is \na very high-resolution look into the sound's frequency profile. Even though\nthe temporal resolution is high, the spectral resolution is limited by the \nsize of the `pwvd_window` (refer to the original docs here). This limited\nspectral resolution means each sample will not have a unique value. For instance\nif the frequency of sound is increasing linearly with time, the `cleaned_fp`\nmay actually look like steps going up. These 'steps' cause issues while calculating\nthe rate of frequency modulation - `fmrate`, and so , the `cleaned_fp` is \nactually downsampled and then upsampled by interpolation. This gives rise to \nthe fitted frequency profile - `fitted_fp`.\n\nThe `fitted_fp` captures the local trends and doesn't have the step like nature of\n`cleaned_fp`. If we were to actually measure frequency modulation from `cleaned_fp`\nthere'd be lots of 0 modulation regions and many very brief bursts of FM regions\nwherever a 'step' rose or dropped. Thanks to the sample-wise unique values in `fitted_fp` we can now \ncalculate the local variation in frequency modulation across the sound.\n\nLet's now check the frequency profiles once more\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "out_refined_inspector.visualise_frequency_profiles()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The raw and cleaned frequency profiles are very similar, though the 'cleanliness'\nin the `cleaned_fp` is visible especially because the frequency profile doesn'\nwildly jump around towards the end of the call. The `fitted_fp` also \nclosely matches the `cleaned_fp` though it seems to rise later and drop faster.\nThis is because of the downsampling that happens to estimate the `fmrate`. \nThe rise time is a direct indicator of the downsampling factor, which samples\nthe `cleaned_fp` at periodic intervals, and is thus called `sample_every`. The\n`sample_every` parameter defaults to 1% of the input signal duration. If the\nfrequency profiles broadly match the actual call as seen coarsely on a spectrogram.\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Step 2: Check the `fmrate` profile\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>\nCF and FM parts of a call are segmented based on the rate of frequency modulation\nthey show. The `fmrate` is a np.array with the estimated frequency modulation\nrate in **kHz/ms**. Yes, pay attention to the units, *it's not kHz/s, but kHz/ms*!\nLet's take a look at the FM rate profile for this sound. \n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "out_refined_inspector.visualise_fmrate()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let's compare this fmrate profile with the final CF-FM segmentation. \n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "out_refined_inspector.visualise_cffm_segmentation()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Something's odd -- even though the FM rate seems to be close to zero\nnear the actual FM parts, parts of it are still being classified as FM!! What's happening. \nLet's take a closer look at the FM rate profile, but zoom in so the y-axis is\nmore limited. Let's also overlay the CF-FM segmentation results \nover this. \n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "seg_out, call_parts, msmts = outputs_refined \ncf, fm, info = seg_out \n\nw,s,a = out_refined_inspector.visualise_fmrate()\nw.set_ylim(0,5)\nt_min, t_max = 0.01, 0.02\nw.set_xlim(t_min, t_max)\ns.set_xlim(t_min, t_max)\na.set_xlim(t_min, t_max)\nw.plot()\nitsfm.make_waveform(cf*4,fs)\nitsfm.make_waveform(fm*4,fs)\nplt.tight_layout()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "From this you can clearly see that the FM part correspond to tiny peaks in \nthe `fmrate` which reach around 1 kHz/ms. It may of course be no surprise\nonce you know the default `fmrate_threshold` is 1 kHz/ms. This rate doesn'\nmake sense for bat call FM portions as they have much high frequency modulation\nrates. The easy way to estimate the relevant `fmrate_threshold` is to eyeball \nthe start and end frequencies of a call part and calculate the fm rate!\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Step 3: Set a relevant `fmrate_threshold`\n>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>\nFor this example call any FM rate above 0.5kHz/ms will allow a sensible segmentation of the CF and FM\nparts. Lets set it more conservatively at 2kHz/ms, this will reduce false \npositives. In general, for this particular call type, the FM sweep has an approximate\nrate of 5-6kHz/ms, and so we should definitely be able to pick up the FM region \nwith a threshold of 2kHz/ms. \n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# add an additional keyword argument\nkeywords['fmrate_threshold'] = 2.0 # kHz/ms\n\noutput_newfmr = itsfm.segment_and_measure_call(audio, fs,**keywords)\n\nout_newfmr_insp = itsfm.itsFMInspector(output_newfmr, audio, fs)\nout_newfmr_insp.visualise_cffm_segmentation()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Summary\n~~~~~~~\nThis tutorial exposed some of the messy details behind the \nPWVD frequency tracking. In most cases, I hope you won't need to \nthink so much about the parameter choices. However, some basic\nplaying around will definitely be necessary each time you're handling\na new type of sound or recording type. Hopefully, this has either allowed \nyou to get a glimpse into the system. Do let me know if \nthere's something (or everythin) is confusing, and not clear!\n"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.9"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}