API : Segmenting sounds into CF and FM

Module that segments the horseshoebat call into FM and CF parts The primary logic of this

itsfm.segment.segment_call_into_cf_fm(call, fs, **kwargs)[source]

Function which identifies regions into CF and FM based on the following process.

1. Candidate regions of CF and FM are first produced based on the segmentation method chosen’.

2. These candidate regions are then refined based on the user’s requirements (minimum length of region, maximum number of CF/FM regions in the sound)

  1. The finalised CF and FM regions are output as Boolean arrays.

Parameters
  • call (np.array) – Audio with horseshoe bat call

  • fs (float>0) – Frequency of sampling in Hz.

  • segment_method (str, optional) – One of [‘peak_percentage’, ‘pwvd’, ‘inst_freq’]. Checkout ‘See Also’ for more information. Defaults to ‘peak_percentage’

  • refinement_method (function, str, optional) –

    The method used to refine the initial CF and FM candidate regions according to the different constraints and rules set by the user.

    Defaults to ‘do_nothing’

Returns

  • cf_samples, fm_samples (np.array) – Boolean numpy array showing which of the samples belong to the cf and the fm respectively.

  • info (dictionary) – Post-processing information depending on the methods used.

Example

Create a chirp in the middle of a somewhat silent recording

>>> import matplotlib.pyplot as plt
>>> import numpy as np
>>> from itsfm.simulate_calls import make_fm_chirp, make_tone
>>> from itsfm.view_horseshoebat_call import plot_movingdbrms
>>> from itsfm.view_horseshoebat_call import visualise_call, make_x_time
>>> from itsfm.view_horseshoebat_call import plot_cffm_segmentation
>>> fs = 44100
>>> start_f, end_f = 1000, 10000
>>> chirp = make_fm_chirp(start_f, end_f, 0.01, fs)
>>> tone_freq = 11000
>>> tone = make_tone(tone_freq, 0.01, fs)
>>> tone_start = 30000; tone_end = tone_start+tone.size
>>> rec = np.random.normal(0,10**(-50/20), 44100)
>>> chirp_start, chirp_end = 10000, 10000 + chirp.size
>>> rec[chirp_start:chirp_end] += chirp
>>> rec[tone_start:tone_end] += tone
>>> rec /= np.max(abs(rec))
>>> actual_fp = np.zeros(rec.size)
>>> actual_fp[chirp_start:chirp_end] = np.linspace(start_f, end_f, chirp.size)
>>> actual_fp[tone_start:tone_end] = np.tile(tone_freq, tone.size)

Track the frequency of the recording and segment it according to frequency modulation

>>> cf, fm, info = segment_call_into_cf_fm(rec, fs, signal_level=-10,
                                               segment_method='pwvd',)

View the output and plot the segmentation results over it: >>> plot_cffm_segmentation(cf, fm, rec, fs)

See also

segment_by_peak_percentage(), segment_by_pwvd(), segment_by_inst_frequency(), itsfm.refine_cfm_regions(), refine_cf_fm_candidates()

Notes

The post-processing information in the object info depends on the method used.

peak_percentagethe two keys ‘fm_re_cf’ and ‘cf_re_fm’ which are the

relative dBrms profiles of FM with relation to the CF portion and vice versa

pwvd :

itsfm.segment.refine_cf_fm_candidates(refinement_method, cf_fm_candidates, fs, info, **kwargs)[source]

Parses the refinement method, checks if its string or function and calls the relevant objects.

Parameters
  • refinement_method (str/function) – A string from the list of inbuilt functions in the module refine_cfm_regions or a user-defined function. Defaults to do_nothing, an inbuilt function which doesn’t returns the candidate Cf-fm regions without alteration.

  • cf_fm_candidates (list with 2 np.arrays) – Both np.arrays need to be Boolean and of the same size as the original audio.

  • fs (float>0) –

  • info (dictionary) –

Returns

cf, fm – Boolean arrays wher True indicates the sample is of the corresponding region.

Return type

np.array

itsfm.segment.segment_by_peak_percentage(call, fs, **kwargs)[source]

This is ideal for calls with one clear CF section with the CF portion being the highest frequency in the call: bat/bird CF-FM calls which have on CF and one/two sweep section.

Calculates the peak frequency of the whole call and performs low+high pass filtering at a frequency slightly lower than the peak frequency.

Parameters
  • call (np.array) –

  • fs (float>0) –

  • peak_percentage (0<float<1, optional) – This is the fraction of the peak at which low and high-pass filtering happens. Defaults to 0.98.

Returns

  • cf_samples, fm_samples (np.array) – Boolean array with True indicating that sample has been categorised as being CF and/or FM.

  • info (dictionary) – With keys ‘fm_re_cf’ and ‘cf_re_fm’ indicating the relative dBrms profiles of the candidate FM regions relative to Cf and vice versa.

Notes

This method unsuited for audio with non-uniform call envelopes. When there is high variation over the call envelope, the peak frequency is likely to be miscalculated, and thus lead to wrong segmentation.

This method is somewhat inspired by the protocol in Schoeppler et al. 2018. However, it differs in the important aspect of being done entirely in the time domain. Schoeppler et al. 2018 use a spectrogram based method to segment the CF and FM segments of H. armiger calls.

References

[1] Schoeppler, D., Schnitzler, H. U., & Denzinger, A. (2018).

Precise Doppler shift compensation in the hipposiderid bat, Hipposideros armiger. Scientific Reports, 8(1), 1-11.

itsfm.segment.segment_by_pwvd(call, fs, **kwargs)[source]

This method is technically more accurate in segmenting CF and FM portions of a sound. The Pseudo-Wigner-Ville Distribution of the input signal is generated.

Parameters
  • call (np.array) –

  • fs (float>0) –

  • fmrate_threshold (float >=0) – The threshold rate of frequency modulation in kHz/ms. Beyond this value a segment of audio is considered a frequency modulated region. Defaults to 1.0 kHz/ms

Returns

  • cf_samples, fm_samples (np.array) – Boolean array of same size as call indicating candidate CF and FM regions.

  • info (dictionary) – See get_pwvd_frequency_profile for the keys it outputs in the info dictioanry. In addition, another key ‘fmrate’ is also calculated which has an np. array with the rate of frequency modulation across the signal in kHz/ms.

Notes

This method may takes some time to run. It is computationally intensive. This method may not work very well in the presence of multiple harmonics or noise. Some basic tweaking of the optional parameters may be required.

See also

get_pwvd_frequency_profile()

Example

Let’s create a two component call with a CF and an FM part in it >>> from itsfm.simulate_calls import make_tone, make_fm_chirp, silence >>> from itsfm.view_horseshoebat_call import plot_cffm_segmentation >>> from itsfm.view_horseshoebat_call import make_x_time >>> fs = 22100 >>> tone = make_tone(5000, 0.01, fs) >>> sweep = make_fm_chirp(1000, 6000, 0.005, fs) >>> gap = silence(0.005, fs) >>> full_call = np.concatenate((tone, gap, sweep)) >>> # reduce rms calculation window size because of low sampling rate! >>> cf, fm, info = segment_by_pwvd(full_call,

fs,

window_size=10, signal_level=-12, sample_every=1*10**-3, extrap_length=0.1*10**-3)

>>> w,s = plot_cffm_segmentation(cf, fm, full_call, fs)
>>> s.plot(make_x_time(cf,fs), info['fitted_fp'])
itsfm.segment.whole_audio_fmrate(whole_freq_profile, fs, **kwargs)[source]

When a recording has multiple components to it, there are silences in between. These silences/background noise portions are assigned a value of 0 Hz.

When a ‘whole audio’ fm rate is naively calculated by taking the diff of the whole frequency profile, there will be sudden jumps in the fm-rate due to the silent parts with 0Hz and the sound segments with non-zero segments. Despite these spikes being very short, they then propagate their influence due to the median filtering that is later down downstream. This essentially causes an increase of false positive FM segments because of the apparent high fmrate.

To overcome the issues caused by the sudden zero to non-zero transitions in frequency values, this function handles each non-zero sound segment separately, and calculates the fmrate over each sound segment independently.

Parameters
  • whole_freq_profile (np.array) – Array with sample-level frequency values of the same size as the audio.

  • fs (float>0) –

Returns

  • fmrate (np.array) – The rate of frequency modulation in kHz/ms. Same size as whole_freq_profile Regions in whole_freq_profile with 0 frequency are set to 0kHz/ms.

  • fitted_frequency_profile (np.aray) – The downsampled, smoothed version of whole_freq_profile, of the same size.

Attention

The fmrate must be processed further downstream! In the whole-audio fmrate array, all samples that were 0 frequency in the original whole_freq_profile are set to 0 kHz/ms!!!

Example

Let’s make a synthetic multi-component sound with 2 FMs and 1 CF component.

>>> fs = 22100
>>> onems = int(0.001*fs)
>>> sweep1 = np.linspace(1000,2000,onems) # fmrate of 1kHz/ms
>>> tone = np.tile(3000, 2*onems) # CF part
>>> sweep2 = np.linspace(4000,10000,3*onems) # 2kHz/ms
>>> gap = np.zeros(10)
>>> freq_profile = np.concatenate((sweep1, gap, tone, gap, sweep2))
>>> fmrate, fit_freq_profile = whole_audio_fmrate(freq_profile, fs)
itsfm.segment.calculate_fm_rate(frequency_profile, fs, **kwargs)[source]

A frequency profile is generally oversampled. This means that there will be many repeated values and sometimes minor drops in frequency over time. This leads to a higher FM rate than is actually there when a sample-wise diff is performed.

This method downsamples the frequency profile, fits a polynomial to it and then gets the smoothened frequency profile with unique values.

The sample-level FM rate can now be calculated reliably.

Parameters
  • frequency_profile (np.array) – Array of same size as the original audio. Each sample has the estimated instantaneous frequency in Hz.

  • fs (float>0) – Sampling rate in Hz

  • medianfilter_length (float>0, optional) – The median filter kernel size which is used to filter out the noise in the frequency profile.

  • sample_every (float, optional) – For default see fit_polynomial_on_downsampled_version

Returns

fm_rate – Same size as frequency_profile. The rate of frequency modulation in kHz/ms

Return type

np.array

itsfm.segment.fit_polynomial_on_downsampled_version(frequency_profile, fs, **kwargs)[source]

Chooses a subset of all points in the input frequency_profile and fits a piecewise polynomial on it. The start and end of the frequency profile are not altered, and chosen as they are.

Parameters
  • frequency_profile (np.array) – The estimated instantaneous frequency in Hz at each sample.

  • fs (float>0) –

  • sample_every (float>0, optional) – The time gap between consecutive points. Defaults to a calculated value which corresponds to 1% of the frequency profiles duration.

  • interpolation_kind (int, optional) – The polynomial order to use while fitting the points. Defaults to 1, which is a piecewise linear fit.

Returns

fitted – Same size as frequency_profile.

Return type

np.array

itsfm.segment.fraction_duration(input_array, fs, fraction)[source]

calculates the duration that matches the required fraction of the input array’s duration.

The fraction must be 0 < fraction < 1

itsfm.segment.check_relevant_duration(duration, fs)[source]

checks that the duration is more than the inter-sample duration.

itsfm.segment.refine_candidate_regions()[source]

Takes in candidate CF and FM regions and tries to satisfy the constraints set by the user.

itsfm.segment.check_segment_cf_and_fm(cf_samples, fm_samples, fs, **kwargs)[source]
itsfm.segment.get_cf_region(cf_samples, fs, **kwargs)[source]

TODO : generalise to multiple CF regions

Parameters
  • cf_samples (np.array) – Boolean with True indicating a Cf region.

  • fs (float) –

Returns

cf_region – The longest continuous stretch

Return type

np.array

itsfm.segment.get_fm_regions(fm_samples, fs, **kwargs)[source]

TODO : generalise to multiple FM regions :param fm_samples: Boolean numpy array with candidate FM samples. :type fm_samples: np.array :param fs: :type fs: float>0 :param min_fm_duration: minimum fm duration expected in seconds. Any fm segment lower than this

duration is considered to be a bad read and discarded. Defaults to 0.5 milliseconds.

Returns

valid_fm – Boolean numpy array with the corrected fm samples.

Return type

np.array

itsfm.segment.segment_call_from_background(audio, fs, **kwargs)[source]

Performs a wavelet transform to track the signal within the relevant portion of the bandwidth.

This methods broadly works by summing up all the signal content above the `lowest_relevant_frequency` using a continuous wavelet transform.

If the call-background segmentation doesn’t work well it’s probably due to one of these things:

  1. Incorrect background_threshold : Play around with different background_threshold values.

  2. Incorrect lowest_relevant_frequency : If the lowest relevant frequency is set outside of the signal’s actual frequency range, then the segmentation will fail. Try lower this parameter till you’re sure all of the signal’s spectral range is above it.

  3. Low signal spectral range : This method uses a continuous wavelet transform to localise the relevant signal. Wavelet transforms have high temporal resolution in for high frequencies, but lower temporal resolutions for lower frequencies. If your signal is dominantly low-frequency, try resampling it to a lower sampling rate and see if this works?

If the above tricks don’t work, then try bandpassing your signal - may be it’s an issue with the in-band signal to noise ratio.

Parameters
  • audio (np.array) –

  • fs (float>0) – Frequency of sampling in Hertz.

  • lowest_relevant_freq (float>0, optional) – The lowest frequency band in Hz whose coefficients will be tracked. The coefficients of all frequencies in the signal >= the lowest relevant frequency are tracked. This is the lowest possible frequency the signal can take. It is best to give a few kHz of berth. Defaults to 35kHz.

  • background_threshold (float<0, optional) – The relative threshold which is used to define the background. The segmentation is performed by selecting the region that is above background_threshold dB relative to the max dB rms value in the audio. Defaults to -20 dB

  • wavelet_type (str, optional) – The type of wavelet which will be used for the continuous wavelet transform. Run pywt.wavelist(kind=’continuous’) for all possible types in case the default doesn’t seem to work. Defaults to mexican hat, ‘mexh’

  • scales (array-like, optional) – The scales to be used for the continuous wavelet transform. Defaults to np.arange(1,10).

Returns

  • potential_region (np.array) – A boolean numpy array where True corresponds to the regions which are call samples, and False are the background samples. The single longest continuous region is output.

  • dbrms_profile (np.array) – The dB rms profile of the summed up wavelet transform for all centre frequencies >= lowest_relevant_frequency.s

Raises
  • ValueError – When lowest_relevant_frequency is too high or not included in the centre frequencies of the default/input scales for wavelet transforms.

  • IncorrectThreshold – When the dynamic range of the relevant part of the signal is smaller or equal to the background_threshold.

itsfm.segment.identify_valid_regions(condition_satisfied, num_expected_regions=1)[source]
Parameters
  • condition_satisfied (np.array) – Boolean numpy array with samples either being True or False. The array may have multiple regions which satisfy a conditions (True) separated by smaller regions which don’t (False).

  • num_expected_regions (int > 0) – The number of expected regions which satisfy a condition. If >2, then the first two longest continuous regions will be returned, and the smaller regions will be suppressed/eliminated. Defaults to 1.

Returns

valid_regions – Boolean array which identifies the regions with the longest contiguous lengths.

Return type

np.array

itsfm.segment.identify_maximum_contiguous_regions(condition_satisfied, number_regions_of_interest=1)[source]

Given a Boolean array - this function identifies regions of contiguous samples that are true and labels each with its own region_number.

Parameters
  • condition_satisfied (np.array) – Numpy array with Boolean (True/False) entries for each sample.

  • number_regions_of_interest (integer > 1) – Number of contiguous regions which are to be detected. The region ids are output in descending order (longest–>shortest). Defaults to 1.

Returns

  • region_numbers (list) – List with numeric IDs given to each contiguous region which is True.

  • region_id_and_samples (np.array) – Two columns numpy array. Column 0 has the region_number, and Column 1 has the individual samples that belong to each region_number.

:raises ValueError : This happens if the condition_satisfied array has no entries that are True.:

itsfm.segment.pre_process_for_segmentation(call, fs, **kwargs)[source]

Performs a series of steps on a raw cf call before passing it for temporal segmentation into cf and fm. Step 1: find peak frequency Step 2: lowpass (fm_audio) and highpass (cf_audio) below

a fixed percentage of the peak frequency

Step 3: calculate the moving dB of the fm and cf audio

Parameters
  • call (np.array) –

  • fs (int.) – Frequency of sampling in Hertz

  • peak_percentage (0<float<1, optional) – This is the fraction of the peak at which low and high-pass filtering happens. Defaults to 0.98.

  • lowpass (optional) – Custom lowpass filtering coefficients. See low_and_highpass_around_threshold

  • highpass – Custom highpass filtering coefficients. See low_and_highpass_around_threshold

  • window_size (integer, optional) – The window size in samples over which the moving rms of the low+high passed signals will be calculated. For default value see documentation of moving_rms

Returns

cf_dbrms, fm_dbrms – The dB rms profile of the high + low passed versions of the input audio.

Return type

np.arrays

itsfm.segment.low_and_highpass_around_threshold(audio, fs, threshold_frequency, **kwargs)[source]

Make two version of an audio clip: the low pass and high pass versions.

Parameters
  • audio (np.array) –

  • fs (float>0) – Frequency of sampling in Hz

  • threshold_frequency (float>0) – The frequency at which the lowpass and highpass operations are be done.

  • lowpass,highpass (ndarrays, optional) – The b & a polynomials of an IIR filter which define the lowpass and highpass filters. Defaults to a second order elliptical filter with rp of 3dB and rs of 10 dB. See signal.ellip for more details of rp and rs.

  • pad_duration (float>0, optional) – Zero-padding duration in seconds before low+high pass filtering. Defaults to 0.1 seconds.

  • double_pass (bool, optional) – Low/high pass filter the audio twice. This has been noticed to help with segmentation accuracy, especially for calls with short CF/FM segments where edge effects are particularly noticeable. Defaults to False

Returns

lp_audio, hp_audio – The low and high pass filtered versions of the input audio.

Return type

np.arrays

itsfm.segment.get_thresholds_re_max(cf_dbrms, fm_dbrms)[source]
itsfm.segment.calc_proper_kernel_size(durn, fs)[source]

scipy.signal.medfilt requires an odd number of samples as kernel_size. This function calculates the number of samples for a given duration which is odd and is close to the required duration.

Parameters
  • durn (float) – Duration in seconds.

  • fs (float) – Sampling rate in Hz

Returns

samples – Number of odd samples that is equal to or little less (by one sample) than the input duration.

Return type

int

itsfm.segment.resize_by_adding_one_sample(input_signal, original_signal, **kwargs)[source]

Resizes the input_signal to the same size as the original signal by repeating one sample value. The sample value can either the last or the first sample of the input_signal.

itsfm.segment.median_filter(input_signal, fs, **kwargs)[source]

Median filters a signal according to a user-settable window size.

Parameters
  • input_signal (np.array) –

  • fs (float) – Sampling rate in Hz.

  • medianfilter_size (float, optional) – The window size in seconds. Defaults to 0.001 seconds.

Returns

med_filtered – Median filtered version of the input_signal.

Return type

np.array

itsfm.segment.identify_cf_ish_regions(frequency_profile, fs, **kwargs)[source]

Identifies CF regions by comparing the rate of frequency modulation across the signal. If the frequency modulation within a region of the signal is less than the limit then it is considered a CF region.

Parameters
  • frequency_profile (np.array) – The instantaneous frequency of the signal over time in Hz.

  • fm_limit (float, optional) – The maximum rate of frequency modulation in Hz/s. Defaults to 1000 Hz/s

  • medianfilter_size (float, optional) –

Returns

  • cfish_regions (np.array) – Boolean array where True indicates a low FM rate region. The output may still need to be cleaned before final use.

  • clean_fmrate_resized

Notes

If you’re used to reading FM modulation rates in kHz/ms then just follow this relation to get the required modulation rate in Hz/s:

X kHz/ms = (X Hz/s)* 10^-6

OR

X Hz/s = (X kHz/ms) * 10^6

See also

median_filter()

itsfm.segment.segment_cf_regions(audio, fs, **kwargs)[source]
exception itsfm.segment.CFIdentificationError[source]
exception itsfm.segment.IncorrectThreshold[source]