{"id":2036,"date":"2019-11-18T22:17:15","date_gmt":"2019-11-19T06:17:15","guid":{"rendered":"https:\/\/live-cometml.pantheonsite.io\/blog\/how-to-apply-machine-learning-and-deep-learning-methods-to-audio-analysis\/"},"modified":"2019-11-18T22:17:15","modified_gmt":"2019-11-19T06:17:15","slug":"how-to-apply-machine-learning-and-deep-learning-methods-to-audio-analysis","status":"publish","type":"post","link":"https:\/\/www.comet.com\/site\/blog\/how-to-apply-machine-learning-and-deep-learning-methods-to-audio-analysis\/","title":{"rendered":"How to apply machine learning and deep learning methods to audio analysis"},"content":{"rendered":"\n<p>To view the code, training visualizations, and more information about the python example at the end of this post, visit the <a href=\"https:\/\/www.comet.com\/demo\/urbansound8k\/view\/\">Comet project page<\/a>.\u00a0<\/p>\n\n\n\n<p><strong>Introduction<\/strong><br \/><br \/><\/p>\n\n\n\n<p>While much of the writing and literature on deep learning concerns computer vision and natural language processing (NLP), audio analysis\u2014a field that includes automatic speech recognition (ASR), digital signal processing, and music classification, tagging, and generation\u2014is a growing subdomain of deep learning applications. Some of the most popular and widespread machine learning systems, virtual assistants Alexa, Siri and Google Home, are largely products built atop models that can extract information from audio signals.\u00a0<\/p>\n\n\n\n<p>Many of our users at <a href=\"http:\/\/comet.ml\">Comet<\/a> are working on audio related machine learning tasks such as audio classification, speech recognition and speech synthesis, so we built them tools to analyze, explore and understand audio data using Comet\u2019s meta machine-learning platform.\u00a0<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/img_5e28d380eb09e.jpg\" alt=\"\" \/><\/figure>\n\n\n\n<p><em>Audio modeling, training and debugging using <\/em><a href=\"http:\/\/comet.ml\"><em>Comet<\/em><\/a><\/p>\n\n\n\n<p>This post is focused on showing how data scientists and AI practitioners can use Comet to apply machine learning and deep learning methods in the domain of audio analysis. To understand how models can extract information from digital audio signals, we\u2019ll dive into some of the core feature engineering methods for audio analysis. We will then use <a href=\"https:\/\/librosa.github.io\/librosa\/\">Librosa<\/a>, a great python library for audio analysis, to code up a short python example training a neural architecture on the <a href=\"https:\/\/urbansounddataset.weebly.com\/urbansound8k.html\">UrbanSound8k<\/a> dataset.\u00a0<\/p>\n\n\n\n<p><strong>Machine Learning for Audio: Digital Signal Processing, Filter Banks, Mel-Frequency Cepstral Coefficients<\/strong><\/p>\n\n\n\n<p>Building machine learning models to classify, describe, or generate audio typically concerns modeling tasks where the input data are audio samples.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/img_5e28d381d4b44.jpg\" alt=\"\" \/><\/figure>\n\n\n\n<p><em>Example waveform of an audio dataset sample from UrbanSound8k.<\/em><\/p>\n\n\n\n<p>These audio samples are usually represented as time series, where the y-axis measurement is the amplitude of the waveform. The amplitude is usually measured as a function of the change in pressure around the microphone or receiver device that originally picked up the audio. Unless there is metadata associated with your audio samples, these time series signals will often be your only input data for fitting a model.\u00a0<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/img_5e28d38242a8d.png\" alt=\"\" \/><\/figure>\n\n\n\n<p>Looking at the samples below, taken from each of the ten classes in the Urbansound8k dataset, it is clear from an eye test that the waveform itself may not necessarily yield clear class identifying information. Consider the waveforms for the engine_idling, siren, and jackhammer classes \u2014 they look quite similar.<\/p>\n\n\n\n<p>It turns out one of the best features to extract from audio waveforms (and digital signals in general) has been around since the 1980\u2019s and is still state-of-the-art: Mel Frequency Cepstral Coefficients (MFCCs), introduced by <a href=\"https:\/\/users.cs.northwestern.edu\/~pardo\/courses\/eecs352\/papers\/Davis1980-MFCC.pdf\">Davis and Mermelstein in 1980<\/a>. Below we will go through a technical discussion of how MFCCs are generated and why they are useful in audio analysis. This section is somewhat technical, so before we dive in, let\u2019s define a few key terms pertaining to digital signal processing and audio analysis. We\u2019ll link to wikipedia and additional resources if you\u2019d like to dig even deeper.<\/p>\n\n\n\n<p><strong>Disordered Yet Useful Terminology<\/strong><\/p>\n\n\n\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/Sampling_(signal_processing)\">Sampling and Sampling Frequency<\/a><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/img_5e28d382e0bd1.png\" alt=\"\" \/><\/figure>\n\n\n\n<p><em>In signal processing, <\/em><strong><em>sampling<\/em><\/strong><em> is the reduction of a continuous signal into a series of discrete values. The <\/em><strong><em>sampling frequency<\/em><\/strong><em> or <\/em><strong><em>rate <\/em><\/strong><em>is the number of samples taken over some fixed amount of time. A high sampling frequency results in less information loss but higher computational expense, and low sampling frequencies have higher information loss but are fast and cheap to compute.<\/em><\/p>\n\n\n\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/Amplitude\">Amplitude<\/a><\/p>\n\n\n\n<p><em>The <\/em><strong><em>amplitude <\/em><\/strong><em>of a sound wave is a measure of its change over a period (usually of time). Another common definition of amplitude is a function of the magnitude of the difference between a variable\u2019s extreme values.<\/em><\/p>\n\n\n\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/Fourier_transform\">Fourier Transform<\/a><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/img_5e28d383232dd.png\" alt=\"\" \/><\/figure>\n\n\n\n<p><em>The <\/em><strong><em>Fourier Transform<\/em><\/strong><em> decomposes a function of time (signal) into constituent frequencies. In the same way a musical chord can be expressed by the volumes and frequencies of its constituent notes, a Fourier Transform of a function displays the amplitude (amount) of each frequency present in the underlying function (signal).<\/em><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/img_5e28d383646b2.png\" alt=\"\" \/><\/figure>\n\n\n\n<p><em>Top: a digital signal; Bottom: the Fourier Transform of the signal.<\/em><\/p>\n\n\n\n<p><em>There are variants of the Fourier Transform including the <\/em><a href=\"https:\/\/en.wikipedia.org\/wiki\/Short-time_Fourier_transform\"><em>Short-time fourier transform<\/em><\/a><em>, which is implemented in the Librosa library and involves splitting an audio signal into frames and then taking the Fourier Transform of each frame. In audio processing generally, the Fourier is an elegant and useful way to decompose an audio signal into its constituent frequencies.<\/em><\/p>\n\n\n\n<p><em>*Resources: by far the best video I\u2019ve found on the Fourier Transform is from <\/em><a href=\"https:\/\/www.youtube.com\/watch?v=spUNpyF58BY&amp;t=1s\"><em>3Blue1Brown<\/em><\/a><em>*<\/em><\/p>\n\n\n\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/Periodogram\">Periodogram<\/a><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/img_5e28d383b44de.jpg\" alt=\"\" \/><\/figure>\n\n\n\n<p><em>In signal processing, a <\/em><strong><em>periodogram<\/em><\/strong><em> is an estimate of the spectral density of a signal. The periodogram above shows the power spectrum of two sinusoidal basis functions of ~30Hz and ~50Hz. The output of a Fourier Transform can be thought of as being (not exactly) essentially a periodogram.\u00a0<\/em><\/p>\n\n\n\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/Spectral_density\">Spectral Density<\/a><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/img_5e28d3842dceb.png\" alt=\"\" \/><\/figure>\n\n\n\n<p>The <strong>power spectrum <\/strong>of a time series is a way to describe the distribution of power into discrete frequency components composing that signal. The statistical average of a signal, measured by its frequency content, is called its <strong>spectrum<\/strong>. The <strong>spectral density<\/strong> of a digital signal describes the frequency content of the signal.\u00a0<\/p>\n\n\n\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/Mel_scale\">Mel-Scale<\/a><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/img_5e28d384cd46c.png\" alt=\"\" \/><\/figure>\n\n\n\n<p><em>The <\/em><strong><em>mel-scale<\/em><\/strong><em> is a scale of pitches judged by listeners to be equal in distance from one another. The reference point between the mel-scale and normal frequency measurement is arbitrarily defined by assigning the perceptual pitch of 1000 mels to 1000 Hz. Above about 500 Hz, increasingly large intervals are judged by listeners to produce equal pitch increments. The name <\/em><strong><em>mel<\/em><\/strong><em> comes from the word melody to indicate the scale is based on pitch comparisons.\u00a0<\/em><\/p>\n\n\n\n<p><em>The formula to convert f hertz into m mels is:<\/em><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/img_5e28d38519529.png\" alt=\"\" \/><\/figure>\n\n\n\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/Cepstrum\">Cepstrum<\/a><\/p>\n\n\n\n<p><em>The <\/em><strong><em>cepstrum <\/em><\/strong><em>is the result of taking the Fourier Transform of the logarithm of the estimated power spectrum of a signal.\u00a0<\/em><\/p>\n\n\n\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/Spectrogram\">Stectrogram<\/a><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/img_5e28d38563e3c.png\" alt=\"\" \/><\/figure>\n\n\n\n<p><em>Mel-frequency spectrogram of an audio sample in the Urbansound8k dataset<\/em><\/p>\n\n\n\n<p><em>A <\/em><strong><em>spectrogram <\/em><\/strong><em>is a visual representation of the spectrum of frequencies of a signal as it varies with time. A nice way to think about spectrograms is as a stacked view of periodograms across some time-interval digital signal.\u00a0<\/em><\/p>\n\n\n\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/Cochlea\">Cochlea<\/a><\/p>\n\n\n\n<p><em>The spiral cavity of the inner ear containing the organ of Corti, which produces nerve impulses in response to sound vibrations.<\/em><\/p>\n\n\n\n<p><strong>Preprocessing Audio: Digital Signal Processing Techniques<\/strong><\/p>\n\n\n\n<p>Dataset preprocessing, feature extraction and feature engineering are steps we take to extract information from the underlying data, information that in a machine learning context should be useful for predicting the class of a sample or the value of some target variable. In audio analysis this process is largely based on finding components of an audio signal that can help us distinguish it from other signals.\u00a0<\/p>\n\n\n\n<p>MFCCs, as mentioned above, remain a state of the art tool for extracting information from audio samples. Despite libraries like Librosa giving us a python one-liner to compute MFCCs for an audio sample, the underlying math is a bit complicated, so we\u2019ll go through it step by step and include some useful links for further learning.<\/p>\n\n\n\n<p>Steps for calculating MFCCs for a given audio sample:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Slice the signal into short frames (of time)<\/li>\n<li>Compute the periodogram estimate of the power spectrum for each frame<\/li>\n<li>Apply the mel filterbank to the power spectra and sum the energy in each filter<\/li>\n<li>Take the discrete cosine transform (DCT) of the log filterbank energies<\/li>\n<\/ol>\n\n\n\n<p>Excellent additional reading on MFCC derivation and computation can be found at blog posts <a href=\"http:\/\/practicalcryptography.com\/miscellaneous\/machine-learning\/guide-mel-frequency-cepstral-coefficients-mfccs\/\">here<\/a> and <a href=\"https:\/\/haythamfayek.com\/2016\/04\/21\/speech-processing-for-machine-learning.html\">here<\/a>.\u00a0<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Slice the signal into short frames<\/strong><\/li>\n<\/ol>\n\n\n\n<p>Slicing the audio signal into short frames is useful in that it allows us to <em>sample<\/em> our audio into discrete time-steps. We assume that on short enough time scales the audio signal doesn\u2019t change. Typical values for the duration of the short frames are between 20-40ms. It is also conventional to overlap each frame 10-15ms. <em>*Note that the overlapping frames will make the features we eventually generate highly correlated. This is the basis for why we have to take the discrete cosine transform at the end of all of this.*<\/em><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Compute the power spectrum for each frame<\/strong><\/li>\n<\/ol>\n\n\n\n<p>Once we have our frames we need to calculate the power spectrum of each frame. The power spectrum of a time series describes the distribution of power into frequency components composing that signal. According to Fourier analysis, any physical signal can be decomposed into a number of discrete frequencies, or a spectrum of frequencies over a continuous range. The statistical average of a certain signal as analyzed in terms of its frequency content is called its spectrum.\u00a0<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/img_5e28d386132ea.gif\" alt=\"Image result for wave to power spectrum&quot;\" \/><\/figure>\n\n\n\n<p>Source: <a href=\"https:\/\/terpconnect.umd.edu\/~toh\/spectrum\/HarmonicAnalysis.html\">University of Maryland, Harmonic Analysis and the Fourier Transform<\/a><\/p>\n\n\n\n<p>We apply the <strong>Short-time fourier transform<\/strong> to each frame to obtain a power spectra for each.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Apply the mel filterbank to the power spectra and sum the energy in each filter<\/strong><\/li>\n<\/ol>\n\n\n\n<p>We still have some work to do once we have our power spectra. The human cochlea does not discern between nearby frequencies well, and this effect only becomes more pronounced as frequencies increase. The <strong>mel-scale<\/strong> is a tool that allows us to approximate the human auditory system\u2019s response more closely than linear frequency bands.\u00a0<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/img_5e28d3865ba51.gif\" alt=\"Image result for mel filterbank&quot;\" \/><\/figure>\n\n\n\n<p>Source: <a href=\"https:\/\/labrosa.ee.columbia.edu\/doc\/HTKBook21\/node54.html\">Columbia<\/a><\/p>\n\n\n\n<p>As can be seen in the visualization above, the mel filters get wider as the frequency increases \u2014 we care less about variations at higher frequencies. At low frequencies, where differences are more discernible to the human ear and thus more important in our analysis, the filters are narrow.\u00a0<\/p>\n\n\n\n<p>The magnitudes from our power spectra, which were found by applying the Fourier transform to our input data, are <a href=\"https:\/\/en.wikipedia.org\/wiki\/Data_binning\">binned<\/a> by correlating them with each triangular Mel filter. This binning is usually applied such that each coefficient is multiplied by the corresponding filter gain, so each Mel filter comes to hold a weighted sum representing the spectral magnitude in that channel.\u00a0<\/p>\n\n\n\n<p>Once we have our filterbank energies, we take the logarithm of each. This is yet another step motivated by the constraints of human hearing: humans don\u2019t perceive changes in volume on a linear scale. To double the perceived volume of an audio wave, the wave\u2019s energy must increase by a factor of 8. If an audiowave is already high volume (high energy), large variations in that wave\u2019s energy may not sound very different.\u00a0<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Take the discrete cosine transform (DCT) of the log filterbank energies<\/strong><\/li>\n<\/ol>\n\n\n\n<p>Because our filterbank energies are overlapping (see step 1), there is usually a strong correlation between them. Taking the discrete cosine transform can help decorrelate the energies.<\/p>\n\n\n\n<p>*****<\/p>\n\n\n\n<p>Thankfully for us, the creators of <a href=\"https:\/\/librosa.github.io\/librosa\/\">Librosa<\/a> have abstracted out a ton of this math and made it easy to generate MFCCs for your audio data. Let\u2019s go through a simple python example to show how this analysis looks in action.<\/p>\n\n\n\n<p><strong>EXAMPLE PROJECT: Urbansound8k + Librosa<\/strong><\/p>\n\n\n\n<p>We\u2019re going to be fitting a simple neural network (keras + tensorflow backend) to the UrbanSound8k dataset. To begin let\u2019s load our dependencies, including numpy, pandas, keras, scikit-learn, and librosa.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#### Dependencies ####\n\n#### Import Comet for experiment tracking and visual tools\nfrom comet_ml import Experiment\n####\n\nimport IPython.display as ipd\nimport numpy as np\nimport pandas as pd\nimport librosa\nimport matplotlib.pyplot as plt\nfrom scipy.io import wavfile as wav\n\nfrom sklearn import metrics\nfrom sklearn.preprocessing import LabelEncoder\n\nfrom keras.models import Sequential\nfrom keras.layers import Dense, Dropout, Activation\nfrom keras.optimizers import Adam\nfrom keras.utils import to_categorical<\/code><\/pre>\n\n\n\n<p>To begin, let\u2019s create a Comet experiment as a wrapper for all of our work. We\u2019ll be able to capture any and all artifacts (audio files, visualizations, model, dataset, system information, training metrics, etc.) automatically.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>experiment = Experiment(api_key=\"API_KEY\",\n                        project_name=\"urbansound8k\")<\/code><\/pre>\n\n\n\n<p>Let\u2019s load in the dataset and grab a sample for each class from the dataset. We can inspect these samples visually and acoustically using Comet.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Load dataset\ndf = pd.read_csv('UrbanSound8K\/metadata\/UrbanSound8K.csv')\n\n# Create a list of the class labels\nlabels = list(df['class'].unique())\n\n# Let's grab a single audio file from each class\nfiles = dict()\nfor i in range(len(labels)):\n    tmp = df[df['class'] == labels[i]][:1].reset_index()\n    path = 'UrbanSound8K\/audio\/fold{}\/{}'.format(tmp['fold'][0], tmp['slice_file_name'][0])\n    files[labels[i]] = path<\/code><\/pre>\n\n\n\n<p>We can look at the waveforms for each sample using librosa\u2019s display.waveplot function.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>fig = plt.figure(figsize=(15,15))\nfig.subplots_adjust(hspace=0.4, wspace=0.4)\nfor i, label in enumerate(labels):\n    fn = files[label]\n    fig.add_subplot(5, 2, i+1)\n    plt.title(label)\n    data, sample_rate = librosa.load(fn)\n    librosa.display.waveplot(data, sr= sample_rate)\nplt.savefig('class_examples.png')<\/code><\/pre>\n\n\n\n<p>We\u2019ll save this graphic to our Comet experiment.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Log graphic of waveforms to Comet\nexperiment.log_image('class_examples.png')<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/img_5e28d386ac451.png\" alt=\"class_examples.png\" \/><\/figure>\n\n\n\n<p>Next, we\u2019ll log the audio files themselves.\u00a0<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Log audio files to Comet for debugging\nfor label in labels:\n    fn = files[label]\n    experiment.log_audio(fn, metadata = {'name': label})<\/code><\/pre>\n\n\n\n<p>Once we log the samples to Comet, we can listen to samples, inspect metadata, and much more right from the UI.\u00a0<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/img_5e28d38789edb.jpg\" alt=\"\" \/><\/figure>\n\n\n\n<p><strong>Preprocessing<\/strong><\/p>\n\n\n\n<p>Now we can extract features from our data. We\u2019re going to be using librosa, but we\u2019ll also show another utility, scipy.io, for comparison and to observe some implicit preprocessing that\u2019s happening.\u00a0<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>fn = 'UrbanSound8K\/audio\/fold1\/191431-9-0-66.wav'\nlibrosa_audio, librosa_sample_rate = librosa.load(fn)\nscipy_sample_rate, scipy_audio = wav.read(fn)\n\nprint(\"Original sample rate: {}\".format(scipy_sample_rate))\nprint(\"Librosa sample rate: {}\".format(librosa_sample_rate))<\/code><\/pre>\n\n\n\n<p>Original sample rate: 48000<\/p>\n\n\n\n<p>Librosa sample rate: 22050<\/p>\n\n\n\n<p>Librosa\u2019s load function will convert the sampling rate to 22.05 KHz automatically. It will also normalize the bit depth between -1 and 1.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>fn = 'UrbanSound8K\/audio\/fold1\/191431-9-0-66.wav'\nlibrosa_audio, librosa_sample_rate = librosa.load(fn)\nscipy_sample_rate, scipy_audio = wav.read(fn)\n\nprint(\"Original sample rate: {}\".format(scipy_sample_rate))\nprint(\"Librosa sample rate: {}\".format(librosa_sample_rate))<\/code><\/pre>\n\n\n\n<p>&gt;Original audio file min~max range: -1869 to 1665<\/p>\n\n\n\n<p>&gt; Librosa audio file min~max range: -0.05 to -0.05<\/p>\n\n\n\n<p>Librosa also converts the audio signal to mono from stereo.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>plt.figure(figsize=(12, 4))\nplt.plot(scipy_audio)\nplt.savefig('original_audio.png')\nexperiment.log_image('original_audio.png')<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/img_5e28d3880c9e1.png\" alt=\"\" \/><\/figure>\n\n\n\n<p><em>Original Audio (note that it\u2019s in stereo \u2014 two audio sources)<\/em><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>plt.figure(figsize=(12, 4))\nplt.plot(scipy_audio)\nplt.savefig('original_audio.png')\nexperiment.log_image('original_audio.png')<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/img_5e28d38844566.png\" alt=\"\" \/><\/figure>\n\n\n\n<p><em>Librosa audio: converted to mono<\/em><\/p>\n\n\n\n<p><strong>Extracting MFCCs from audio using Librosa<\/strong><\/p>\n\n\n\n<p>Remember all the math we went through to understand mel-frequency cepstrum coefficients earlier? Using Librosa, here\u2019s how you extract them from audio (using the librosa_audio we defined above)<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>mfccs = librosa.feature.mfcc(y=librosa_audio, sr=librosa_sample_rate, n_mfcc = 40)<\/code><\/pre>\n\n\n\n<p>That\u2019s it!<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>print(mfccs.shape)<\/code><\/pre>\n\n\n\n<p>&gt; (40, 173)<\/p>\n\n\n\n<p>Librosa calculated 40 MFCCs over a 173 frame audio sample.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>plt.figure(figsize=(8,8))\nlibrosa.display.specshow(mfccs, sr=librosa_sample_rate, x_axis='time')\nplt.savefig('MFCCs.png')\nexperiment.log_image('MFCCs.png')<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/img_5e28d38887ad9.png\" alt=\"\" \/><\/figure>\n\n\n\n<p>MFCC Spectrogram<\/p>\n\n\n\n<p>We\u2019ll define a simple function to extract MFCCs for every file in our dataset.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>def extract_features(file_name):    audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast')\n    mfccs = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40)\n    mfccs_processed = np.mean(mfccs.T,axis=0)\n\n    return mfccs_processed<\/code><\/pre>\n\n\n\n<p>Now let\u2019s extract features.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>features = []\n\n# Iterate through each sound file and extract the features\nfor index, row in metadata.iterrows():\n\n    file_name = os.path.join(os.path.abspath(fulldatasetpath),'fold'+str(row[\"fold\"])+'\/',str(row[\"slice_file_name\"]))\n\n    class_label = row[\"class\"]\n    data = extract_features(file_name)\n\n    features.append([data, class_label])\n\n# Convert into a Panda dataframe\nfeaturesdf = pd.DataFrame(features, columns=['feature', 'fold' ,'class_label'])<\/code><\/pre>\n\n\n\n<p>We now have a dataframe where each row has a label (class) and a single feature column, comprised of 40 MFCCs.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>featuresdf.head()<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/img_5e28d388e29c9.png\" alt=\"\" \/><\/figure>\n\n\n\n<pre class=\"wp-block-code\"><code>featuresdf.iloc[0]['feature']<\/code><\/pre>\n\n\n\n<p>array([-2.1579300e+02,\u00a0 7.1666122e+01, -1.3181377e+02, -5.2091331e+01,<\/p>\n\n\n\n<p>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0-2.2115969e+01, -2.1764181e+01, -1.1183747e+01,\u00a0 1.8912683e+01,<\/p>\n\n\n\n<p>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a06.7266388e+00,\u00a0 1.4556893e+01, -1.1782045e+01,\u00a0 2.3010368e+00,<\/p>\n\n\n\n<p>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0-1.7251305e+01,\u00a0 1.0052421e+01, -6.0095000e+00, -1.3153191e+00,<\/p>\n\n\n\n<p>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0-1.7693510e+01,\u00a0 1.1171228e+00, -4.3699470e+00,\u00a0 7.2629538e+00,<\/p>\n\n\n\n<p>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0-1.1815971e+01, -7.4952612e+00,\u00a0 5.4577131e+00, -2.9442446e+00,<\/p>\n\n\n\n<p>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0-5.8693886e+00, -9.8654032e-02, -3.2121708e+00,\u00a0 4.6092505e+00,<\/p>\n\n\n\n<p>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0-5.8293257e+00, -5.3475075e+00,\u00a0 1.3341187e+00, 7.1307826e+00,<\/p>\n\n\n\n<p>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0-7.9450034e-02,\u00a0 1.7109241e+00, -5.6942000e+00, -2.9041715e+00,<\/p>\n\n\n\n<p>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a03.0366952e+00, -1.6827590e+00, -8.8585770e-01,\u00a0 3.5438776e-01],<\/p>\n\n\n\n<p>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0dtype=float32)<\/p>\n\n\n\n<p>Now that we have successfully extracted our features from the underlying audio data, we can build and train a model.\u00a0<\/p>\n\n\n\n<p><strong>Model building and training<\/strong><\/p>\n\n\n\n<p>We\u2019ll start by converting our MFCCs to numpy arrays, and encoding our classification labels.<\/p>\n\n\n\n<p>Our dataset will be split into training and test sets. We&#8217;ll use the first 8 folds for training and the 9th and 10th folds for testing.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>train = feat_df[feat_df['fold'] &lt;= 8]\ntest = feat_df[feat_df['fold'] &gt; 8]\n\nx_train = np.array(train.feature.tolist())\ny_train = np.array(train.class_label.tolist())\n\nx_test = np.array(test.feature.tolist())\ny_test = np.array(test.class_label.tolist())\n\n# Encode the classification labels\nle = LabelEncoder()\ny_train = to_categorical(le.fit_transform(y_train))\ny_test = to_categorical(le.fit_transform(y_test))<\/code><\/pre>\n\n\n\n<p>Let\u2019s define and compile a simple feedforward neural network architecture.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>num_labels = y_train.shape[1]\nfilter_size = 2\ndef build_model_graph(input_shape=(40,)):\n    model = Sequential()\n    model.add(Dense(256))\n    model.add(Activation('relu'))\n    model.add(Dropout(0.5))\n    model.add(Dense(256))\n    model.add(Activation('relu'))\n    model.add(Dropout(0.5))\n    model.add(Dense(num_labels))\n    model.add(Activation('softmax'))\n    # Compile the model\n    model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')\n\n    return model\nmodel = build_model_graph()<\/code><\/pre>\n\n\n\n<p>Let\u2019s look at a model summary and compute pre-training accuracy.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Display model architecture summary\nmodel.summary()\n\n# Calculate pre-training accuracy\nscore = model.evaluate(x_test, y_test, verbose=0)\naccuracy = 100*score[1]<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/img_5e28d3894c6ce.png\" alt=\"\" \/><\/figure>\n\n\n\n<pre class=\"wp-block-code\"><code>print(\"Pre-training accuracy: %.4f%%\" % accuracy)<\/code><\/pre>\n\n\n\n<p>Pre-training accuracy: 12.2496%<\/p>\n\n\n\n<p>Now it\u2019s time to train our model.<\/p>\n\n\n\n<p>Training completed in time:\u00a0<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from keras.callbacks import ModelCheckpoint\nfrom datetime import datetime\n\nnum_epochs = 100\nnum_batch_size = 32\n\nmodel.fit(x_train, y_train, batch_size=num_batch_size, epochs=num_epochs, validation_data=(x_test, y_test), verbose=1)<\/code><\/pre>\n\n\n\n<p>Even before training completed, Comet keeps track of the key information about our experiment. We can visualize our accuracy and loss curves in real time from the Comet UI (note the orange spin wheel indicates that training is in process).<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/img_5e28d38a1bafa.jpg\" alt=\"\" \/><\/figure>\n\n\n\n<p><em>Comet\u2019s Experiment visualization dashboard<\/em><\/p>\n\n\n\n<p>Once trained we can evaluate our model on the train and test data.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Evaluating the model on the training and testing set\nscore = model.evaluate(x_train, y_train, verbose=0)\nprint(\"Training Accuracy: {0:.2%}\".format(score[1]))\n\nscore = model.evaluate(x_test, y_test, verbose=0)\nprint(\"Testing Accuracy: {0:.2%}\".format(score[1]))<\/code><\/pre>\n\n\n\n<p>Training Accuracy: 93.00%<\/p>\n\n\n\n<p>Testing Accuracy: 87.35%<\/p>\n\n\n\n<p><strong>Conclusion\u00a0<\/strong><\/p>\n\n\n\n<p>Our model has trained rather well, but there is likely lots of room for improvement, perhaps using Comet\u2019s <a href=\"https:\/\/www.comet.com\/docs\/python-sdk\/introduction-optimizer\/\">Hyperparameter Optimization<\/a> tool. In a small amount of code we\u2019ve been able to extract mathematically complex MFCCs from audio data, build and train a neural network to classify audio based on those MFCCs, and evaluate our model on the test data.\u00a0<\/p>\n\n\n\n<p><strong>To get started with Comet, <\/strong><a href=\"https:\/\/live-cometml.pantheonsite.io\/pricing\/\"><strong>click here<\/strong><\/a><strong>. Comet is 100% free for public projects.<\/strong>\u00a0<\/p>\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n<h2 class=\"wp-block-heading\"><em>Want to stay in the loop?\u00a0<a href=\"https:\/\/info.comet.ml\/newsletter-signup\/?utm_campaign=tensorboard-integration&amp;utm_source=blog&amp;utm_medium=CTA\">Subscribe to the Comet Newsletter<\/a>\u00a0for weekly insights and perspective on the latest ML news, projects, and more.<\/em><\/h2>\n","protected":false},"excerpt":{"rendered":"<p>To view the code, training visualizations, and more information about the python example at the end of this post, visit the Comet project page.\u00a0 Introduction While much of the writing and literature on deep learning concerns computer vision and natural language processing (NLP), audio analysis\u2014a field that includes automatic speech recognition (ASR), digital signal processing, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":2058,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"customer_name":"","customer_description":"","customer_industry":"","customer_technologies":"","customer_logo":"","footnotes":""},"categories":[6,7],"tags":[],"coauthors":[106],"class_list":["post-2036","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-machine-learning","category-tutorials"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v25.9 (Yoast SEO v25.9) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>How to apply machine learning and deep learning methods to audio analysis - Comet<\/title>\n<meta name=\"description\" content=\"While much of the writing and literature on deep learning concerns computer vision and natural language processing (NLP), audio analysis\u2014a field that includes automatic speech recognition (ASR), digital signal processing, and music classification, tagging, and generation\u2014is a growing subdomain of deep learning applications. Some of the most popular and widespread machine learning systems, virtual assistants Alexa, Siri and Google Home, are largely products built atop models that can extract information from audio signals.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.comet.com\/site\/blog\/how-to-apply-machine-learning-and-deep-learning-methods-to-audio-analysis\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How to apply machine learning and deep learning methods to audio analysis\" \/>\n<meta property=\"og:description\" content=\"While much of the writing and literature on deep learning concerns computer vision and natural language processing (NLP), audio analysis\u2014a field that includes automatic speech recognition (ASR), digital signal processing, and music classification, tagging, and generation\u2014is a growing subdomain of deep learning applications. Some of the most popular and widespread machine learning systems, virtual assistants Alexa, Siri and Google Home, are largely products built atop models that can extract information from audio signals.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.comet.com\/site\/blog\/how-to-apply-machine-learning-and-deep-learning-methods-to-audio-analysis\/\" \/>\n<meta property=\"og:site_name\" content=\"Comet\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/cometdotml\" \/>\n<meta property=\"article:published_time\" content=\"2019-11-19T06:17:15+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/img_5e28d38789edb-1.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1600\" \/>\n\t<meta property=\"og:image:height\" content=\"855\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Nikolas Laskaris\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Cometml\" \/>\n<meta name=\"twitter:site\" content=\"@Cometml\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Nikolas Laskaris\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"16 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"How to apply machine learning and deep learning methods to audio analysis - Comet","description":"While much of the writing and literature on deep learning concerns computer vision and natural language processing (NLP), audio analysis\u2014a field that includes automatic speech recognition (ASR), digital signal processing, and music classification, tagging, and generation\u2014is a growing subdomain of deep learning applications. Some of the most popular and widespread machine learning systems, virtual assistants Alexa, Siri and Google Home, are largely products built atop models that can extract information from audio signals.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.comet.com\/site\/blog\/how-to-apply-machine-learning-and-deep-learning-methods-to-audio-analysis\/","og_locale":"en_US","og_type":"article","og_title":"How to apply machine learning and deep learning methods to audio analysis","og_description":"While much of the writing and literature on deep learning concerns computer vision and natural language processing (NLP), audio analysis\u2014a field that includes automatic speech recognition (ASR), digital signal processing, and music classification, tagging, and generation\u2014is a growing subdomain of deep learning applications. Some of the most popular and widespread machine learning systems, virtual assistants Alexa, Siri and Google Home, are largely products built atop models that can extract information from audio signals.","og_url":"https:\/\/www.comet.com\/site\/blog\/how-to-apply-machine-learning-and-deep-learning-methods-to-audio-analysis\/","og_site_name":"Comet","article_publisher":"https:\/\/www.facebook.com\/cometdotml","article_published_time":"2019-11-19T06:17:15+00:00","og_image":[{"width":1600,"height":855,"url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/img_5e28d38789edb-1.jpg","type":"image\/jpeg"}],"author":"Nikolas Laskaris","twitter_card":"summary_large_image","twitter_creator":"@Cometml","twitter_site":"@Cometml","twitter_misc":{"Written by":"Nikolas Laskaris","Est. reading time":"16 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.comet.com\/site\/blog\/how-to-apply-machine-learning-and-deep-learning-methods-to-audio-analysis\/#article","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/blog\/how-to-apply-machine-learning-and-deep-learning-methods-to-audio-analysis\/"},"author":{"name":"engineering@atre.net","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/550ac35e8e821db8064c5bd1f0a04e6b"},"headline":"How to apply machine learning and deep learning methods to audio analysis","datePublished":"2019-11-19T06:17:15+00:00","mainEntityOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/how-to-apply-machine-learning-and-deep-learning-methods-to-audio-analysis\/"},"wordCount":2457,"publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/how-to-apply-machine-learning-and-deep-learning-methods-to-audio-analysis\/#primaryimage"},"thumbnailUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/img_5e28d38789edb-1.jpg","articleSection":["Machine Learning","Tutorials"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.comet.com\/site\/blog\/how-to-apply-machine-learning-and-deep-learning-methods-to-audio-analysis\/","url":"https:\/\/www.comet.com\/site\/blog\/how-to-apply-machine-learning-and-deep-learning-methods-to-audio-analysis\/","name":"How to apply machine learning and deep learning methods to audio analysis - Comet","isPartOf":{"@id":"https:\/\/www.comet.com\/site\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.comet.com\/site\/blog\/how-to-apply-machine-learning-and-deep-learning-methods-to-audio-analysis\/#primaryimage"},"image":{"@id":"https:\/\/www.comet.com\/site\/blog\/how-to-apply-machine-learning-and-deep-learning-methods-to-audio-analysis\/#primaryimage"},"thumbnailUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/img_5e28d38789edb-1.jpg","datePublished":"2019-11-19T06:17:15+00:00","description":"While much of the writing and literature on deep learning concerns computer vision and natural language processing (NLP), audio analysis\u2014a field that includes automatic speech recognition (ASR), digital signal processing, and music classification, tagging, and generation\u2014is a growing subdomain of deep learning applications. Some of the most popular and widespread machine learning systems, virtual assistants Alexa, Siri and Google Home, are largely products built atop models that can extract information from audio signals.","breadcrumb":{"@id":"https:\/\/www.comet.com\/site\/blog\/how-to-apply-machine-learning-and-deep-learning-methods-to-audio-analysis\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.comet.com\/site\/blog\/how-to-apply-machine-learning-and-deep-learning-methods-to-audio-analysis\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/blog\/how-to-apply-machine-learning-and-deep-learning-methods-to-audio-analysis\/#primaryimage","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/img_5e28d38789edb-1.jpg","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2022\/06\/img_5e28d38789edb-1.jpg","width":1600,"height":855,"caption":"Preprocessing"},{"@type":"BreadcrumbList","@id":"https:\/\/www.comet.com\/site\/blog\/how-to-apply-machine-learning-and-deep-learning-methods-to-audio-analysis\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.comet.com\/site\/"},{"@type":"ListItem","position":2,"name":"How to apply machine learning and deep learning methods to audio analysis"}]},{"@type":"WebSite","@id":"https:\/\/www.comet.com\/site\/#website","url":"https:\/\/www.comet.com\/site\/","name":"Comet","description":"Build Better Models Faster","publisher":{"@id":"https:\/\/www.comet.com\/site\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.comet.com\/site\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.comet.com\/site\/#organization","name":"Comet ML, Inc.","alternateName":"Comet","url":"https:\/\/www.comet.com\/site\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/","url":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","contentUrl":"https:\/\/www.comet.com\/site\/wp-content\/uploads\/2025\/01\/logo_comet_square.png","width":310,"height":310,"caption":"Comet ML, Inc."},"image":{"@id":"https:\/\/www.comet.com\/site\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/cometdotml","https:\/\/x.com\/Cometml","https:\/\/www.youtube.com\/channel\/UCmN63HKvfXSCS-UwVwmK8Hw"]},{"@type":"Person","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/550ac35e8e821db8064c5bd1f0a04e6b","name":"engineering@atre.net","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.comet.com\/site\/#\/schema\/person\/image\/027c18177377edf459980f0cfb83706c","url":"https:\/\/secure.gravatar.com\/avatar\/d002a459a297e0d1779329318029aee19868c312b3e1f3c9ec9b3e3add2740de?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/d002a459a297e0d1779329318029aee19868c312b3e1f3c9ec9b3e3add2740de?s=96&d=mm&r=g","caption":"engineering@atre.net"},"sameAs":["https:\/\/live-cometml.pantheonsite.io"],"url":"https:\/\/www.comet.com\/site\/blog\/author\/engineeringatre-net\/"}]}},"_links":{"self":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/2036","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/comments?post=2036"}],"version-history":[{"count":0,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/posts\/2036\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media\/2058"}],"wp:attachment":[{"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/media?parent=2036"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/categories?post=2036"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/tags?post=2036"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.comet.com\/site\/wp-json\/wp\/v2\/coauthors?post=2036"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}