This is the first article in a four-part series about how Bronto employees collaborated to transform raw, anonymized data into an interactive audio and visual installation for Moogfest 2017.
In February, our General Manager Carolyn Sparano approached the engineering team to propose we create an exhibit for Moogfest 2017. She wanted to showcase our work with big data to power communication between retail brands and shoppers.
We decided to create an installation that would transform data into sight and sound for the annual festival, which is held on the same campus as our office – the American Tobacco Campus in Durham, NC. The event draws creative people from around the globe for four days of performances and exhibits that celebrate technology, music and art.
With just three months to conceive, design and execute our project, time was going to be tight. Fortunately, my background in music, design and engineering inspired me to jump right in, knowing generally what needed to be done. I had never actually done this kind of project before, but I understood the various aspects of work it would require. I have worked a lot with MIDI (Musical Instrument Digital Interface) but I had never used data as source material. With the help of other engineers at Bronto, I was able to quickly access and anonymize a set of time and event data from the busiest retail week of 2016: Black Friday and Cyber Monday.
Things were about to get interesting.
Realizing the scale and the enormous amount of data available, many challenges quickly became apparent:
- What can we do with the volume of material available?
- How can we format the data in a way that can be converted to MIDI?
- Is there existing software necessary to convert time-series data into MIDI, and if not, can we create it?
- How can we transform source data to create variety in the resulting MIDI without fundamentally changing the data?
Black Friday and Cyber Monday were an easy target. That time period would show a wide range of activity and make for more dynamic music. Our source data is stored in binary format, so we needed a process to retrieve and decode the data into information that could then be transformed into MIDI.
In our case, we only needed timestamps and counts, specifically, the timestamp at which a given count of events occurred. That made this step easy and we could garner a basic data set in CSV format (see image, right, which shows raw data in CSV format).
Next, we needed to convert this CSV data to MIDI. We needed some kind of tool that could receive raw data, convert it to MIDI, and then output it to a file that is properly formatted according to standard MIDI specifications.
So, the hunt was on to remove what would become an obstacle to moving the project forward. Surely, someone had tried something similar and had written about it. I found MIDITime and, by proxy, midiutil. Both are Python libraries designed to do just what we needed.
With the help of a few Bronto engineers, I wrote a wrapper library in Python around MIDITime to automate the generation of MIDI files (more on that later) and to manipulate the source data in certain ways. This was a huge step forward in our project, because now we could generate a lot of different kinds of MIDI relatively quickly.
At this point, we had a lot of data to work with and a consistent way to generate MIDI from that data, but it’s very granular data – down to the millisecond. As shown in Figure 2, the resulting MIDI was super-dense and noisy, which is not a bad thing, but to create music, we needed more variety in textures and shapes to make it interesting.
Figure 2: Raw, uncompressed data converted to MIDI. The solid horizontal bands are actually individual notes piled on top of one another.
This is where manipulating the source data came into play. We chose to use time compression, primarily because it is simple and effective. Because we record source data by the millisecond, we have an easy way to create groups or “bins” into which we can put relevant data. For example, let’s say we want to create a one-second bin. We know that we can put up to 1,000 milliseconds into that bin.
As long as we add up the counts for each of those milliseconds, we preserve the integrity of the original source data:
- 1,000 milliseconds at 1 count/millisecond = 1,000 events/second
- 1,000 milliseconds at 10 counts/millisecond = 10,000 events/second
- 1,000 milliseconds at 1 count/millisecond = 60,000 events/minute
The bigger the bin, the sparser and less dense the resulting MIDI is. The examples below reflect 30 minutes of data and show how we can go from dense and noisy MIDI to a thinner, less dense and noisy line by applying different levels of time compression. Figure 3 applies 1x time compression. Here, our bin size is one second.
Figure 3: 1x time compression applied to source data
Figure 4 shows how we can start to create a more melodic line or shape to lay over some of the noisier MIDI by increasing the compression rate. At 30x, our bin size here is 30 seconds.
Figure 4: 30x time compression applied to source data
Figure 5 uses a 10-minute bin (600 seconds) to create an even thinner line.
Figure 5: 600x time compression applied to source data
In conjunction with time compression, there are also a variety of MIDI settings to be made that can shape the resulting MIDI. The way that MIDITime works is by setting the length of one year in seconds, then plotting the time-series data in that year. It also allows you to reverse the output and use a logarithmic or linear algorithm to plot the notes.
In addition to those settings, you can set the beats per minute (BPM), the note length, and the note velocity (how hard the note would be pressed on a real piano or keyboard). Beyond time compression, you can apply moving averages or whatever means of grouping data you would like. It really depends how much time you have to play around with settings and how pure you want the resulting MIDI to be. In this case, we wanted the MIDI to represent the actual activity as much as possible.
Because we’re working with time-series data, we can convert it to actual time when creating the MIDI. Ten minutes of source data can be played in 10 actual minutes of music. However, we need variety so we don’t always want a one-to-one conversion. We can take that 10 minutes and crunch it in to five minutes or three minutes, for example, to make it faster or slower.
Now that we had data, a consistent way to convert it to MIDI and a coherent conversion strategy, we could begin to look at the bigger picture and pull it all together. These decisions were nuanced and necessitated automating the generation of the MIDI files. It became too unwieldy to do it manually. I wrote some Python code to run through the permutations of all of these settings and build a library of MIDI files from which to choose.
I was honored to be a part of this project. I learned a lot and had a lot of fun working on all the different challenges that came up and working with many of my colleagues to resolve them. Our Moogfest 2017 installation was a hit, and we are thankful to everyone on the Bronto team who helped to make it happen.
Check out the next article in this series, about how we made the MIDI data come alive into music that festival-goers could experience and manipulate.