In a post last year I talked about the difference in musical preferences between the incoming New York Philharmonic music director, Jaap van Zweden, and his predecessor Alan Gilbert. There I simply compared counts of their most frequently performed composers. It gives you some idea of how they differ, but comparing plots of performance frequencies is cumbersome, especially if you want to compare multiple conductors and more than just their top 10 composers. In this post I apply ideal point modeling, from the pyschometrics and political science literature, to the problem of modeling conductor preferences. This approach allows for easy visualization of latent structure in conductor preferences.

Ideal Point Models

Ideal point models originate from item response theory (IRT) in pyschometrics. IRT is concerned with estimating latent ability by looking at a test taker’s responses to test items. By looking at many test takers’ correct and incorrect responses to many items, we can determine both the ability of the test takers and the difficulty of the items. Ideal point models in political science seek to estimate the latent ideological location (hence ideal point) of legislators (or supreme court justices, or anyone who votes) by using their voting behavior on things like bills. By looking at which legislators vote for or against which bills, we can determine their ideological preferences and the ideological bent of the bills. This political science interpretation is essentially what we need for finding conductor preferences, where we have conductors instead of legislators and musical pieces instead of bills. Of course there are no votes in music, but here we treat a conductor’s decision to ever play a piece with the New York Philharmonic as a ‘Yes’ vote, and not having played a piece a ‘No’ for that piece.

For this post we use the R package ideal by Simon Jackman, which implements the Bayesian ideal point model of (Clinton, Jackman, & Rivers, 2004). The model is as follows. Let be the ideal point of conductor , where is usually 1 or 2. We define

where is the latent propensity for conductor to play piece , is the difficulty parameter for piece , and is the discrimination parameter for piece . This is related to the observed performance history by setting the probabilty that conductor plays piece to

where is the standard normal CDF. The ideal package, as in the paper, puts 0-mean normal priors on , and , with a large variance on the piece parameters. The model is fit using augmented MCMC.

We see that lower will increase , increasing the probability of piece being played; it is a popularity parameter for the piece. is a slope parameter that determines how much propensity to play piece will change with conductor preference.

Data

I used the New York Philharmonic data again but was more careful with the data munging. This time I used the tidyjson package to extract the performance data, and I hope this snippet is useful to other R users interested in this data set:

library(tidyjson)
library(tidyverse)

programs <- tidyjson::read_json(path = "Programs/json/complete.json", format = "json")
concerts <- programs %>%
  gather_keys %>% gather_array %>%
  spread_values(id = jstring("id"), 
                program_id = jstring("programID"),
                orchestra = jstring("orchestra"),
                season = jstring("season")) %>%
  enter_object("concerts") %>% gather_array %>%
  spread_values(event_type = jstring("eventType"),
                location = jstring("Location"),
                venue = jstring("Venue"),
                date = jstring("Date"),
                time = jstring("Time")) %>%
  select(-document.id, -key, -array.index)

works <- programs %>% 
  gather_keys %>% gather_array %>%
  spread_values(id = jstring("id"), 
                program_id = jstring("programID")) %>%
  enter_object("works") %>% gather_array %>% 
  spread_values(work_id = jstring("ID"),
                composer = jstring("composerName"),
                title = jstring("workTitle"),
                movement = jstring("movement"),
                conductor = jstring("conductorName")) %>%
  select(-document.id, -key, -array.index)  

performance <- left_join(concerts, works, by = "id")  

There were other fields, like soloist, that I could have extracted but did not.

There were two duplication issues with the data. Of course for the ideal point model this doesn’t really matter since we only care whether a conductor ever performed a piece, not how many times. But removing duplicates will be useful for models that use count data.

  1. For subscription season concerts, conductors will perform the same program multiple times in a week. I treat this as one performance.
  2. Sometimes conductors will perform multiple movements of a single piece, but not the whole piece. The data has each movement as a separate entry. I treated multiple movements being performed in the same concert as one performance of the whole piece.

After de-duplication I created an indicator matrix with conductors on the rows and pieces on the columns and passed that to ideal.

See all the data munging and analysis code at: https://github.com/delimited0/PerformanceHistory/tree/master/IdealPoint

Results

See part 2 here

References

  1. Clinton, J., Jackman, S., & Rivers, D. (2004). The Statistical Analysis of Roll Call Voting: A Unified Approach. American Political Science Review, 98(2), 355–370.