Music Information Retrieval, an Information Retrieval subdomain, is a relatively young but rapidly growing domain. Music Information Retrieval, or MIR, aims to develop “... efficient and intelligent methods to analyze, recognize, retrieve and organize music” (Lidy & Rauber, 2009, p. 448). MIR research is multinational and multidisciplinary, with researchers from disciplines including musicology, music theory, digital librarianship, computer science, human-computer interaction, digital humanities, and psychology. “Uniting this seemingly disparate aggregation is the common goal of providing the kind of robust access to the world's vast store of music—in all its varied forms (i.e., audio, symbolic, and metadata)—that we currently provide for textual materials” (Downie, 2004, p. 1033).
MIR has essentially two forms, data-based and content-based. Data-based forms retrieve external information about music documents, for example, information on the composer, performer, instrumentation, genre, or information on relationships between music documents, for example, between two different performances of the same song. In contrast, content-based MIR attempts to retrieve meaningful musical information from the music itself. There are two types of content-based MIR, audio and symbolic.
Symbolic music information is music information represented as music notation. Symbolic music can be represented in scores (of a composition for orchestra, for example), a piece of sheet music (of a popular song from a musical with piano accompaniment, for example), a lead sheet used by a jazz musician, and so on. A music score, or similar music document, is “a structured organization of symbols, which correspond to acoustic events and describe the gestures needed for their production” (Orio, 2006, p. 14). In this report, I will briefly explain symbolic MIR, present some issues in this domain, and introduce an exemplar symbolic MIR system, the Josquin Research Project.
The first thing to understand about MIR is that content-based music IR is far more complicated compared with text IR. Szeto (2018) helps to explain why:
Despite its complexity, content-driven symbolic MIR continues to grow as a field, and its systems and tools are gradually becoming more efficient, effective, and accessible.
MIR issues – Literature Review
Optical Character Recognition
For a large collection of images of music scores to be searchable at the level of music information (content), each image must be converted into a true symbolic representation of the music score, a process called Optical Music Recognition. Good quality Optical Music Recognition, or OMR, is necessary for the sustainability of large-scale symbolic MIR systems. Burgoyne et al. (2015, p. 213) emphasize that “[w]ithout optical music recognition technology to convert scanned images of printed music to a machine‐readable encoding, all musical data had to be entered manually, which was (and still is) cumbersome, expensive, and error‐prone.”
In the MIR field, improving OMR is ongoing, and innovative developments and applications of OMR are contributing to this field. In one innovative case, MIR researchers at McGill University, (Hankinson et al., 2012), have applied Optical Music Recognition and Optical Text Recognition to scan and make the contents searchable of The Liber Usualis, a sizable compendium of the most common chants used by the Catholic Church➀, as the basis for their first large-scale production: liber.simssa.ca. This service book uses square-note neume musical notation, the main notation type used in the Renaissance.
This project is a good example of how complex and creative symbolic MIR system development can be. The steps in making the Liber Usualis’s content searchable included a) detecting meaningful page elements (see figure 5), b) separating music notation page layers from text layers, c) processing them with OMR and OCR respectively, d) isolating neume shapes from staff lines, e) classifying neume shapes, and f) adding back in staff lines. The steps continued with g) identifying music clef shapes and position for each staff, h) correlating it with a staff line, i) identifying the initial pitch for each neume based on its relative position and j) giving each subsequent pitch class name to identify in that neume. Beyond its creativity, a novel feature of this project was using class names to help with recognizing a neume groups’ pitch content. By using the pitch contour of a neume (for compound neumes), class names for the shape were constructed: “For complex compound neumes, direction information was encoded in the class name. The result was a set of class names where the specific pitch contour of a neume could be reconstructed from knowing only its starting pitch” (Hankinson et al., 2012, p. 906). See figure 6.Despite the successes of applying OMR technologies and overcoming complex issues, such as encoding compound Renaissance neumes like in the above example, OMR poses challenges to the successful development and implementation of MIR systems. Through qualitative research where they interviewed academic music librarians on the topic of digitizing musical scores, Laplante & Fujinaga (2016) found that librarians find using OMR technologies successfully quite a challenge. OMR technologies are not yet as advanced nor as user-friendly as OCR. As a result, libraries need to allocate considerable time and resources to extract the music notation of scores. As well, because the accessibility of encoded digital music scores is still limited, music researchers often create a corpus of encoded scores on their own, using their own tools to boot (Laplante & Fujinaga, 2016). Libraries could help alleviate a number of these problems, but librarians will first need to become more comfortable and competent using OMR technologies.
OMR technologies also pose issues when the scores being digitized are handwritten, or when the music is highly rhythmically complex. During his computational musicological study of 20th-century composer Gunther Schuller, Bush (2018) found OMR success rates for correctly recognizing musical material drops considerably in both these cases. This issue could frustrate researchers and librarians from trying to digitize complex contemporary compositions or lesser-known compositions that are only available as handwritten scores, limiting access to these types of music documents.
Musical expertise & labelling symbolic music information
Advances in OMR technology have increased the amount of symbolic music data available to students and researchers as collections of searchable digital scores have grown. However, much of this data is unlabeled, limiting the effectiveness of search results. This is due in part because, as Devaney et al. (2015) explain, accurately identifying and labelling symbolic music data requires a high level of music domain knowledge. Unfortunately, because music domain knowledge is quite specialized, this may pose problems to the development of effective searchable digital score collections, since the limited number of music librarians and MIR researchers may become over-extended.
Users’ needs overlooked
The bulk of MIR research has largely overlooked the needs of MIR system users, and as Goodchild (2017) points out, studies that have taken into account users’ information needs and behaviours have not had much influence on the MIR field. Not understanding music information users will impede developing useful features for digital music libraries. The complexity of how and why people seek music information may compound this problem. Goodchild adds that in comparison to text-centred library use, music information-seeking behaviour is more complex, with users seeking text materials on top of music documents—recordings and conventional and digital scores. Also, locating musical materials is often complicated by multiple manifestations of a music work, among other issues.
As well, Rousi et al. (2018) point out that previous MIR studies that have approached the issue of relevance, have been decidedly system rather than user-oriented. Importantly, Rousi et al. mention that existing research on MIR has not given much attention to how professional musicians evaluate the relevance of music information. To better understand the relationships between a performer’s situation when seeking music information to solve a problem and the music information itself, Rousi et al. attempted to study situational relevance for music performers. The results of Rousi et al.’s study show that many classical musicians understand music notation as the foundation of gestural musical language ➁ and that the study participants saw music notation as the most important type of music information helping them improve as interpreters of music pieces. Designing symbolic MIR systems with the situational relevance of performers in mind could result in more effective systems which would be better used by music students studying performance or conducting, and professional musicians as well.
As a rare exception to system-centred MIR work, Devaney & Léveillé Gauvin (2019), did take into account the needs of users when they designed recent extensions to Humdrum and MEI. They based their extensions on findings from qualitative interviews with scholarly music researchers and found the researchers needed information on timing, loudness, pitch, and timbre in musical performances.
Music traditions of the world & notation systems
The vast majority of symbolic music encoding systems, as Mammen et al. (2016) rightfully point out, have been developed to encode music written in standard Western music notation. Such a focus “… is regrettably limiting, with cultural, theoretical, and practical consequences for MIR” (Lee, et al., 2002, p. 1). Further, most MIR systems have been built to search for music information in music based on the Western 12 tone tuning system, as Ünal et al. (2014) mention. Despite this, researchers are developing important MIR systems specifically for music traditions with different notation systems and that have distinctly different musical features from Western music. Most success has been in systems built for Indian classical music, with work on Turkish music the second most common.
An impressive development is iSargam, which encodes a music notation system called Sargam, the notation system of Carnatic music (Southern Indian classical music). This unique approach based on Indian music theory was developed by Mammen et al. (2016) to be a machine-readable music notation system for Carnatic music supporting playback, notation printing, searching, and retrieval within a composition. iSargam uses a Unicode-based music notation representation language where Unicode characters represent musical features of Carnatic music. The encoding system also handles whether a music symbol is grouped or single, following Carnatic music features and music theory—"So, for encoding of a notated composition, we take every symbol and check if it can be further split into different characters…” (Mammen et al., 2016, p. 6). See figure 7.
The system encodes meaningful musical information for each note (called swara) including a) if the note is played with expressions (gamaka), b) the octave or range of notes it belongs to (anumandra), c) its duration, and d) additional features, such as up or downbow if played with a bowed instrument (see figure 7).
In another example from India, Chordia (2007) developed a representation system based on Humdrum encoding syntax to represent Hindustani music (Northern Indian classical music), instrumental and vocal compositions (bandishes and gats respectively) written in a notation system called Bhatkhande. Srinivasamurthy & Chordia (2012) then extended the capabilities of this system to encode Carnatic music. Building off Chordia’s 2007 system, they included changes to incorporate Carnatic music features such as gamakas and highly complex rhythms.
Researchers have also done work on symbolic music data and the representation of classical Turkish Makam music ➂. Building off previous work on Turkish Makam music that used n-gram analysis ➃ by Alpkoçak, Adil & Gedik, Ali Cenk, (2006), Ünal et al. (2014) developed a method for classifying makams hierarchically from symbolic data.
An example from East Asia is an early exploratory project using XML encoding to represent a genre of traditional Korean court music that uses a notation system called Chôngganbo (Lee, et al., 2002) ➄. Lee et al. stress that Korean traditional music cannot be adequately represented by symbolic music encoding systems developed for music written in Western notation. These researchers aimed to develop a representational framework to match the expressive features represented in Chôngganbo, such as timbral (sound quality) variety, which is a highly meaningful feature of Korean traditional music.
Despite these successes, it is important to point out that a major limitation of symbolic MIR system development is that the majority of music traditions in the world, even most highly complex ones, do not use a music notation system. Panteli et al., (2018) remind readers of this fact in their review of manual and computational approaches in music corpus research and analysis of world music.
➃ N-grams are widely used in text mining and natural language processing tasks.
➄ Unfortunately, Lee’s system seems not to have led to other attempts to represent music of traditional Korean music genres in symbolic MIR systems.
Meaningful music features in different genres & traditions
Because of the marked differences between meaningful music features across different music traditions and genres, compiling a collection of encoded symbolic music representative of a wide variety of types of music is highly challenging, if not impossible. Also, Panteli et al. (2018) plainly state that different musical notation languages and formats used in different traditions also make comparisons difficult. Their study’s findings reveal that successful computational studies of notated world music have focused on specific music traditions or genres (e.g. ragtime music (Volk, A. & de Haas, W.B., 2013)) or music from a certain geographical region (e.g. Cretan folk songs (Conklin & Anagnostopoulou, 2011)). As a result, Panteli et al. conclude that for comparative studies of world music, audio recordings are more practical as objects of analysis. Audio music documents may be more practical, but I am skeptical, due to the differences in meaningful musical features across a wide range of different types of music, that an audio MIR system could be developed that is effective in terms of precision. Given the complexity and variety of music in the world, the number and variety of musical features that would have to be made searchable in such a system would be staggering. Further, I am uncertain an AI would learn to recognize well enough the variety of meaningful musical features across cultures without bias to some traditions’ musical features over others.
Abdallah, S., Benetos, E., Gold, N., Hargreaves, S., Weyde, T., & Wolff, D. (2017). The Digital Music Lab: A big data infrastructure for digital musicology. Journal on Computing and Cultural Heritage, 10(1), 1–21. https://doi.org/10.1145/2983918
Alpkoçak, Adil, & Gedik, Ali Cenk. (2006). Classification of Turkish songs according to makams by using n grams. Proceedings of the 15th Turkish Symposium on Artificial Intelligence and Neural Networks. TAINN, Mugla, Turkey. http://people.cs.deu.edu.tr/alpkocak/Papers/TAINN2006-AdilAlpkocak.pdf
Behrendt, Inga, Bain, Jennifer, & Helsen, Kate. (2017). MEI Kodierung der frühesten Notation in linienlosen Neumen. In Kodikologie und Paläographie im Digitalen Zeitalter 4 (pp. 275–291). Books on Demand. https://kups.ub.uni-koeln.de/7774/1/kpdz4Online.pdf
Burgoyne, J.A., Downie, J.S., & Fujinaga, I. (2015). Music information retrieval. In A new companion to digital humanities (pp. 213–228). John Wiley & Sons. 10.1002/9781118680605.ch15
Bush, Christopher. (2018). Schuller’s musical space: Analysis of registration in Gunther Schuller’s solo and chamber compositions for clarinet using pitch field, Music21, and statistical process analysis [PhD Dissertation, New York University]. http://ezproxy.library.ubc.ca/login?url=https://search.proquest.com/docview/2018339406?accountid=14656
Byrd, D. (2001). Music-notation searching and digital libraries. Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries - JCDL ’01, 239–246. https://doi.org/10.1145/379437.379662
CCARH. (2018). Music 254 [Wiki]. Packard Humanities Institute’s Center for Computer Assisted Research in the Humanities at Stanford University. https://wiki.ccarh.org/wiki/Music_254#The_Humdrum_Toolkit
Chordia, P. (2007). A system for the analysis and representation of bandishes and gats using Humdrum syntax. Proceedings of the 2007 Frontiers of Research in Speech and Music Conference. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.486.3397&rep=rep1&type=pdf
Conklin, D., & Anagnostopoulou, C. (2011). Comparative pattern analysis of Cretan folk songs. Journal of New Music Research, 40(2), 119–125. https://doi.org/10.1080/09298215.2011.573562
Cook, N. (2004). Computational and comparative musicology. In Empirical musicology: Aims, methods, prospects. (pp. 103-126.). Oxford University Press. http://ezproxy.library.ubc.ca/login?url=http://search.ebscohost.com/login.aspx?direct=true&db=rft&AN=A498907&site=ehost-live&scope=site
Devaney, J., & Léveillé Gauvin, H. (2019). Encoding music performance data in Humdrum and MEI. International Journal on Digital Libraries, 20(1), 81–91. https://doi.org/10.1007/s00799-017-0229-3
Devaney, Johanna, Arthur, Claire, Condit-Schultz, Nathaniel, & Nisula, Kristen. (2015). Theme And Variation Encodings with Roman Numerals (TAVERN): A new data set for symbolic music analysis. Proceedings of the International Society of Music Information Retrieval (ISMIR) Conference, 728–734. http://ismir2015.uma.es/articles/261_Paper.pdf
Dougan, K. (2015). Finding the right notes: An observational study of score and recording seeking behaviors of music students. The Journal of Academic Librarianship, 41(1), 61–67. https://doi.org/10.1016/j.acalib.2014.09.013
Downie, J. S. (2004). A sample of music information retrieval approaches. Journal of the American Society for Information Science and Technology, 55(12), 1033–1036. https://doi.org/10.1002/asi.20054
Fraunhofer IDMT. (2020). Query by humming: Technology. Fraunhofer Institute for Digital Media Technology IDMT. https://www.idmt.fraunhofer.de/en/institute/projects-products/query-by-humming.html#tabpanel-2
Godøy, Rolf Inge, & Jensenius, Alexander Refsum. (2009). Body movement in music information retrieval. Proceedings of the 10th International Society for Music Information Retrieval, 45–50. http://urn.nb.no/URN:NBN:no-23872
Goodchild, M. (2017). Digital music libraries: Librarian perspectives and the challenges ahead. CAML Review / Revue de l’ACBM, 45(2–3). https://doi.org/10.25071/1708-6701.40305
Hankinson, A., Burgoyne, J. A., Vigliensoni, G., & Fujinaga, I. (2012). Creating a large-scale searchable digital collection from printed music materials. Proceedings of the 21st International Conference Companion on World Wide Web - WWW ’12 Companion, 903. https://doi.org/10.1145/2187980.2188221
Humdrum Toolkit. (2020). Humdrum. https://www.humdrum.org/
Huron, David. (2020). Representing music using **kern (I). Humdrum User Guide. https://www.humdrum.org/guide/ch02/
Josquin Research Project. (2020). The Josquin Research Project. https://josquin.stanford.edu/
Ju, Y., Pedro, G. P., MacKay, C., Hopkins, E. A., Cumming, J., & Fujinaga, I. (2019, May 30). Enabling music search and analysis: A database for symbolic music files. Music Encoding Conference, University of Vienna. https://music-encoding.org/conference/2019/abstracts_mec2019/MEC%20SIMSSA%20DB.pdf
Kim, Sung-min. (2015). Towards organizing and retrieving classical music based on functional requirements for bibliographic records (FRBR) [Dissertation, University of Pittsburgh]. ProQuest Dissertations & Theses Global. (1749035397). Retrieved from http://ezproxy.library.ubc.ca/login?url=https://search-proquest-com.ezproxy.library.ubc.ca/docview/1749035397?accountid=14656
Kirkman, A. (2015). Review: The Josquin Research Project by Jesse Rodin and Craig Sapp. Journal of the American Musicological Society, 68(2), 455–465. https://doi.org/10.1525/jams.2015.68.2.455
Laplante, A., & Fujinaga, I. (2016). Digitizing musical scores: Challenges and opportunities for libraries. Proceedings of the 3rd International Workshop on Digital Libraries for Musicology - DLfM 2016, 45–48. https://doi.org/10.1145/2970044.2970055
Lee, Jin Ha, Downie, J. Stephen, & Renear, Allen. (2002). Representing Korean traditional musical notation in XML. Proceedings of the Third International Conference on Music Information Retrieval. ISMIR, IRCAM Centre Pompidou, Paris. https://ismir2002.ismir.net/proceedings/03-SP01-4.pdf
Lidy, T., & Rauber, A. (2009). Music information retrieval. In Handbook of research on digital libraries: Design, development, and impact (pp. 448–465). IGI Global.
Løvhaug, L. E. (2006). Digital archive for scores and music [Master’s Thesis, Norges teknisk-naturvitenskapelige universitet]. https://ntnuopen.ntnu.no/ntnu-xmlui/handle/11250/261886
Mammen, S., Krishnamurthi, I., Varma, A. J., & Sujatha, G. (2016). iSargam: Music notation representation for Indian Carnatic music. EURASIP Journal on Audio, Speech, and Music Processing, 2016(1), 5. https://doi.org/10.1186/s13636-016-0083-z
MEI. (2020a). An introduction to MEI. Music Encoding Initiative. https://music-encoding.org/about/
MEI. (2020b). MEI guidelines (4.0.1). Music Encoding Initiative. https://music-encoding.org/guidelines/v4/content/
Orio, N. (2006). Music retrieval: A tutorial and review. Foundations and Trends® in Information Retrieval, 1(1), 1–96. https://doi.org/10.1561/1500000002
Panteli, M., Benetos, E., & Dixon, S. (2018). A review of manual and computational approaches for the study of world music corpora. Journal of New Music Research, 47(2), 176–189. https://doi.org/10.1080/09298215.2017.1418896
Rizo, D., & Marsden, A. (2019). An MEI-based standard encoding for hierarchical music analyses. International Journal on Digital Libraries, 20(1), 93–105. https://doi.org/10.1007/s00799-018-0262-x
Rousi, A. M., Savolainen, R., Harviainen, M., & Vakkari, P. (2018). Situational relevance of music information modes. Journal of Documentation, 74(5), 1008–1024. Library, Information Science & Technology Abstracts. https://doi.org/DOI:10.1108/JD-10-2017-0149
Simon, Scott J. (2005). A multi -dimensional entropy model of jazz improvisation for music information retrieval [PhD Dissertation]. University of North Texas.
Srinivasamurthy, Ajay, & Chordia, Parag. (2012). A unified system for analysis and representation of Indian classical music using humdrum syntax. Proceedings of the 2nd CompMusic Workshop, 38–42. http://mtg.upf.edu/system/files/publications/CompMusicWorkshop_2.pdf
Szeto, K. (2018). The roles of academic libraries in shaping music publishing in the digital age. Library Trends, 67(2), 303–318. https://doi.org/10.1353/lib.2018.0038
Teich Geertinger, A., & Pugin, L. (2011). MEI for bridging the gap between music cataloguing and digital critical editions. Die Tonkunst: Magazin Für Klassische Musik Und Musikwissenschaft, 5(3), 289–294. http://www.die-tonkunst.de/dtk_ausgaben/dtk_1103_sample.pdf
Ünal, E., Bozkurt, B., & Karaosmanoğlu, M. K. (2014). A hierarchical approach to makam classification of Turkish makam music, using symbolic data. Journal of New Music Research, 43(1), 132–146. https://doi.org/10.1080/09298215.2013.870211
Veltkamp, R. C., Wiering, F., & Typke, R. (2008). Content based music retrieval. In B. Furht (Ed.), Encyclopedia of multimedia (pp. 97–98). Springer US. https://doi.org/10.1007/978-0-387-78414-4_272
Volk, A., & de Haas, W.B. (2013). A corpus-based study on ragtime syncopation. Proceedings of the International Society for Music Information Retrieval Conference, 163–168. https://dspace.library.uu.nl/handle/1874/289635