Introduction to Symbolic Music Information Retrieval

A graphic showking kern symbolic MIR encoding and music bars

I prepared a research paper on the relatively new and dynamic field of symbolic music information retrieval (MIR), a sub-domain of information retrieval (IR). My report provides an overview of the technology, a review of MIR literature, and an introduction to an existing MIR website, The Josquin Research Project. In this post, you can read the report’s introduction and literature review that shows an in-depth understanding of symbolic MIR and provides some critical analysis of the literature.

 

Introduction

Music Information Retrieval, an Information Retrieval subdomain, is a relatively young but rapidly growing domain. Music Information Retrieval, or MIR, aims to develop “... efficient and intelligent methods to analyze, recognize, retrieve and organize music” (Lidy & Rauber, 2009, p. 448). MIR research is multinational and multidisciplinary, with researchers from disciplines including musicology, music theory, digital librarianship, computer science, human-computer interaction, digital humanities, and psychology. “Uniting this seemingly disparate aggregation is the common goal of providing the kind of robust access to the world's vast store of music—in all its varied forms (i.e., audio, symbolic, and metadata)—that we currently provide for textual materials” (Downie, 2004, p. 1033).

MIR has essentially two forms, data-based and content-based. Data-based forms retrieve external information about music documents, for example, information on the composer, performer, instrumentation, genre, or information on relationships between music documents, for example, between two different performances of the same song. In contrast, content-based MIR attempts to retrieve meaningful musical information from the music itself. There are two types of content-based MIR, audio and symbolic.

A flow chart of the different field os music information retrieval

Symbolic music information is music information represented as music notation. Symbolic music can be represented in scores (of a composition for orchestra, for example), a piece of sheet music (of a popular song from a musical with piano accompaniment, for example), a lead sheet used by a jazz musician, and so on. A music score, or similar music document, is “a structured organization of symbols, which correspond to acoustic events and describe the gestures needed for their production” (Orio, 2006, p. 14). In this report, I will briefly explain symbolic MIR, present some issues in this domain, and introduce an exemplar symbolic MIR system, the Josquin Research Project.

The first thing to understand about MIR is that content-based music IR is far more complicated compared with text IR. Szeto (2018) helps to explain why:

Encoding text is relatively simple because text is a one-dimensional sequence of letters. Music notation, on the other hand, involves capturing multiple streams of symbols that vary in length and size and interact with symbols in other streams in different ways depending on context. Encoding music notation into structured machine-readable and machine-actionable data is even more complex, since the notation's context dependency cannot be translated to simple rules. (p. 310)

Despite its complexity, content-driven symbolic MIR continues to grow as a field, and its systems and tools are gradually becoming more efficient, effective, and accessible.

Read about the Basics of Symbolic MIR (Music feature extraction, Symbolic music encoding formats, Digital Music score Collections/Corpora, and the Advantages of Symbolic MIR) in the full report 

MIR issues – Literature Review

Optical Character Recognition

For a large collection of images of music scores to be searchable at the level of music information (content), each image must be converted into a true symbolic representation of the music score, a process called Optical Music Recognition. Good quality Optical Music Recognition, or OMR, is necessary for the sustainability of large-scale symbolic MIR systems. Burgoyne et al. (2015, p. 213) emphasize that “[w]ithout optical music recognition technology to convert scanned images of printed music to a machine‐readable encoding, all musical data had to be entered manually, which was (and still is) cumbersome, expensive, and error‐prone.”

In the MIR field, improving OMR is ongoing, and innovative developments and applications of OMR are contributing to this field. In one innovative case, MIR researchers at McGill University, (Hankinson et al., 2012), have applied Optical Music Recognition and Optical Text Recognition to scan and make the contents searchable of The Liber Usualis, a sizable compendium of the most common chants used by the Catholic Church➀, as the basis for their first large-scale production: liber.simssa.ca. This service book uses square-note neume musical notation, the main notation type used in the Renaissance.

 ➀ The Liber Usualis is a valuable resource for early European music scholars and is used identify the origins of chants used in polyphonic compositions (comprising the majority of Renaissance vocal compositions), among other uses.

The Liber Usualis - Detecting meaningful page elements (Hankinson et al., 2012, p. 905)

This project is a good example of how complex and creative symbolic MIR system development can be. The steps in making the Liber Usualis’s content searchable included a) detecting meaningful page elements (see figure 5), b) separating music notation page layers from text layers, c) processing them with OMR and OCR respectively, d) isolating neume shapes from staff lines, e) classifying neume shapes, and f) adding back in staff lines. The steps continued with g) identifying music clef shapes and position for each staff, h) correlating it with a staff line, i) identifying the initial pitch for each neume based on its relative position and j) giving each subsequent pitch class name to identify in that neume. Beyond its creativity, a novel feature of this project was using class names to help with recognizing a neume groups’ pitch content. By using the pitch contour of a neume (for compound neumes), class names for the shape were constructed: “For complex compound neumes, direction information was encoded in the class name. The result was a set of class names where the specific pitch contour of a neume could be reconstructed from knowing only its starting pitch” (Hankinson et al., 2012, p. 906). See figure 6.The Liber Usualis - Class names for neume shapes (Hankinson et al., 2012, p. 906) Despite the successes of applying OMR technologies and overcoming complex issues, such as encoding compound Renaissance neumes like in the above example, OMR poses challenges to the successful development and implementation of MIR systems. Through qualitative research where they interviewed academic music librarians on the topic of digitizing musical scores, Laplante & Fujinaga (2016) found that librarians find using OMR technologies successfully quite a challenge. OMR technologies are not yet as advanced nor as user-friendly as OCR. As a result, libraries need to allocate considerable time and resources to extract the music notation of scores. As well, because the accessibility of encoded digital music scores is still limited, music researchers often create a corpus of encoded scores on their own, using their own tools to boot (Laplante & Fujinaga, 2016). Libraries could help alleviate a number of these problems, but librarians will first need to become more comfortable and competent using OMR technologies.

OMR technologies also pose issues when the scores being digitized are handwritten, or when the music is highly rhythmically complex. During his computational musicological study of 20th-century composer Gunther Schuller, Bush (2018) found OMR success rates for correctly recognizing musical material drops considerably in both these cases. This issue could frustrate researchers and librarians from trying to digitize complex contemporary compositions or lesser-known compositions that are only available as handwritten scores, limiting access to these types of music documents.

Musical expertise & labelling symbolic music information

Advances in OMR technology have increased the amount of symbolic music data available to students and researchers as collections of searchable digital scores have grown. However, much of this data is unlabeled, limiting the effectiveness of search results. This is due in part because, as Devaney et al. (2015) explain, accurately identifying and labelling symbolic music data requires a high level of music domain knowledge.  Unfortunately, because music domain knowledge is quite specialized, this may pose problems to the development of effective searchable digital score collections, since the limited number of music librarians and MIR researchers may become over-extended.

Users’ needs overlooked

The bulk of MIR research has largely overlooked the needs of MIR system users, and as Goodchild (2017) points out, studies that have taken into account users’ information needs and behaviours have not had much influence on the MIR field. Not understanding music information users will impede developing useful features for digital music libraries. The complexity of how and why people seek music information may compound this problem. Goodchild adds that in comparison to text-centred library use, music information-seeking behaviour is more complex, with users seeking text materials on top of music documents—recordings and conventional and digital scores. Also, locating musical materials is often complicated by multiple manifestations of a music work, among other issues.

As well, Rousi et al. (2018) point out that previous MIR studies that have approached the issue of relevance, have been decidedly system rather than user-oriented. Importantly, Rousi et al. mention that existing research on MIR has not given much attention to how professional musicians evaluate the relevance of music information. To better understand the relationships between a performer’s situation when seeking music information to solve a problem and the music information itself, Rousi et al. attempted to study situational relevance for music performers. The results of Rousi et al.’s study show that many classical musicians understand music notation as the foundation of gestural musical language ➁ and that the study participants saw music notation as the most important type of music information helping them improve as interpreters of music pieces. Designing symbolic MIR systems with the situational relevance of performers in mind could result in more effective systems which would be better used by music students studying performance or conducting, and professional musicians as well.

➁ A musical gesture can be movements required to produce sound on an instrument or with your voice and perceptual (imagined) movements linked to features of music’s sound. For performers of many music traditions including Western classical music, understanding gestural musical features is important in developing and performing their interpretation of a musical work, and is important to performers when seeking music information (Godøy, Rolf Inge & Jensenius, Alexander Refsum, 2009)

 

As a rare exception to system-centred MIR work, Devaney & Léveillé Gauvin (2019), did take into account the needs of users when they designed recent extensions to Humdrum and MEI. They based their extensions on findings from qualitative interviews with scholarly music researchers and found the researchers needed information on timing, loudness, pitch, and timbre in musical performances.

Music traditions of the world & notation systems

The vast majority of symbolic music encoding systems, as Mammen et al. (2016) rightfully point out, have been developed to encode music written in standard Western music notation. Such a focus “… is regrettably limiting, with cultural, theoretical, and practical consequences for MIR” (Lee, et al., 2002, p. 1). Further, most MIR systems have been built to search for music information in music based on the Western 12 tone tuning system, as Ünal et al. (2014) mention. Despite this, researchers are developing important MIR systems specifically for music traditions with different notation systems and that have distinctly different musical features from Western music. Most success has been in systems built for Indian classical music, with work on Turkish music the second most common.

An impressive development is iSargam, which encodes a music notation system called Sargam, the notation system of Carnatic music (Southern Indian classical music). This unique approach based on Indian music theory was developed by Mammen et al. (2016) to be a machine-readable music notation system for Carnatic music supporting playback, notation printing, searching, and retrieval within a composition. iSargam uses a Unicode-based music notation representation language where Unicode characters represent musical features of Carnatic music. The encoding system also handles whether a music symbol is grouped or single, following Carnatic music features and music theory—"So, for encoding of a notated composition, we take every symbol and check if it can be further split into different characters…” (Mammen et al., 2016, p. 6). See figure 7.

Figure 7 iSargam. Represent musical features of Carnatic music - encoding swara (notes) (Mammen et al., 2016, p. 6)

The system encodes meaningful musical information for each note (called swara) including a) if the note is played with expressions (gamaka), b) the octave or range of notes it belongs to (anumandra), c) its duration, and d) additional features, such as up or downbow if played with a bowed instrument (see figure 7).

In another example from India, Chordia (2007) developed a representation system based on Humdrum encoding syntax to represent Hindustani music (Northern Indian classical music), instrumental and vocal compositions (bandishes and gats respectively) written in a notation system called Bhatkhande. Srinivasamurthy & Chordia (2012) then extended the capabilities of this system to encode Carnatic music. Building off Chordia’s 2007 system, they included changes to incorporate Carnatic music features such as gamakas and highly complex rhythms.

Researchers have also done work on symbolic music data and the representation of classical Turkish Makam music ➂. Building off previous work on Turkish Makam music that used n-gram analysis ➃ by Alpkoçak, Adil & Gedik, Ali Cenk, (2006), Ünal et al. (2014) developed a method for classifying makams hierarchically from symbolic data.

An example from East Asia is an early exploratory project using XML encoding to represent a genre of traditional Korean court music that uses a notation system called Chôngganbo (Lee, et al., 2002) ➄. Lee et al. stress that Korean traditional music cannot be adequately represented by symbolic music encoding systems developed for music written in Western notation. These researchers aimed to develop a representational framework to match the expressive features represented in Chôngganbo, such as timbral (sound quality) variety, which is a highly meaningful feature of Korean traditional music.

Despite these successes, it is important to point out that a major limitation of symbolic MIR system development is that the majority of music traditions in the world, even most highly complex ones, do not use a music notation system.  Panteli et al., (2018) remind readers of this fact in their review of manual and computational approaches in music corpus research and analysis of world music.

➂ Makam are the system of melody types and scales, or modes, used in Turkish classical music, related to Persian Dastgah and the Arab world’s Maqām system.
➃ N-grams are widely used in text mining and natural language processing tasks.
➄ Unfortunately, Lee’s system seems not to have led to other attempts to represent music of traditional Korean music genres in symbolic MIR systems.

Meaningful music features in different genres & traditions

Because of the marked differences between meaningful music features across different music traditions and genres, compiling a collection of encoded symbolic music representative of a wide variety of types of music is highly challenging, if not impossible. Also, Panteli et al. (2018) plainly state that different musical notation languages and formats used in different traditions also make comparisons difficult. Their study’s findings reveal that successful computational studies of notated world music have focused on specific music traditions or genres (e.g. ragtime music (Volk, A. & de Haas, W.B., 2013)) or music from a certain geographical region (e.g. Cretan folk songs (Conklin & Anagnostopoulou, 2011)). As a result, Panteli et al. conclude that for comparative studies of world music, audio recordings are more practical as objects of analysis. Audio music documents may be more practical, but I am skeptical, due to the differences in meaningful musical features across a wide range of different types of music, that an audio MIR system could be developed that is effective in terms of precision. Given the complexity and variety of music in the world, the number and variety of musical features that would have to be made searchable in such a system would be staggering. Further, I am uncertain an AI would learn to recognize well enough the variety of meaningful musical features across cultures without bias to some traditions’ musical features over others.

Open and read the full report

References

Abdallah, S., Benetos, E., Gold, N., Hargreaves, S., Weyde, T., & Wolff, D. (2017). The Digital Music Lab: A big data infrastructure for digital musicology. Journal on Computing and Cultural Heritage, 10(1), 1–21. https://doi.org/10.1145/2983918

Alpkoçak, Adil, & Gedik, Ali Cenk. (2006). Classification of Turkish songs according to makams by using n grams. Proceedings of the 15th Turkish Symposium on Artificial Intelligence and Neural Networks. TAINN, Mugla, Turkey. http://people.cs.deu.edu.tr/alpkocak/Papers/TAINN2006-AdilAlpkocak.pdf

Behrendt, Inga, Bain, Jennifer, & Helsen, Kate. (2017). MEI Kodierung der frühesten Notation in linienlosen Neumen. In Kodikologie und Paläographie im Digitalen Zeitalter 4 (pp. 275–291). Books on Demand. https://kups.ub.uni-koeln.de/7774/1/kpdz4Online.pdf

Burgoyne, J.A., Downie, J.S., & Fujinaga, I. (2015). Music information retrieval. In A new companion to digital humanities (pp. 213–228). John Wiley & Sons. 10.1002/9781118680605.ch15

Bush, Christopher. (2018). Schuller’s musical space: Analysis of registration in Gunther Schuller’s solo and chamber compositions for clarinet using pitch field, Music21, and statistical process analysis [PhD Dissertation, New York University]. http://ezproxy.library.ubc.ca/login?url=https://search.proquest.com/docview/2018339406?accountid=14656

Byrd, D. (2001). Music-notation searching and digital libraries. Proceedings of the First ACM/IEEE-CS Joint Conference on Digital Libraries - JCDL ’01, 239–246. https://doi.org/10.1145/379437.379662

CCARH. (2018). Music 254 [Wiki]. Packard Humanities Institute’s Center for Computer Assisted Research in the Humanities at Stanford University. https://wiki.ccarh.org/wiki/Music_254#The_Humdrum_Toolkit
Chordia, P. (2007). A system for the analysis and representation of bandishes and gats using Humdrum syntax. Proceedings of the 2007 Frontiers of Research in Speech and Music Conference. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.486.3397&rep=rep1&type=pdf

Conklin, D., & Anagnostopoulou, C. (2011). Comparative pattern analysis of Cretan folk songs. Journal of New Music Research, 40(2), 119–125. https://doi.org/10.1080/09298215.2011.573562

Cook, N. (2004). Computational and comparative musicology. In Empirical musicology: Aims, methods, prospects. (pp. 103-126.). Oxford University Press. http://ezproxy.library.ubc.ca/login?url=http://search.ebscohost.com/login.aspx?direct=true&db=rft&AN=A498907&site=ehost-live&scope=site

Devaney, J., & Léveillé Gauvin, H. (2019). Encoding music performance data in Humdrum and MEI. International Journal on Digital Libraries, 20(1), 81–91. https://doi.org/10.1007/s00799-017-0229-3

Devaney, Johanna, Arthur, Claire, Condit-Schultz, Nathaniel, & Nisula, Kristen. (2015). Theme And Variation Encodings with Roman Numerals (TAVERN): A new data set for symbolic music analysis. Proceedings of the International Society of Music Information Retrieval (ISMIR) Conference, 728–734. http://ismir2015.uma.es/articles/261_Paper.pdf

Dougan, K. (2015). Finding the right notes: An observational study of score and recording seeking behaviors of music students. The Journal of Academic Librarianship, 41(1), 61–67. https://doi.org/10.1016/j.acalib.2014.09.013

Downie, J. S. (2004). A sample of music information retrieval approaches. Journal of the American Society for Information Science and Technology, 55(12), 1033–1036. https://doi.org/10.1002/asi.20054

Fraunhofer IDMT. (2020). Query by humming: Technology. Fraunhofer Institute for Digital Media Technology IDMT. https://www.idmt.fraunhofer.de/en/institute/projects-products/query-by-humming.html#tabpanel-2

Godøy, Rolf Inge, & Jensenius, Alexander Refsum. (2009). Body movement in music information retrieval. Proceedings of the 10th International Society for Music Information Retrieval, 45–50. http://urn.nb.no/URN:NBN:no-23872

Goodchild, M. (2017). Digital music libraries: Librarian perspectives and the challenges ahead. CAML Review / Revue de l’ACBM, 45(2–3). https://doi.org/10.25071/1708-6701.40305

Hankinson, A., Burgoyne, J. A., Vigliensoni, G., & Fujinaga, I. (2012). Creating a large-scale searchable digital collection from printed music materials. Proceedings of the 21st International Conference Companion on World Wide Web - WWW ’12 Companion, 903. https://doi.org/10.1145/2187980.2188221

Humdrum Toolkit. (2020). Humdrum. https://www.humdrum.org/

Huron, David. (2020). Representing music using **kern (I). Humdrum User Guide. https://www.humdrum.org/guide/ch02/

Josquin Research Project. (2020). The Josquin Research Project. https://josquin.stanford.edu/

Ju, Y., Pedro, G. P., MacKay, C., Hopkins, E. A., Cumming, J., & Fujinaga, I. (2019, May 30). Enabling music search and analysis: A database for symbolic music files. Music Encoding Conference, University of Vienna. https://music-encoding.org/conference/2019/abstracts_mec2019/MEC%20SIMSSA%20DB.pdf

Kim, Sung-min. (2015). Towards organizing and retrieving classical music based on functional requirements for bibliographic records (FRBR) [Dissertation, University of Pittsburgh]. ProQuest Dissertations & Theses Global. (1749035397). Retrieved from http://ezproxy.library.ubc.ca/login?url=https://search-proquest-com.ezproxy.library.ubc.ca/docview/1749035397?accountid=14656

Kirkman, A. (2015). Review: The Josquin Research Project by Jesse Rodin and Craig Sapp. Journal of the American Musicological Society, 68(2), 455–465. https://doi.org/10.1525/jams.2015.68.2.455

Laplante, A., & Fujinaga, I. (2016). Digitizing musical scores: Challenges and opportunities for libraries. Proceedings of the 3rd International Workshop on Digital Libraries for Musicology - DLfM 2016, 45–48. https://doi.org/10.1145/2970044.2970055

Lee, Jin Ha, Downie, J. Stephen, & Renear, Allen. (2002). Representing Korean traditional musical notation in XML. Proceedings of the Third International Conference on Music Information Retrieval. ISMIR, IRCAM Centre Pompidou, Paris. https://ismir2002.ismir.net/proceedings/03-SP01-4.pdf

Lidy, T., & Rauber, A. (2009). Music information retrieval. In Handbook of research on digital libraries: Design, development, and impact (pp. 448–465). IGI Global.

Løvhaug, L. E. (2006). Digital archive for scores and music [Master’s Thesis, Norges teknisk-naturvitenskapelige universitet]. https://ntnuopen.ntnu.no/ntnu-xmlui/handle/11250/261886

Mammen, S., Krishnamurthi, I., Varma, A. J., & Sujatha, G. (2016). iSargam: Music notation representation for Indian Carnatic music. EURASIP Journal on Audio, Speech, and Music Processing, 2016(1), 5. https://doi.org/10.1186/s13636-016-0083-z

MEI. (2020a). An introduction to MEI. Music Encoding Initiative. https://music-encoding.org/about/

MEI. (2020b). MEI guidelines (4.0.1). Music Encoding Initiative. https://music-encoding.org/guidelines/v4/content/

Orio, N. (2006). Music retrieval: A tutorial and review. Foundations and Trends® in Information Retrieval, 1(1), 1–96. https://doi.org/10.1561/1500000002

Panteli, M., Benetos, E., & Dixon, S. (2018). A review of manual and computational approaches for the study of world music corpora. Journal of New Music Research, 47(2), 176–189. https://doi.org/10.1080/09298215.2017.1418896

Rizo, D., & Marsden, A. (2019). An MEI-based standard encoding for hierarchical music analyses. International Journal on Digital Libraries, 20(1), 93–105. https://doi.org/10.1007/s00799-018-0262-x

Rousi, A. M., Savolainen, R., Harviainen, M., & Vakkari, P. (2018). Situational relevance of music information modes. Journal of Documentation, 74(5), 1008–1024. Library, Information Science & Technology Abstracts. https://doi.org/DOI:10.1108/JD-10-2017-0149

Simon, Scott J. (2005). A multi -dimensional entropy model of jazz improvisation for music information retrieval [PhD Dissertation]. University of North Texas.

Srinivasamurthy, Ajay, & Chordia, Parag. (2012). A unified system for analysis and representation of Indian classical music using humdrum syntax. Proceedings of the 2nd CompMusic Workshop, 38–42. http://mtg.upf.edu/system/files/publications/CompMusicWorkshop_2.pdf

Szeto, K. (2018). The roles of academic libraries in shaping music publishing in the digital age. Library Trends, 67(2), 303–318. https://doi.org/10.1353/lib.2018.0038

Teich Geertinger, A., & Pugin, L. (2011). MEI for bridging the gap between music cataloguing and digital critical editions. Die Tonkunst: Magazin Für Klassische Musik Und Musikwissenschaft, 5(3), 289–294. http://www.die-tonkunst.de/dtk_ausgaben/dtk_1103_sample.pdf

Ünal, E., Bozkurt, B., & Karaosmanoğlu, M. K. (2014). A hierarchical approach to makam classification of Turkish makam music, using symbolic data. Journal of New Music Research, 43(1), 132–146. https://doi.org/10.1080/09298215.2013.870211

Veltkamp, R. C., Wiering, F., & Typke, R. (2008). Content based music retrieval. In B. Furht (Ed.), Encyclopedia of multimedia (pp. 97–98). Springer US. https://doi.org/10.1007/978-0-387-78414-4_272

Volk, A., & de Haas, W.B. (2013). A corpus-based study on ragtime syncopation. Proceedings of the International Society for Music Information Retrieval Conference, 163–168. https://dspace.library.uu.nl/handle/1874/289635

 

Share this learning activity with others

Learning Significance