Musicasity is an AI-powered music composition platform that highly supports casts collaboration where the “essence” of melody tracks are captured and trained to generate new casting styles that can be stored as new cast characters. It works by mimicking notes that are put down on the keyboard through DDSP-VST: Neural Audio Synthesis for All and MusicVAE. It also provides melody extensions that author individuals’ music style on user behalves based on what users specifically inputted.
Academic Case Study
Solo Project in the context of AI / Machine Learning
2022 Q2-Q3(10 weeks)
Tech-Centered Research, Data Testing, and UI Design
As Artificial Intelligence has been integrated into our lives through interfaces, products, services, and cities, we must work to advance the field on behalf of the user, seeing beyond stereotypes to design for humans and machines. Through the development of this tech-centered research project, I understood the hood of AI and how it is built conceptually, learned to train models to witness how the process works and what can go wrong with them by collecting and labeling data correctly, and explored how AI can be used for the purpose of creative visual expression.
How can we “design” Artificial Intelligence and influence its applications by understanding the capabilities of machine learning and leveraging the power of Detailed Control of Musical Performance via Hierarchical Modeling and Variational AutoEncoder like MusicVAE?
According to the Oxford Scholarship, Music is a powerful means of communication where it provides a means by which people can share emotions, intentions, and meanings even though their spoken languages may be mutually incomprehensible. And from assumptions, agentive technology could help communicate those feelings and sensations.
Crystal is a vlogger on Youtube with over 120k fans. Her contents are related to her lifestyle, her DJ career, and her part-time job at a high school teaching music. She typically curates her own music style at clubs, makes deliberated contents to support her coaching for kids, and makes the background soundtrack for her vlogs. Besides those, she is also super addicted to KPOP culture.
However, it normally takes her a long time to get the exact melody sorted out except for music when create something brand-new that connects her with her students and social media audiences. And she is expecting to make a change...
The style-based GAN architecture (StyleGAN) used in Runway ML yields state-of-the-art results in data-driven unconditional generative image modeling. The improved model redefines the state of the art in unconditional image modeling, both in terms of existing distribution quality metrics as well as perceived image quality. (V.2022Q2)
Select 3 different objects, people, or things and create an expansive index / mind map (multiple interpretations and visual references) of each, including multiple viewpoints, associations, and perspectives of each.
Narrow down the index and create 5 different datasets / interpretations. (Each dataset hits Runway ML's minimum number of 500 images. Train the model for multiple times and compare results.
P1. Mindmaps (include labels and connections)
P2. Pose Estimation Detection Exploration
P3. Successfully Trained Models (objects and people)
A digital twin is a digital representation of a real-world entity or system. The implementation of a digital twin is an encapsulated software object or model that mirrors a unique physical object, process, organization, person or other abstraction. According to the article TREVOR PAGLEN: TRAINING HUMANS, “Training Humans” explores two fundamental issues in particular: a). How humans are represented, interpreted and codified through training datasets? b). How technological systems harvest, label and use this material? With the help of a digital twin, the coherence between the digital and physical worlds can be truly ensured in the full lifecycle context. Followed by my initial prompt, I decided to focus on the machine learning potentials specific to the music industry.
Simulate AI interactions for experimentation, research, and concepting.
Design interfaces and affordances that collect user feedback for AI/Agents.
Collaborate with AI/Agents to explore new co-creation processes.
The Digital Twin can enable Musicasity to generate personalized AI music and add customized sound effects based on user inputs? It will dramatically diversify the innovation on the application layer powered by the following technologies.
DDSP is a new approach to realistic neural audio synthesis of musical instruments that combines the efficiency and interpretability of classical DSP elements (such as filters, oscillators, reverberation, etc.) with the expressivity of deep learning. The application layer of the technology has been commonly used in commercial products, but DDSP-VST trains models way faster. Therefore, the step of self-testing a thousand samples was skipped. But basically, the technology enables users to:
1. Change the tone/effect of an existing soundtrack / voice
2. Add personal components to the generated content through model training
One of the biggest barriers has always been allowing creatives to train their own models, as the training process usually requires a lot of training data and computational power. DDSP overcomes these challenges with the built-in structure of the model. This enables anyone to train their own model with as little as a few minutes of audio and a couple hours on a free Colab GPU.
When a painter creates a work of art, she first blends and explores color options on an artist’s palette before applying them to the canvas. This process is a creative act in its own right and has a profound effect on the final work. Musicians and composers have mostly lacked a similar device for exploring and mixing musical ideas, but MusicVAE is a machine learning model that hcreate palettes for blending and exploring musical scores. Therefore, for the purpose of establishing a composition platform with both a easy-to-use composing tool and a music twin community, the technology can enable users to extend the length of an existing MIDI file that adds variation to the original vibe.
Magenta Studio is a MIDI plugin for Ableton Live. It also supports a standalone version for testing. It contains 5 tools: Continue, Groove, Generate, Drumify, and Interpolate, which let users apply Magenta models to your MIDI files. In order to form a concept that both follows the brief and helps achieve a better outcomes, I mainly tested the models from Continue and Interpolate features.
Continue uses the predictive power of recurrent neural networks (RNN) to generate notes that are likely to follow your drum beat or melody. Give it an input file and it can extend it by up to 32 measures/bars. This can be helpful for adding variation to a drum beat or creating new material for a melodic track. It typically picks up on things like durations, key signatures and timing. It can be used to produce more random outputs by increasing the temperature. Click to select a file (or drag and drop) that you would like to extend, then click Generate. The output files will be added to the output folder you selected.
The major form of a melody data is notes
I picked GarageBand and got it connected with my keyboard for direct data input through cable.
GarageBand is a basic and accessible tool to use, even though Magenta Studio is supposed to be a plugin.
The Beta version does offer two options, with Plugin Version and Standalone Applications, and I chose to test models with Standalone Apps to avoid external distraction.
I picked a track type(keyboard) to start composition.
In order to control variables, I assume that the concept is based on keyboard input/software instrument only) even though other options like drum and voice are also provided in GarageBand.
Moreover, to reduce the distractions caused by genres, I picked both Pop songs and Classic songs, with total four melodies.
I used keyboard to input the notes manually by playing 10 times for each song to get adequate datasets and purposefully trained the machine.
To ensure that all the data can be collected meaningfully, I chosed to have 10 clips with 2 bars/measures, 20 clips with 4 bars/measures, and 10 clips with 8 bars/measures randomly assigned to the total 40 clips to avoid biased genres.
I saved each melody that I noted down into a separate band file with single sound track, and export it to MP3 file. And I picked AnyConv, a MP3 to MIDI Converter to convert cycled sound track into MIDI file to meet the requirements.
I picked AnyConv, a MP3 to MIDI Converter to convert cycled sound track into MIDI file to meet Magenta Studio requirements.
To control variables, I trained 320 models with the Continue App, using the 40 datasets I have. For each dataset, I generated 4 variations. Whenever I generated new melodies, I maintain the same length of 2 bars and the same temperature of 1.0 to test the extension performance of MusicVAE.
The Models I trained with the Continue feature came with satisfactory results. And you can tell by listening to the clips that I randomly picked from each of the two genres.
The tone and pitch of the generated content are similar to the original input, while the notes have indicated differences and varieties and have extended my attributes through mixing, mashing, collaborating and formulating new in-between aesthetics from these various processes.
Default landing page to kick-off
Browse for trending Melodies
The playground for music composition
With DDSP-VST and MusicVAE, users are able to create their digital twin by following the steps below:
Import local files to the central editor.
Set up stats in the AI synthesizer.
Generate two variations at a time and drag them to the editor.
Adjust its tone by making changes to the sound/properties, and adding reverb and delay effects.
The link tree to reach out Music Talents
auto-generated Casts Profile
Developed a thorough understanding of the machine learning concept and its practical applications, including the importance of accurate data labeling and the potential limitations and risks associated with machine learning models.
Gained insights into the potential of AI for artistic and aesthetic purposes by training hundreds of datasets via Runway ML and Magenta Studio.
With a strong foundation in the skills and knowledge necessary to continue exploring and innovating in AI/ML, I am prepared for a career in tech-centered research and methods to facilitate technological innovation.