Musicasity

Your Digital Twin in music powered by agentive technology

Musicasity is an AI-powered music composition platform that highly supports casts collaboration where the “essence” of melody tracks are captured and trained to generate new casting styles that can be stored as new cast characters. It works by mimicking notes that are put down on the keyboard through DDSP-VST: Neural Audio Synthesis for All and MusicVAE. It also provides melody extensions that author individuals’ music style on user behalves based on what users specifically inputted.

Project Type

Academic Case Study

Solo Project in the context of AI / Machine Learning

Timeline & Focus

2022 Q2-Q3(10 weeks)
Tech-Centered Research, Data Testing, and UI Design

Project Instructor

Jenny Rodenhouse
Todd Masilko

Tools Used

Runway ML
Garage Band
Magenta Studio

My Goal

Project Value

As Artificial Intelligence has been integrated into our lives through interfaces, products, services, and cities, we must work to advance the field on behalf of the user, seeing beyond stereotypes to design for humans and machines. Through the development of this tech-centered research project, I understood the hood of AI and how it is built conceptually, learned to train models to witness how the process works and what can go wrong with them by collecting and labeling data correctly, and explored how AI can be used for the purpose of creative visual expression.

Prompt

tech-centric design approach

How can we “design” Artificial Intelligence and influence its applications by understanding the capabilities of machine learning and leveraging the power of Detailed Control of Musical Performance via Hierarchical Modeling and Variational AutoEncoder like MusicVAE?

Context

Why Musicasity?

According to the Oxford Scholarship, Music is a powerful means of communication where it provides a means by which people can share emotions, intentions, and meanings even though their spoken languages may be mutually incomprehensible. And from assumptions, agentive technology could help communicate those feelings and sensations.

Target Audience

Meet Crystal

Crystal is a vlogger on Youtube with over 120k fans. Her contents are related to her lifestyle, her DJ career, and her part-time job at a high school teaching music. She typically curates her own music style at clubs, makes deliberated contents to support her coaching for kids, and makes the background soundtrack for her vlogs. Besides those, she is also super addicted to KPOP culture.

However, it normally takes her a long time to get the exact melody sorted out except for music when create something brand-new that connects her with her students and social media audiences. And she is expecting to make a change...

Ways of Machine Seeing #1

Runway ML object recognition Testing

The style-based GAN architecture (StyleGAN) used in Runway ML yields state-of-the-art results in data-driven unconditional generative image modeling. The improved model redefines the state of the art in unconditional image modeling, both in terms of existing distribution quality metrics as well as perceived image quality. (V.2022Q2)

Step-by-step Tasks:

Select 3 different objects, people, or things and create an expansive index / mind map (multiple interpretations and visual references) of each, including multiple viewpoints, associations, and perspectives of each.
Narrow down the index and create 5 different datasets / interpretations. (Each dataset hits Runway ML's minimum number of 500 images. Train the model for multiple times and compare results.

Experiences With Training ML Models:

P1. Mindmaps (include labels and connections)

P2. Pose Estimation Detection Exploration

P3. Successfully Trained Models (objects and people)

Ways of Machine Seeing #2

Digital Twin

A digital twin is a digital representation of a real-world entity or system. The implementation of a digital twin is an encapsulated software object or model that mirrors a unique physical object, process, organization, person or other abstraction. According to the article TREVOR PAGLEN: TRAINING HUMANS, “Training Humans” explores two fundamental issues in particular: a). How humans are represented, interpreted and codified through training datasets? b). How technological systems harvest, label and use this material? With the help of a digital twin, the coherence between the digital and physical worlds can be truly ensured in the full lifecycle context. Followed by my initial prompt, I decided to focus on the machine learning potentials specific to the music industry.

Prompts:

Simulate AI interactions for experimentation, research, and concepting.
Design interfaces and affordances that collect user feedback for AI/Agents.
Collaborate with AI/Agents to explore new co-creation processes.

What If?

The Digital Twin can enable Musicasity to generate personalized AI music and add customized sound effects based on user inputs? It will dramatically diversify the innovation on the application layer powered by the following technologies.

Technological Innovation

DDSP-VST: Neural Audio Synthesis for All

What Can It Do?

DDSP is a new approach to realistic neural audio synthesis of musical instruments that combines the efficiency and interpretability of classical DSP elements (such as filters, oscillators, reverberation, etc.) with the expressivity of deep learning. The application layer of the technology has been commonly used in commercial products, but DDSP-VST trains models way faster. Therefore, the step of self-testing a thousand samples was skipped. But basically, the technology enables users to:
1. Change the tone/effect of an existing soundtrack / voice
2. Add personal components to the generated content through model training
‍

Working Mechanism

One of the biggest barriers has always been allowing creatives to train their own models, as the training process usually requires a lot of training data and computational power. DDSP overcomes these challenges with the built-in structure of the model. This enables anyone to train their own model with as little as a few minutes of audio and a couple hours on a free Colab GPU.

Technological Innovation

MusicVAE

What Can It Do?

When a painter creates a work of art, she first blends and explores color options on an artist’s palette before applying them to the canvas. This process is a creative act in its own right and has a profound effect on the final work. Musicians and composers have mostly lacked a similar device for exploring and mixing musical ideas, but MusicVAE is a machine learning model that hcreate palettes for blending and exploring musical scores. Therefore, for the purpose of establishing a composition platform with both a easy-to-use composing tool and a music twin community, the technology can enable users to extend the length of an existing MIDI file that adds variation to the original vibe.

Working Mechanism

Musical sequences are fundamentally high dimensional. Exploring melodies by enumerating all possible variations is not feasible(e.g the space of all possible monophonic piano melodies), and would result in lots of unmusical sequences that essentially sound random. A JavaScript library and pre-trained models were released for doing inference in the browser, but one limitation with this type of auto-encoder is that it often has “holes” in its latent space. This means that if you decoded a random vector, it might not result in anything realistic. For example, NSynth is capable of reconstruction and interpolation, but it lacks the realism property, and thus the ability to randomly sample, due to these holes.

Application Research For MusicAVE

Magenta Studio

Magenta Studio is a MIDI plugin for Ableton Live. It also supports a standalone version for testing. It contains 5 tools: Continue, Groove, Generate, Drumify, and Interpolate, which let users apply Magenta models to your MIDI files. In order to form a concept that both follows the brief and helps achieve a better outcomes, I mainly tested the models from Continue and Interpolate features.
‍
Continue uses the predictive power of recurrent neural networks (RNN) to generate notes that are likely to follow your drum beat or melody. Give it an input file and it can extend it by up to 32 measures/bars. This can be helpful for adding variation to a drum beat or creating new material for a melodic track. It typically picks up on things like durations, key signatures and timing. It can be used to produce more random outputs by increasing the temperature. Click to select a file (or drag and drop) that you would like to extend, then click Generate. The output files will be added to the output folder you selected.

Features Showcase

Train My Models

AI/Machine Learning Testing

Step1 - Collecting Data

The major form of a melody data is notes
I picked GarageBand and got it connected with my keyboard for direct data input through cable.
GarageBand is a basic and accessible tool to use, even though Magenta Studio is supposed to be a plugin.
The Beta version does offer two options, with Plugin Version and Standalone Applications, and I chose to test models with Standalone Apps to avoid external distraction.

I picked a track type(keyboard) to start composition.
In order to control variables, I assume that the concept is based on keyboard input/software instrument only) even though other options like drum and voice are also provided in GarageBand.
Moreover, to reduce the distractions caused by genres, I picked both Pop songs and Classic songs, with total four melodies.

I used keyboard to input the notes manually by playing 10 times for each song to get adequate datasets and purposefully trained the machine.
To ensure that all the data can be collected meaningfully, I chosed to have 10 clips with 2 bars/measures, 20 clips with 4 bars/measures, and 10 clips with 8 bars/measures randomly assigned to the total 40 clips to avoid biased genres.

Step2 - Generate New Melodies With Extended Bars/Measures

I saved each melody that I noted down into a separate band file with single sound track, and export it to MP3 file. And I picked AnyConv, a MP3 to MIDI Converter to convert cycled sound track into MIDI file to meet the requirements.
I picked AnyConv, a MP3 to MIDI Converter to convert cycled sound track into MIDI file to meet Magenta Studio requirements.
To control variables, I trained 320 models with the Continue App, using the 40 datasets I have. For each dataset, I generated 4 variations. Whenever I generated new melodies, I maintain the same length of 2 bars and the same temperature of 1.0 to test the extension performance of MusicVAE.

Step3 - Continue Model Validation

The Models I trained with the Continue feature came with satisfactory results. And you can tell by listening to the clips that I randomly picked from each of the two genres.
The tone and pitch of the generated content are similar to the original input, while the notes have indicated differences and varieties and have extended my attributes through mixing, mashing, collaborating and formulating new in-between aesthetics from these various processes.

From My Keyboard Output To AI Generated Digital Twin

Für Elise (feat. Serafina)

Classic Music | Keyboard Input

https://bucolic-cajeta-c74ef0.netlify.app/Alice_Original.mp3

Für Elise (feat. Serafina's Digital Twin)

Classic Music | Continue AI Generation

https://bucolic-cajeta-c74ef0.netlify.app/Alice_Continue.mp3

Love Story (feat. Serafina)

Pop Song | Keyboard Input

https://bucolic-cajeta-c74ef0.netlify.app/LoveStory_Orginal.mp3

Love Story (feat. Serafina's Digital Twin)

Pop Song | Continue AI Generation

https://bucolic-cajeta-c74ef0.netlify.app/LoveStory_Continue.mp3

Leveraging Technological Innovation

Main Features

01 | Immersive Creator Vibe

Dashboard

Default landing page to kick-off

Casting Playlist

Browse for trending Melodies

02 | Easy-to-Use AI Composition Editor

Production

The playground for music composition

With DDSP-VST and MusicVAE, users are able to create their digital twin by following the steps below:

Import local files to the central editor.
Set up stats in the AI synthesizer.
Generate two variations at a time and drag them to the editor.
Adjust its tone by making changes to the sound/properties, and adding reverb and delay effects.

03 | Digital Twin Management

Musician Profile

The link tree to reach out Music Talents

Cast Profile

auto-generated Casts Profile

summary

the project has Precisely predicted the potential of AI ahead of the OpenAI gpt 3.5 breakthrough release in dec. 2022.
i Developed a thorough understanding of the machine learning concept and its practical applications, including the importance of accurate data labeling and the potential limitations and risks associated with machine learning models.
i Gained insights into the potential of AI for artistic and aesthetic purposes by training hundreds of datasets via Runway ML and Magenta Studio.
With a strong foundation in the skills and knowledge necessary to continue exploring and innovating in AI/Machine Learning, I am prepared for a career in tech-centered research and methods to facilitate technological innovation.