Automatic Music Generation Project using Deep Learning

Free Machine Learning courses with 130+ real-time projects Start Now!!

Automatic Music Generation is a process where a system composes short pieces of music using different parameters like pitch interval, notes, chords, tempo, etc. In this Project we are going to use Piano Instrument with the following terms:

  • Note: This is a sound produced by a single key.
  • Chords: The combination of 2 or more notes is called a chord.
  • Octave: The distance between two notes is stated as an octave in a piano. It is specifically the gap between the two notes that share the same letter name.

About this Project :

In this project, we will be creating an Automatic Music Generation model using LSTM. We will fetch notes from all music files which will then be fed into the model for prediction. Then finally we will create a MIDI file using these predicted notes.

LSTM for Automatic Music Generation :

Long Short Term Memory(LSTM) is a type of RNN (Recurrent Neural Network) that solves some scenarios where RNN failed. LSTM solves Long-Term dependency problem of RNN i.e. RNN networks store data of previous output in a memory for a very short period of time.

LSTM also solves the problem of Vanishing Gradient i.e. in order to get the best result, the model tries to minimize the loss after every time step by calculating loss with respect to some weights. After reaching a certain period, this weight becomes so less that it approximates to zero or vanishes and the model stops training.

Drawback of LSTM:

LSTM requires lots of resources and time to get trained for real world applications. Randomly initializing different weights makes LSTM networks behave similar to that of feed forward neural networks. Therefore they require small weight initialization instead.

Our model will be a Many-to-one sequence model as there will be one output for every sequence of input notes after each timesteps.

Input to the LSTM model will be the amplitude(A) of these notes which are recorded at different intervals of time which computes hidden vectors and passes to the next layer for the next timestep(t).

lstm architecture

Dataset for Automatic Music Generation Project:

Please download the dataset for the automatic music generation project from the following link Classical Music MIDI

This dataset consists of classical piano midi files containing compositions of 19 famous composers scraped from Official Classical Piano Midi

automatic music generation dataset file

Download Automatic Music Generation Project Code

Please download the source code of automatic music generation with tensorflow: Automatic Music Generation Project Code

Automatic Music Generation Project Prerequisites

Install the following libraries using pip :

pip install numpy, music21, tensorflow, sklearn

The versions which are used in automatic music generation project for python and its corresponding modules are as follows:

  1. Python: 3.8.5
  2. TensorFlow: 2.3.1  Note: TensorFlow version should be 2.2 or higher in order to use Keras or else install keras directly
  3. music21: 5.5.0
  4. Numpy: 1.19.5
  5. sklearn: 0.24.2

Music21 is a python library that is used to parse and read various musical files. In this project we will be using the Musical Instrument Digital Interface (MIDI) file which is a small file size with ease of modification and manipulation and a wide choice of electronic instruments. It is a universally accepted file format, which means that music produced by one synthesiser in MIDI format can be modified by another synthesiser.

Project Structure

  • All Midi Files/: This is the dataset folder containing various midi files of different composers.
  • auto_music_gen.py: In this file, we will build, train and test our model.
  • s2s/: This directory contains optimizer, metrics, and weights of our trained model.
  • pred_music.mid: This is a music file of predicted notes.

Steps for Automatic Music Generation Project:

1. Import Libraries

Firstly we will import all the required libraries which have been shared in the prerequisites section.

Code:

#DataFlair Automatic Music Generation Project
#load all the libraries
from music21 import *
import glob
from tqdm import tqdm
import numpy as np
import random
from tensorflow.keras.layers import LSTM,Dense,Input,Dropout
from tensorflow.keras.models import Sequential,Model,load_model
from sklearn.model_selection import train_test_split

2. Reading and Parsing the Midi File

We will now read the midi file dataset. We will be using “Schubert” composed files. You can use more or less depending on your system.

For this project, we will be only working on files that contain sequential streams of Piano data. We will separate all files by their instruments and use only Piano. Piano stream from the midi file contains many datas like Keys, Time Signature, Chord, Note etc. We don’t require all of this except Notes and Chords to generate music. Lastly, we will return arrays of notes and chords.

Code:

def read_files(file):
 notes=[]
 notes_to_parse=None
 #parse the midi file
 midi=converter.parse(file)
 #seperate all instruments from the file
 instrmt=instrument.partitionByInstrument(midi)

for part in instrmt.parts:
 #fetch data only of Piano instrument
 if 'Piano' in str(part):
 notes_to_parse=part.recurse()

 #iterate over all the parts of sub stream elements
 #check if element's type is Note or chord
 #if it is chord split them into notes
 for element in notes_to_parse:
 if type(element)==note.Note:
  notes.append(str(element.pitch))
 elif type(element)==chord.Chord:
  notes.append('.'.join(str(n) for n in element.normalOrder))

#return the list of notes
return notes

#retrieve paths recursively from inside the directories/files
file_path=["schubert"]
all_files=glob.glob('All Midi Files/'+file_path[0]+'/*.mid',recursive=True)

#reading each midi file
notes_array = np.array([read_files(i) for i in tqdm(all_files,position=0,leave=True)])

3. Exploring the dataset

Let’s check how many unique notes we have and what is the distribution of these notes.

Code:

#unique notes
notess = sum(notes_array,[])
unique_notes = list(set(notess))
print("Unique Notes:",len(unique_notes))

#notes with their frequency
freq=dict(map(lambda x: (x,notess.count(x)),unique_notes))

#get the threshold frequency
for i in range(30,100,20):
print(i,":",len(list(filter(lambda x:x[1]>=i,freq.items()))))

Output:

explore dataset

As you can see, we have 304 unique notes and most of our notes have frequencies greater than 30 or 50. It will become hard to train our model with all the notes therefore for this project we will be going to use 50 as a threshold frequency. So we will take only those notes which have frequencies more than 50. You can anytime change these parameters.

Also, we will change our ‘notes_array’ which will contain notes that are greater than threshold frequency.

Code:

#filter notes greater than threshold i.e. 50
freq_notes=dict(filter(lambda x:x[1]>=50,freq.items()))

#create new notes using the frequent notes
new_notes=[[i for i in j if i in freq_notes] for j in notes_array]

We are going to create two dictionaries where one will have notes index as a key and notes as value and other will be the reverse of the first i.e key will be notes and value will be its respective index. We will be going to see its use later.

Code:

#dictionary having key as note index and value as note
ind2note=dict(enumerate(freq_notes))

#dictionary having key as note and value as note index
note2ind=dict(map(reversed,ind2note.items()))

4. Input and Output Sequence for model

Now we will create input and output sequences for our model. We will be using a timestep of 50. So if we traverse 50 notes of our input sequence then the 51th note will be the output for that sequence. Let’s take an example to see how it works.

We will use ‘DataFlair is best for machine learning projects’ sentence and a timestep of 2. So we will provide 2 words at every input to get the output.

(x)                                  (y)

DataFlair is,                 best
is best,                           for
for machine,                 learning
machine learning        projects

And so on as you can see after feeding input(x) of 2 words(timesteps) our output(y) is the next word. As our model requires numeric data, we will convert all notes to its respective index value using the “note2ind” (note to index) dictionary which we have created earlier.

Code:

#timestep
timesteps=50

#store values of input and output
x=[] ; y=[]

for i in new_notes:
 for j in range(0,len(i)-timesteps):
  #input will be the current index + timestep
  #output will be the next index after timestep
  inp=i[j:j+timesteps] ; out=i[j+timesteps]

  #append the index value of respective notes
  x.append(list(map(lambda x:note2ind[x],inp)))
  y.append(note2ind[out])

x_new=np.array(x)
y_new=np.array(y)

5. Training and Testing sets

We will reshape our array for our model and split the data into 80:20 ratio. 80% for the training set and 20% for the testing set.

Code:

#reshape input and output for the model
x_new = np.reshape(x_new,(len(x_new),timesteps,1))
y_new = np.reshape(y_new,(-1,1))

#split the input and value into training and testing sets
#80% for training and 20% for testing sets
x_train,x_test,y_train,y_test = train_test_split(x_new,y_new,test_size=0.2,random_state=42)

6. Building the model

As we have discussed earlier we are going to use LSTM model architecture. We will use 2 stacked LSTM layers with a dropout rate of 0.2. Dropout basically prevents overfitting while training the model, while it does not affect the inference model. Finally we will be using a fully connected Dense layer for output.

Output dimension of the Dense Layer will be equal to the length of our unique notes along with the ‘softmax’ activation function which is used for multi-class classification problems.

Code:

#create the model
model = Sequential()
#create two stacked LSTM layer with the latent dimension of 256
model.add(LSTM(256,return_sequences=True,input_shape=(x_new.shape[1],x_new.shape[2])))
model.add(Dropout(0.2))
model.add(LSTM(256))
model.add(Dropout(0.2))
model.add(Dense(256,activation='relu'))
#fully connected layer for the output with softmax activation
model.add(Dense(len(note2ind),activation='softmax'))
model.summary()

Output:

model summary

7) Train the Model

After building the model, we will now train it on the input and output data. For this will be using ‘Adam’ optimizer on batch size of 128 and for total 80 epochs.

Code:

#compile the model using Adam optimizer
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam',metrics=['accuracy'])

#train the model on training sets and validate on testing sets
model.fit(
 x_train,y_train,
 batch_size=128,epochs=80,
 validation_data=(x_test,y_test))

After we finish training let’s save the model for prediction.

Code:

#save the model for predictions
model.save("s2s")

Output:

model s2s

8) Inference (sampling) phase.

In this section we will finally compose our own music. Using the trained model we will predict the notes.

Firstly generate a random integer(index) for our testing input array which will be our testing input pattern. We will reshape our array and predict the output. Using the ‘np.argmax()’ function, we will get the data of the maximum probability value. Convert this predicted index to notes using ‘ind2note’(index to note) dictionary. Our next music pattern will be one step ahead of the previous pattern. Repeat this process till we generate 200 notes. Again you can change this parameter as per your requirements.

Code:

#load the model
model = load_model(“s2s”)
#generate random index
index = np.random.randint(0,len(x_test)-1)
#get the data of generated index from x_test
music_pattern = x_test[index]
out_pred=[] #it will store predicted notes

#iterate till 200 note is generated
for i in range(200):

 #reshape the music pattern
 music_pattern = music_pattern.reshape(1,len(music_pattern),1)

 #get the maximum probability value from the predicted output
 pred_index = np.argmax(model.predict(music_pattern))
 #get the note using predicted index and
 #append to the output prediction list
 out_pred.append(ind2note[pred_index])
 music_pattern = np.append(music_pattern,pred_index)

 #update the music pattern with one timestep ahead
 music_pattern = music_pattern[1:]

9) Saving the file

Finally, we are ready with the predicted output notes. Now we will save them into a MIDI file.

Code:

output_notes = []
for offset,pattern in enumerate(out_pred):
#if pattern is a chord instance
if ('.' in pattern) or pattern.isdigit():
 #split notes from the chord
 notes_in_chord = pattern.split('.')
 notes = []
 for current_note in notes_in_chord:
  i_curr_note=int(current_note)
  #cast the current note to Note object and
  #append the current note
  new_note = note.Note(i_curr_note)
  new_note.storedInstrument = instrument.Piano()
  notes.append(new_note)

 #cast the current note to Chord object
 #offset will be 1 step ahead from the previous note
 #as it will prevent notes to stack up
 new_chord = chord.Chord(notes)
 new_chord.offset = offset
 output_notes.append(new_chord)

else:
 #cast the pattern to Note object apply the offset and
 #append the note
 new_note = note.Note(pattern)
 new_note.offset = offset
 new_note.storedInstrument = instrument.Piano()
 output_notes.append(new_note)

#save the midi file
midi_stream = stream.Stream(output_notes)
midi_stream.write('midi', fp='pred_music.mid')

You can listen the predicted music from the Shared Codes of this project.

Summary

We built a model for Automatic Music Generation. It predicted for every input note at different timesteps. The accuracy of the model is 80%, which is quite good as we have taken a midi file from only one composer. You can try with more composers which will increase the model accuracy.

You give me 15 seconds I promise you best tutorials
Please share your happy experience on Google

follow dataflair on YouTube

Leave a Reply

Your email address will not be published. Required fields are marked *