as we have nearly completed the second week of the composition phase, I thought I should share some information regarding the use of Machine Learning (ML) in the project and its purpose.
The main purpose of the use of ML in this project is to assist the compositional process. The generated fragments can be viewed as a pool of material and/or more abstract musical ideas, which you can borrow, transform and incorporate into the composition. The ML algorithm used, specifically a Neural Network, is trying to "learn" the aesthetic preferences of the group and propose sound material that matches them. This means that the algorithm learns from your evaluations. This is an experiment to see whether aesthetic preferences can in fact be modelled by ML - a question that we find both intriguing and challenging!
It's important to know that the ML algorithm focuses exclusively on form, learning from features such as the density and durations of sound events, the spectral similarity between successive sounds etc and ignoring parameters such as panning.
As we are still in the beginning of the training process, the algorithm has not yet "converged" to the preferences of the group, so for now the generated fragments are more or less random. For that reason, your evaluations are very important, so please take the time to listen to all the fragments carefully before you evaluate them.
Finally, if you have any questions or thoughts regarding the use of ML or the generated fragments, please feel free to post them here!
Thanks, this is very helpful to know about.
One question I have is about the evaluations of our phrases.
Kosmas indicated that we should evaluate according to their context in the emerging composition. I.e. how is each phrase contributing to the possible form/structure/development of the piece we are composing. This is obvious, since we are implicitly selecting a favourite phrase to stick with, and considering implications for the future if the work. But I'm curious about how it affects the algorithm.
When you mention form as density and durations of events, does this apply both within and across the chain of phrases? I.e. will it learn that the highest scoring 3rd generation phrase is the most favoured continuation of the highest scoring 2nd generation phrase, or will it just learn from whatever happens within each phrase discretely, assuming that a high score is a 'great' phrase regardless of its context?
for the time being, the algorithm learns from individual fragments with the purpose to generate fragments that get high scores. I'm thinking of expanding the model (or adding a second model) to predict possible continuations of what has already been composed, but I don't have enough data for that yet. Perhaps as we progress it will make more sense to explore that option.
Thanks for clarifying. I can certainly hear that the algorithm is improving its output, which is very exciting. In some ways I hope it won't learn too much, because the interesting elements are the ones that seem free of aesthetic conventions! As a related research question, I wonder to what extent such an algorithm might develop its own aesthetic under the influence of the group aesthetic. I imagine that's probably a more complex question...
A second level model would be interesting as it would align the system more with the thinking behind the grading, assuming that composers grade fragments in respect of their place in the existing structure. However, I can completely understand that this would require a lot of work and much more data, and we're only just come a few sections into the piece!
In any case, I feel positive that the system is already making a valuable contribution to the composition process.