Introduction


The connection between mental health and music has been a subject of extensive exploration, with a growing body of research shedding light on the impact that music can have on mental well-being. The therapeutic potential of music spans diverse populations and psychological conditions. Numerous studies have explored this link, revealing how music can positively influence mood, reduce stress, and alleviate symptoms of mental disorders. Harvard Medical School noted that different music therapy tools, like learning or playing an instrument, listening, and singing, can be a form of healing for our mental and physical health. Additionally, the Brain, Behavior, & Immunity - Health Journal revealed that interventions using music, such as music therapy, can positively treat symptoms related to a variety of disorders and improve the immune system. Music therapy has firmly established itself as an evidence-based practice, providing beneficial outcomes for holistic health.


Group three was intrigued with this specific therapeutic intervention, and wanted to explore the Music & Mental Health Survey Results dataset, curated to explore potential correlations between an individual’s music taste and their self-reported mental well-being. The aim of this project is twofold: firstly, to identify the most effective model for assessing cumulative self-reported mental health scores, and secondly, finding the optimal model for evaluating each individual self-reported mental health condition including Depression, Anxiety, OCD and Insomnia. By exploring these two questions, we hope to shed light on the interplay between music preferences and mental health, ultimately offering valuable insights on the application of music therapy and the mysteries of the human mind.


Group three aim to find the optimal model for evaluating cumulative self-reported mental health scores, encompassing an array of variables that intertwine music-related and non-music-related aspects. By selecting factors like favorite music genre, personal instrumental or composition experiences, and hours spent listening to music, we hope to unveil the patterns that tie an individual’s music profile and their overall mental well-being. Understanding the correlations between these variables could educate music therapists on crafting interventions tailored to an individual’s unique music preferences, and enhance therapeutic outcomes significantly. Furthermore, our exploration extends to identifying the most effective model for evaluating individual mental health conditions. Exploring different models and comparing which one is best for each condition can inform professionals on the best tools to analyze mental health conditions. Lastly, it will foster a deeper appreciation for the healing properties of music, solidifying its validity in healthcare.


In the following sections, we will present the results of our analyses, showcasing the effectiveness of various models in capturing self-reported mental health scores and shedding light on the interplay between music and the mind. Our hope is that these results will not only ignite excitement but also inspire further investigation into the subject, unlocking the potential of music as a tool for enhancing mental well-being and enriching our understanding of the human mind.


Data


Group three chose to analyze the dataset “Music & Mental Health Survey Results” which was collected by Catherine Rasgaitis from the University of Washington via Google Form. The survey form was posted on various platforms, such as Reddit forums, Discord servers, and social media sites, to ensure that observations are not limited to students on campus but are open to populations of all ages and locations. Additionally, the survey was advertised through the use of posters and “business cards” at various public locations, including libraries, parks, and other venues. The original dataset contained 616 observations and 33 columns. Each observation is representative of the respondents’ musical background and listening habits, their rank of how often they listen to 16 different music genres, and their rank of Anxiety, Depression, Insomnia, and OCD on a scale of zero to ten (zero representing no experience with the condition and ten representing experience with the condition on a regular basis). All respondents have approved the publication and utilization of their data.

The variables we decided to utilize that relate to musical background and listening habits include:


Variable Description
Age The age of the respondent.
Hours Per Day The number of hours the respondent listens to music daily.
While Working A binary variable indicating whether the respondent listens to music while studying or working.
Instrumentalist A binary variable revealing whether the respondent plays an instrument regularly.
Composer A binary variable signifying whether the respondent composes music.
Exploratory A binary variable indicating whether the respondent actively explores new artists/genres.
Foreign Languages A binary variable disclosing whether the respondent regularly listens to music with lyrics in a language they are not fluent in.


Variables such as Favorite Genre, BPM, and Primary Streaming Service were excluded due to factors like the high number of unique variables or unreliability due to the self-reporting method. However, group three did consider all the frequency variables, which denote whether respondents listen to 16 different music genres: Classical, Country, EDM, Folk, Gospel, Hip-Hop, Jazz, K-Pop, Latin, Lofi, Metal, Pop, Rap, Rock, Video Game Music and R&B. Additionally, Group three considered all variables that represented the respondents’ rank of Anxiety, Depression, Insomnia, and OCD . With consideration that each person has subjective understanding for the scale, we can only predict an approximation for a person’s self-reported mental health score. Figure 1.1 -1.4 display a preview of the table containing the variables group three focused on analyzing:

Figure 1.1: Preview of Musical Background and Listening Habits Data
Age Hours Per Day While Working Instrumentalist Composer Exploratory Foreign Languages
18 4.0 0 0 0 0 1
61 2.5 1 0 1 1 1
18 4.0 1 0 0 1 0
Figure 1.12: Preview of Music Genre Listening Frequency Data
Classical Country EDM Folk Gospel Hip Hop Jazz K-Pop
0 0 1 0 0 0 0 1
1 0 0 0 1 0 1 1
0 0 0 0 0 1 0 1
Figure 1.13: Preview of Music Genre Listening Frequency Data
Latin Lofi Metal Pop Rap Rock R&B Video Game Music
0 1 1 0 0 0 1 0
1 1 0 1 0 1 0 0
1 1 0 1 1 1 0 0
Figure 1.2: Preview of Mental Health Score Data
Anxiety Depression Insomnia OCD Combined MH(Std)
7 7 10 2 2.868
9 7 3 3 1.694
7 2 5 9 2.044
8 8 7 7 4.350
4 8 6 0 0.129


All the variables in the table are noted to be numeric, including the binary variables and frequency variables. The responses for binary variables were symbolized as “1” representing “yes,” while “0” represented “no.” Additionally, the frequency responses, which were in the form of a Likert-type question, were converted into a binary and then numerical format. Responses such as “never” and “rarely” were classified as “low frequency,” while responses like “sometimes” and “very frequently” were classified as “high frequency.” Subsequently, the responses were treated as numeric binary values, where “1” represented “high frequency,” and “0” represented “low frequency.” The purpose of converting the variables into a uniform numeric format was to enable them as predictors, significantly aiding group three in answering the second question about finding which model works best for each individual mental health condition. Furthermore, group three was keen to see which predictors hold greater significance for each mental health condition. The original categorical format of the variables were not suitable for linear regression, while binary conversion eliminated unnecessary complexity for our path to find the best model and predictors.

Another significant variable on the table is the Std Combined MH, also known as the Standardized Mental Health Score for each respondent. This specific variable was calculated by standardizing the four mental health-related variables (Depression, Anxiety, Insomnia, and OCD) and taking their sum. This score reflects the overall self-reported mental health status of each individual, considering the contributions of Depression, Anxiety, Insomnia, and OCD. The importance of standardization can be observed after viewing the histogram below, which represents the distribution of scores for each of the four mental health conditions:


It can be noted that the distributions of each of the histograms differ. Specifically, Depression is bi-modal, Anxiety is left-skewed, and Insomnia and OCD are right-skewed. This difference in distribution can lead to misleading interpretations if direct comparisons are made between the raw data without standardization. However, through standardization, we can make direct comparisons of proportions and infer the relative scores across the different mental health-related variables. Finding the Standardized Self-Reported Mental Health Score will aid group three with navigating question one, which is driven to see which model works best for finding the cumulative self-reported mental health scores.


Results


Group Three aims to investigate two questions: “What is the most effective model for assessing cumulative self-reported mental health scores?” and “What is the optimal model for evaluating each individual self-reported mental health condition?” Given that these two questions evaluate similar predictions, Group Three decided to utilize a similar modeling structure to predict all five variables and investigate the optimal model for predictions.


Group three utilized cross-validation which fits the original dataset into ten folds of training and testing datasets, and uses linear regression models as the modeling method. Group three analyzed the two questions by constructing and comparing five models, each individually applied to every observation in the data set. To determine the model with the best performance, group three condensed each model’s average Mean Absolute Error (MAE) into a table. MAE is a measure of how far, on average, the predictions of a model are from the actual combined self-reported mental health scores.


Starting with all variables, Group Three used a variables’ p-value to reduce the number of predictors to determine which ones are the most informative. Group Three then plotted the models together with an empty linear regression model. The empty model in this scenario functions as a reference model to assess each model’s predicting ability. In the context of our data, the empty model predicts a person’s self-reported mental health score without considering any variables related to their background and listening habits. Observe each model’s MAE in the table below:


Standardized Mental Health Scores Prediction


The plot below displays the five models that aim to predict standardized self-reported mental health scores with various predictors. The x-axis on the plot is the variable Hours Per Day.



Now we will construct a table to compare the four models with the empty model based on MAE. Since the empty model has no predictor, the MAE should be the largest. We will then determine the best model that can predict the Standardized Mental Health Scores.


Figure 2.1: Model Evaluation
Model Predictors MAE
Second All Variables + Age * Country [Frequency] 2.199
First All Variables 2.200
Fourth Only Significant Variables and Interactions 2.257
Third Only Significant Variables 2.277
Fifth Empty 2.301


It appears that the second model, with all variables and one significant interaction, provides the best information to predict the self-reported mental health score more accurately. On the contrary, the lack of any variables in the empty model performs the worst in predicting self-reported mental health score as expected. Thus, we conclude that model that contains the combination of all individual predictors and the best interaction effect can predict the Standardized Self-Reported Mental Health Scores the best. However, none of the models fit the observations by simple visualization. We will now use similar modelling structures to examine each individual self-reported mental health conditions.


Individual Mental Health Scores Predictions


We will start by examining the self-reported mental health condition Depression.



Now we will look at the comparison between all models’ MAE to pick the best model that predict Depression.


Figure 2.2: Model Evaluation
Model Predictors MAE
Second All Variables + Age * Video Game Music [Frequency] 2.416
First All Variables 2.425
Fourth Only Significant Variables and Interactions 2.547
Third Only Significant Variables 2.579
Fifth Empty 2.582


Next, we will look at the next self-reported mental health condition, OCD.



Below displays the model comparison and the corresponding most optimal model.


Figure 2.3: Model Evaluation
Model Predictors MAE
Second All Variables + Age * Pop [Frequency] 2.277
First All Variables 2.279
Fourth All Significant Variables and Interactions 2.319
Third All Significant Variables 2.359
Fifth Empty 2.384


Based on the table above, model 2 again performs the best among the five models.

We will now use similar logics to examine the modelling performance for Insomnia.



We will now calculate the corresponding MAEs and exhibit the best model for Insomnia.


Figure 2.4: Model Evaluation
Model Predictors MAE
Second All Variables + Age * Pop [Frequency] 2.584
First All Variables 2.591
Fourth All Significant Variables and Interactions 2.645
Third All Significant Variables 2.650
Fifth Empty 2.707


Based on the table above, model two performs the best for predicting Insomnia.

Lastly, we will examine the performance of each modeling for Anxiety.


Below is our last comparision table that displays all the models for Anxiety.

Figure 2.5: Model Evaluation
Model Predictors MAE
First All Variables 2.195
Second All Variables + Age * Pop [Frequency] 2.195
Fourth All Significant Variables and Interactions 2.236
Third All Significant Variables 2.291
Fifth Empty 2.299


Given the table above, model one predicts the best score for Anxiety. Interestingly, this is also the only self-reported mental health condition that has model one as the best-performing model.

Conclusion


In conclusion, Group Three’s project explored the intriguing link between music and mental health, using the Music & Mental Health Survey Results dataset. This exploration was done through the investigation of two questions: “What is the most effective model for assessing cumulative self-reported mental health scores?” and “What is the optimal model for evaluating each individual self-reported mental health condition?” The group was driven to explore these questions to find valuable insights to the field of music therapy and mental health research. With question one, Group Three found that model two for the combined self-reported mental health scores, which had the lowest MAE, had the best model; while the empty model performed the worst. With question two, Group Three found that all the model evaluations have similar structures. It could be observed that the second model, which had all variables and one specific interaction, was the best for predicting the score for Depression, OCD, and Insomnia. However, the first model, which examined all variables alone, was the best for predicting the score for Anxiety. With this in mind, it is expected that Anxiety performs best as subjects are more likely to report higher levels of Anxiety as it doesn’t need a medical diagnosis. These findings lead to an ultimate conclusion that self-reported mental health conditions cannot be accurately predicted using music preferences alone, which the group did not expect.

While aiming to identify effective models for assessing cumulative self-reported mental health scores and individual mental health conditions, Group Three faced challenges due to the limited number of observations in the dataset. Given the small sample size, the models did not perform as well as hoped. The limited observations affected the reliability and generalizability of our results, making it challenging to draw robust conclusions. Moreover, linear regression models can be powerful tools for predicting relationships between variables, but they are not immune to errors. One factor that could have contributed to the presence of errors in our linear regression models was choosing to not iteratively use forward or backward selection to find the combination of variables that minimizes the mean absolute error.

To mitigate these errors it would be integral to use different approaches. For example, stepwise regression to iteratively select the best combination of variables that produce the lowest mean absolute error. Additionally, creating a neural network and utilizing gradient descent to better minimize the cost function could be beneficial to revamping our model.

Despite the limitations, the project serves as a stepping stone for future research in this field. The group recognizes the need for larger and more diverse datasets to provide more meaningful insights into the relationship between music preferences and mental well-being. In the real world, understanding the therapeutic potential of music remains crucial for improving mental health interventions. Moving forward, Group Three encourages researchers to try more diverse models to optimize accuracy. By doing so, the true potential of music therapy and its role in promoting holistic well-being can be unlocked for individuals worldwide.