Harmonizing Emotion and Sound: A Novel Framework for Procedural Sound Generation Based on Emotional Dynamics
DOI: http://dx.doi.org/10.62527/joiv.8.4.3101
Abstract
Keywords
Full Text:
PDFReferences
F. Cuadrado, I. L. Cobo, T. M. Blanco, and A. Tajadura‐Jiménez, “Arousing the Sound: A Field Study on the Emotional Impact on Children of Arousing Sound Design and 3D Audio Spatialization in an Audio Story,” Frontiers Media, vol. 11, 2020, doi: 10.3389/fpsyg.2020.00737.
B. Kenwright, “There’s More to Sound Than Meets the Ear: Sound in Interactive Environments,” Institute of Electrical and Electronics Engineers, vol. 40, no. 4, pp. 62–70, 2020, doi: 10.1109/mcg.2020.2996371.
F. Abri, L. Gutiérrez, A. S. Namin, D. R. W. Sears, and K. S. Jones, “Predicting Emotions Perceived from Sounds,” Cornell University, 2020. doi: 10.48550/arXiv.2012.02643.
D. Jain et al., “A Taxonomy of Sounds in Virtual Reality,” 2021. doi: 10.1145/3461778.3462106.
Z. Jia, Y. Lin, X. Cai, C. Haobin, G. Haijun, and J. Wang, “SST-EmotionNet: Spatial-Spectral-Temporal based Attention 3D Dense Network for EEG Emotion Recognition,” 2020. doi: 10.1145/3394171.3413724.
P. Thiparpakul, S. Mokekhaow, and K. Supabanpot, “How Can Video Game Atmosphere Affect Audience Emotion with Sound,” 2021. doi: 10.1109/iciet51873.2021.9419652.
D. Williams, “Psychophysiological Approaches to Sound and Music in Games,” Cambridge University Press, 2021. doi: 10.1017/9781108670289.019.
A. Pinilla, J. García, W. L. Raffe, J. Voigt-Antons, R. Spang, and S. Möller, “Affective visualization in Virtual Reality: An integrative review,” Cornell University, 2020. doi: 10.48550/arXiv.2012.08849.
A. Schmitz, C. Holloway, and Y. Cho, “Hearing through Vibrations: Perception of Musical Emotions by Profoundly Deaf People,” Cornell University, 2020. doi: 10.48550/arXiv.2012.13265.
A. N. Nagele et al., “Interactive Audio Augmented Reality in Participatory Performance,” Frontiers Media, vol. 1, 2021, doi: 10.3389/frvir.2020.610320.
M. Geronazzo and S. Serafin, “Sonic Interactions in Virtual Environments: the Egocentric Audio Perspective of the Digital Twin,” Cornell University, 2022. doi: 10.48550/arXiv.2204.09919.
J. Atherton and G. Wang, “Doing vs. Being: A philosophy of design for artful VR,” Routledge, vol. 49, no. 1, pp. 35–59, 2020, doi: 10.1080/09298215.2019.1705862.
E. Svikhnushina and P. Pu, “Social and Emotional Etiquette of Chatbots: A Qualitative Approach to Understanding User Needs and Expectations,” Cornell University, 2020. doi: 10.48550/arXiv.2006.13883.
P. Slovák, A. N. Antle, N. Theofanopoulou, C. D. Roquet, J. J. Gross, and K. Isbister, “Designing for emotion regulation interventions: an agenda for HCI theory and research,” Cornell University, 2022. doi: 10.48550/arXiv.2204.00118.
N. Marhamati and S. C. Creston, “Visual Response to Emotional State of User Interaction,” Cornell University, 2023. doi: 10.48550/arXiv.2303.17608.
X. Mao, W. Yu, K. D. Yamada, and M. R. Zielewski, “Procedural Content Generation via Generative Artificial Intelligence,” Cornell University, 2024. doi: 10.48550/arXiv.2407.09013.
D. Serrano and M. Cartwright, “A General Framework for Learning Procedural Audio Models of Environmental Sounds,” Cornell University, 2023. doi: 10.48550/arXiv.2303.02396.
T. Marrinan, P. Akram, O. Gurmessa, and A. Shishkin, “Leveraging AI to Generate Audio for User-generated Content in Video Games,” Cornell University, 2024. doi: 10.48550/arXiv.2404.
K. Fukaya, D. Daylamani-Zad, and H. Agius, “Intelligent Generation of Graphical Game Assets: A Conceptual Framework and Systematic Review of the State of the Art,” Cornell University, 2023. doi: 10.48550/arXiv.2311.
C. Bossalini, W. L. Raffe, and J. García, “Generative Audio and Real-Time Soundtrack Synthesis in Gaming Environments,” 2020. doi: 10.1145/3441000.3441075.
A. Dash and K. Agres, “AI-Based Affective Music Generation Systems: A Review of Methods, and Challenges,” Cornell University, 2023. doi: 10.48550/arXiv.2301.06890.
A. Kadhim Ali, A. Mohsin Abdullah, and S. Fawzi Raheem, “Impact the Classes’ Number on the Convolutional Neural Networks Performance for Image Classification,” 2023. doi: https://doi.org/10.62527/ijasce.5.2.132.
F. Miladiyenti, F. Rozi, W. Haslina, and D. Marzuki, “Incorporating Mobile-based Artificial Intelligence to English Pronunciation Learning in Tertiary-level Students: Developing Autonomous Learning,” 2022. doi: https://doi.org/10.62527/ijasce.4.3.92.
R. Bansal, “Read it to me: An emotionally aware Speech Narration Application,” Cornell University, 2022. doi: 10.48550/arXiv.2209.02785.
K. Agres, A. Dash, and P. Chua, “AffectMachine-Classical: A novel system for generating affective classical music,” Cornell University, 2023. doi: 10.48550/arXiv.2304.04915.
K. Zhou, B. Şişman, R. Rana, B. W. Schuller, and H. Li, “Speech Synthesis with Mixed Emotions,” Cornell University, 2022. doi: 10.48550/arXiv.2208.05890.
Y. Lei, S. Yang, X. Wang, and L. Xie, “MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis,” Cornell University, 2022. doi: 10.48550/arXiv.2201.06460.
Y. Lei, S. Yang, and L. Xie, “Fine-grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis,” Cornell University, 2020. doi: 10.48550/arXiv.2011.08477.
A. Scarlatos, “Sonispace: a simulated-space interface for sound design and experimentation,” Cornell University, 2020. doi: 10.48550/arXiv.2009.14268.
S. Torresin et al., “Acoustics for Supportive and Healthy Buildings: Emerging Themes on Indoor Soundscape Research,” Multidisciplinary Digital Publishing Institute, vol. 12, no. 15, pp. 6054–6054, 2020, doi: 10.3390/su12156054.
E. Easthope, “SnakeSynth: New Interactions for Generative Audio Synthesis,” Cornell University, 2023. doi: 10.48550/arXiv.2307.05830.
S. Afzal, H. Khan, I. Khan, and M. J. Piran, “A Comprehensive Survey on Affective Computing; Challenges, Trends, Applications, and Future Directions,” Cornell University, 2023. doi: 10.48550/arXiv.2305.07665.
K. Makantasis, A. Liapis, and G. N. Yannakakis, “The Pixels and Sounds of Emotion: General-Purpose Representations of Arousal in Games,” Institute of Electrical and Electronics Engineers, vol. 14, no. 1, pp. 680–693, 2023, doi: 10.1109/taffc.2021.3060877.
A. E. Ali, “Designing for Affective Augmentation: Assistive, Harmful, or Unfamiliar?,” Cornell University, 2023. doi: 10.48550/arXiv.2303.18038.
D. Harley, A. P. Tarun, B. J. Stinson, T. Tibu, and A. Mazalek, “Playing by Ear: Designing for the Physical in a Sound-Based Virtual Reality Narrative,” 2021. doi: 10.1145/3430524.3440635.
A. Kern, W. Ellermeier, and L. Jost, “The influence of mood induction by music or a soundscape on presence and emotions in a virtual reality park scenario,” 2020. doi: 10.1145/3411109.3411129.
T. Zhou, Y. Wu, Q. Meng, and J. Kang, “Influence of the Acoustic Environment in Hospital Wards on Patient Physiological and Psychological Indices,” Frontiers Media, vol. 11, 2020, doi: 10.3389/fpsyg.2020.01600.
A. Kern and W. Ellermeier, “Audio in VR: Effects of a Soundscape and Movement-Triggered Step Sounds on Presence,” Frontiers Media, vol. 7, 2020, doi: 10.3389/frobt.2020.00020.
D. Eckhoff, R. Ng, and Á. Cassinelli, “Virtual Reality Therapy for the Psychological Well-being of Palliative Care Patients in Hong Kong,” Cornell University, 2022. doi: 10.48550/arXiv.2207.
G. Nie and Y. Zhan, “A Review of Affective Generation Models,” Cornell University, 2022. doi: 10.48550/arXiv.2202.10763.
Z. Yang, X. Jing, A. Triantafyllopoulos, M. Song, I. Aslan, and B. W. Schuller, “An Overview & Analysis of Sequence-to-Sequence Emotional Voice Conversion,” Cornell University, 2022. doi: 10.48550/arXiv.2203.15873.
H. Hung, J. Ching, S. Doh, N. Kim, J. Nam, and Y. Yang, “EMOPIA: A Multi-Modal Pop Piano Dataset For Emotion Recognition and Emotion-based Music Generation,” Cornell University, 2021. doi: 10.48550/arXiv.2108.01374.
S. Cunningham, H. Ridley, J. Weinel, and R. Picking, “Supervised machine learning for audio emotion recognition,” Springer Science+Business Media, vol. 25, no. 4, pp. 637–650, 2020, doi: 10.1007/s00779-020-01389-0.
K. Matsumoto, S. Hara, and M. Abe, “Speech-Like Emotional Sound Generation Using WaveNet,” Institute of Electronics, Information and Communication Engineers, vol. E105.D, no. 9, pp. 1581–1589, 2022, doi: 10.1587/transinf.2021edp7236.
X. Ji et al., “Audio-Driven Emotional Video Portraits,” Cornell University, 2021. doi: 10.48550/arXiv.2104.07452.
N. R. Prabhu, B. Lay, S. Welker, N. Lehmann‐Willenbrock, and T. Gerkmann, “EMOCONV-DIFF: Diffusion-based Speech Emotion Conversion for Non-parallel and In-the-wild Data,” Cornell University, 2023. doi: 10.48550/arXiv.2309.07828.
M. Liuni, L. Ardaillon, L. Bonal, L. Seropian, and J. Aucouturier, “ANGUS: Real-time manipulation of vocal roughness for emotional speech transformations,” Cornell University, 2020. doi: 10.48550/arXiv.2008.11241.
V. Isnard, T. Nguyen, and I. Viaud‐Delmon, “Exploiting Voice Transformation to Promote Interaction in Virtual Environments,” 2021. doi: 10.1109/vrw52623.2021.00021.
M. N. Dar, M. U. Akram, S. G. Khawaja, and A. N. Pujari, “CNN and LSTM-based emotion charting using physiological signals,” Sensors (Basel), vol. 20, no. 16, p. 4551, 2020, doi: 10.3390/s20164551.
H. Cui, A. Liu, X. Zhang, X. Chen, K. Wang, and X. Chen, “EEG-based emotion recognition using an end-to-end regional-asymmetric convolutional neural network,” Knowl Based Syst, vol. 205, no. 106243, p. 106243, 2020, doi: 10.1016/j.knosys.2020.106243.
M. Behnke, M. Buchwald, A. Bykowski, S. Kupiński, and L. D. Kaczmarek, “Psychophysiology of positive and negative emotions, dataset of 1157 cases and 8 biosignals,” Sci Data, vol. 9, no. 1, p. 10, 2022, doi: 10.1038/s41597-021-01117-0.
M. S. Khan, N. Salsabil, M. G. R. Alam, M. A. A. Dewan, and M. Z. Uddin, “CNN-XGBoost fusion-based affective state recognition using EEG spectrogram image analysis,” Sci Rep, vol. 12, no. 1, p. 14122, 2022, doi: 10.1038/s41598-022-18257-x.
P. L. Neves, J. Fornari, and J. Florindo, “Generating music with sentiment using Transformer-GANs,” Cornell University, Jan. 2022, doi: 10.48550/arXiv.2212.
S. Ji and X. Yang, “Emotion-Conditioned Melody Harmonization with Hierarchical Variational Autoencoder,” Cornell University, Jan. 2023, doi: 10.48550/arxiv.2306.03718.
D. Andreoletti, L. Luceri, T. Leidi, A. Peternier, and S. Giordano, “The Virtual Emotion Loop: Towards Emotion-Driven Services via Virtual Reality,” Cornell University, Jan. 2021, doi: 10.48550/arxiv.2102.13407.
S. H. Paplu, C. Mishra, and K. Berns, “Real-time Emotion Appraisal with Circumplex Model for Human-Robot Interaction,” Cornell University, Jan. 2022, doi: 10.48550/arxiv.2202.09813.
R. Nandy, K. Nandy, and S. T. Walters, “Relationship Between Valence and Arousal for Subjective Experience in a Real-life Setting for Supportive Housing Residents: Results From an Ecological Momentary Assessment Study,” Jan. 2023, doi: 10.2196/34989.
S. N. Chennoor, B. R. K. Madhur, M. Ali, and T. K. Kumar, “Human Emotion Detection from Audio and Video Signals,” Cornell University, Jan. 2020, doi: 10.48550/arxiv.2006.11871.
M. Singh and Y. Fang, “Emotion Recognition in Audio and Video Using Deep Neural Networks,” Cornell University, Jan. 2020, doi: 10.48550/arxiv.2006.08129.
K. Zhou, B. Şişman, R. Rana, B. W. Schuller, and H. Li, “Emotion Intensity and its Control for Emotional Voice Conversion,” Institute of Electrical and Electronics Engineers, Jan. 2023, doi: 10.1109/taffc.2022.3175578.
H. Tang, X. Zhang, J. Wang, N. Cheng, and J. Xiao, “EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis,” Cornell University, Jan. 2023, doi: 10.48550/arxiv.2306.00648.
W. Peng, Y. Hu, Y. Xie, L. Xing, and Y. Sun, “CogIntAc: Modeling the Relationships between Intention, Emotion and Action in Interactive Process from Cognitive Perspective,” Cornell University, Jan. 2022, doi: 10.48550/arxiv.2205.03540.
E. Osuna, L.-F. Rodríguez, and J. O. Gutiérrez-García, “Toward integrating cognitive components with computational models of emotion using software design patterns,” Elsevier BV, Jan. 2021, doi: 10.1016/j.cogsys.2020.10.004.
K. Opong-Mensah, “Simulation of Human and Artificial Emotion (SHArE),” Oct. 2023. [Online]. Available: https://arxiv.org/pdf/2011.02151.pdf
G. Zhang and et al., “iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre,” Cornell University, Jan. 2022, doi: 10.48550/arxiv.2206.14866.
A. Vinay and A. Lerch, “Evaluating generative audio systems and their metrics,” Cornell University, Jan. 2022, doi: 10.48550/arxiv.2209.00130.
H. Mo, S. Ding, and S. C. Hui, “A Multimodal Data-driven Framework for Anxiety Screening,” Cornell University, Jan. 2023, doi: 10.48550/arxiv.2303.09041.
F. Yan, N. Wu, A. M. Iliyasu, K. Kawamoto, and K. Hirota, “Framework for identifying and visualising emotional atmosphere in online learning environments in the COVID-19 Era,” Springer Science+Business Media, Jan. 2022, doi: 10.1007/s10489-021-02916-z.
R. Habibi, J. Pfau, J. Holmes, and M. S. El‐Nasr, “Empathetic AI for Empowering Resilience in Games,” Cornell University, Jan. 2023, doi: 10.48550/arxiv.2302.09070.