![]() We verify our claim through a user study in which participants were asked to match input images with generated music without access to the intermediate caption and lyrics. In contrast with pixel level translation, our system retains the semantics of the input image. We train our proposed model, which we call BGT (BLIP-GPT2-TeleMelody), on two open-source datasets, one containing over 200,000 labeled images, and another containing more than 175,000 MIDI music files. We propose a method for generating music from a given image through three stages of translation, from image to caption, caption to lyrics, and lyrics to instrumental music, which forms the content to be combined with a given style. Finally, we present design implications for better supporting PwS. Results indicate that personalized practice with targeted scenarios and timely feedback from a supportive community, which was appreciated more than quantitative indicators, assisted PwS in speaking fluently, staying positive, and facing similar real-life circumstances. We further conducted a seven-day deployment study (N=11) to understand how participants utilized these key features. ![]() We then iteratively designed an online tool, CoPracTter, to support Chinese PwS practicing speaking fluency with 1) targeted stress-inducing practice scenarios, 2) real-time speech indicators, and 3) personalized timely feedback from the community. In our formative study, we found unique practices and challenges among Chinese PwS. Although prior work has explored approaches to assist PwS, they primarily focused on western contexts. It causes low self-esteem among other detrimental effects on people who stutter (PwS). Stuttering is a speech disorder influencing over 70 million people worldwide, including 13 million in China.
0 Comments
Leave a Reply. |