To address the insufficient naturalness of timbre in audio synthesis, this study proposes a multi-stage framework integrating feature analysis and intelligent optimization. First, a dataset is constructed by synthesizing three types of sound waves, from which acoustic features are extracted. Dimensionality reduction and clustering are employed to quantify the differences between synthetic and natural sounds. Second, Bayesian optimization is applied to adaptively assign feature weights, identifying key discriminative indicators. Finally, reinforcement learning dynamically adjusts synthesis parameters (e.g., frequency, decay factor) using clustering center distance as a reward to drive synthetic timbre closer to natural sound distributions. Experimental results demonstrate that this method significantly enhances the naturalness of synthesized timbre, providing an efficient data-driven solution for audio optimization.
ZHAO et al. (Wed,) studied this question.