LIU Xiangwu, LIU Jiufu, XIE Hui, XU Qingwen, FAN Shenglin
To solve the problem of huge cost caused by the collision of agents with obstacles when the environmental information was unknown, the robust reinforcement path planning algorithm based on Bayesian optimization and quadrature was proposed. The grid map of the environment was established for setting environmental rewards. Bayesian optimization was used to establish Gaussian Process surrogate model for historical action sets, and the mean and variance of rewards were estimated by Gaussian Process. The upper confident bound (UCB) method was selected to balance exploration and exploitation, and the action sequence was selected to avoid over-exploration and over-exploitation. Bayesian quadrature method was used to actively learn the environment, and the uncertainty of environmental information was reduced by minimizing the expected posterior variance of Q value for avoiding collision and improving robustness. The Q table was updated iteratively, and the Q learning method was used to plan the path. The simulation experiment was carried out to compare the proposed algorithm with the classical Q-learning path planning algorithm. The results show that compared to the classical algorithm, the proposed algorithm has higher learning efficiency for the environment, smaller collision probability, faster convergence speed of the optimal path steps and more effective path planning.