基于多智能体深度强化学习的高速公路可变限速协同控制方法
CSTR:
作者:
作者单位:

1.同济大学 道路与交通工程教育部重点实验室,上海 201804;2.浙江杭绍甬高速公路有限公司,浙江 杭州 310000

作者简介:

余荣杰,教授,博士生导师,工学博士,主要研究方向为道路交通事故风险辨识与主动管控。 E-mail: yurongjie@tongji.edu.cn

通讯作者:

章锐辞,硕士生,主要研究方向为主动交通安全。E-mail: zhang_ruici@tongji.edu.cn

中图分类号:

U491.5

基金项目:

浙江省交通运输厅科技计划项目(2021047)


Coordinated Variable Speed Limit Control for Freeway Based on Multi-Agent Deep Reinforcement Learning
Author:
Affiliation:

1.Key Laboratory of Road and Traffic Engineering of the Ministry of Education, Tongji University, Shanghai 201804, China;2.Zhejiang Hangshaoyong Expressway Co., Ltd., Hangzhou 310000, China

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [27]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    面向高速公路多路段可变限速协同控制需求,针对高维参数空间高效训练寻优难题,提出了应用多智能体深度确定性策略梯度(MADDPG)算法的高速公路可变限速协同控制方法。区别于既有研究的单个智能体深度确定性策略梯度(DDPG)算法,MADDPG将每个管控单元抽象为具备Actor-Critic强化学习架构的智能体,在算法训练过程中共享各智能体的状态、动作信息,使得各智能体具备推测其余智能体控制策略的能力,进而实现多路段协同控制。基于开源仿真软件SUMO,在高速公路典型拥堵场景对提出的控制方法开展管控效果验证。实验结果表明,提出的MADDPG算法降低了拥堵持续时间和路段运行速度标准差,分别减少69.23 %、47.96 %,可显著提高交通效率与安全。对比单智能体DDPG算法,MADDPG可节约50 %的训练时间并提高7.44 %的累计回报值,多智能体算法可提升协同控制策略的优化效率。进一步,为验证智能体间共享信息的必要性,将MADDPG与独立多智能体DDPG(IDDPG)算法进行对比:相较于IDDPG,MADDPG可使拥堵持续时间、速度标准差均值的改善提升11.65 %、19.00 %。

    Abstract:

    In order to meet the needs of coordinated variable speed limit (VSL) control of multi-segment on freeways, and to solve the problem of efficient training optimization in high-dimensional parameter space, a multi-agent deep deterministic policy gradient (MADDPG) algorithm is proposed for freeway VSL control. Different from the existing research on the single agent Deep Deterministic Policy Gradient (DDPG) algorithm, MADDPG abstracts each control unit as an agent with Actor-Critic reinforcement learning architecture, and shares each agent in the algorithm training process. The state and action information of the agents enable each agent to have the ability to infer the control strategies of other agents, thereby realizing multi-segment coordinated control. Based on the open source simulation software SUMO, the effect of the control method proposed is verified in a typical freeway traffic jam scenario. The experimental results show that the MADDPG algorithm proposed reduces the traffic jam duration and the speed standard deviation by 69.23 % and 47.96 % respectively, which can significantly improve the traffic efficiency and safety. Compared with the single-agent DDPG algorithm, MADDPG can save 50 % of the training time and increase the cumulative return value by 7.44 %. The multi-agent algorithm can improve the optimization efficiency of the collaborative control strategy. Further, in order to verify the necessity of sharing information among agents, MADDPG is compared with the independent DDPG (IDDPG) algorithm: It is shown that MADDPG can improve the traffic jam duration and speed standard deviation by 11.65 %, 19.00 % respectively.

    参考文献
    [1] KEJUN L, MEIPING Y, JIANLONG Z, et al. Model predictive control for variable speed limit in freeway work zone[C]//2008 27th Chinese Control Conference. Kunming: IEEE, 2008: 488-493.
    [2] 包杰. 基于多源数据的城市路网交通事故风险研究[D]. 南京:东南大学,2019.BAO Jie. Research on crash risk of urban road network based on multi-source data[D]. Nanjing: Southeast University. 2014.
    [3] HARBORD B. M25 controlled motorway-results of the first two years[C]//9th International Conference on Road Transport Information and Control. [S.l.]: IET Digital Library, 1998: 149-154.
    [4] MIRSHAHI M, OBENBERGER J, FUHS C A, et al. Active traffic management: The next step in congestion management[R]. [S.l.]: United States. Federal Highway Administration, 2007.
    [5] HOOGENDOORN S P, DAAMEN W, HOOGENDOORN R G, et al. Assessment of dynamic speed limits on freeway A20 near Rotterdam, Netherlands[J]. Transportation Research Record, 2013, 2380(1): 61.
    [6] 李志斌. 快速道路可变限速控制技术[D]. 南京:东南大学,2014.LI Zhibin. Variable speed limit technique on expressways[D]. Nanjing: Southeast University, 2014.
    [7] HAN Y, YU H, LI Z, et al. An optimal control-based vehicle speed guidance strategy to improve traffic safety and efficiency against freeway jam waves[J]. Accident Analysis & Prevention, 2021, 163: 106429.
    [8] LI Z, ZHU X, LIU X, et al. Model-based predictive variable speed limit control on multi-lane freeways with a line of connected automated vehicles[C]//2019 IEEE Intelligent Transportation Systems Conference (ITSC). Edmonton: IEEE, 2019: 1989-1994.
    [9] HAN Y, HEGYI A, YUAN Y, et al. Resolving freeway jam waves by discrete first-order model-based predictive control of variable speed limits[J]. Transportation Research Part C: Emerging Technologies, 2017, 77: 405.
    [10] LU X Y, SHLADOVER S. MPC-based variable speed limit and its impact on traffic with V2I type ACC[C]//2018 21st International Conference on Intelligent Transportation Systems (ITSC). Edmonton: IEEE, 2018: 3923-3928.
    [11] YU R, ABDEL-ATY M. An optimal variable speed limits system to ameliorate traffic safety risk[J]. Transportation Research Part C: Emerging Technologies, 2014, 46: 235.
    [12] WANG C, XU Y, ZHANG J, et al. Integrated traffic control for freeway recurrent bottleneck based on deep reinforcement learning[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(9): 15522.
    [13] LI Z, LIU P, XU C, et al. Reinforcement learning-based variable speed limit control strategy to reduce traffic congestion at freeway recurrent bottlenecks[J]. IEEE Transactions on Intelligent Transportation Systems, 2017, 18(11): 3204.
    [14] WU Y, TAN H, QIN L, et al. Differential variable speed limits control for freeway recurrent bottlenecks via deep actor-critic algorithm[J]. Transportation Research Part C: Emerging Technologies, 2020, 117: 102649.
    [15] KE Z, LI Z, CAO Z, et al. Enhancing transferability of deep reinforcement learning-based variable speed limit control using transfer learning[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 22(7): 4684.
    [16] ROY A, HOSSAIN M, MUROMACHI Y. A deep reinforcement learning-based intelligent intervention framework for real-time proactive road safety management[J]. Accident Analysis & Prevention, 2022, 165: 106512.
    [17] CHU T, WANG J, CODECà L, et al. Multi-agent deep reinforcement learning for large-scale traffic signal control[J]. IEEE Transactions on Intelligent Transportation Systems, 2019, 21(3): 1086.
    [18] SARTORETTI G, KERR J, SHI Y, et al. Primal: Pathfinding via reinforcement and imitation multi-agent learning[J]. IEEE Robotics and Automation Letters, 2019, 4(3): 2378.
    [19] GUILLEN-PEREZ A, CANO M D. Multi-agent deep reinforcement learning to manage connected autonomous vehicles at tomorrows intersections[J]. IEEE Transactions on Vehicular Technology, 2022, 71(7): 7033.
    [20] QIE H, SHI D, SHEN T, et al. Joint optimization of multi-UAV target assignment and path planning based on multi-agent reinforcement learning[J]. IEEE Access, 2019, 7: 146264.
    [21] WU T, ZHOU P, LIU K, et al. Multi-agent deep reinforcement learning for urban traffic light control in vehicular networks[J]. IEEE Transactions on Vehicular Technology, 2020, 69(8): 8243.
    [22] LI Z, YU H, ZHANG G, et al. Network-wide traffic signal control optimization using a multi-agent deep reinforcement learning[J]. Transportation Research Part C: Emerging Technologies, 2021, 125: 103059.
    [23] YU R, ABDEL-ATY M. Utilizing support vector machine in real-time crash risk evaluation[J]. Accident Analysis & Prevention, 2013, 51: 252.
    [24] 周召敏. T-CPS下考虑低速车影响的交通拥堵特征分析及抑制策略研究[D]. 重庆:重庆大学, 2020.ZHOU Zhaomin. Research on low-speed-vehicles-based congestion characteristics and congestion control methods in T-CPS[D]. Chongqing: Chongqing University, 2020.
    [25] 全国人大常委会. 中华人民共和国道路交通安全法[M]. 北京:全国人大常委会, 2021.The Standing Committee of the National People’s Congress. Road traffic safety law of the People’s Republic of China[M]. Beijing: The Standing Committee of the National People’s Congress, 2021
    [26] LOWE R, WU Y I, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). [S.l.]: Curran Associates Inc., 2017: 6379-6390.
    [27] ZHANG Z, ZHENG J, ZOU C. Multi-empirical discriminant multi-agent reinforcement learning algorithm based on intra-group evolution[C]// 2019 2nd International Symposium on Big Data and Applied Statistics. [S.l.]: IOP Publishing, 2020: 012038-012053.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

余荣杰,徐灵,章锐辞.基于多智能体深度强化学习的高速公路可变限速协同控制方法[J].同济大学学报(自然科学版),2024,52(7):1089~1098

复制
分享
文章指标
  • 点击次数:154
  • 下载次数: 256
  • HTML阅读次数: 1451
  • 引用次数: 0
历史
  • 收稿日期:2022-10-18
  • 在线发布日期: 2024-07-30
文章二维码