Applications and Challenges of Reinforcement Learning in Autonomous Driving Technology
CSTR:
Author:
Affiliation:

1.School of Civil Engineering and Transportation, South China University of Technology, Guangzhou 510640, China;2.School of Information Engineering, Chang’an University, Xi’an 710064, China;3.School of Vehicle and Mobility, Tsinghua University, Beijing 100084, China

Clc Number:

U461

  • Article
  • | |
  • Metrics
  • |
  • Reference [83]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    This paper provides a comprehensive overview and summary of the application of reinforcement learning in the field of autonomous driving. First, an introduction to the principles and development of reinforcement learning is presented. Following that, the autonomous driving technology system and the fundamentals required for the application of reinforcement learning in this field are systematically introduced. Subsequently, application cases of reinforcement learning in autonomous driving are described according to different directions of use. Finally, the current challenges of applying reinforcement learning in the field of autonomous driving are deeply analyzed, and several prospects are proposed.

    Reference
    [1] 张雷,沈国琛,秦晓洁,等.智能网联交通系统中的信息物理映射与系统构建[J].同济大学学报(自然科学版),2022,50(1):79.ZHANG Lei, SHEN Guochen, QIN Xiaojie, et al. Information physical mapping and system construction of intelligent network transportation[J].Journal of Tongji University(Natural Science),2022,50(1):79.
    [2] 林泓熠,刘洋,李深,等.车路协同系统关键技术研究进展[J].华南理工大学学报(自然科学版),2023,51(10):46.LIN Hongyi, LIU Yang, LI Shen, et al. Research progress on key technologies in the cooperative vehicle infrastructure system[J]. Journal of South China University of Technology (Natural Science),2023,51(10):46.
    [3] LIU Y, WU F, LIU Z, et al. Can language models be used for real-world urban-delivery route optimization?[J]. The Innovation, 2023, 4(6):1.
    [4] LIU Y, LYU C, ZHANG Y, et al. DeepTSP: deep traffic state prediction model based on large-scale empirical data[J]. Communications in Transportation Research, 2021, 1: 100012.
    [5] 刘兵,王锦锐,谢济铭,等.微观轨迹数据驱动的交织区换道概率分布模型[J].汽车安全与节能学报,2022,13(2):333.LIU Bing, WANG Jinrui, XIE Jiming, et al. Microscopic trajectory data-driven probability distribution model for weaving area of channel change [J]. Journal of Automotive Safety and Energy, 2022, 13 (2): 333.
    [6] YANG Z, CHAI Y, ANGUELOV D, et al. Surfelgan: synthesizing realistic sensor data for autonomous driving[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 11118-11127.
    [7] HU Y, YANG J, CHEN L, et al. Planning-oriented autonomous driving[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2023: 17853-17862.
    [8] MINSKY, LEE M. Theory of neural-analog reinforcement systems and its application to the brain-model problem[R]. Princeton: PrincetonUniversity, 1954.
    [9] BROWN T, MANN B, RYDER N, et al. Language models are few-shot learners[J]. Advances in Neural Information Processing Systems, 2020, 33: 18771.
    [10] FENG S, SUN H, YAN X, et al. Dense reinforcement learning for safety validation of autonomous vehicles[J]. Nature, 2023, 615(7953): 620.
    [11] BELLMAN R. A Markovian decision process[J]. Indiana University Mathematics Journal, 1957, 6(4):15.
    [12] HOWARD R A. Dynamic programming and Markov process[M]. Cambridge: MIT Press, 1960.
    [13] SUTTON R S. Learning to predict by the methods of temporal differences[J]. Machine Learning, 1988, 3: 9.
    [14] WATKINS C J C H. Learning from delayed rewards[D]. Cambridge:University of Cambridge, 1989.
    [15] RUMMERY G A, NIRANJAN M. On-line Q-learning using connectionist systems[R]. Cambridge: University of Cambridge, 1994.
    [16] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529.
    [17] SCHULMAN J, LEVINE S, ABBEEL P, et al. Trust region policy optimization[C]//International Conference on Machine Learning. Cambridge : JMLR, 2015: 1889-1897.
    [18] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[J]. arXiv preprint arXiv:1707.06347, 2017.
    [19] HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//International Conference on Machine Learning. Cambridge: JMLR, 2018: 1861-1870.
    [20] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[J]. arXiv preprint arXiv:1509.02971, 2015.
    [21] FUJIMOTO S, HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[C]//International Conference on Machine Learning. Cambridge: JMLR, 2018: 1587-1596.
    [22] ALEXIADIS V, COLYAR J, HALKIAS J, et al. The next generation simulation program[J]. Institute of Transportation Engineers, 2004, 74(8): 22.
    [23] GEIGER A, LENZ P, STILLER C, et al. Vision meets robotics: the kitti dataset[J]. The International Journal of Robotics Research, 2013, 32(11): 1231.
    [24] CORDTS M, OMRAN M, RAMOS S, et al. The cityscapes dataset for semantic urban scene understanding[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 3213-3223.
    [25] KRAJEWSKI R, BOCK J, KLOEKER L, et al. The highd dataset: a drone dataset of naturalistic vehicle trajectories on german highways for validation of highly automated driving systems[C]//2018 21st International Conference on Intelligent Transportation Systems (ITSC). Piscataway: IEEE, 2018: 2118-2125.
    [26] HUANG X, CHENG X, GENG Q, et al. The apolloscape dataset for autonomous driving[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Piscataway: IEEE, 2018: 954-960.
    [27] YU F, CHEN H, WANG X, et al. Bdd100k: a diverse driving dataset for heterogeneous multitask learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 2636-2645.
    [28] ZHAN W, SUN L, WANG D, et al. Interaction dataset: an international, adversarial and cooperative motion dataset in interactive driving scenarios with semantic maps[J]. arXiv preprint arXiv:1910.03088, 2019.
    [29] WANG J H, FU T, XUE J T, et al. Realtime wide-area vehicle trajectory tracking using millimeter-wave radar sensors and the open TJRD TS dataset[J]. International Journal of Transportation Science and Technology, 2023, 12(1): 273.
    [30] BOCK J, KRAJEWSKI R, MOERS T, et al. The ind dataset: a drone dataset of naturalistic road user trajectories at german intersections[C]//2020 IEEE Intelligent Vehicles Symposium (IV). Piscataway: IEEE, 2020: 1929-1934.
    [31] KRAJEWSKI R, MOERS T, BOCK J, et al. The round dataset: a drone dataset of road user trajectories at roundabouts in germany[C]//2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC). Piscataway: IEEE, 2020: 1-6.
    [32] CAESAR H, BANKITI V, LANG A H, et al. nuscenes: a multimodal dataset for autonomous driving[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 11621-11631.
    [33] SUN P, KRETZSCHMAR H, DOTIWALLA X, et al. Scalability in perception for autonomous driving: Waymo open dataset[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 2446-2454.
    [34] HOUSTON J, ZUIDHOF G, BERGAMINI L, et al. One thousand and one hours: self-driving motion prediction dataset[C]//Conference on Robot Learning. Cambridge: JMLR, 2021: 409-418.
    [35] BARNES D, GADD M, MURCUTT P, et al. The oxford radar robotcar dataset: a radar extension to the oxford robotcar dataset[C]//2020 IEEE International Conference on Robotics and Automation (ICRA). Piscataway: IEEE, 2020: 6433-6438.
    [36] YU H, LUO Y, SHU M, et al. Dair-v2x: a large-scale dataset for vehicle-infrastructure cooperative 3d object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2022: 21361-21370.
    [37] SHI X, ZHAO D, YAO H, et al. Video-based trajectory extraction with deep learning for High-Granularity Highway Simulation (HIGH-SIM)[J]. Communications in Transportation Research, 2021, 1: 100014.
    [38] XIAO P, SHAO Z, HAO S, et al. Pandaset: advanced sensor suite dataset for autonomous driving[C]//2021 IEEE International Intelligent Transportation Systems Conference (ITSC). Piscataway: IEEE, 2021: 3095-3101.
    [39] MOERS T, VATER L, KRAJEWSKI R, et al. The exiD dataset: a real-world trajectory dataset of highly interactive highway scenarios in Germany[C]//2022 IEEE Intelligent Vehicles Symposium (Ⅳ). Piscataway: IEEE, 2022: 958-964.
    [40] BURNETT K, YOON D J, WU Y, et al. Boreas: a multi-season autonomous driving dataset[J]. The International Journal of Robotics Research, 2023, 42(1/2): 33.
    [41] LOPEZ P A, BEHRISCH M, BIEKER-WALZ L, et al. Microscopic traffic simulation using sumo[C]//2018 21st International Conference on Intelligent Transportation Systems (ITSC). Piscataway: IEEE, 2018: 2575-2582.
    [42] DOSOVITSKIY A, ROS G, CODEVILLA F, et al. CARLA: an open urban driving simulator[C]//Conference on Robot Learning. Cambridge: JMLR, 2017: 1-16.
    [43] FAN H Y ,ZHU F ,LIU C C , et al. Baidu apollo em motion planner[J]. arXiv preprint arXiv:1807.08048, 2018.
    [44] WU C, KREIDIEH A, PARVATE K, et al. Flow: architecture and benchmarking for reinforcement learning in traffic control[J]. arXiv preprint arXiv:1710.05465, 2017, 10.
    [45] WYMANN B, ESPIé E, GUIONNEAU C, et al. Torcs, the open racing car simulator[EB/OL].[2020-02-06]. https://onsite.run.
    [46] SUN J, TIAN Y. OnSite[EB/OL]. [2022-08-30]. https://onsite.run.
    [47] SHAH S, DEY D, LOVETT C, et al. Airsim: high-fidelity visual and physical simulation for autonomous vehicles[C]//Field and Service Robotics: Results of the 11th International Conference. Cham: Springer International Publishing, 2018: 621-635.
    [48] EDOUARD L. An environment for autonomous driving decision-making [EB/OL]. [2022-08-30]. https://github.com/eleurent/ highway- env.
    [49] LEURENT E. A collection of environments for autonomous driving and tactical decision-making tasks [EB/OL]. [2022-08-30]. https://github.com/eleurent/highway-env.
    [50] LI Q, PENG Z, FENG L, et al. Metadrive: composing diverse driving scenarios for generalizable reinforcement learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(3): 3461.
    [51] 冯耀,景首才,惠飞,等.基于深度强化学习的智能网联车辆换道轨迹规划方法[J].汽车安全与节能学报,2022,13(4):705.FENG Yao, JING Shoucai, HUI Fei, et al. Deep reinforcement learning-based lane-changing trajectory planning method of intelligent and connected vehicles[J]. Journal of Automotive Safety and Energy,2022,13(4):705.
    [52] ALIZADEH A, MOGHADAM M, BICER Y, et al. Automated lane change decision making using deep reinforcement learning in dynamic and uncertain highway environment[C]//2019 IEEE Intelligent Transportation Systems Conference (ITSC). Piscataway: IEEE, 2019: 1399-1404.
    [53] 李文礼,邱凡珂,廖达明,等.基于深度强化学习的高速公路换道跟踪控制模型[J].汽车安全与节能学报,2022,13(4):750.LI Wenli, QIU Fanke, LIAO Daming, et al. Highway lane change decision control model based on deep reinforcement learning[J]. Journal of Automotive Safety and Energy,2022,13(4):750.
    [54] 朱冰,蒋渊德,赵健,等.基于深度强化学习的车辆跟驰控制[J].中国公路学报,2019,32(6):53.DOI:10.19721/j.cnki.1001-7372.2019.06.005.ZHU Bing, JIANG Yuande, ZHAO Jian, et al. A car-following control algorithm based on dep reinforcement learning[J]. China Journal of Highway and Transport,2019,32(6):53.DOI:10.19721/j.cnki.1001-7372.2019.06.005.
    [55] HE Y, LIU Y, YANG L, et al. Deep adaptive control: deep reinforcement learning-based adaptive vehicle trajectory control algorithms for different risk levels[J]. IEEE Transactions on Intelligent Vehicles, 2023, 9(1):1654.
    [56] ZHAO D, XIA Z, ZHANG Q. Model-free optimal control based intelligent cruise control with hardware-in-the-loop demonstration research frontier[J]. IEEE Computational Intelligence Magazine, 2017, 12(2): 56.
    [57] 乔良,鲍泓,玄祖兴,等.基于强化学习的无人驾驶匝道汇入模型[J].计算机工程,2018,44(7):20.DOI:10.19678/j.issn.1000-3428.0050990.QIAO Liang, BAO Hong, XUAN Zuxing, et al. Autonomous driving ramp merging model based on reinforcement learning[J]. Computer Engineering,2018,44(7):20.DOI:10.19678/j.issn.1000-3428.0050990.
    [58] LUBARS J, GUPTA H, CHINCHALI S, et al. Combining reinforcement learning with model predictive control for on-ramp merging[C]//2021 IEEE International Intelligent Transportation Systems Conference (ITSC). Piscataway: IEEE, 2021: 942-947.
    [59] KIM M, SEO J, LEE M, et al. Vision-based uncertainty-aware lane keeping strategy using deep reinforcement learning[J]. Journal of Dynamic Systems, Measurement, and Control, 2021, 143(8): 084503.
    [60] LU H, LU C, YU Y, et al. Autonomous overtaking for intelligent vehicles considering social preference based on hierarchical reinforcement learning[J]. Automotive Innovation, 2022, 5(2): 195.
    [61] DESHPANDE N, SPALANZANI A. Deep reinforcement learning based vehicle navigation amongst pedestrians using a grid-based state representation[C]//2019 IEEE Intelligent Transportation Systems Conference (ITSC). Piscataway: IEEE, 2019: 2081-2086.
    [62] 欧阳卓,周思源,吕勇,等.基于深度强化学习的无信号灯交叉路口车辆控制[J].计算机科学,2022,49(3):46.OUYANG Zhuo, ZHOU Siyuan, Yong Lü , et al. DRL-based vehicle control strategy for signal-free intersections[J]. Computer Science,2022,49(3):46.
    [63] QIAO Z, MUELLING K, DOLAN J, et al. Pomdp and hierarchical options mdp with continuous actions for autonomous driving at intersections[C]//2018 21st International Conference on Intelligent Transportation Systems (ITSC). Piscataway: IEEE, 2018: 2377-2382.
    [64] LIANG X, WANG T, YANG L, et al. Cirl: controllable imitative reinforcement learning for vision-based self-driving[C]//Proceedings of the European Conference on Computer Vision (ECCV). Cham: Springer International Publishing, 2018: 584-599.
    [65] GUO Q, ANGAH O, LIU Z, et al. Hybrid deep reinforcement learning based eco-driving for low-level connected and automated vehicles along signalized corridors[J]. Transportation Research Part C: Emerging Technologies, 2021, 124: 102980.
    [66] 李江坤,邓伟文,任秉韬,等.基于场景动力学和强化学习的自动驾驶边缘测试场景生成方法[J].汽车工程,2022,44(7):976.DOI:10.19562/j.chinasae.qcgc.2022.07.004.LI Jiangkun, DENG Weiwen, REN Bingtao, et al. Automatic driving edge scene generation method based on scene dynamics and reinforcement learning[J]. Automotive Engineering,2022,44(7):976.DOI:10.19562/j.chinasae.qcgc.2022.07.004.
    [67] FENG S, YAN X, SUN H, et al. Intelligent driving intelligence test for autonomous vehicles with naturalistic and adversarial environment[J]. Nature Communications, 2021, 12(1): 748.
    [68] FENG S, FENG Y, SUN H, et al. Testing scenario library generation for connected and automated vehicles, part II: case studies[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 22(9): 5635.
    [69] CHEN B, CHEN X, WU Q, et al. Adversarial evaluation of autonomous vehicles in lane-change scenarios[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 23(8): 10333.
    [70] BARZ B, RODNER E, GARCIA Y G, et al. Detecting regions of maximal divergence for spatio-temporal anomaly detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 41(5): 1088.
    [71] SUN H, FENG S, YAN X, et al. Corner case generation and analysis for safety assessment of autonomous vehicles[J]. Transportation Research Record, 2021, 2675(11): 587.
    [72] CHEN B, CHEN X, WU Q, et al. Adversarial evaluation of autonomous vehicles in lane-change scenarios[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 23(8): 10333.
    [73] FENG S, YAN X, SUN H, et al. Intelligent driving intelligence test for autonomous vehicles with naturalistic and adversarial environment[J]. Nature Communications, 2021, 12(1): 748.
    [74] 陈泽宇,方志远,杨瑞鑫,等.基于深度强化学习的混合动力汽车能量管理策略[J].电工技术学报,2022,37(23):6157.DOI:10.19595/j.cnki.1000-6753.tces.211342.CHEN Zeyu, FANG Zhiyuan, YANG Ruixin, et al. Energy management strategy for hybrid electric vehicle based on the deep reinforcement learning method[J]. Transactions of China Electrotechnical Society,2022,37(23):6157.DOI:10.19595/j.cnki.1000-6753.tces.211342.
    [75] LIAN R, PENG J, WU Y, et al. Rule-interposing deep reinforcement learning based energy management strategy for power-split hybrid electric vehicle[J]. Energy, 2020, 197: 117297.
    [76] XIONG R, CAO J, YU Q. Reinforcement learning-based real-time power management for hybrid energy storage system in the plug-in hybrid electric vehicle[J]. Applied Energy, 2018, 211: 538.
    [77] 殷国栋,朱侗,任祖平,等.基于多Agent的电动汽车底盘智能控制系统框架[J].中国机械工程,2018,29(15):1796.YIN Guodong, ZHU Tong, REN Zuping, et al. Intelligent control system framework for multi-agent based electric vehicle chassises[J]. China Mechanical Engineering,2018,29(15):1796.
    [78] 江洪,王鹏程,李仲兴. 基于智能体理论的空气悬架车身高度智能控制系统研究[J]. 重庆理工大学学报( 自然科学) ,2019,33(4):17.JIANG Hong,WANG Pengcheng,LI Zhongxing. Research on air suspension vehicle height intelligent control system based on agent theory[J]. Journal of Chongqing University of Technology( Natural Science),2019,33(4): 17.
    [79] LI Z, CHU T, KALABI? U. Dynamics-enabled safe deep reinforcement learning: Case study on active suspension control[C]//2019 IEEE Conference on Control Technology and Applications (CCTA). Piscataway: IEEE, 2019: 585-591.
    [80] DU Y, CHEN J, ZHAO C, et al. A hierarchical framework for improving ride comfort of autonomous vehicles via deep reinforcement learning with external knowledge[J]. Computer‐Aided Civil and Infrastructure Engineering, 2023, 38(8): 1059.
    [81] 代珊珊,刘全.基于动作约束深度强化学习的安全自动驾驶方法[J].计算机科学,2021,48(9):235.DAI Shanshan, LIU Quan. Action constrained deep reinforcement learning based safe automatic driving method[J]. Computer Science,2021,48(9):235.
    [82] SHALEV-SHWARTZ S, SHAMMAH S, Safe SHASHUA A. , multi-agent, reinforcement learning for autonomous driving[J]. arXiv preprint arXiv:1610.03295, 2016.
    [83] BEWLEY A, RIGLEY J, LIU Y, et al. Learning to drive from simulation without real world labels[C]//2019 International Conference on Robotics and Automation (ICRA). Piscataway: IEEE, 2019: 4818-4824.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

HE Yixu, LIN Hongyi, LIU Yang, YANG Lan, QU Xiaobo. Applications and Challenges of Reinforcement Learning in Autonomous Driving Technology[J].同济大学学报(自然科学版),2024,52(4):520~531

Copy
Share
Article Metrics
  • Abstract:701
  • PDF: 1640
  • HTML: 1309
  • Cited by: 0
History
  • Received:August 07,2023
  • Online: April 30,2024
Article QR Code