基于逆向强化学习的装船时堆场翻箱智能决策

doi:10.11908/j.issn.0253-374x.21021

首页 > 过刊浏览>2021年第49卷第10期 >1417-1425. DOI:10.11908/j.issn.0253-374x.21021

基于逆向强化学习的装船时堆场翻箱智能决策
DOI:
                        10.11908/j.issn.0253-374x.21021
                    
CSTR:
                        [cstr]
                    
作者:
                        张艳伟张艳伟
武汉理工大学 交通与物流工程学院，湖北 武汉 430063
在期刊界中查找
在百度中查找
在本站中查找
蔡梦蝶蔡梦蝶
武汉理工大学 交通与物流工程学院，湖北 武汉 430063
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:武汉理工大学 交通与物流工程学院，湖北 武汉 430063
作者简介:张艳伟（1977—），女，副教授，工学博士，主要研究方向为智慧港航、港口物流、智能决策与算法。 E-mail：zywtg@whut.edu.cn
通讯作者:
中图分类号:U695.22
基金项目:国家自然科学基金（60904067）

An Inverse Reinforcement Learning Method for Container Relocation in Container Terminal Yard During Loading

Author:

ZHANG Yanwei
ZHANG Yanwei
School of Transportation and Logistics Engineering， Wuhan University of Technology， Wuhan 430063， China
在期刊界中查找
在百度中查找
在本站中查找
CAI Mengdie
CAI Mengdie
School of Transportation and Logistics Engineering， Wuhan University of Technology， Wuhan 430063， China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

School of Transportation and Logistics Engineering， Wuhan University of Technology， Wuhan 430063， China

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

集装箱码头装船时堆场翻箱具有时序性与动态性，属于NP（non?deterministic polynomial）难问题。针对常见的顺岸式集装箱码头堆场，以最小化总翻箱次数为优化目标，考虑翻箱对装船连续性及效率的影响，基于马尔科夫决策过程构建装船时堆场翻箱模型，设计逆向强化学习算法。为验证算法的有效性，以随机决策为基准，将设计的逆向强化学习算法与码头常见规则决策、随机决策对比。结果表明，贝位堆存状态不佳时，常见的规则决策不一定优于随机决策；逆向强化学习算法可有效挖掘隐含专家经验，收敛至最小翻箱次数的概率更高，且不同堆存状态下均能更好地限制单次发箱的翻箱次数，可实现装船时堆场翻箱智能决策。

关键词:集装箱码头;堆场翻箱;智能决策;马尔科夫决策过程;逆向强化学习

Abstract:

The container relocation during loading in the terminal yard has sequential and dynamic characteristics， and belongs to the non-deterministic polynomial hard problem. This paper takes the common container terminal yard， which is parallel to the shoreline， as the research object. Considering the relocation effect on the continuity and efficiency of shipment， the model based on Markov decision processes for the container relocation in the yard during loading was proposed， with the optimization objective to minimize the total relocation times， and the algorithm based on inverse reinforcement learning was designed. To verify the effectiveness of the algorithm， taking the random decision as criterion， the inverse reinforcement learning algorithm was compared with the common rule decision-making and the random decision-making . The results show that when the initial state of the bay is unsatisfactory， the common rule decision-making is not necessarily superior to random decision-making. The inverse reinforcement learning algorithm can effectively mine and apply the expert experience， and the probability of converging to the minimum relocation times is obviously better than that of the others. In addition， it can better control the relocation times of a single loading in different state of the bay， and realize the intelligent decision-making of container relocation during loading.

Key words:container terminal;yard relocation;intelligent decision-making;Markov decision processes;inverse reinforcement learning

引用本文

张艳伟,蔡梦蝶.基于逆向强化学习的装船时堆场翻箱智能决策[J].同济大学学报(自然科学版),2021,49(10):1417~1425

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2021-01-14
最后修改日期:
录用日期:
在线发布日期: 2021-10-18
出版日期:

引用本文

分享

文章指标

历史

文章二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码