搜尋專利授權區
關鍵字
選單
專利授權區


專利授權區
專利名稱(中) 於非對稱策略架構下以階層式強化學習訓練主策略的方法
專利名稱(英) Master Policy Training Method Of Hierarchical Reinforcement Learning With Asymmetrical Policy Architecture
專利家族 中華民國:I835638
美國:2023-0362196(公開號)
專利權人 國立清華大學 100.00%
發明人 李濬屹
技術領域 資訊工程,電子電機
專利摘要(中)
本發明包括以下步驟:讀取一主策略、複數次策略和一環境狀態;其中各該次策略具有不同的推理成本(inference cost);使用該主策略以選擇該些次策略的其中一者為一選定次策略;根據該選定次策略產生至少一動作訊號;施行該至少一動作訊號於一動作執行單元;從環境偵測得知至少一回饋訊號,此回饋訊號對應該動作執行單元執行該至少一動作訊號後的至少一回饋反應;根據該至少一回饋訊號和該選定次策略的一推理成本計算該主策略的一主回饋訊號,根據該主回饋訊號訓練該主策略是否應該選擇該選定次策略,以降低深度神經網路的推理成本且輸出令人滿意的結果。
專利摘要(英)
The present invention includes the following steps: loading a master policy, a plurality of sub-policies, and environment data; wherein each of the sub-policies has a different inference cost; selecting one of the sub-policies as a selected sub-policy by using the master policy; generating at least one action signal according to the selected sub-policy; applying the at least one action signal to an action executing unit; detecting at least one reward signal from an environment, wherein the at least one reward signal corresponds to at least one reaction of the action executing unit executing the at least one action signal; calculating a master reward signal of the master policy according to the at least one reward signal and an inference cost from the selected sub-policy; training the master policy to decide whether to select the selected sub-policy by using the master reward signal, decreasing inference cost of a deep neural network model and outputting a satisfying result.
聯絡資訊
承辦人姓名 李曉琪
承辦人電話 03-5715131 #31061
承辦人Email hsiaochi@mx.nthu.edu.tw
我有興趣 BACK