Safety-critical Policy Iteration Algorithm for Control under Model Uncertainty
Article ID: 4361
DOI: https://doi.org/10.30564/aia.v4i1.4361
Abstract
Keywords
Full Text:

References
[1] Beard, R.W., Saridis, G.N., Wen, J.T., 1997. Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation, Automatica. 33(12), 2159- 2177. DOI: https://doi.org/10.1016/S0005-1098(97)00128-3
[2] Vamvoudakis, K.G., Lewis, F.L., 2010. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem, Automatica. 46(5), 878-888. DOI: https://doi.org/10.1109/IJCNN.2009.5178586
[3] Lewis, F.L., Vamvoudakis, K.G., 2011. Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data. IEEE Transactions on Systems. 41(1), 14-25. http://www.derongliu.org/adp/adpcdrom/Vamvoudakis2011.pdf
[4] Kiumarsi, B., Lewis, F.L., 2015. Actor-critic-based optimal tracking for partially unknown nonlinear discrete-time systems, IEEE Transactions on Neural Networks and Learning Systems. 26(1), 140-151. DOI: https://doi.org/10.1109/TNNLS.2014.2358227
[5] Modares, H., Lewis, F.L., Naghibi-Sistani, M.B., 2014. Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems, Automatica. 50(1), 193-202. DOI: https://doi.org/10.1016/j.automatica.2013.09.043
[6] Wang, D., Liu, D., Zhang, Y., et al., 2018. Neural network robust tracking control with adaptive critic framework for uncertain nonlinear systems, Neural Networks. 97(1), 11-18. DOI: https://doi.org/10.1016/j.neunet.2017.09.005
[7] Bhasin, S., Kamalapurkar, R., Johnson, M., et al., 2013. A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems, Automatica. 49(1), 82-92. https://ncr.mae. ufl.edu/papers/auto13.pdf
[8] Gao, W., Jiang, Z., 2018. Learning-based adaptive optimal trackingcontrol of strict-feedback nonlinear systems. IEEE Transactions on Neural Networks and Learning Systems. 29(1), 2614-2624. https://ieeexplore.ieee.org/ielaam/5962385/8360119/8100742- aam.pdf
[9] Abu-Khalaf, M., Lewis, F.L., 2004. Nearly optimal state feedback control of constrained nonlinear systems using a neural networks hjb approach. Annual Reviews in Control. 28(2), 239-251. DOI: http://dx.doi.org/10.1016/j.arcontrol.2004.07.002
[10] Ames, A.D., Grizzle, J.W., Tabuada, P., 2014. Control barrier function based quadratic programs with application to adaptive cruise control. 53rd IEEE Conference on Decision and Control. pp. 6271-6278. DOI: https://doi.org/10.1109/CDC.2014.7040372
[11] Ames, A.D., Xu, X., Grizzle, J.W., et al., 2017. Control barrier function based quadratic programs for safety critical systems. IEEE Transactions on Automatic Control. 62(8), 3861-3876. DOI: https://doi.org/10.1109/TAC.2016.2638961
[12] Nguyen, Q., Sreenath, K., 2016. Exponential control barrier functions for enforcing high relative-degree safety-critical constraints. 2016 American Control Conference (ACC). pp. 322-328. DOI: https://doi.org/10.1109/ACC.2016.7524935
[13] Romdlony, M.Z., Jayawardhana, B., 2014. Uniting control Lyapunov and control barrier functions. 53rd IEEE Conference on Decision and Control. pp. 2293- 2298.DOI: https://doi.org/10.1109/CDC.2014.7039737
[14] Xu, X., Tabuada, P., Grizzle, J.W., et al., 2015. Robustness of control barrier functions for safety critical control. Analysis and Design of Hybrid Systems ADHS IFAC Papers Online. 48(27), 54-61. DOI: https://doi.org/10.1016/j.ifacol.2015.11.152
[15] Prajna, S., Rantzer, A., 2005. On the necessity of barrier certificates. 16thIFAC World Congress IFAC Proceedings. 38(1), 526-531. DOI: https://doi.org/10.3182/20050703-6-CZ-1902.00743
[16] Ames, A.D., Powell, M., 2013. Towards the unification of locomotion and manipulation through control lyapunov nctions and quadratic programs. In Control of Cyber Physical Systems. pp. 219-240. http://ames. caltech.edu/unify_ames_powell.pdf
[17] Galloway, K., Sreenath, K., Ames, A.D., et al., 2015. Torque saturation in bipedal robotic walking through control Lyapunov function-based quadratic programs. pp. 323-332. DOI: https://doi.org/10.1109/ACCESS.2015.2419630
[18] Taylor, A.J., Dorobantu, V.D., Le, H.M., et al., 2019. Episodic learning with control lyapunov functions for uncertain robotic systems. ArXiv preprint. https:// arxiv.org/abs/1903
[19] Taylor, A.J., Singletary, A., Yue, Y., et al., 2019. Learning for safety-critical control with control barrier functions. ArXiv preprint. https://arxiv.org/ abs/1912.10099
[20] Westenbroek, T., Fridovich-Keil, D., Mazumdar, E., et al., 2019. Feedback linearization for unknown systems via reinforcement learning. ArXiv preprint. https://arxiv.org/abs/1910.13272
[21] Hwangbo, J., Lee, J., Dosovitskiy, A., et al., 2019. Learning agile and dynamic motor skills for legged robots. Science Robotics. 4(26), 58-72. https://arxiv. org/abs/1901.08652
[22] Levine, S., Finn, C., Darrell, T., et al., 2016. Endtoend training of deep visuomotor policies. Learning Research. 17(1), 1532-4435. https://arxiv.org/ abs/1504.00702
[23] Bansal, S., Calandra, R., Xiao, T., et al., 2017. Goal-driven dynamics learning via Bayesian optimization. 56th Annual Conference on Decision and Control (CDC). pp. 5168-5173. DOI: https://doi.org/10.1109/CDC.2017.8264425
[24] Fisac, J.F., Akametalu, A.K., Zeilinger, M.N., et al., 2019. A general safety framework for learning-based control in uncertain robotic systems. IEEE Transactions on Automatic Control. 64(7), 2737-2752. DOI: https://doi.org/10.1109/TAC.2018.2876389
[25] Prajna, S., Jadbabaie, A., 2004. Safety verification of hybrid systems using barrier certificates. In International Workshop on Hybrid Systems: Computation and Control. Springer. 2993(1), 477-492. https://viterbi-web.usc.edu/~jdeshmuk/teaching/cs699-fm-forcps/Papers/A5.pdf
[26] Yazdani, N.M., Moghaddam, R.K., Kiumarsi, B., et al., 2020. A Safety-Certified Policy Iteration Algorithm for Control of Constrained Nonlinear Systems. IEEE Control Systems Letters. 4(3), 686-691. DOI: https://doi.org/10.1109/LCSYS.2020.2990632
[27] Lewis, F.L., Vrabie, D., Syrmos, V.L., 2012. Optimal control, 3rd Edition. John Wiley & Sons.
[28] Wang, L., Ames, A., Egerstedt, M., 2016. Safety barrier certificates for heterogeneous multi-robot systems. 2016 American Control Conference (ACC). pp. 5213-5218. DOI: https://doi.org/10.1109/ACC.2016.7526486
[29] Ames, A.D., Coogan, S., Egerstedt, M., et al., 2019. Control barrier functions: Theory and applications. In Proc 2019 European Control Conference. https:// arxiv.org/abs/1903.11199
[30] Jiang, Y., Jiang, Z., 2015. Global adaptive dynamic programming for continuous-time nonlinear systems. IEEE Transactions on Automatic Control. 60(1), 2917-2929. DOI: https://doi.org/10.1109/TAC.2015.2414811
[31] Gaspar, P., Szaszi, I., Bokor, J., 2003. Active suspension design using linear parameter varying control. International Journal of Vehicle Autonomous Systems. 1(2), 206-221. DOI: https://doi.org/10.1016/S1474-6670(17)30403-2
[32] Silver, D., Lever, G., Heess, N., et al., 2014. Deterministic policy gradient algorithms. International conference on machine learning. pp. 387-395. http:// proceedings.mlr.press/v32/silver14.pdf
[33] Papachristodoulou, A., Anderson, J., Valmorbida, G., et al., 2013. SOSTOOLS: Sum of squares optimization toolbox for MATLAB. Control and Dynamical Systems, California Institute of Technology, Pasadena. http://arxiv.org/abs/1310.4716
[34] Xu, J., Xie, L., Wang, Y., 2009. Simultaneous stabilization and robust control of polynomial nonlinear systems using SOS techniques. IEEE Transactions on Automatic Control. 54(8), 1892-1897. DOI: https://doi.org/10.1109/TAC.2009.2022108
Refbacks
- There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.