Derivative-free Optimization a Review of Algorithms and Comparison of Software Implementations
Abstract
Reinforcement learning is almost learning agent models that brand the best sequential decisions in unknown environments. In an unknown surround, the agent needs to explore the environs while exploiting the collected information, which normally forms a sophisticated problem to solve. Derivative-free optimization, meanwhile, is capable of solving sophisticated problems. It commonly uses a sampling-and-updating framework to iteratively improve the solution, where exploration and exploitation are likewise needed to exist well balanced. Therefore, derivative-complimentary optimization deals with a similar cadre issue equally reinforcement learning, and has been introduced in reinforcement learning approaches, under the names of learning classifier systems and neuroevolution/evolutionary reinforcement learning. Although such methods take been adult for decades, recently, derivative-free reinforcement learning exhibits attracting increasing attention. Nevertheless, recent survey on this topic is still lacking. In this commodity, we summarize methods of derivative-complimentary reinforcement learning to engagement, and organize the methods in aspects including parameter updating, model selection, exploration, and parallel/distributed methods. Moreover, nosotros hash out some current limitations and possible future directions, hoping that this commodity could bring more attentions to this topic and serve as a catalyst for developing novel and efficient approaches.
References
-
Sutton R S, Barto A Thousand. Reinforcement Learning: An Introduction. Cambridge, Massachusetts: MIT Press, 1998
-
Wiering K, Van Otterlo M. Reinforcement Learning: State-of-the-Art. Berlin, Heidelberg: Springer, 2012
-
Dietterich T G. Car learning research: iv current directions. Bogus Intelligence Magazine, 1997, 18(iv): 97–136
-
Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M Yard, Graves A, Riedmiller 1000, Fidjeland A Yard, Ostrovski G. Man-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529–533
-
Silvery D, Huang A, Maddison C J, Guez A, Sifre L, Driessche G V D, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M. Mastering the game of Go with deep neural networks and tree search. Nature, 2016, 529(7587): 484–489
-
Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai Thou, Guez A, Lanctot 1000, Sifre Fifty, Kumaran D, Graepel T, Lillicrap T, Simonyan Chiliad, Hassabis D. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 2018, 362(6419): 1140–1144
-
Abbeel P, Coates A, Quigley M, Ng A Y. An application of reinforcement learning to aerobatic helicopter flight. In: Proceedings of the 19th International Conference on Neural Information Processing Systems. 2006, 1–viii
-
Zoph B, Le Q V. Neural compages search with reinforcement learning. In: Proceedings of the 5th International Conference on Learning Representations. 2017
-
Huang C, Lucey S, Ramanan D. Learning policies for adaptive tracking with deep characteristic cascades. In: Proceedings of the IEEE International Conference on Calculator Vision. 2017, 105–114
-
Yu Fifty, Zhang W, Wang J, Yu Y. SeqGAN: sequence generative adversarial nets with policy gradient. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017, 2852–2858
-
Wang Y C, Usher J M. Application of reinforcement learning for agent-based production scheduling. Engineering Applications of Artificial Intelligence, 2005, 18(i): 73–82
-
Choi J J, Laibson D, Madrian B C, Metrick A. Reinforcement learning and savings behavior. The Journal of Finance, 2009, 64(vi): 2515–2534
-
Shi J C, Yu Y, Da Q, Chen S Y, Zeng A. Virtual-taobao: virtualizing real-world online retail environment for reinforcement learning. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019, 4902–4909
-
Boyan J A, Littman M Fifty. Package routing in dynamically changing networks: a reinforcement learning approach. In: Proceedings of the 6th International Briefing on Neural Information Processing Systems. 1993, 671–678
-
Frank M J, Seeberger 50 C, O'reilly R C. By carrot or by stick: cognitive reinforcement learning in parkinsonism. Scientific discipline, 2004, 306(5703): 1940–1943
-
Samejima K, Ueda Y, Doya K, Kimura M. Representation of action-specific advantage values in the striatum. Scientific discipline, 2005, 310(5752): 1337–1340
-
Shalev-Shwartz S, Shamir O, Shammah Due south. Failures of gradient-based deep learning. In: Proceedings of the 34th International Conference on Automobile Learning. 2017, 3067–3075
-
Conn A R, Scheinberg Chiliad, Vicente Fifty N. Introduction to Derivative-Free Optimization. Philadelphia, PA: SIAM, 2009
-
Kolda T Yard, Lewis R M, Torczon V. Optimization past straight search: new perspectives on some classical and modernistic methods. SIAM Review, 2003, 45(three): 385–482
-
Rios L M, Sahinidis Northward 5. Derivative-complimentary optimization: a review of algorithms and comparing of software implementations. Journal of Global Optimization, 2013, 56(3): 1247–1293
-
Sigaud O, Wilson S W. Learning classifier systems: a survey. Soft Computing, 2007, 11(11): 1065–1078
-
Moriarty D Due east, Schultz A C, Grefenstette J J. Evolutionary algorithms for reinforcement learning. Periodical of Artificial Intelligence Research, 1999, 11: 241–276
-
Whiteson S. Evolutionary computation for reinforcement learning. In: Wiering Thousand, van Otterlo M, eds. Reinforcement Learning: State-of-the-Art. Springer, Berlin, Heidelberg, 2012, 325–355
-
Bellman R. A Markovian determination procedure. Journal of Mathematics and Mechanics, 1957, 6(v): 679–684
-
Bartlett P L, Baxter J. Space-horizon policy gradient estimation. Periodical of Artificial Intelligence Inquiry, 2001, 15: 319–350
-
Holland J H. Adaptation in Natural and Artificial Systems. Ann Arbor, MI: The Academy of Michigan Press, 1975
-
Hansen N, Müller Due south D, Koumoutsakos P. Reducing the time complexity of the derandomized evolution strategy with covariance matrix accommodation (CMA-ES). Evolutionary Computation, 2003, 11(one): 1–18
-
Shahriari B, Swersky K, Wang Z, Adams R P, Freitas D North. Taking the man out of the loop: a review of Bayesian optimization. Proceedings of the IEEE, 2016, 104(ane): 148–175
-
De Boer P T, Kroese D P, Mannor S, Rubinstein R Y. A tutorial on the cross-entropy method. Register of Operations Research, 2005, 134(1): nineteen–67
-
Munos R. From bandits to Monte-Carlo tree search: the optimistic principle applied to optimization and planning. Foundations and Trends in Auto Learning, 2014, 7(1): i–129
-
Yu Y, Qian H, Hu Y Q. Derivative-free optimization via classification. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016, 2286–2292
-
He J, Yao X. Migrate assay and average fourth dimension complication of evolutionary algorithms. Artificial Intelligence, 2001, 127(one): 57–85
-
Yu Y, Zhou Z H. A new arroyo to estimating the expected showtime hitting time of evolutionary algorithms. Artificial Intelligence, 2008, 172(15): 1809–1832
-
Balderdash A D. Convergence rates of efficient global optimization algorithms. Journal of Auto Learning Research, 2011, 12: 2879–2904
-
Jamieson K 1000, Nowak R D, Recht B. Query complexity of derivativefree optimization. In: Proceedings of the 25th International Conference on Neural Information Processing Systems. 2012, 2681–2689
-
Yu Y, Qian H. The sampling-and-learning framework: a statistical view of evolutionary algorithms. In: Proceedings of the 2014 IEEE Congress on Evolutionary Computation. 2014, 149–158
-
Duchi J C, Jordan M I, Wainwright M J, Wibisono A. Optimal rates for null-order convex optimization: the ability of two function evaluations. IEEE Transactions on Information Theory, 2015, 61(5): 2788–2806
-
Yu Y, Qian C, Zhou Z H. Switch analysis for running time analysis of evolutionary algorithms. IEEE Transactions on Evolutionary Ciphering, 2015, xix(6): 777–792
-
Kawaguchi K, Kaelbling L P, Lozano-Perez T. Bayesian optimization with exponential convergence. In: Proceedings of the 28th International Conference on Neural Information Processing Systems. 2015, 2809–2817
-
Kawaguchi K, Maruyama Y, Zheng X. Global continuous optimization with fault bound and fast convergence. Journal of Bogus Intelligence Research, 2016, 56: 153–195
-
Mitchell M. An Introduction to Genetic Algorithms. Cambridge, MA: MIT Press, 1998
-
Taylor M E, Whiteson Southward, Stone P. Comparison evolutionary and temporal difference methods in a reinforcement learning domain. In: Proceedings of the 2006 Conference on Genetic and Evolutionary Computation. 2006, 1321–1328
-
Abdolmaleki A, Lioutikov R, Peters J, Lau Northward, Reis L P, Neumann Grand. Model-based relative entropy stochastic search. In: Proceedings of the 28th International Conference on Neural Information Processing Systems. 2015, 3537–3545
-
Hu Y Q, Qian H, Yu Y. Sequential nomenclature-based optimization for directly policy search. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017, 2029–2035
-
Salimans T, Ho J, Chen X, Sidor S, Sutskever I. Development strategies every bit a scalable alternative to reinforcement learning. 2017, arXiv:1703.03864
-
Snoek J, Larochelle H, Adams R P. Practical Bayesian optimization of auto learning algorithms. In: Proceedings of the 25th International Briefing on Neural Information Processing Systems. 2012, 2960–2968
-
Thornton C, Hutter F, Hoos H H, Leyton-Brownish K. Automobile-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Information Mining. 2013, 847–855
-
Existent Due east, Moore South, Selle A, Saxena S, Suematsu Y 50, Tan J, Le Q Five, Kurakin A. Large-scale development of image classifiers. In: Proceedings of the 34th International Conference on Car Learning. 2017, 2902–2911
-
Real E, Aggarwal A, Huang Y, Le Q 5. Regularized development for image classifier architecture search. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2019, 4780–4789
-
Zhang Y, Sohn K, Villegas R, Pan Grand, Lee H. Improving object detection with deep convolutional networks via Bayesian optimization and structured prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015, 249–258
-
Qian C, Yu Y, Zhou Z H. Subset option by pareto optimization. In: Proceedings of the 28th International Briefing on Neural Information Processing Systems. 2015, 1765–1773
-
Qian C, Shi J C, Yu Y, Tang Yard. On subset selection with general cost constraints. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017, 2613–2619
-
Brown M, An B, Kiekintveld C, Ordóñez F, Tambe 1000. An extended study on multi-objective security games. Autonomous Agents and MultiAgent Systems, 2014, 28(1): 31–71
-
Domingos P M. A few useful things to know about machine learning. Communications of the ACM, 2012, 55(10): 78–87
-
Yu Y. Towards sample efficient reinforcement learning. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018, 5739–5743
-
Plappert 1000, Houthooft R, Dhariwal P, Sidor S, Chen R Y, Chen X, Asfour T, Abbeel P, Andrychowicz M. Parameter space noise for exploration. In: Proceedings of the 6th International Conference on Learning Representations. 2018
-
Pathak D, Agrawal P, Efros A A, Darrell T. Marvel-driven exploration past cocky-supervised prediction. In: Proceedings of the 34th International Conference on Car Learning. 2017, 2778–2787
-
Duan Y, Chen X, Houthooft R, Schulman J, Abbeel P. Benchmarking deep reinforcement learning for continuous control. In: Proceedings of the 33rd International Briefing on Machine Learning. 2016, 1329–1338
-
Schulman J, Levine S, Abbeel P, Jordan Chiliad I, Moritz P. Trust region policy optimization. In: Proceedings of the 32nd International Briefing on Car Learning. 2015, 1889–1897
-
Bach F R, Perchet Five. Highly-smoothen null-th order online optimization. In: Proceedings of the 29th Conference on Learning Theory. 2016, 257–283
-
Qian H, Yu Y. Scaling simultaneous optimistic optimization for high-dimensional non-convex functions with depression effective dimensions. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016, 2000–2006
-
Yao Ten. Evolving bogus neural networks. Proceedings of the IEEE, 1999, 87(nine): 1423–1447
-
Stanley K O, Clune J, Lehman J, Miikkulainen R. Designing neural networks through neuroevolution. Nature Machine Intelligence, 2019, ane(1): 24–35
-
Such F P, Madhavan V, Conti E, Lehman J, Stanley K O, Clune J. Deep neuroevolution: genetic algorithms are a competitive alternative for grooming deep neural networks for reinforcement learning. 2017, arXiv preprint arXiv:1712.06567
-
Morse G, Stanley G O. Uncomplicated evolutionary optimization can rival stochastic slope descent in neural networks. In: Proceedings of the 2016 Conference on Genetic and Evolutionary Ciphering. 2016, 477–484
-
Zhang X, Clune J, Stanley G O. On the human relationship betwixt the OpenAI development strategy and stochastic slope descent. 2017, arXiv preprint arXiv:1712.06564
-
Koutník J, Cuccu G, Schmidhuber J, Gomez F J. Evolving big-scale neural networks for vision-based reinforcement learning. In: Proceedings of the 2013 Conference on Genetic and Evolutionary Computation. 2013, 1061–1068
-
Hausknecht M J, Lehman J, Miikkulainen R, Stone P. A neuroevolution approach to general Atari game playing. IEEE Transactions on Computational Intelligence and AI in Games, 2014, 6(four): 355–366
-
Risi S, Togelius J. Neuroevolution in games: country of the art and open challenges. IEEE Transactions on Computational Intelligence and AI in Games, 2017, 9(ane): 25–41
-
Chrabaszcz P, Loshchilov I, Hutter F. Dorsum to basics: benchmarking approved development strategies for playing atari. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018, 1419–1426
-
Mania H, Guy A, Recht B. Elementary random search of static linear policies is competitive for reinforcement learning. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018, 1805–1814
-
Malik D, Pananjady A, Bhatia K, Khamaru Chiliad, Bartlett P, Wainwright 1000 J. Derivative-free methods for policy optimization: guarantees for linear quadratic systems. In: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics. 2019, 2916–2925
-
Hansen N, Arnold D V, Auger A. Evolution strategies. In: Kacprzyk J, Pedrycz W, eds. Springer Handbook of Computational Intelligence. Springer, Berlin, Heidelberg, 2015, 871–898
-
Hansen Northward, Ostermeier A. Adapting arbitrary normal mutation distributions in evolution strategies: the covariance matrix adaptation. In: Proceedings of 1996 IEEE International Conference on Evolutionary Computation. 1996, 312–317
-
Hansen Due north, Ostermeier A. Completely derandomized self-adaptation in evolution strategies. Evolutionary Computation, 2001, 9(two): 159–195
-
Heidrich-Meisner V, Igel C. Evolution strategies for direct policy search. In: Proceedings of the 10th International Conference on Parallel Problem Solving from Nature. 2008, 428–437
-
Heidrich-Meisner Five, Igel C. Neuroevolution strategies for episodic reinforcement learning. Journal of Algorithms, 2009, 64(4): 152–168
-
Peters J, Schaal Due south. Natural histrion-critic. Neurocomputing, 2008, 71(7–9): 1180–1190
-
Heidrich-Meisner 5, Igel C. Hoeffding and Bernstein races for selecting policies in evolutionary straight policy search. In: Proceedings of the 26th International Briefing on Machine Learning. 2009, 401–408
-
Stulp F, Sigaud O. Path integral policy comeback with covariance matrix adaptation. In: Proceedings of the 29th International Conference on Machine Learning. 2012
-
Szita I, Lörincz A. Learning Tetris using the noisy cross-entropy method. Neural Computation, 2006, 18(12): 2936–2941
-
Wierstra D, Schaul T, Peters J, Schmidhuber J. Natural evolution strategies. In: Proceedings of the 2008 IEEE Congress on Evolutionary Computation. 2008, 3381–3387
-
Wierstra D, Schaul T, Glasmachers T, Sun Y, Peters J, Schmidhuber J. Natural evolution strategies. Journal of Auto Learning Enquiry, 2014, 15(ane): 949–980
-
Salimans T, Goodfellow I J, Zaremba W, Cheung V, Radford A, Chen X. Improved techniques for preparation GANs. In: Proceedings of the 29th International Briefing on Neural Data Processing Systems. 2016, 2226–2234
-
Geweke J. Antithetic acceleration of Monte Carlo integration in Bayesian inference. Periodical of Econometrics, 1988, 38(one–two): 73–89
-
Brockhoff D, Auger A, Hansen Northward, Arnold D V, Hohm T. Mirrored sampling and sequential selection for evolution strategies. In: Proceedings of the 11th International Conference on Parallel Problem Solving from Nature. 2010, 11–21
-
Todorov E, Erez T, Tassa Y. MuJoCo: a physics engine for model-based command. In: Proceedings of 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. 2012, 5026–5033
-
Bellemare M Chiliad, Naddaf Y, Veness J, Bowling M. The arcade learning environment: an evaluation platform for general agents. Journal of Artificial Intelligence Research, 2013, 47: 253–279
-
Brockman M, Cheung Five, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba Due west. OpenAI Gym. 2016, arXiv preprint arXiv:1606.01540
-
Mnih V, Badia A P, Mirza Yard, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K. Asynchronous methods for deep reinforcement learning. In: Proceedings of the 33rd International Conference on Automobile Learning. 2016, 1928–1937
-
Lehman J, Chen J, Clune J, Stanley 1000 O. ES is more than just a traditional finite-difference approximator. In: Proceedings of the 2018 Conference on Genetic and Evolutionary Ciphering. 2018, 450–457
-
Choromanski K, Rowland M, Sindhwani Five, Turner R Due east, Weller A. Structured evolution with compact architectures for scalable policy optimization. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 969–977
-
Chen Z, Zhou Y, He X, Jiang Southward. A restart-based rank-1 evolution strategy for reinforcement learning. In: Proceedings of the 28th International Articulation Conference on Bogus Intelligence. 2019, 2130–2136
-
Choromanski K, Pacchiano A, Parker-Holder J, Tang Y, Sindhwani V. From complication to simplicity: adaptive ES-active subspaces for blackbox optimization. In: Proceedings of the 31st International Briefing on Neural Data Processing Systems. 2019
-
Constantine P G. Active Subspaces — Emerging Ideas for Dimension Reduction in Parameter Studies. volume 2 of SIAM spotlights. Philadelphia, PA: SIAM, 2015
-
Liu G, Zhao L, Yang F, Bian J, Qin T, Yu North, Liu T Y. Trust region evolution strategies. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019, 4352–4359
-
Tang Y, Choromanski K, Kucukelbir A. Variance reduction for evolution strategies via structured command variates. In: Proceedings of International Conference on Artifical Intelligence and Statistics. 2020, 646–656
-
Fuks L, Awad N, Hutter F, Lindauer M. An evolution strategy with progressive episode lengths for playing games. In: Proceedings of the 28th International Articulation Conference on Artificial Intelligence. 2019, 1234–1240
-
Houthooft R, Chen Y, Isola P, Stadie B C, Wolski F, Ho J, Abbeel P. Evolved policy gradients. In: Proceedings of the 31th International Conference on Neural Information Processing Systems. 2018, 5405–5414
-
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. 2017, arXiv preprint arXiv:1707.06347
-
Duan Y, Schulman J, Chen Ten, Bartlett P L, Sutskever I, Abbeel P. RL2: Fast reinforcement learning via slow reinforcement learning. 2016, arXiv preprint arXiv:1611.02779
-
Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast accommodation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 1126–1135
-
Ha D, Schmidhuber J. Recurrent world models facilitate policy development. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2018, 2455–2467
-
Yu West, Liu C K, Turk Yard. Policy transfer with strategy optimization. In: Proceedings of the seventh International Conference on Learning Representations. 2019
-
Lehman J, Stanley K O. Abandoning objectives: development through the search for novelty alone. Evolutionary Computation, 2011, 19(2): 189–223
-
Gangwani T, Peng J. Policy optimization by genetic distillation. In: Proceedings of the sixth International Conference on Learning Representations. 2018
-
Ross S, Gordon K J, Bagnell D. A reduction of fake learning and structured prediction to no-regret online learning. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. 2011, 627–635
-
Bodnar C, Twenty-four hour period B, Lió P. Proximal distilled evolutionary reinforcement learning. In: Proceedings of the 34th AAAI Briefing on Bogus Intelligence. 2020, 3283–3290
-
Khadka S, Tumer K. Development-guided policy gradient in reinforcement learning. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2018, 1196–1208
-
Lehman J, Chen J, Clune J, Stanley K O. Safe mutations for deep and recurrent neural networks through output gradients. In: Proceedings of the 2018 Conference on Genetic and Evolutionary Computation. 2018, 117–124
-
Fujimoto S, Hoof H, Meger D. Addressing function approximation fault in thespian-critic methods. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 1582–1591
-
Rasmussen C E, Williams C G I. Gaussian Processes for Car Learning. Cambridge, Massachusetts: MIT Printing, 2006
-
Kushner H J. A new method of locating the maximum of an arbitrary multipeak curve in the presence of dissonance. Journal of Bones Engineering, 1964, 86: 97–106
-
Močkus J, Tiesis V, Žilinskas A. Toward global optimization. In: Dixon Fifty C W, Szego G P, eds. The Application of Bayesian Methods for Seeking the Extremum. Elsevier, Amsterdam, Netherlands, 1978, 117–128
-
Srinivas N, Krause A, Kakade S M, Seeger One thousand W. Gaussian process optimization in the bandit setting: no regret and experimental pattern. In: Proceedings of the 27th International Conference on Machine Learning. 2010, 1015–1022
-
Freitas D N, Smola A J, Zoghi M. Exponential regret bounds for Gaussian process bandits with deterministic observations. In: Proceedings of the 29th International Conference on Machine Learning. 2012
-
Brochu E, Cora V Yard, Freitas D N. A tutorial on Bayesian optimization of expensive price functions, with application to active user modeling and hierarchical reinforcement learning. 2010, arXiv preprint arXiv:1012.2599
-
Wilson A, Fern A, Tadepalli P. Using trajectory data to improve Bayesian optimization for reinforcement learning. Journal of Motorcar Learning Enquiry, 2014, 15(1): 253–282
-
Calandra R, Seyfarth A, Peters J, Deisenroth M P. An experimental comparison of Bayesian optimization for bipedal locomotion. In: Proceedings of the 2014 IEEE International Conference on Robotics and Automation. 2014, 1951–1958
-
Calandra R, Seyfarth A, Peters J, Deisenroth K P. Bayesian optimization for learning gaits under doubt — an experimental comparison on a dynamic bipedal walker. Annals of Mathematics and Bogus Intelligence, 2016, 76(1–2): five–23
-
Marco A, Berkenkamp F, Hennig P, Schoellig A P, Krause A, Schaal S, Trimpe Due south. Virtual vs. real: trading off simulations and physical experiments in reinforcement learning with Bayesian optimization. In: Proceedings of the 2017 IEEE International Briefing on Robotics and Automation. 2017, 1557–1563
-
Letham B, Bakshy E. Bayesian optimization for policy search via online-offline experimentation. 2019, arXiv preprint arXiv:1904.01049
-
Swersky K, Snoek J, Adams R P. Multi-task Bayesian optimization. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 2004–2012
-
Vien Northward A, Zimmermann H, Toussaint M. Bayesian functional optimization. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018, 4171–4178
-
Vien N A, Dang Five H, Chung T. A covariance matrix adaptation evolution strategy for directly policy search in reproducing kernel Hilbert space. In: Proceedings of The 9th Asian Conference on Machine Learning. 2017, 606–621
-
Eriksson D, Pearce M, Gardner J R, Turner R, Poloczek M. Scalable global optimization via local Bayesian optimization. In: Proceedings of the 32nd International Briefing on Neural Information Processing Systems. 2019, 5497–5508
-
Lozano J A, Larranaga P, Inza I, Bengoetxea E. Towards a New Evolutionary Ciphering: advances on Estimation of Distribution Algorithms. Berlin, Germany: Springer-Verlag, 2006
-
Hashimoto T, Yadlowsky S, Duchi J C. Derivative gratis optimization via repeated classification. In: Proceedings of the 2018 International Conference on Artificial Intelligence and Statistics. 2018, 2027–2036
-
Zhou A, Zhang J, Sun J, Zhang Chiliad. Fuzzy-nomenclature assisted solution preselection in evolutionary optimization. In: Proceedings of the 33rd AAAI Briefing on Artificial Intelligence. 2019, 2403–2410
-
Dasgupta D, McGregor D. Designing awarding-specific neural networks using the structured genetic algorithm. In: Proceedings of the International Conference on Combinations of Genetic Algorithms and Neural Networks. 1992, 87–96
-
Stanley 1000 O, Miikkulainen R. Efficient reinforcement learning through evolving neural network topologies. In: Proceedings of the 2002 Conference on Genetic and Evolutionary Computation. 2002, 569–577
-
Stanley K O, Miikkulainen R. Evolving neural networks through augmenting topologies. Evolutionary Computation, 2002, x(ii): 99–127
-
Singh S P, Sutton R S. Reinforcement learning with replacing eligibility traces. Machine Learning, 1996, 22(1–3): 123–158
-
Whiteson Due south, Stone P. Sample-efficient evolutionary function approximation for reinforcement learning. In: Proceedings of the 21st AAAI Conference on Artificial Intelligence. 2006, 518–523
-
Whiteson Southward, Stone P. Evolutionary function approximation for reinforcement learning. Journal of Machine Learning Inquiry, 2006, 7: 877–917
-
Kohl N, Miikkulainen R. Evolving neural networks for strategic decision-making problems. Neural Networks, 2009, 22(3): 326–337
-
Gauci J, Stanley K O. A case report on the critical part of geometric regularity in motorcar learning. In: Proceedings of the 23rd AAAI Conference on Bogus Intelligence. 2008, 628–633
-
Hausknecht M J, Khandelwal P, Miikkulainen R, Stone P. HyperNEAT-GGP: a hyperNEAT-based Atari general game player. In: Proceedings of the 2012 Briefing on Genetic and Evolutionary Computation. 2012, 217–224
-
Ebrahimi S, Rohrbach A, Darrell T. Gradient-free policy architecture search and adaptation. In: Proceedings of the 1st Briefing on Robot Learning. 2017, 505–514
-
Zoph B, Le Q V. Neural architecture search with reinforcement learning. In: Proceedings of the 5th International Briefing on Learning Representations. 2017
-
Gaier A, Ha D. Weight agnostic neural networks. In: Proceedings of the 31st International Conference on Neural Data Processing Systems. 2019, 5365–5379
-
Conti Eastward, Madhavan V, Such F P, Lehman J, Stanley Grand O, Clune J. Improving exploration in development strategies for deep reinforcement learning via a population of novelty-seeking agents. In: Proceedings of the 31st International Briefing on Neural Information Processing Systems. 2018, 5032–5043
-
Chen X H, Yu Y. Reinforcement learning with derivative-gratis exploration. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. 2019, 1880–1882
-
Lillicrap T P, Hunt J J, Pritzel A, Heess North, Erez T, Tassa Y, Silvery D, Wierstra D. Continuous control with deep reinforcement learning. In: Proceedings of the 4th International Conference on Learning Representations. 2016
-
Vemula A, Sun Due west, Bagnell J A. Contrasting exploration in parameter and activeness space: a zeroth-society optimization perspective. In: Proceedings of the 22nd International Briefing on Artificial Intelligence and Statistics. 2019, 2926–2935
-
Colas C, Sigaud O, Oudeyer P Y. GEP-PG: decoupling exploration and exploitation in deep reinforcement learning algorithms. In: Proceedings of the 35th International Briefing on Machine Learning. 2018, 1038–1047
-
Liu Y R, Hu Y Q, Qian H, Yu Y, Qian C. ZOOpt: toolbox for derivativefree optimization. 2017, arXiv preprint arXiv:1801.00329
-
Jaderberg Grand, Dalibard V, Osindero S, Czarnecki W M, Donahue J, Razavi A, Vinyals O, Green T, Dunning I, Simonyan Chiliad, Fernando C, Kavukcuoglu Grand. Population based training of neural networks. 2017, arXiv preprint arXiv:1711.09846
-
Beattie C, Leibo J Z, Teplyashin D, Ward T, Wainwright M, Küttler H, Lefrancq A, Dark-green S, Valdés V, Sadik A, Schrittwieser J, Anderson Chiliad, York S, Deceit M, Cain A, Bolton A, Gaffney S, King H, Hassabis D, Legg South, Petersen S. DeepMind Lab. 2016, arXiv preprint arXiv:1612.03801
-
Vinyals O, Ewalds T, Bartunov South, Georgiev P, Vezhnevets A S, Yeo M, Makhzani A, Küttler H, Agapiou J P, Schrittwieser J, Quan J, Gaffney S, Petersen S, Simonyan Grand, Schaul T, Hasselt v H, Silverish D, Lillicrap T P, Calderone K, Keet P, Brunasso A, Lawrence D, Ekermo A, Repp J, Tsing R. StarCraft II: a new challenge for reinforcement learning. 2017, arXiv preprint arXiv:1708.04782
-
Moritz P, Nishihara R, Wang S, Tumanov A, Liaw R, Liang E, Paul W, Jordan M I, Stoica I. Ray: a distributed framework for emerging AI applications. In: Proceedings of the 13th USENIX Symposium on Operating Systems Pattern and Implementation. 2018, 561–577
-
Elfwing Due south, Uchibe E, Doya K. Online meta-learning past parallel algorithm competition. In: Proceedings of the 2018 Briefing on Genetic and Evolutionary Computation. 2018, 426–433
-
Baker J E. Reducing bias and inefficiency in the selection algorithm. In: Proceedings of the 2nd International Conference on Genetic Algorithms. 1987, fourteen–21
-
Jaderberg Thousand, Czarnecki W Grand, Dunning I, Marris Fifty, Lever G, Castaneda A G, Beattie C, Rabinowitz North C, Morcos A Due south, Ruderman A, Sonnerat N, Green T, Deason 50, Leibo J Z, Silver D, Hassabis D, Kavukcuoglu M, Graepel T. Man-level performance in 3d multiplayer games with population-based reinforcement learning. Scientific discipline, 2019, 364(6443): 859–865
-
Jung W, Park G, Sung Y. Population-guided parallel policy search for reinforcement learning. In: Proceedings of the 8th International Conference on Learning Representations. 2020
-
Pourchot A, Perrin N, Sigaud O. Importance mixing: Improving sample reuse in evolutionary policy search methods. 2018, arXiv preprint arXiv:1808.05832
-
Stork J, Zaefferer M, Bartz-Beielstein T, Eiben A E. Surrogate models for enhancing the efficiency of neuroevolution in reinforcement learning. In: Proceedings of the 2019 Conference on Genetic and Evolutionary Computation. 2019, 934–942
-
Bibi A, Bergou E H, Sener O, Ghanem B, Richtárik P. A stochastic derivative-free optimization method with importance sampling: theory and learning to command. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 3275–3282
-
Chen X, Liu S, Xu K, Li X, Lin 10, Hong M, Cox D D. ZO-AdaMM: aeroth-order adaptive momentum method for black-box optimization. In: Proceedings of the 32nd International Briefing on Neural Information Processing Systems. 2019, 7202–7213
-
Gorbunov E A, Bibi A, Sener O, Bergou East H, Richtárik P. A stochastic derivative costless optimization method with momentum. In: Proceedings of the 8th International Briefing on Learning Representations. 2020
-
Kandasamy K, Schneider J, Poczos B. High dimensional Bayesian optimisation and bandits via additive models. In: Proceedings of the 32nd International Briefing on Machine Learning. 2015, 295–304
-
Wang Z, Zoghi M, Hutter F, Matheson D, Freitas Northward D. Bayesian optimization in a billion dimensions via random embeddings. Journal of Bogus Intelligence Research, 2016, 55: 361–387
-
Qian H, Hu Y Q, Yu Y. Derivative-costless optimization of high-dimensional non-convex functions past sequential random embeddings. In: Proceedings of the 25th International Articulation Conference on Artificial Intelligence. 2016, 1946–1952
-
Yang P, Tang K, Yao 10. Turning loftier-dimensional optimization into computationally expensive optimization. IEEE Transactions on Evolutionary Computation, 2018, 22(1): 143–156
-
Mutny One thousand, Krause A. Efficient loftier dimensional Bayesian optimization with additivity and quadrature fourier features. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2018, 9019–9030
-
Müller North, Glasmachers T. Challenges in high-dimensional reinforcement learning with evolution strategies. In: Proceedings of the 15th International Conference on Parallel Problem Solving from Nature. 2018, 411–423
-
Li Z, Zhang Q, Lin X, Zhen H L. Fast covariance matrix accommodation for big-scale black-box optimization. IEEE Transaction on Cybernetics, 2020, 50(v): 2073–2083
-
Wang H, Qian H, Yu Y. Noisy derivative-free optimization with value suppression. In: Proceedings of the 32nd AAAI Briefing on Bogus Intelligence. 2018, 1447–1454
Acknowledgements
This piece of work was supported by the Program A for Outstanding PhD Candidate of Nanjing University, National Scientific discipline Foundation of Cathay (61876077), Jiangsu Science Foundation (BK20170013), and Collaborative Innovation Middle of Novel Software Applied science and Industrialization. Yang Yu is the corresponding author of this commodity. The authors would similar to thank Xiong-Hui Chen and Zhao-Hua Li for improving the article.
Writer data
Affiliations
Corresponding author
Electronic supplementary material
Virtually this article
Cite this article
Qian, H., Yu, Y. Derivative-complimentary reinforcement learning: a review. Front. Comput. Sci. xv, 156336 (2021). https://doi.org/x.1007/s11704-020-0241-iv
-
Received:
-
Accustomed:
-
Published:
-
DOI : https://doi.org/10.1007/s11704-020-0241-iv
Keywords
- reinforcement learning
- derivative-gratis optimization
- neuroevolution reinforcement learning
- neural architecture search
Source: https://link.springer.com/article/10.1007/s11704-020-0241-4
0 Response to "Derivative-free Optimization a Review of Algorithms and Comparison of Software Implementations"
Post a Comment