Derivative-free Optimization a Review of Algorithms and Comparison of Software Implementations

Abstract

Reinforcement learning is almost learning agent models that brand the best sequential decisions in unknown environments. In an unknown surround, the agent needs to explore the environs while exploiting the collected information, which normally forms a sophisticated problem to solve. Derivative-free optimization, meanwhile, is capable of solving sophisticated problems. It commonly uses a sampling-and-updating framework to iteratively improve the solution, where exploration and exploitation are likewise needed to exist well balanced. Therefore, derivative-complimentary optimization deals with a similar cadre issue equally reinforcement learning, and has been introduced in reinforcement learning approaches, under the names of learning classifier systems and neuroevolution/evolutionary reinforcement learning. Although such methods take been adult for decades, recently, derivative-free reinforcement learning exhibits attracting increasing attention. Nevertheless, recent survey on this topic is still lacking. In this commodity, we summarize methods of derivative-complimentary reinforcement learning to engagement, and organize the methods in aspects including parameter updating, model selection, exploration, and parallel/distributed methods. Moreover, nosotros hash out some current limitations and possible future directions, hoping that this commodity could bring more attentions to this topic and serve as a catalyst for developing novel and efficient approaches.

References

Sutton R S, Barto A Thousand. Reinforcement Learning: An Introduction. Cambridge, Massachusetts: MIT Press, 1998

MATH Google Scholar
Wiering K, Van Otterlo M. Reinforcement Learning: State-of-the-Art. Berlin, Heidelberg: Springer, 2012

Volume Google Scholar
Dietterich T G. Car learning research: iv current directions. Bogus Intelligence Magazine, 1997, 18(iv): 97–136

Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M Yard, Graves A, Riedmiller 1000, Fidjeland A Yard, Ostrovski G. Man-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529–533

Article Google Scholar
Silvery D, Huang A, Maddison C J, Guez A, Sifre L, Driessche G V D, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M. Mastering the game of Go with deep neural networks and tree search. Nature, 2016, 529(7587): 484–489

Article Google Scholar
Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai Thou, Guez A, Lanctot 1000, Sifre Fifty, Kumaran D, Graepel T, Lillicrap T, Simonyan Chiliad, Hassabis D. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 2018, 362(6419): 1140–1144

MathSciNet MATH Article Google Scholar
Abbeel P, Coates A, Quigley M, Ng A Y. An application of reinforcement learning to aerobatic helicopter flight. In: Proceedings of the 19th International Conference on Neural Information Processing Systems. 2006, 1–viii
Zoph B, Le Q V. Neural compages search with reinforcement learning. In: Proceedings of the 5th International Conference on Learning Representations. 2017
Huang C, Lucey S, Ramanan D. Learning policies for adaptive tracking with deep characteristic cascades. In: Proceedings of the IEEE International Conference on Calculator Vision. 2017, 105–114
Yu Fifty, Zhang W, Wang J, Yu Y. SeqGAN: sequence generative adversarial nets with policy gradient. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017, 2852–2858
Wang Y C, Usher J M. Application of reinforcement learning for agent-based production scheduling. Engineering Applications of Artificial Intelligence, 2005, 18(i): 73–82

Article Google Scholar
Choi J J, Laibson D, Madrian B C, Metrick A. Reinforcement learning and savings behavior. The Journal of Finance, 2009, 64(vi): 2515–2534

Commodity Google Scholar
Shi J C, Yu Y, Da Q, Chen S Y, Zeng A. Virtual-taobao: virtualizing real-world online retail environment for reinforcement learning. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019, 4902–4909
Boyan J A, Littman M Fifty. Package routing in dynamically changing networks: a reinforcement learning approach. In: Proceedings of the 6th International Briefing on Neural Information Processing Systems. 1993, 671–678
Frank M J, Seeberger 50 C, O'reilly R C. By carrot or by stick: cognitive reinforcement learning in parkinsonism. Scientific discipline, 2004, 306(5703): 1940–1943

Commodity Google Scholar
Samejima K, Ueda Y, Doya K, Kimura M. Representation of action-specific advantage values in the striatum. Scientific discipline, 2005, 310(5752): 1337–1340

Article Google Scholar
Shalev-Shwartz S, Shamir O, Shammah Due south. Failures of gradient-based deep learning. In: Proceedings of the 34th International Conference on Automobile Learning. 2017, 3067–3075
Conn A R, Scheinberg Chiliad, Vicente Fifty N. Introduction to Derivative-Free Optimization. Philadelphia, PA: SIAM, 2009

MATH Volume Google Scholar
Kolda T Yard, Lewis R M, Torczon V. Optimization past straight search: new perspectives on some classical and modernistic methods. SIAM Review, 2003, 45(three): 385–482

MathSciNet MATH Article Google Scholar
Rios L M, Sahinidis Northward 5. Derivative-complimentary optimization: a review of algorithms and comparing of software implementations. Journal of Global Optimization, 2013, 56(3): 1247–1293

MathSciNet MATH Article Google Scholar
Sigaud O, Wilson S W. Learning classifier systems: a survey. Soft Computing, 2007, 11(11): 1065–1078

MATH Commodity Google Scholar
Moriarty D Due east, Schultz A C, Grefenstette J J. Evolutionary algorithms for reinforcement learning. Periodical of Artificial Intelligence Research, 1999, 11: 241–276

MATH Article Google Scholar
Whiteson S. Evolutionary computation for reinforcement learning. In: Wiering Thousand, van Otterlo M, eds. Reinforcement Learning: State-of-the-Art. Springer, Berlin, Heidelberg, 2012, 325–355

Chapter Google Scholar
Bellman R. A Markovian determination procedure. Journal of Mathematics and Mechanics, 1957, 6(v): 679–684

MathSciNet MATH Google Scholar
Bartlett P L, Baxter J. Space-horizon policy gradient estimation. Periodical of Artificial Intelligence Inquiry, 2001, 15: 319–350

MathSciNet MATH Article Google Scholar
Holland J H. Adaptation in Natural and Artificial Systems. Ann Arbor, MI: The Academy of Michigan Press, 1975

Google Scholar
Hansen N, Müller Due south D, Koumoutsakos P. Reducing the time complexity of the derandomized evolution strategy with covariance matrix accommodation (CMA-ES). Evolutionary Computation, 2003, 11(one): 1–18

Article Google Scholar
Shahriari B, Swersky K, Wang Z, Adams R P, Freitas D North. Taking the man out of the loop: a review of Bayesian optimization. Proceedings of the IEEE, 2016, 104(ane): 148–175

Article Google Scholar
De Boer P T, Kroese D P, Mannor S, Rubinstein R Y. A tutorial on the cross-entropy method. Register of Operations Research, 2005, 134(1): nineteen–67

MathSciNet MATH Commodity Google Scholar
Munos R. From bandits to Monte-Carlo tree search: the optimistic principle applied to optimization and planning. Foundations and Trends in Auto Learning, 2014, 7(1): i–129

MATH Article Google Scholar
Yu Y, Qian H, Hu Y Q. Derivative-free optimization via classification. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016, 2286–2292
He J, Yao X. Migrate assay and average fourth dimension complication of evolutionary algorithms. Artificial Intelligence, 2001, 127(one): 57–85

MathSciNet MATH Article Google Scholar
Yu Y, Zhou Z H. A new arroyo to estimating the expected showtime hitting time of evolutionary algorithms. Artificial Intelligence, 2008, 172(15): 1809–1832

MathSciNet MATH Article Google Scholar
Balderdash A D. Convergence rates of efficient global optimization algorithms. Journal of Auto Learning Research, 2011, 12: 2879–2904

MathSciNet MATH Google Scholar
Jamieson K 1000, Nowak R D, Recht B. Query complexity of derivativefree optimization. In: Proceedings of the 25th International Conference on Neural Information Processing Systems. 2012, 2681–2689
Yu Y, Qian H. The sampling-and-learning framework: a statistical view of evolutionary algorithms. In: Proceedings of the 2014 IEEE Congress on Evolutionary Computation. 2014, 149–158
Duchi J C, Jordan M I, Wainwright M J, Wibisono A. Optimal rates for null-order convex optimization: the ability of two function evaluations. IEEE Transactions on Information Theory, 2015, 61(5): 2788–2806

MathSciNet MATH Commodity Google Scholar
Yu Y, Qian C, Zhou Z H. Switch analysis for running time analysis of evolutionary algorithms. IEEE Transactions on Evolutionary Ciphering, 2015, xix(6): 777–792

Article Google Scholar
Kawaguchi K, Kaelbling L P, Lozano-Perez T. Bayesian optimization with exponential convergence. In: Proceedings of the 28th International Conference on Neural Information Processing Systems. 2015, 2809–2817
Kawaguchi K, Maruyama Y, Zheng X. Global continuous optimization with fault bound and fast convergence. Journal of Bogus Intelligence Research, 2016, 56: 153–195

MathSciNet MATH Commodity Google Scholar
Mitchell M. An Introduction to Genetic Algorithms. Cambridge, MA: MIT Press, 1998

MATH Book Google Scholar
Taylor M E, Whiteson Southward, Stone P. Comparison evolutionary and temporal difference methods in a reinforcement learning domain. In: Proceedings of the 2006 Conference on Genetic and Evolutionary Computation. 2006, 1321–1328
Abdolmaleki A, Lioutikov R, Peters J, Lau Northward, Reis L P, Neumann Grand. Model-based relative entropy stochastic search. In: Proceedings of the 28th International Conference on Neural Information Processing Systems. 2015, 3537–3545
Hu Y Q, Qian H, Yu Y. Sequential nomenclature-based optimization for directly policy search. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017, 2029–2035
Salimans T, Ho J, Chen X, Sidor S, Sutskever I. Development strategies every bit a scalable alternative to reinforcement learning. 2017, arXiv:1703.03864
Snoek J, Larochelle H, Adams R P. Practical Bayesian optimization of auto learning algorithms. In: Proceedings of the 25th International Briefing on Neural Information Processing Systems. 2012, 2960–2968
Thornton C, Hutter F, Hoos H H, Leyton-Brownish K. Automobile-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Information Mining. 2013, 847–855
Existent Due east, Moore South, Selle A, Saxena S, Suematsu Y 50, Tan J, Le Q Five, Kurakin A. Large-scale development of image classifiers. In: Proceedings of the 34th International Conference on Car Learning. 2017, 2902–2911
Real E, Aggarwal A, Huang Y, Le Q 5. Regularized development for image classifier architecture search. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2019, 4780–4789
Zhang Y, Sohn K, Villegas R, Pan Grand, Lee H. Improving object detection with deep convolutional networks via Bayesian optimization and structured prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015, 249–258
Qian C, Yu Y, Zhou Z H. Subset option by pareto optimization. In: Proceedings of the 28th International Briefing on Neural Information Processing Systems. 2015, 1765–1773
Qian C, Shi J C, Yu Y, Tang Yard. On subset selection with general cost constraints. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017, 2613–2619
Brown M, An B, Kiekintveld C, Ordóñez F, Tambe 1000. An extended study on multi-objective security games. Autonomous Agents and MultiAgent Systems, 2014, 28(1): 31–71

Article Google Scholar
Domingos P M. A few useful things to know about machine learning. Communications of the ACM, 2012, 55(10): 78–87

Article Google Scholar
Yu Y. Towards sample efficient reinforcement learning. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018, 5739–5743
Plappert 1000, Houthooft R, Dhariwal P, Sidor S, Chen R Y, Chen X, Asfour T, Abbeel P, Andrychowicz M. Parameter space noise for exploration. In: Proceedings of the 6th International Conference on Learning Representations. 2018
Pathak D, Agrawal P, Efros A A, Darrell T. Marvel-driven exploration past cocky-supervised prediction. In: Proceedings of the 34th International Conference on Car Learning. 2017, 2778–2787
Duan Y, Chen X, Houthooft R, Schulman J, Abbeel P. Benchmarking deep reinforcement learning for continuous control. In: Proceedings of the 33rd International Briefing on Machine Learning. 2016, 1329–1338
Schulman J, Levine S, Abbeel P, Jordan Chiliad I, Moritz P. Trust region policy optimization. In: Proceedings of the 32nd International Briefing on Car Learning. 2015, 1889–1897
Bach F R, Perchet Five. Highly-smoothen null-th order online optimization. In: Proceedings of the 29th Conference on Learning Theory. 2016, 257–283
Qian H, Yu Y. Scaling simultaneous optimistic optimization for high-dimensional non-convex functions with depression effective dimensions. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016, 2000–2006
Yao Ten. Evolving bogus neural networks. Proceedings of the IEEE, 1999, 87(nine): 1423–1447

Article Google Scholar
Stanley K O, Clune J, Lehman J, Miikkulainen R. Designing neural networks through neuroevolution. Nature Machine Intelligence, 2019, ane(1): 24–35

Commodity Google Scholar
Such F P, Madhavan V, Conti E, Lehman J, Stanley K O, Clune J. Deep neuroevolution: genetic algorithms are a competitive alternative for grooming deep neural networks for reinforcement learning. 2017, arXiv preprint arXiv:1712.06567
Morse G, Stanley G O. Uncomplicated evolutionary optimization can rival stochastic slope descent in neural networks. In: Proceedings of the 2016 Conference on Genetic and Evolutionary Ciphering. 2016, 477–484
Zhang X, Clune J, Stanley G O. On the human relationship betwixt the OpenAI development strategy and stochastic slope descent. 2017, arXiv preprint arXiv:1712.06564
Koutník J, Cuccu G, Schmidhuber J, Gomez F J. Evolving big-scale neural networks for vision-based reinforcement learning. In: Proceedings of the 2013 Conference on Genetic and Evolutionary Computation. 2013, 1061–1068
Hausknecht M J, Lehman J, Miikkulainen R, Stone P. A neuroevolution approach to general Atari game playing. IEEE Transactions on Computational Intelligence and AI in Games, 2014, 6(four): 355–366

Article Google Scholar
Risi S, Togelius J. Neuroevolution in games: country of the art and open challenges. IEEE Transactions on Computational Intelligence and AI in Games, 2017, 9(ane): 25–41

Article Google Scholar
Chrabaszcz P, Loshchilov I, Hutter F. Dorsum to basics: benchmarking approved development strategies for playing atari. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018, 1419–1426
Mania H, Guy A, Recht B. Elementary random search of static linear policies is competitive for reinforcement learning. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018, 1805–1814
Malik D, Pananjady A, Bhatia K, Khamaru Chiliad, Bartlett P, Wainwright 1000 J. Derivative-free methods for policy optimization: guarantees for linear quadratic systems. In: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics. 2019, 2916–2925
Hansen N, Arnold D V, Auger A. Evolution strategies. In: Kacprzyk J, Pedrycz W, eds. Springer Handbook of Computational Intelligence. Springer, Berlin, Heidelberg, 2015, 871–898

Affiliate Google Scholar
Hansen Northward, Ostermeier A. Adapting arbitrary normal mutation distributions in evolution strategies: the covariance matrix adaptation. In: Proceedings of 1996 IEEE International Conference on Evolutionary Computation. 1996, 312–317
Hansen Due north, Ostermeier A. Completely derandomized self-adaptation in evolution strategies. Evolutionary Computation, 2001, 9(two): 159–195

Article Google Scholar
Heidrich-Meisner V, Igel C. Evolution strategies for direct policy search. In: Proceedings of the 10th International Conference on Parallel Problem Solving from Nature. 2008, 428–437
Heidrich-Meisner Five, Igel C. Neuroevolution strategies for episodic reinforcement learning. Journal of Algorithms, 2009, 64(4): 152–168

MATH Article Google Scholar
Peters J, Schaal Due south. Natural histrion-critic. Neurocomputing, 2008, 71(7–9): 1180–1190

Article Google Scholar
Heidrich-Meisner 5, Igel C. Hoeffding and Bernstein races for selecting policies in evolutionary straight policy search. In: Proceedings of the 26th International Briefing on Machine Learning. 2009, 401–408
Stulp F, Sigaud O. Path integral policy comeback with covariance matrix adaptation. In: Proceedings of the 29th International Conference on Machine Learning. 2012
Szita I, Lörincz A. Learning Tetris using the noisy cross-entropy method. Neural Computation, 2006, 18(12): 2936–2941

MATH Commodity Google Scholar
Wierstra D, Schaul T, Peters J, Schmidhuber J. Natural evolution strategies. In: Proceedings of the 2008 IEEE Congress on Evolutionary Computation. 2008, 3381–3387
Wierstra D, Schaul T, Glasmachers T, Sun Y, Peters J, Schmidhuber J. Natural evolution strategies. Journal of Auto Learning Enquiry, 2014, 15(ane): 949–980

MathSciNet MATH Google Scholar
Salimans T, Goodfellow I J, Zaremba W, Cheung V, Radford A, Chen X. Improved techniques for preparation GANs. In: Proceedings of the 29th International Briefing on Neural Data Processing Systems. 2016, 2226–2234
Geweke J. Antithetic acceleration of Monte Carlo integration in Bayesian inference. Periodical of Econometrics, 1988, 38(one–two): 73–89

MATH Article Google Scholar
Brockhoff D, Auger A, Hansen Northward, Arnold D V, Hohm T. Mirrored sampling and sequential selection for evolution strategies. In: Proceedings of the 11th International Conference on Parallel Problem Solving from Nature. 2010, 11–21
Todorov E, Erez T, Tassa Y. MuJoCo: a physics engine for model-based command. In: Proceedings of 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. 2012, 5026–5033
Bellemare M Chiliad, Naddaf Y, Veness J, Bowling M. The arcade learning environment: an evaluation platform for general agents. Journal of Artificial Intelligence Research, 2013, 47: 253–279

Article Google Scholar
Brockman M, Cheung Five, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba Due west. OpenAI Gym. 2016, arXiv preprint arXiv:1606.01540
Mnih V, Badia A P, Mirza Yard, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K. Asynchronous methods for deep reinforcement learning. In: Proceedings of the 33rd International Conference on Automobile Learning. 2016, 1928–1937
Lehman J, Chen J, Clune J, Stanley 1000 O. ES is more than just a traditional finite-difference approximator. In: Proceedings of the 2018 Conference on Genetic and Evolutionary Ciphering. 2018, 450–457
Choromanski K, Rowland M, Sindhwani Five, Turner R Due east, Weller A. Structured evolution with compact architectures for scalable policy optimization. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 969–977
Chen Z, Zhou Y, He X, Jiang Southward. A restart-based rank-1 evolution strategy for reinforcement learning. In: Proceedings of the 28th International Articulation Conference on Bogus Intelligence. 2019, 2130–2136
Choromanski K, Pacchiano A, Parker-Holder J, Tang Y, Sindhwani V. From complication to simplicity: adaptive ES-active subspaces for blackbox optimization. In: Proceedings of the 31st International Briefing on Neural Data Processing Systems. 2019
Constantine P G. Active Subspaces — Emerging Ideas for Dimension Reduction in Parameter Studies. volume 2 of SIAM spotlights. Philadelphia, PA: SIAM, 2015

Google Scholar
Liu G, Zhao L, Yang F, Bian J, Qin T, Yu North, Liu T Y. Trust region evolution strategies. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019, 4352–4359
Tang Y, Choromanski K, Kucukelbir A. Variance reduction for evolution strategies via structured command variates. In: Proceedings of International Conference on Artifical Intelligence and Statistics. 2020, 646–656
Fuks L, Awad N, Hutter F, Lindauer M. An evolution strategy with progressive episode lengths for playing games. In: Proceedings of the 28th International Articulation Conference on Artificial Intelligence. 2019, 1234–1240
Houthooft R, Chen Y, Isola P, Stadie B C, Wolski F, Ho J, Abbeel P. Evolved policy gradients. In: Proceedings of the 31th International Conference on Neural Information Processing Systems. 2018, 5405–5414
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. 2017, arXiv preprint arXiv:1707.06347
Duan Y, Schulman J, Chen Ten, Bartlett P L, Sutskever I, Abbeel P. RL²: Fast reinforcement learning via slow reinforcement learning. 2016, arXiv preprint arXiv:1611.02779
Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast accommodation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 1126–1135
Ha D, Schmidhuber J. Recurrent world models facilitate policy development. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2018, 2455–2467
Yu West, Liu C K, Turk Yard. Policy transfer with strategy optimization. In: Proceedings of the seventh International Conference on Learning Representations. 2019
Lehman J, Stanley K O. Abandoning objectives: development through the search for novelty alone. Evolutionary Computation, 2011, 19(2): 189–223

Article Google Scholar
Gangwani T, Peng J. Policy optimization by genetic distillation. In: Proceedings of the sixth International Conference on Learning Representations. 2018
Ross S, Gordon K J, Bagnell D. A reduction of fake learning and structured prediction to no-regret online learning. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. 2011, 627–635
Bodnar C, Twenty-four hour period B, Lió P. Proximal distilled evolutionary reinforcement learning. In: Proceedings of the 34th AAAI Briefing on Bogus Intelligence. 2020, 3283–3290
Khadka S, Tumer K. Development-guided policy gradient in reinforcement learning. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2018, 1196–1208
Lehman J, Chen J, Clune J, Stanley K O. Safe mutations for deep and recurrent neural networks through output gradients. In: Proceedings of the 2018 Conference on Genetic and Evolutionary Computation. 2018, 117–124
Fujimoto S, Hoof H, Meger D. Addressing function approximation fault in thespian-critic methods. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 1582–1591
Rasmussen C E, Williams C G I. Gaussian Processes for Car Learning. Cambridge, Massachusetts: MIT Printing, 2006

MATH Google Scholar
Kushner H J. A new method of locating the maximum of an arbitrary multipeak curve in the presence of dissonance. Journal of Bones Engineering, 1964, 86: 97–106

Article Google Scholar
Močkus J, Tiesis V, Žilinskas A. Toward global optimization. In: Dixon Fifty C W, Szego G P, eds. The Application of Bayesian Methods for Seeking the Extremum. Elsevier, Amsterdam, Netherlands, 1978, 117–128
Srinivas N, Krause A, Kakade S M, Seeger One thousand W. Gaussian process optimization in the bandit setting: no regret and experimental pattern. In: Proceedings of the 27th International Conference on Machine Learning. 2010, 1015–1022
Freitas D N, Smola A J, Zoghi M. Exponential regret bounds for Gaussian process bandits with deterministic observations. In: Proceedings of the 29th International Conference on Machine Learning. 2012
Brochu E, Cora V Yard, Freitas D N. A tutorial on Bayesian optimization of expensive price functions, with application to active user modeling and hierarchical reinforcement learning. 2010, arXiv preprint arXiv:1012.2599
Wilson A, Fern A, Tadepalli P. Using trajectory data to improve Bayesian optimization for reinforcement learning. Journal of Motorcar Learning Enquiry, 2014, 15(1): 253–282

MathSciNet MATH Google Scholar
Calandra R, Seyfarth A, Peters J, Deisenroth M P. An experimental comparison of Bayesian optimization for bipedal locomotion. In: Proceedings of the 2014 IEEE International Conference on Robotics and Automation. 2014, 1951–1958
Calandra R, Seyfarth A, Peters J, Deisenroth K P. Bayesian optimization for learning gaits under doubt — an experimental comparison on a dynamic bipedal walker. Annals of Mathematics and Bogus Intelligence, 2016, 76(1–2): five–23

MathSciNet Commodity Google Scholar
Marco A, Berkenkamp F, Hennig P, Schoellig A P, Krause A, Schaal S, Trimpe Due south. Virtual vs. real: trading off simulations and physical experiments in reinforcement learning with Bayesian optimization. In: Proceedings of the 2017 IEEE International Briefing on Robotics and Automation. 2017, 1557–1563
Letham B, Bakshy E. Bayesian optimization for policy search via online-offline experimentation. 2019, arXiv preprint arXiv:1904.01049
Swersky K, Snoek J, Adams R P. Multi-task Bayesian optimization. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 2004–2012
Vien Northward A, Zimmermann H, Toussaint M. Bayesian functional optimization. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018, 4171–4178
Vien N A, Dang Five H, Chung T. A covariance matrix adaptation evolution strategy for directly policy search in reproducing kernel Hilbert space. In: Proceedings of The 9th Asian Conference on Machine Learning. 2017, 606–621
Eriksson D, Pearce M, Gardner J R, Turner R, Poloczek M. Scalable global optimization via local Bayesian optimization. In: Proceedings of the 32nd International Briefing on Neural Information Processing Systems. 2019, 5497–5508
Lozano J A, Larranaga P, Inza I, Bengoetxea E. Towards a New Evolutionary Ciphering: advances on Estimation of Distribution Algorithms. Berlin, Germany: Springer-Verlag, 2006

MATH Book Google Scholar
Hashimoto T, Yadlowsky S, Duchi J C. Derivative gratis optimization via repeated classification. In: Proceedings of the 2018 International Conference on Artificial Intelligence and Statistics. 2018, 2027–2036
Zhou A, Zhang J, Sun J, Zhang Chiliad. Fuzzy-nomenclature assisted solution preselection in evolutionary optimization. In: Proceedings of the 33rd AAAI Briefing on Artificial Intelligence. 2019, 2403–2410
Dasgupta D, McGregor D. Designing awarding-specific neural networks using the structured genetic algorithm. In: Proceedings of the International Conference on Combinations of Genetic Algorithms and Neural Networks. 1992, 87–96
Stanley 1000 O, Miikkulainen R. Efficient reinforcement learning through evolving neural network topologies. In: Proceedings of the 2002 Conference on Genetic and Evolutionary Computation. 2002, 569–577
Stanley K O, Miikkulainen R. Evolving neural networks through augmenting topologies. Evolutionary Computation, 2002, x(ii): 99–127

Commodity Google Scholar
Singh S P, Sutton R S. Reinforcement learning with replacing eligibility traces. Machine Learning, 1996, 22(1–3): 123–158

MATH Google Scholar
Whiteson Due south, Stone P. Sample-efficient evolutionary function approximation for reinforcement learning. In: Proceedings of the 21st AAAI Conference on Artificial Intelligence. 2006, 518–523
Whiteson Southward, Stone P. Evolutionary function approximation for reinforcement learning. Journal of Machine Learning Inquiry, 2006, 7: 877–917

MathSciNet MATH Google Scholar
Kohl N, Miikkulainen R. Evolving neural networks for strategic decision-making problems. Neural Networks, 2009, 22(3): 326–337

Commodity Google Scholar
Gauci J, Stanley K O. A case report on the critical part of geometric regularity in motorcar learning. In: Proceedings of the 23rd AAAI Conference on Bogus Intelligence. 2008, 628–633
Hausknecht M J, Khandelwal P, Miikkulainen R, Stone P. HyperNEAT-GGP: a hyperNEAT-based Atari general game player. In: Proceedings of the 2012 Briefing on Genetic and Evolutionary Computation. 2012, 217–224
Ebrahimi S, Rohrbach A, Darrell T. Gradient-free policy architecture search and adaptation. In: Proceedings of the 1st Briefing on Robot Learning. 2017, 505–514
Zoph B, Le Q V. Neural architecture search with reinforcement learning. In: Proceedings of the 5th International Briefing on Learning Representations. 2017
Gaier A, Ha D. Weight agnostic neural networks. In: Proceedings of the 31st International Conference on Neural Data Processing Systems. 2019, 5365–5379
Conti Eastward, Madhavan V, Such F P, Lehman J, Stanley Grand O, Clune J. Improving exploration in development strategies for deep reinforcement learning via a population of novelty-seeking agents. In: Proceedings of the 31st International Briefing on Neural Information Processing Systems. 2018, 5032–5043
Chen X H, Yu Y. Reinforcement learning with derivative-gratis exploration. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. 2019, 1880–1882
Lillicrap T P, Hunt J J, Pritzel A, Heess North, Erez T, Tassa Y, Silvery D, Wierstra D. Continuous control with deep reinforcement learning. In: Proceedings of the 4th International Conference on Learning Representations. 2016
Vemula A, Sun Due west, Bagnell J A. Contrasting exploration in parameter and activeness space: a zeroth-society optimization perspective. In: Proceedings of the 22nd International Briefing on Artificial Intelligence and Statistics. 2019, 2926–2935
Colas C, Sigaud O, Oudeyer P Y. GEP-PG: decoupling exploration and exploitation in deep reinforcement learning algorithms. In: Proceedings of the 35th International Briefing on Machine Learning. 2018, 1038–1047
Liu Y R, Hu Y Q, Qian H, Yu Y, Qian C. ZOOpt: toolbox for derivativefree optimization. 2017, arXiv preprint arXiv:1801.00329
Jaderberg Grand, Dalibard V, Osindero S, Czarnecki W M, Donahue J, Razavi A, Vinyals O, Green T, Dunning I, Simonyan Chiliad, Fernando C, Kavukcuoglu Grand. Population based training of neural networks. 2017, arXiv preprint arXiv:1711.09846
Beattie C, Leibo J Z, Teplyashin D, Ward T, Wainwright M, Küttler H, Lefrancq A, Dark-green S, Valdés V, Sadik A, Schrittwieser J, Anderson Chiliad, York S, Deceit M, Cain A, Bolton A, Gaffney S, King H, Hassabis D, Legg South, Petersen S. DeepMind Lab. 2016, arXiv preprint arXiv:1612.03801
Vinyals O, Ewalds T, Bartunov South, Georgiev P, Vezhnevets A S, Yeo M, Makhzani A, Küttler H, Agapiou J P, Schrittwieser J, Quan J, Gaffney S, Petersen S, Simonyan Grand, Schaul T, Hasselt v H, Silverish D, Lillicrap T P, Calderone K, Keet P, Brunasso A, Lawrence D, Ekermo A, Repp J, Tsing R. StarCraft II: a new challenge for reinforcement learning. 2017, arXiv preprint arXiv:1708.04782
Moritz P, Nishihara R, Wang S, Tumanov A, Liaw R, Liang E, Paul W, Jordan M I, Stoica I. Ray: a distributed framework for emerging AI applications. In: Proceedings of the 13th USENIX Symposium on Operating Systems Pattern and Implementation. 2018, 561–577
Elfwing Due south, Uchibe E, Doya K. Online meta-learning past parallel algorithm competition. In: Proceedings of the 2018 Briefing on Genetic and Evolutionary Computation. 2018, 426–433
Baker J E. Reducing bias and inefficiency in the selection algorithm. In: Proceedings of the 2nd International Conference on Genetic Algorithms. 1987, fourteen–21
Jaderberg Thousand, Czarnecki W Grand, Dunning I, Marris Fifty, Lever G, Castaneda A G, Beattie C, Rabinowitz North C, Morcos A Due south, Ruderman A, Sonnerat N, Green T, Deason 50, Leibo J Z, Silver D, Hassabis D, Kavukcuoglu M, Graepel T. Man-level performance in 3d multiplayer games with population-based reinforcement learning. Scientific discipline, 2019, 364(6443): 859–865

MathSciNet Article Google Scholar
Jung W, Park G, Sung Y. Population-guided parallel policy search for reinforcement learning. In: Proceedings of the 8th International Conference on Learning Representations. 2020
Pourchot A, Perrin N, Sigaud O. Importance mixing: Improving sample reuse in evolutionary policy search methods. 2018, arXiv preprint arXiv:1808.05832
Stork J, Zaefferer M, Bartz-Beielstein T, Eiben A E. Surrogate models for enhancing the efficiency of neuroevolution in reinforcement learning. In: Proceedings of the 2019 Conference on Genetic and Evolutionary Computation. 2019, 934–942
Bibi A, Bergou E H, Sener O, Ghanem B, Richtárik P. A stochastic derivative-free optimization method with importance sampling: theory and learning to command. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 3275–3282
Chen X, Liu S, Xu K, Li X, Lin 10, Hong M, Cox D D. ZO-AdaMM: aeroth-order adaptive momentum method for black-box optimization. In: Proceedings of the 32nd International Briefing on Neural Information Processing Systems. 2019, 7202–7213
Gorbunov E A, Bibi A, Sener O, Bergou East H, Richtárik P. A stochastic derivative costless optimization method with momentum. In: Proceedings of the 8th International Briefing on Learning Representations. 2020
Kandasamy K, Schneider J, Poczos B. High dimensional Bayesian optimisation and bandits via additive models. In: Proceedings of the 32nd International Briefing on Machine Learning. 2015, 295–304
Wang Z, Zoghi M, Hutter F, Matheson D, Freitas Northward D. Bayesian optimization in a billion dimensions via random embeddings. Journal of Bogus Intelligence Research, 2016, 55: 361–387

MathSciNet MATH Commodity Google Scholar
Qian H, Hu Y Q, Yu Y. Derivative-costless optimization of high-dimensional non-convex functions past sequential random embeddings. In: Proceedings of the 25th International Articulation Conference on Artificial Intelligence. 2016, 1946–1952
Yang P, Tang K, Yao 10. Turning loftier-dimensional optimization into computationally expensive optimization. IEEE Transactions on Evolutionary Computation, 2018, 22(1): 143–156

Article Google Scholar
Mutny One thousand, Krause A. Efficient loftier dimensional Bayesian optimization with additivity and quadrature fourier features. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2018, 9019–9030
Müller North, Glasmachers T. Challenges in high-dimensional reinforcement learning with evolution strategies. In: Proceedings of the 15th International Conference on Parallel Problem Solving from Nature. 2018, 411–423
Li Z, Zhang Q, Lin X, Zhen H L. Fast covariance matrix accommodation for big-scale black-box optimization. IEEE Transaction on Cybernetics, 2020, 50(v): 2073–2083

Article Google Scholar
Wang H, Qian H, Yu Y. Noisy derivative-free optimization with value suppression. In: Proceedings of the 32nd AAAI Briefing on Bogus Intelligence. 2018, 1447–1454

Download references

Acknowledgements

This piece of work was supported by the Program A for Outstanding PhD Candidate of Nanjing University, National Scientific discipline Foundation of Cathay (61876077), Jiangsu Science Foundation (BK20170013), and Collaborative Innovation Middle of Novel Software Applied science and Industrialization. Yang Yu is the corresponding author of this commodity. The authors would similar to thank Xiong-Hui Chen and Zhao-Hua Li for improving the article.

Writer data

Affiliations

National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China

Hong Qian & Yang Yu

Corresponding author

Correspondence to Yang Yu.

Electronic supplementary material

Virtually this article

Verify currency and authenticity via CrossMark

Cite this article

Qian, H., Yu, Y. Derivative-complimentary reinforcement learning: a review. Front. Comput. Sci. xv, 156336 (2021). https://doi.org/x.1007/s11704-020-0241-iv

Download commendation

Received: 31 May 2020
Accustomed: 20 November 2020
Published: 01 September 2021
DOI : https://doi.org/10.1007/s11704-020-0241-iv

Keywords

reinforcement learning
derivative-gratis optimization
neuroevolution reinforcement learning
neural architecture search

varneywatints.blogspot.com

Source: https://link.springer.com/article/10.1007/s11704-020-0241-4

Derivative-free Optimization a Review of Algorithms and Comparison of Software Implementations

Abstract

References

Acknowledgements

Writer data

Affiliations

Corresponding author

Electronic supplementary material

Virtually this article

Cite this article

Keywords

0 Response to "Derivative-free Optimization a Review of Algorithms and Comparison of Software Implementations"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel

Derivative-free Optimization a Review of Algorithms and Comparison of Software Implementations

Abstract

References

Acknowledgements

Writer data

Affiliations

Corresponding author

Electronic supplementary material

Virtually this article

Cite this article

Keywords

Related Posts

0 Response to "Derivative-free Optimization a Review of Algorithms and Comparison of Software Implementations"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel