Derivative-free Optimization a Review of Algorithms and Comparison of Software Implementations

Abstract

Reinforcement learning is almost learning agent models that brand the best sequential decisions in unknown environments. In an unknown surround, the agent needs to explore the environs while exploiting the collected information, which normally forms a sophisticated problem to solve. Derivative-free optimization, meanwhile, is capable of solving sophisticated problems. It commonly uses a sampling-and-updating framework to iteratively improve the solution, where exploration and exploitation are likewise needed to exist well balanced. Therefore, derivative-complimentary optimization deals with a similar cadre issue equally reinforcement learning, and has been introduced in reinforcement learning approaches, under the names of learning classifier systems and neuroevolution/evolutionary reinforcement learning. Although such methods take been adult for decades, recently, derivative-free reinforcement learning exhibits attracting increasing attention. Nevertheless, recent survey on this topic is still lacking. In this commodity, we summarize methods of derivative-complimentary reinforcement learning to engagement, and organize the methods in aspects including parameter updating, model selection, exploration, and parallel/distributed methods. Moreover, nosotros hash out some current limitations and possible future directions, hoping that this commodity could bring more attentions to this topic and serve as a catalyst for developing novel and efficient approaches.

References

  1. Sutton R S, Barto A Thousand. Reinforcement Learning: An Introduction. Cambridge, Massachusetts: MIT Press, 1998

    MATH  Google Scholar

  2. Wiering K, Van Otterlo M. Reinforcement Learning: State-of-the-Art. Berlin, Heidelberg: Springer, 2012

    Volume  Google Scholar

  3. Dietterich T G. Car learning research: iv current directions. Bogus Intelligence Magazine, 1997, 18(iv): 97–136

    Google Scholar

  4. Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M Yard, Graves A, Riedmiller 1000, Fidjeland A Yard, Ostrovski G. Man-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529–533

    Article  Google Scholar

  5. Silvery D, Huang A, Maddison C J, Guez A, Sifre L, Driessche G V D, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M. Mastering the game of Go with deep neural networks and tree search. Nature, 2016, 529(7587): 484–489

    Article  Google Scholar

  6. Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai Thou, Guez A, Lanctot 1000, Sifre Fifty, Kumaran D, Graepel T, Lillicrap T, Simonyan Chiliad, Hassabis D. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 2018, 362(6419): 1140–1144

    MathSciNet  MATH  Article  Google Scholar

  7. Abbeel P, Coates A, Quigley M, Ng A Y. An application of reinforcement learning to aerobatic helicopter flight. In: Proceedings of the 19th International Conference on Neural Information Processing Systems. 2006, 1–viii

  8. Zoph B, Le Q V. Neural compages search with reinforcement learning. In: Proceedings of the 5th International Conference on Learning Representations. 2017

  9. Huang C, Lucey S, Ramanan D. Learning policies for adaptive tracking with deep characteristic cascades. In: Proceedings of the IEEE International Conference on Calculator Vision. 2017, 105–114

  10. Yu Fifty, Zhang W, Wang J, Yu Y. SeqGAN: sequence generative adversarial nets with policy gradient. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017, 2852–2858

  11. Wang Y C, Usher J M. Application of reinforcement learning for agent-based production scheduling. Engineering Applications of Artificial Intelligence, 2005, 18(i): 73–82

    Article  Google Scholar

  12. Choi J J, Laibson D, Madrian B C, Metrick A. Reinforcement learning and savings behavior. The Journal of Finance, 2009, 64(vi): 2515–2534

    Commodity  Google Scholar

  13. Shi J C, Yu Y, Da Q, Chen S Y, Zeng A. Virtual-taobao: virtualizing real-world online retail environment for reinforcement learning. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019, 4902–4909

  14. Boyan J A, Littman M Fifty. Package routing in dynamically changing networks: a reinforcement learning approach. In: Proceedings of the 6th International Briefing on Neural Information Processing Systems. 1993, 671–678

  15. Frank M J, Seeberger 50 C, O'reilly R C. By carrot or by stick: cognitive reinforcement learning in parkinsonism. Scientific discipline, 2004, 306(5703): 1940–1943

    Commodity  Google Scholar

  16. Samejima K, Ueda Y, Doya K, Kimura M. Representation of action-specific advantage values in the striatum. Scientific discipline, 2005, 310(5752): 1337–1340

    Article  Google Scholar

  17. Shalev-Shwartz S, Shamir O, Shammah Due south. Failures of gradient-based deep learning. In: Proceedings of the 34th International Conference on Automobile Learning. 2017, 3067–3075

  18. Conn A R, Scheinberg Chiliad, Vicente Fifty N. Introduction to Derivative-Free Optimization. Philadelphia, PA: SIAM, 2009

    MATH  Volume  Google Scholar

  19. Kolda T Yard, Lewis R M, Torczon V. Optimization past straight search: new perspectives on some classical and modernistic methods. SIAM Review, 2003, 45(three): 385–482

    MathSciNet  MATH  Article  Google Scholar

  20. Rios L M, Sahinidis Northward 5. Derivative-complimentary optimization: a review of algorithms and comparing of software implementations. Journal of Global Optimization, 2013, 56(3): 1247–1293

    MathSciNet  MATH  Article  Google Scholar

  21. Sigaud O, Wilson S W. Learning classifier systems: a survey. Soft Computing, 2007, 11(11): 1065–1078

    MATH  Commodity  Google Scholar

  22. Moriarty D Due east, Schultz A C, Grefenstette J J. Evolutionary algorithms for reinforcement learning. Periodical of Artificial Intelligence Research, 1999, 11: 241–276

    MATH  Article  Google Scholar

  23. Whiteson S. Evolutionary computation for reinforcement learning. In: Wiering Thousand, van Otterlo M, eds. Reinforcement Learning: State-of-the-Art. Springer, Berlin, Heidelberg, 2012, 325–355

    Chapter  Google Scholar

  24. Bellman R. A Markovian determination procedure. Journal of Mathematics and Mechanics, 1957, 6(v): 679–684

    MathSciNet  MATH  Google Scholar

  25. Bartlett P L, Baxter J. Space-horizon policy gradient estimation. Periodical of Artificial Intelligence Inquiry, 2001, 15: 319–350

    MathSciNet  MATH  Article  Google Scholar

  26. Holland J H. Adaptation in Natural and Artificial Systems. Ann Arbor, MI: The Academy of Michigan Press, 1975

    Google Scholar

  27. Hansen N, Müller Due south D, Koumoutsakos P. Reducing the time complexity of the derandomized evolution strategy with covariance matrix accommodation (CMA-ES). Evolutionary Computation, 2003, 11(one): 1–18

    Article  Google Scholar

  28. Shahriari B, Swersky K, Wang Z, Adams R P, Freitas D North. Taking the man out of the loop: a review of Bayesian optimization. Proceedings of the IEEE, 2016, 104(ane): 148–175

    Article  Google Scholar

  29. De Boer P T, Kroese D P, Mannor S, Rubinstein R Y. A tutorial on the cross-entropy method. Register of Operations Research, 2005, 134(1): nineteen–67

    MathSciNet  MATH  Commodity  Google Scholar

  30. Munos R. From bandits to Monte-Carlo tree search: the optimistic principle applied to optimization and planning. Foundations and Trends in Auto Learning, 2014, 7(1): i–129

    MATH  Article  Google Scholar

  31. Yu Y, Qian H, Hu Y Q. Derivative-free optimization via classification. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016, 2286–2292

  32. He J, Yao X. Migrate assay and average fourth dimension complication of evolutionary algorithms. Artificial Intelligence, 2001, 127(one): 57–85

    MathSciNet  MATH  Article  Google Scholar

  33. Yu Y, Zhou Z H. A new arroyo to estimating the expected showtime hitting time of evolutionary algorithms. Artificial Intelligence, 2008, 172(15): 1809–1832

    MathSciNet  MATH  Article  Google Scholar

  34. Balderdash A D. Convergence rates of efficient global optimization algorithms. Journal of Auto Learning Research, 2011, 12: 2879–2904

    MathSciNet  MATH  Google Scholar

  35. Jamieson K 1000, Nowak R D, Recht B. Query complexity of derivativefree optimization. In: Proceedings of the 25th International Conference on Neural Information Processing Systems. 2012, 2681–2689

  36. Yu Y, Qian H. The sampling-and-learning framework: a statistical view of evolutionary algorithms. In: Proceedings of the 2014 IEEE Congress on Evolutionary Computation. 2014, 149–158

  37. Duchi J C, Jordan M I, Wainwright M J, Wibisono A. Optimal rates for null-order convex optimization: the ability of two function evaluations. IEEE Transactions on Information Theory, 2015, 61(5): 2788–2806

    MathSciNet  MATH  Commodity  Google Scholar

  38. Yu Y, Qian C, Zhou Z H. Switch analysis for running time analysis of evolutionary algorithms. IEEE Transactions on Evolutionary Ciphering, 2015, xix(6): 777–792

    Article  Google Scholar

  39. Kawaguchi K, Kaelbling L P, Lozano-Perez T. Bayesian optimization with exponential convergence. In: Proceedings of the 28th International Conference on Neural Information Processing Systems. 2015, 2809–2817

  40. Kawaguchi K, Maruyama Y, Zheng X. Global continuous optimization with fault bound and fast convergence. Journal of Bogus Intelligence Research, 2016, 56: 153–195

    MathSciNet  MATH  Commodity  Google Scholar

  41. Mitchell M. An Introduction to Genetic Algorithms. Cambridge, MA: MIT Press, 1998

    MATH  Book  Google Scholar

  42. Taylor M E, Whiteson Southward, Stone P. Comparison evolutionary and temporal difference methods in a reinforcement learning domain. In: Proceedings of the 2006 Conference on Genetic and Evolutionary Computation. 2006, 1321–1328

  43. Abdolmaleki A, Lioutikov R, Peters J, Lau Northward, Reis L P, Neumann Grand. Model-based relative entropy stochastic search. In: Proceedings of the 28th International Conference on Neural Information Processing Systems. 2015, 3537–3545

  44. Hu Y Q, Qian H, Yu Y. Sequential nomenclature-based optimization for directly policy search. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017, 2029–2035

  45. Salimans T, Ho J, Chen X, Sidor S, Sutskever I. Development strategies every bit a scalable alternative to reinforcement learning. 2017, arXiv:1703.03864

  46. Snoek J, Larochelle H, Adams R P. Practical Bayesian optimization of auto learning algorithms. In: Proceedings of the 25th International Briefing on Neural Information Processing Systems. 2012, 2960–2968

  47. Thornton C, Hutter F, Hoos H H, Leyton-Brownish K. Automobile-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Information Mining. 2013, 847–855

  48. Existent Due east, Moore South, Selle A, Saxena S, Suematsu Y 50, Tan J, Le Q Five, Kurakin A. Large-scale development of image classifiers. In: Proceedings of the 34th International Conference on Car Learning. 2017, 2902–2911

  49. Real E, Aggarwal A, Huang Y, Le Q 5. Regularized development for image classifier architecture search. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2019, 4780–4789

  50. Zhang Y, Sohn K, Villegas R, Pan Grand, Lee H. Improving object detection with deep convolutional networks via Bayesian optimization and structured prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015, 249–258

  51. Qian C, Yu Y, Zhou Z H. Subset option by pareto optimization. In: Proceedings of the 28th International Briefing on Neural Information Processing Systems. 2015, 1765–1773

  52. Qian C, Shi J C, Yu Y, Tang Yard. On subset selection with general cost constraints. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017, 2613–2619

  53. Brown M, An B, Kiekintveld C, Ordóñez F, Tambe 1000. An extended study on multi-objective security games. Autonomous Agents and MultiAgent Systems, 2014, 28(1): 31–71

    Article  Google Scholar

  54. Domingos P M. A few useful things to know about machine learning. Communications of the ACM, 2012, 55(10): 78–87

    Article  Google Scholar

  55. Yu Y. Towards sample efficient reinforcement learning. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018, 5739–5743

  56. Plappert 1000, Houthooft R, Dhariwal P, Sidor S, Chen R Y, Chen X, Asfour T, Abbeel P, Andrychowicz M. Parameter space noise for exploration. In: Proceedings of the 6th International Conference on Learning Representations. 2018

  57. Pathak D, Agrawal P, Efros A A, Darrell T. Marvel-driven exploration past cocky-supervised prediction. In: Proceedings of the 34th International Conference on Car Learning. 2017, 2778–2787

  58. Duan Y, Chen X, Houthooft R, Schulman J, Abbeel P. Benchmarking deep reinforcement learning for continuous control. In: Proceedings of the 33rd International Briefing on Machine Learning. 2016, 1329–1338

  59. Schulman J, Levine S, Abbeel P, Jordan Chiliad I, Moritz P. Trust region policy optimization. In: Proceedings of the 32nd International Briefing on Car Learning. 2015, 1889–1897

  60. Bach F R, Perchet Five. Highly-smoothen null-th order online optimization. In: Proceedings of the 29th Conference on Learning Theory. 2016, 257–283

  61. Qian H, Yu Y. Scaling simultaneous optimistic optimization for high-dimensional non-convex functions with depression effective dimensions. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016, 2000–2006

  62. Yao Ten. Evolving bogus neural networks. Proceedings of the IEEE, 1999, 87(nine): 1423–1447

    Article  Google Scholar

  63. Stanley K O, Clune J, Lehman J, Miikkulainen R. Designing neural networks through neuroevolution. Nature Machine Intelligence, 2019, ane(1): 24–35

    Commodity  Google Scholar

  64. Such F P, Madhavan V, Conti E, Lehman J, Stanley K O, Clune J. Deep neuroevolution: genetic algorithms are a competitive alternative for grooming deep neural networks for reinforcement learning. 2017, arXiv preprint arXiv:1712.06567

  65. Morse G, Stanley G O. Uncomplicated evolutionary optimization can rival stochastic slope descent in neural networks. In: Proceedings of the 2016 Conference on Genetic and Evolutionary Ciphering. 2016, 477–484

  66. Zhang X, Clune J, Stanley G O. On the human relationship betwixt the OpenAI development strategy and stochastic slope descent. 2017, arXiv preprint arXiv:1712.06564

  67. Koutník J, Cuccu G, Schmidhuber J, Gomez F J. Evolving big-scale neural networks for vision-based reinforcement learning. In: Proceedings of the 2013 Conference on Genetic and Evolutionary Computation. 2013, 1061–1068

  68. Hausknecht M J, Lehman J, Miikkulainen R, Stone P. A neuroevolution approach to general Atari game playing. IEEE Transactions on Computational Intelligence and AI in Games, 2014, 6(four): 355–366

    Article  Google Scholar

  69. Risi S, Togelius J. Neuroevolution in games: country of the art and open challenges. IEEE Transactions on Computational Intelligence and AI in Games, 2017, 9(ane): 25–41

    Article  Google Scholar

  70. Chrabaszcz P, Loshchilov I, Hutter F. Dorsum to basics: benchmarking approved development strategies for playing atari. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018, 1419–1426

  71. Mania H, Guy A, Recht B. Elementary random search of static linear policies is competitive for reinforcement learning. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018, 1805–1814

  72. Malik D, Pananjady A, Bhatia K, Khamaru Chiliad, Bartlett P, Wainwright 1000 J. Derivative-free methods for policy optimization: guarantees for linear quadratic systems. In: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics. 2019, 2916–2925

  73. Hansen N, Arnold D V, Auger A. Evolution strategies. In: Kacprzyk J, Pedrycz W, eds. Springer Handbook of Computational Intelligence. Springer, Berlin, Heidelberg, 2015, 871–898

    Affiliate  Google Scholar

  74. Hansen Northward, Ostermeier A. Adapting arbitrary normal mutation distributions in evolution strategies: the covariance matrix adaptation. In: Proceedings of 1996 IEEE International Conference on Evolutionary Computation. 1996, 312–317

  75. Hansen Due north, Ostermeier A. Completely derandomized self-adaptation in evolution strategies. Evolutionary Computation, 2001, 9(two): 159–195

    Article  Google Scholar

  76. Heidrich-Meisner V, Igel C. Evolution strategies for direct policy search. In: Proceedings of the 10th International Conference on Parallel Problem Solving from Nature. 2008, 428–437

  77. Heidrich-Meisner Five, Igel C. Neuroevolution strategies for episodic reinforcement learning. Journal of Algorithms, 2009, 64(4): 152–168

    MATH  Article  Google Scholar

  78. Peters J, Schaal Due south. Natural histrion-critic. Neurocomputing, 2008, 71(7–9): 1180–1190

    Article  Google Scholar

  79. Heidrich-Meisner 5, Igel C. Hoeffding and Bernstein races for selecting policies in evolutionary straight policy search. In: Proceedings of the 26th International Briefing on Machine Learning. 2009, 401–408

  80. Stulp F, Sigaud O. Path integral policy comeback with covariance matrix adaptation. In: Proceedings of the 29th International Conference on Machine Learning. 2012

  81. Szita I, Lörincz A. Learning Tetris using the noisy cross-entropy method. Neural Computation, 2006, 18(12): 2936–2941

    MATH  Commodity  Google Scholar

  82. Wierstra D, Schaul T, Peters J, Schmidhuber J. Natural evolution strategies. In: Proceedings of the 2008 IEEE Congress on Evolutionary Computation. 2008, 3381–3387

  83. Wierstra D, Schaul T, Glasmachers T, Sun Y, Peters J, Schmidhuber J. Natural evolution strategies. Journal of Auto Learning Enquiry, 2014, 15(ane): 949–980

    MathSciNet  MATH  Google Scholar

  84. Salimans T, Goodfellow I J, Zaremba W, Cheung V, Radford A, Chen X. Improved techniques for preparation GANs. In: Proceedings of the 29th International Briefing on Neural Data Processing Systems. 2016, 2226–2234

  85. Geweke J. Antithetic acceleration of Monte Carlo integration in Bayesian inference. Periodical of Econometrics, 1988, 38(one–two): 73–89

    MATH  Article  Google Scholar

  86. Brockhoff D, Auger A, Hansen Northward, Arnold D V, Hohm T. Mirrored sampling and sequential selection for evolution strategies. In: Proceedings of the 11th International Conference on Parallel Problem Solving from Nature. 2010, 11–21

  87. Todorov E, Erez T, Tassa Y. MuJoCo: a physics engine for model-based command. In: Proceedings of 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. 2012, 5026–5033

  88. Bellemare M Chiliad, Naddaf Y, Veness J, Bowling M. The arcade learning environment: an evaluation platform for general agents. Journal of Artificial Intelligence Research, 2013, 47: 253–279

    Article  Google Scholar

  89. Brockman M, Cheung Five, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba Due west. OpenAI Gym. 2016, arXiv preprint arXiv:1606.01540

  90. Mnih V, Badia A P, Mirza Yard, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K. Asynchronous methods for deep reinforcement learning. In: Proceedings of the 33rd International Conference on Automobile Learning. 2016, 1928–1937

  91. Lehman J, Chen J, Clune J, Stanley 1000 O. ES is more than just a traditional finite-difference approximator. In: Proceedings of the 2018 Conference on Genetic and Evolutionary Ciphering. 2018, 450–457

  92. Choromanski K, Rowland M, Sindhwani Five, Turner R Due east, Weller A. Structured evolution with compact architectures for scalable policy optimization. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 969–977

  93. Chen Z, Zhou Y, He X, Jiang Southward. A restart-based rank-1 evolution strategy for reinforcement learning. In: Proceedings of the 28th International Articulation Conference on Bogus Intelligence. 2019, 2130–2136

  94. Choromanski K, Pacchiano A, Parker-Holder J, Tang Y, Sindhwani V. From complication to simplicity: adaptive ES-active subspaces for blackbox optimization. In: Proceedings of the 31st International Briefing on Neural Data Processing Systems. 2019

  95. Constantine P G. Active Subspaces — Emerging Ideas for Dimension Reduction in Parameter Studies. volume 2 of SIAM spotlights. Philadelphia, PA: SIAM, 2015

    Google Scholar

  96. Liu G, Zhao L, Yang F, Bian J, Qin T, Yu North, Liu T Y. Trust region evolution strategies. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019, 4352–4359

  97. Tang Y, Choromanski K, Kucukelbir A. Variance reduction for evolution strategies via structured command variates. In: Proceedings of International Conference on Artifical Intelligence and Statistics. 2020, 646–656

  98. Fuks L, Awad N, Hutter F, Lindauer M. An evolution strategy with progressive episode lengths for playing games. In: Proceedings of the 28th International Articulation Conference on Artificial Intelligence. 2019, 1234–1240

  99. Houthooft R, Chen Y, Isola P, Stadie B C, Wolski F, Ho J, Abbeel P. Evolved policy gradients. In: Proceedings of the 31th International Conference on Neural Information Processing Systems. 2018, 5405–5414

  100. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. 2017, arXiv preprint arXiv:1707.06347

  101. Duan Y, Schulman J, Chen Ten, Bartlett P L, Sutskever I, Abbeel P. RL2: Fast reinforcement learning via slow reinforcement learning. 2016, arXiv preprint arXiv:1611.02779

  102. Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast accommodation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 1126–1135

  103. Ha D, Schmidhuber J. Recurrent world models facilitate policy development. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2018, 2455–2467

  104. Yu West, Liu C K, Turk Yard. Policy transfer with strategy optimization. In: Proceedings of the seventh International Conference on Learning Representations. 2019

  105. Lehman J, Stanley K O. Abandoning objectives: development through the search for novelty alone. Evolutionary Computation, 2011, 19(2): 189–223

    Article  Google Scholar

  106. Gangwani T, Peng J. Policy optimization by genetic distillation. In: Proceedings of the sixth International Conference on Learning Representations. 2018

  107. Ross S, Gordon K J, Bagnell D. A reduction of fake learning and structured prediction to no-regret online learning. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. 2011, 627–635

  108. Bodnar C, Twenty-four hour period B, Lió P. Proximal distilled evolutionary reinforcement learning. In: Proceedings of the 34th AAAI Briefing on Bogus Intelligence. 2020, 3283–3290

  109. Khadka S, Tumer K. Development-guided policy gradient in reinforcement learning. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2018, 1196–1208

  110. Lehman J, Chen J, Clune J, Stanley K O. Safe mutations for deep and recurrent neural networks through output gradients. In: Proceedings of the 2018 Conference on Genetic and Evolutionary Computation. 2018, 117–124

  111. Fujimoto S, Hoof H, Meger D. Addressing function approximation fault in thespian-critic methods. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 1582–1591

  112. Rasmussen C E, Williams C G I. Gaussian Processes for Car Learning. Cambridge, Massachusetts: MIT Printing, 2006

    MATH  Google Scholar

  113. Kushner H J. A new method of locating the maximum of an arbitrary multipeak curve in the presence of dissonance. Journal of Bones Engineering, 1964, 86: 97–106

    Article  Google Scholar

  114. Močkus J, Tiesis V, Žilinskas A. Toward global optimization. In: Dixon Fifty C W, Szego G P, eds. The Application of Bayesian Methods for Seeking the Extremum. Elsevier, Amsterdam, Netherlands, 1978, 117–128

  115. Srinivas N, Krause A, Kakade S M, Seeger One thousand W. Gaussian process optimization in the bandit setting: no regret and experimental pattern. In: Proceedings of the 27th International Conference on Machine Learning. 2010, 1015–1022

  116. Freitas D N, Smola A J, Zoghi M. Exponential regret bounds for Gaussian process bandits with deterministic observations. In: Proceedings of the 29th International Conference on Machine Learning. 2012

  117. Brochu E, Cora V Yard, Freitas D N. A tutorial on Bayesian optimization of expensive price functions, with application to active user modeling and hierarchical reinforcement learning. 2010, arXiv preprint arXiv:1012.2599

  118. Wilson A, Fern A, Tadepalli P. Using trajectory data to improve Bayesian optimization for reinforcement learning. Journal of Motorcar Learning Enquiry, 2014, 15(1): 253–282

    MathSciNet  MATH  Google Scholar

  119. Calandra R, Seyfarth A, Peters J, Deisenroth M P. An experimental comparison of Bayesian optimization for bipedal locomotion. In: Proceedings of the 2014 IEEE International Conference on Robotics and Automation. 2014, 1951–1958

  120. Calandra R, Seyfarth A, Peters J, Deisenroth K P. Bayesian optimization for learning gaits under doubt — an experimental comparison on a dynamic bipedal walker. Annals of Mathematics and Bogus Intelligence, 2016, 76(1–2): five–23

    MathSciNet  Commodity  Google Scholar

  121. Marco A, Berkenkamp F, Hennig P, Schoellig A P, Krause A, Schaal S, Trimpe Due south. Virtual vs. real: trading off simulations and physical experiments in reinforcement learning with Bayesian optimization. In: Proceedings of the 2017 IEEE International Briefing on Robotics and Automation. 2017, 1557–1563

  122. Letham B, Bakshy E. Bayesian optimization for policy search via online-offline experimentation. 2019, arXiv preprint arXiv:1904.01049

  123. Swersky K, Snoek J, Adams R P. Multi-task Bayesian optimization. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 2004–2012

  124. Vien Northward A, Zimmermann H, Toussaint M. Bayesian functional optimization. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018, 4171–4178

  125. Vien N A, Dang Five H, Chung T. A covariance matrix adaptation evolution strategy for directly policy search in reproducing kernel Hilbert space. In: Proceedings of The 9th Asian Conference on Machine Learning. 2017, 606–621

  126. Eriksson D, Pearce M, Gardner J R, Turner R, Poloczek M. Scalable global optimization via local Bayesian optimization. In: Proceedings of the 32nd International Briefing on Neural Information Processing Systems. 2019, 5497–5508

  127. Lozano J A, Larranaga P, Inza I, Bengoetxea E. Towards a New Evolutionary Ciphering: advances on Estimation of Distribution Algorithms. Berlin, Germany: Springer-Verlag, 2006

    MATH  Book  Google Scholar

  128. Hashimoto T, Yadlowsky S, Duchi J C. Derivative gratis optimization via repeated classification. In: Proceedings of the 2018 International Conference on Artificial Intelligence and Statistics. 2018, 2027–2036

  129. Zhou A, Zhang J, Sun J, Zhang Chiliad. Fuzzy-nomenclature assisted solution preselection in evolutionary optimization. In: Proceedings of the 33rd AAAI Briefing on Artificial Intelligence. 2019, 2403–2410

  130. Dasgupta D, McGregor D. Designing awarding-specific neural networks using the structured genetic algorithm. In: Proceedings of the International Conference on Combinations of Genetic Algorithms and Neural Networks. 1992, 87–96

  131. Stanley 1000 O, Miikkulainen R. Efficient reinforcement learning through evolving neural network topologies. In: Proceedings of the 2002 Conference on Genetic and Evolutionary Computation. 2002, 569–577

  132. Stanley K O, Miikkulainen R. Evolving neural networks through augmenting topologies. Evolutionary Computation, 2002, x(ii): 99–127

    Commodity  Google Scholar

  133. Singh S P, Sutton R S. Reinforcement learning with replacing eligibility traces. Machine Learning, 1996, 22(1–3): 123–158

    MATH  Google Scholar

  134. Whiteson Due south, Stone P. Sample-efficient evolutionary function approximation for reinforcement learning. In: Proceedings of the 21st AAAI Conference on Artificial Intelligence. 2006, 518–523

  135. Whiteson Southward, Stone P. Evolutionary function approximation for reinforcement learning. Journal of Machine Learning Inquiry, 2006, 7: 877–917

    MathSciNet  MATH  Google Scholar

  136. Kohl N, Miikkulainen R. Evolving neural networks for strategic decision-making problems. Neural Networks, 2009, 22(3): 326–337

    Commodity  Google Scholar

  137. Gauci J, Stanley K O. A case report on the critical part of geometric regularity in motorcar learning. In: Proceedings of the 23rd AAAI Conference on Bogus Intelligence. 2008, 628–633

  138. Hausknecht M J, Khandelwal P, Miikkulainen R, Stone P. HyperNEAT-GGP: a hyperNEAT-based Atari general game player. In: Proceedings of the 2012 Briefing on Genetic and Evolutionary Computation. 2012, 217–224

  139. Ebrahimi S, Rohrbach A, Darrell T. Gradient-free policy architecture search and adaptation. In: Proceedings of the 1st Briefing on Robot Learning. 2017, 505–514

  140. Zoph B, Le Q V. Neural architecture search with reinforcement learning. In: Proceedings of the 5th International Briefing on Learning Representations. 2017

  141. Gaier A, Ha D. Weight agnostic neural networks. In: Proceedings of the 31st International Conference on Neural Data Processing Systems. 2019, 5365–5379

  142. Conti Eastward, Madhavan V, Such F P, Lehman J, Stanley Grand O, Clune J. Improving exploration in development strategies for deep reinforcement learning via a population of novelty-seeking agents. In: Proceedings of the 31st International Briefing on Neural Information Processing Systems. 2018, 5032–5043

  143. Chen X H, Yu Y. Reinforcement learning with derivative-gratis exploration. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. 2019, 1880–1882

  144. Lillicrap T P, Hunt J J, Pritzel A, Heess North, Erez T, Tassa Y, Silvery D, Wierstra D. Continuous control with deep reinforcement learning. In: Proceedings of the 4th International Conference on Learning Representations. 2016

  145. Vemula A, Sun Due west, Bagnell J A. Contrasting exploration in parameter and activeness space: a zeroth-society optimization perspective. In: Proceedings of the 22nd International Briefing on Artificial Intelligence and Statistics. 2019, 2926–2935

  146. Colas C, Sigaud O, Oudeyer P Y. GEP-PG: decoupling exploration and exploitation in deep reinforcement learning algorithms. In: Proceedings of the 35th International Briefing on Machine Learning. 2018, 1038–1047

  147. Liu Y R, Hu Y Q, Qian H, Yu Y, Qian C. ZOOpt: toolbox for derivativefree optimization. 2017, arXiv preprint arXiv:1801.00329

  148. Jaderberg Grand, Dalibard V, Osindero S, Czarnecki W M, Donahue J, Razavi A, Vinyals O, Green T, Dunning I, Simonyan Chiliad, Fernando C, Kavukcuoglu Grand. Population based training of neural networks. 2017, arXiv preprint arXiv:1711.09846

  149. Beattie C, Leibo J Z, Teplyashin D, Ward T, Wainwright M, Küttler H, Lefrancq A, Dark-green S, Valdés V, Sadik A, Schrittwieser J, Anderson Chiliad, York S, Deceit M, Cain A, Bolton A, Gaffney S, King H, Hassabis D, Legg South, Petersen S. DeepMind Lab. 2016, arXiv preprint arXiv:1612.03801

  150. Vinyals O, Ewalds T, Bartunov South, Georgiev P, Vezhnevets A S, Yeo M, Makhzani A, Küttler H, Agapiou J P, Schrittwieser J, Quan J, Gaffney S, Petersen S, Simonyan Grand, Schaul T, Hasselt v H, Silverish D, Lillicrap T P, Calderone K, Keet P, Brunasso A, Lawrence D, Ekermo A, Repp J, Tsing R. StarCraft II: a new challenge for reinforcement learning. 2017, arXiv preprint arXiv:1708.04782

  151. Moritz P, Nishihara R, Wang S, Tumanov A, Liaw R, Liang E, Paul W, Jordan M I, Stoica I. Ray: a distributed framework for emerging AI applications. In: Proceedings of the 13th USENIX Symposium on Operating Systems Pattern and Implementation. 2018, 561–577

  152. Elfwing Due south, Uchibe E, Doya K. Online meta-learning past parallel algorithm competition. In: Proceedings of the 2018 Briefing on Genetic and Evolutionary Computation. 2018, 426–433

  153. Baker J E. Reducing bias and inefficiency in the selection algorithm. In: Proceedings of the 2nd International Conference on Genetic Algorithms. 1987, fourteen–21

  154. Jaderberg Thousand, Czarnecki W Grand, Dunning I, Marris Fifty, Lever G, Castaneda A G, Beattie C, Rabinowitz North C, Morcos A Due south, Ruderman A, Sonnerat N, Green T, Deason 50, Leibo J Z, Silver D, Hassabis D, Kavukcuoglu M, Graepel T. Man-level performance in 3d multiplayer games with population-based reinforcement learning. Scientific discipline, 2019, 364(6443): 859–865

    MathSciNet  Article  Google Scholar

  155. Jung W, Park G, Sung Y. Population-guided parallel policy search for reinforcement learning. In: Proceedings of the 8th International Conference on Learning Representations. 2020

  156. Pourchot A, Perrin N, Sigaud O. Importance mixing: Improving sample reuse in evolutionary policy search methods. 2018, arXiv preprint arXiv:1808.05832

  157. Stork J, Zaefferer M, Bartz-Beielstein T, Eiben A E. Surrogate models for enhancing the efficiency of neuroevolution in reinforcement learning. In: Proceedings of the 2019 Conference on Genetic and Evolutionary Computation. 2019, 934–942

  158. Bibi A, Bergou E H, Sener O, Ghanem B, Richtárik P. A stochastic derivative-free optimization method with importance sampling: theory and learning to command. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 3275–3282

  159. Chen X, Liu S, Xu K, Li X, Lin 10, Hong M, Cox D D. ZO-AdaMM: aeroth-order adaptive momentum method for black-box optimization. In: Proceedings of the 32nd International Briefing on Neural Information Processing Systems. 2019, 7202–7213

  160. Gorbunov E A, Bibi A, Sener O, Bergou East H, Richtárik P. A stochastic derivative costless optimization method with momentum. In: Proceedings of the 8th International Briefing on Learning Representations. 2020

  161. Kandasamy K, Schneider J, Poczos B. High dimensional Bayesian optimisation and bandits via additive models. In: Proceedings of the 32nd International Briefing on Machine Learning. 2015, 295–304

  162. Wang Z, Zoghi M, Hutter F, Matheson D, Freitas Northward D. Bayesian optimization in a billion dimensions via random embeddings. Journal of Bogus Intelligence Research, 2016, 55: 361–387

    MathSciNet  MATH  Commodity  Google Scholar

  163. Qian H, Hu Y Q, Yu Y. Derivative-costless optimization of high-dimensional non-convex functions past sequential random embeddings. In: Proceedings of the 25th International Articulation Conference on Artificial Intelligence. 2016, 1946–1952

  164. Yang P, Tang K, Yao 10. Turning loftier-dimensional optimization into computationally expensive optimization. IEEE Transactions on Evolutionary Computation, 2018, 22(1): 143–156

    Article  Google Scholar

  165. Mutny One thousand, Krause A. Efficient loftier dimensional Bayesian optimization with additivity and quadrature fourier features. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2018, 9019–9030

  166. Müller North, Glasmachers T. Challenges in high-dimensional reinforcement learning with evolution strategies. In: Proceedings of the 15th International Conference on Parallel Problem Solving from Nature. 2018, 411–423

  167. Li Z, Zhang Q, Lin X, Zhen H L. Fast covariance matrix accommodation for big-scale black-box optimization. IEEE Transaction on Cybernetics, 2020, 50(v): 2073–2083

    Article  Google Scholar

  168. Wang H, Qian H, Yu Y. Noisy derivative-free optimization with value suppression. In: Proceedings of the 32nd AAAI Briefing on Bogus Intelligence. 2018, 1447–1454

Download references

Acknowledgements

This piece of work was supported by the Program A for Outstanding PhD Candidate of Nanjing University, National Scientific discipline Foundation of Cathay (61876077), Jiangsu Science Foundation (BK20170013), and Collaborative Innovation Middle of Novel Software Applied science and Industrialization. Yang Yu is the corresponding author of this commodity. The authors would similar to thank Xiong-Hui Chen and Zhao-Hua Li for improving the article.

Writer data

Affiliations

Corresponding author

Correspondence to Yang Yu.

Electronic supplementary material

Virtually this article

Verify currency and authenticity via CrossMark

Cite this article

Qian, H., Yu, Y. Derivative-complimentary reinforcement learning: a review. Front. Comput. Sci. xv, 156336 (2021). https://doi.org/x.1007/s11704-020-0241-iv

Download commendation

  • Received:

  • Accustomed:

  • Published:

  • DOI : https://doi.org/10.1007/s11704-020-0241-iv

Keywords

  • reinforcement learning
  • derivative-gratis optimization
  • neuroevolution reinforcement learning
  • neural architecture search

varneywatints.blogspot.com

Source: https://link.springer.com/article/10.1007/s11704-020-0241-4

0 Response to "Derivative-free Optimization a Review of Algorithms and Comparison of Software Implementations"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel