Recently, machine learning techniques have been applied to hedge strategy for options. However, it is also recognized that the machines have some practical drawbacks. One big issue is interpretability. It is not easy to explain how the machines have achieved the final results and what the obtained parameters mean. Another issue is robustness to market changes. Basically, the machine will only work within the range of the data on which it was trained. If market conditions change significantly after training, the trained machine needs to be retrained on the new market data, otherwise, it can be very hard to control or even predict the behavior of the outputs from the machine. To overcome these challenges, we propose a new method by applying Gaussian process regression (GPR) to the policy function in reinforcement learning. In our method, the parameters of policy function are directly interpreted as optimal hedge trading amounts on each grid points. The high interpretability of the parameters enable us to combine the hedging strategies provided by the machine and delta-hedging strategy by model. The combined method shows high level of robustness against market change in our numerical experiments.
Yoshihiro Tawada is a Deputy Chief Manager of Quants Research & Advanced Solutions Development Dept. at Mitsubishi UFJ Morgan Stanley Securities Co., Ltd. He is a CFA Charterholder. He received a master's degree in engineering from the University of Tokyo and bachelor's degree in science from Hokkaido University. Toru Sugimura is the Chief Manager of Quants Research & Advanced Solutions Development Dept. at Mitsubishi UFJ Morgan Stanley Securities Co., Ltd. He received a Ph.D. in finance from Hitotsubashi University and bachelor's degree in science from Hokkaido University. The views expressed in the webinar and slides are the authors' own and do not reflect the view of institutions the authors belong to.