Research Issues on Bridging
Machine Learning and Simulation

This minisymposium will be held on Apr. 29th in the conference SDM2022.


Traditionally, simulation and machine learning have had opposite research directions: deductive and inductive computations. Many types of simulations based on the knowledge in their application fields have been developed and provided interpretable analysis in data mining, while in the filed of machine learning the development of surrogate model technology by deep learning and Gaussian process allows us to approximate the system in the real world as a black box with high accuracy.
In recent years, new research has been expanding based on methods and models of the opponent in each field; simulation has high accuracy and its cost is reduced by using a surrogate model and data assimilation techniques, and new research problems of machine learning are discovered by embedding simulators and first-principles calculations in a part of the statistical models.
However, because research issues tend to depend on the simulation application domain, the results are limited in each field. In this mini-symposium, we introduce simulation methods based on machine learning and machine-learning methods based on simulations, and explore possibility of an essential fusion of the fields. Particularly the intersection of simulation optimization and statistical modeling is considered through data assimilation. We also build a community by providing a place to effectively share the issues and the achievements across the fields.

Target audience

This minisymposium aims to attract all researchers and professionals of data mining and machine learning, who are interested in simulations, and researchers and engineers of simulations, who try to use techniques of machine learning and statistics.



opening & welcome

by Keisuke Yamazaki

10 mins



Application of kernel ABC to build a digital twin of

the manufacturing factory.

by Dr. Keiichi Kisamori

20 mins

Dr. Keiichi Kisamori

Principal of BIRD INITIATIVE Inc.

In recent years, in the era called VUCA, the importance of digital twins of a factory, logistics, and supply chain using simulation is increasing in the manufacturing industry. However, it is hard to optimize parameters to represent a precise real situation and data-driven simulation. We address this problem by using a data assimilation method that extends kernel mean embedding methods. We show an implementation example of this method in the real industry.


Knowledge-Guided Machine Learning:

A New Framework for Accelerating Scientific Discovery

by Professor Vipin Kumar

25 mins

Professor Vipin Kumar

University of Minnesota

This talk makes a case that in a real-world systems that are governed by physical processes, there is an opportunity to take advantage of fundamental scientific knowledge to inform the search of a physically meaningful and accurate ML model. While this talk will illustrate the potential of the knowledge-guided machine learning (KGML) paradigm in the context of environmental problems (e.g., Fresh water science, Hydrology, Agroecology), the paradigm has the potential to greatly advance the pace of discovery in a diverse set of discipline where mechanistic models are used, e.g., power engineering, climate science, weather forecasting, and pandemic management


invited talk 3


20 mins



Towards Integration of Data Assimilation and

Deep Learning Beneficial to Seismology

by Professor Hiromichi Nagao

20 mins

Professor Hiromichi Nagao

Earthquake Research Institute, The University of Tokyo

Data assimilation, which integrates numerical simulations and observational data based on Bayesian statistics, has been widely applied in seismology. Deep learning is expected to make data assimilation much more effective even when seismological numerical simulations are massive. We introduce our works such as development of a new data assimilation algorithm, implementation of the replica exchange Monte Carlo on data assimilation to simultaneously estimate seismic wave propagations and underground structures, application of a convolutional neural network to extract seismic phenomena from images, and current trials towards integration of data assimilation and deep learning aiming in the national seismological projects in Japan.


Statistical Machine Learning for Materials Modeling and Simulation

by Professor Ryo Yoshida

20 mins

Professor Ryo Yoshida

The Institute of Statistical Mathematics, Research Organization of Information and Systems

This talk will describe the potential of integrating machine learning and simulation in materials science. The most significant barrier to implementing data-driven materials research stems from the lack of sufficient amounts of data. In addition, the ultimate goal of materials research is to discover innovative materials that exist in unexplored areas where no data exist. Therefore, interpolative predictions using fully data-driven approaches are generally not sufficient to achieve this goal, and the integration of computer experiments into a machine learning pipeline plays an important role in materials science. This talk will present a case study of materials exploration based on transfer learning and adaptive design of experiments on a high-dimensional design space.


closing remarks

by Keisuke Yamazaki

5 mins



Keisuke Yamazaki


Biography of organizer

Keisuke Yamazaki is the leader of machine learning research team in National Institute of Advanced Industrial Science and Technology, and also working in BIRD-INITIATIVE Inc. His research interest focuses specifically on the Bayesian statistics with algebraic geometry and its connection to simulation algorithm. His papers have been published in journals and international conferences of machine learning such as Neural Networks, JMLR, Machine Learning, ICML and AISTATS. He is now proposing an organized session in the annual conference of Japanese Society of AI to provide a place to share the issues in machine learning and simulation.