Speaker
Keisuke Fujii
Description
See the full Abstract at http://ocs.ciemat.es/EPS2018ABS/pdf/P2.1007.pdf
Automatic Robust Regression Analysis of Fusion Plasma Experiment
Data based on Generative Modelling
K. Fujii1 , C. Suzuki2 , and M. Hasuo1
1 Department of Mechanical Engineering and Science, Graduate School of Engineering, Kyoto
University, Kyoto 615-8540, Japan
1 National Institute for Fusion Science, Gifu 509-5292, Japan
The first step to realize an automatic data analysis for fusion plasma experiment is automat-
ically fitting noisy data measured routinely. A textbook example of fitting procedures is the
minimization of the squared difference between the measured data and some parameterized
functions such as polynomial. This model implicitly assumes that both the noise distribution
and the latent function form are already known, however, it is frequently not the case for the
real world data analysis. Using the conventional model in such situatiln easily results in over- or
under-fitting, and therefore some human supervision has been usually necessary. In this work,
we propose to optimize a model itself to stabilize the analysis.
Based on Bayesian statistics, the goodness of a model M for particular (k-th) data y(k) can
be measured by the merginal likelihood,
∫
p(y(k) |M ) = p(y(k) |θ , M )p(θ |M )d θ (1)
where, p(y(k) |θ ) is likelihood of data y(k) with given fitting parameter θ . The form of the likeli-
hood (noise distribution and form of the latent function) is implicitly included in the likelihood
and the prior distribution p(θ |M ).
The robustness of the model M might be measured by an expectation of this merginal like-
lihood, E p(y) [log p(y|M )], where p(y) is the true distribution of y that will generate data in
the future. We show that the maximization of this expectation is identical to the minimization
of Kullback-Leibler divergence between the true data distribution p(y) and the modeled data
distribution p(y|M ), and therefore the unbiased generative modeling is essential.
A strategy we propose here is to construct a flexible generative model, i.e. the latent function
form and the noise distribution, with neural networks and optimize their weights to fit our gen-
erative model to a large amount of data. We applied this strategy to Thomson scattering data in
Large Helical Device and found that our model outperforms the conventional analysis methods
that does not take into account the data distribution, especially in terms of the robustness.