42
and Unmeasured Confounders in a Bayesian
Framework
Yanxun Xu
Johns Hopkins University
Abstract: Effectively navigating decision-making
demands a comprehensive understanding of causal
relationships, especially with unmeasured confounders in the environment. Traditional causal inference
methods often relies on auxiliary data sources to identify true causal effects, such as instrumental variables
or proxies. Unfortunately, such data might be difficult
or impractical to acquire in observational studies,
leading to potential inaccuracies and incomplete inference. To address this limitation, we propose a novel
approach that integrates Bayesian joint modeling with
causal inference for effective decision-making under
the presence of unmeasured confounding. By taking
advantage of proper model design and assumptions,
the proposed framework can identify true causal effects without the reliance on additional data sources,
thereby leading to more informed and effective decisions in complex real-world observational scenarios.
Invited Session IS011: Data Science and Engineering
Active Machine Learning for Surrogate Modeling
in Complex Engineering Systems
Xiaowei Yue
Tsinghua University
Abstract: Active learning is a subfield of advanced
statistics and machine learning that focuses on improving the data collection efficiency in expensive-to-evaluate engineering systems. Surrogate models are indispensable in the analysis of complex engineering systems. The quality of surrogate models is
determined by the data quality and the model class but
achieving a high standard of them is challenging in
complex engineering systems. Heterogeneity, implicit
constraints, and extreme events are typical examples
of the factors that complicate systems, yet they have
been underestimated or disregarded in machine learning. This presentation is dedicated to tackling the
challenges in surrogate modeling of complex engineering systems by developing the following machine
learning methodologies. (i) Partitioned active learning
partitions the design space according to heterogeneity
in response features, thereby exploiting localized
models to measure the informativeness of unlabeled
data. (ii) For the systems with implicit constraints,
failure-averse active learning incorporates constraint
outputs to estimate the safe region and avoid undesirable failures in learning the target function. (iii) The
multi-output extreme spatial learning enables modeling and simulating extreme events in composite fuselage assembly. The proposed methods were applied to
real-world case studies and outperformed benchmark
methods. The codes of these algorithms are
open-sourced in our GitHub.
Nonparametric Statistical Inference via Metric
Distribution Function in Metric Spaces
Wenliang Pan
Chinese Academy of Sciences
Abstract: The distribution function is essential in
statistical inference and connected with samples to
form a directed closed loop by the correspondence
theorem in measure theory and the Glivenko-Cantelli
and Donsker properties. This connection creates a
paradigm for statistical inference. However, existing
distribution functions are defined in Euclidean spaces
and are no longer convenient to use in rapidly evolving data objects of complex nature. It is imperative to
develop the concept of the distribution function in a
more general space to meet emerging needs. Note that
the linearity allows us to use hypercubes to define the
distribution function in a Euclidean space. Still, without the linearity in a metric space, we must work with
the metric to investigate the probability measure. We
introduce a class of metric distribution functions
through the metric only. We overcome this challenging step by proving the correspondence theorem and
the Glivenko-Cantelli theorem for metric distribution
functions in metric spaces, laying the foundation for
conducting rational statistical inference for metric
space-valued data. Then, we develop a homogeneity
test and a mutual independence test for non-Euclidean
random objects and present comprehensive empirical