Dates Time Speakers/Topic Location
October 25, 2019 10:00 AM Ying Hung: Integration of Models and Data for Inference about Humans and Machines (I) Konstantin Mischaikow: Analyzing Imprecise Dynamics Jingjin Yu: Toward Scaleable and Optimal Autonomy COR-433
November 1, 2019 10:00 AM Rong Chen: Dynamic Systems and Sequential Monte-Carlo Jason Klusowski: Integration of Models and Data for Inference about Humans and Machines (II) Cun-Hui Zhang: Statistical Inference with High-Dimensional Data COR-433
November 8, 2019 10:00 AM Kostas Bekris: Generating Motion for Adaptive Robots Fred Roberts: Meaningless Statements in Performance Measurement for Intelligent Machines COR-433
November 15, 2019 10:00 AM Fioralba Cakoni: Inside-Out, Seen and Unseen Matthew Stone: Colors in Context Inference challenges in Bayesian cognitive science Wujun Zhang: Numerical approximation of optimal transport problem COR-433
February 7, 2020 10:00 AM Patrick Shafto (Mathematics and Computer Science; Rutgers University)

Title: Cooperation in Humans and Machines

Abstract: Cooperation, specifically cooperative information sharing, is a basic principle of human intelligence. Machine learning, in contrast, focuses on learning from randomly sampled data, which neither leverages others’ cooperation nor prioritizes the ability to communicate what has been learned. I will discuss ways in which our understanding of human learning may be leveraged to develop new machine learning, and form a foundation for improved integration of machine learning into human society.

May 8, 2020 10:00 AM Rene Vidal (Biomedical Engineering; Johns Hopinks University)

Title: From Optimization Algorithms to Dynamical Systems and Back

Abstract: Recent work has shown that tools from dynamical systems can be used to analyze accelerated optimization algorithms. For example, it has been shown that the continuous limit of Nesterov’s accelerated gradient (NAG) gives an ODE whose convergence rate matches that of NAG for convex, unconstrained, and smooth problems. Conversely, it has been shown that NAG can be obtained as the discretization of an ODE, however since different discretizations lead to different algorithms, the choice of the discretization becomes important. The first part of this talk will extend this type of analysis to convex, constrained and non-smooth problems by using Lyapunov stability theory to analyze continuous limits of the Alternating Direction Method of Multipliers (ADMM). The second part of this talk will show that many existing and new optimization algorithms can be obtained by suitably discretizing a dissipative Hamiltonian. As an example, we will present a new method called Relativistic Gradient Descent (RGD), which empirically outperforms momentum, RMSprop, Adam and AdaGrad on several non-convex problems. This is joint work with Guilherme Franca, Daniel Robinson and Jeremias Sulam.

June 5, 2020 12:00 PM Lydia Chilton (Computer Science; Columbia University)

Title: AI Tools for Creative Work

June 16, 2020 12:00 PM Mykhaylo Tyomkyn (Applied Mathematics; Charles University)

Title: Many Disjoint Triangles in Co-triangle-free Graphs

June 22, 2020 12:00 PM Lenka Zdeborova (Institute of Theoretical Physics; French National Centre for Scientific Research)

Title: Understanding Machine Learning with Statistical Physics

June 30, 2020 12:00 PM Rebecca Wright (Computational Science Center; Barnard College)

Title: Privacy in Today’s World

July 7, 2020 12:00 PM Vivek Singh, (Behavioral Informatics Lab; Rutgers University)

Title: Algorithmic Fairness

Abstract: Today Artificial Intelligence (AI) algorithms are used to make multiple decisions affecting human lives and many such algorithms have been reported to be biased. This includes parole decisions, search results, and product recommendation, among others. Using multiple examples of recent efforts from my lab, I will discuss how such bias can be systematically measured and how the underlying algorithms can be made less biased. More details available at:

July 17, 2020 10:00 AM Cynthia Rudin (Prediction Analysis Lab; Duke University)

Title: Interpretability vs. Explainability in Machine Learning

Abstract: With widespread use of machine learning, there have been serious societal consequences from using black box models for high-stakes decisions, including flawed bail and parole decisions in criminal justice. Explanations for black box models are not reliable, and can be misleading. If we use interpretable machine learning models, they come with their own explanations, which are faithful to what the model actually computes.

In this talk, I will discuss some of the reasons that black boxes with explanations can go wrong, whereas using inherently interpretable models would not have these same problems. I will give an example of where an explanation of a black box model went wrong, namely, I will discuss ProPublica’s analysis of the COMPAS model used in the criminal justice system: ProPublica’s explanation of the black box model COMPAS was flawed because it relied on wrong assumptions to identify the race variable as being important. Luckily in recidivism prediction applications, black box models are not needed because inherently interpretable models exist that are just as accurate as COMPAS.

I will also give examples of interpretable models in healthcare. One of these models, the 2HELPS2B score, is actually used in intensive care units in hospitals; most machine learning models cannot be used when the stakes are so high.

Finally, I will discuss two long-term projects my lab is working on, namely optimal sparse decision trees and interpretable neural networks.

July 21, 2020 12:00 PM Peter Winkler (Math and Computer Science; Dartmouth)

Title: Cooperative Puzzles

September 11, 2020 10:00 AM Mauro Maggioni (Data Intensive Computation; Johns Hopkins)

Title: Learning Interaction laws in particle- and agent-based systems

Abstract: Interacting agent-based systems are ubiquitous in science, from modeling of particles in Physics to prey-predator and colony models in Biology, to opinion dynamics in economics and social sciences. Oftentimes the laws of interactions between the agents are quite simple, for example they depend only on pairwise interactions, and only on pairwise distance in each interaction. We consider the following inference problem for a system of interacting particles or agents: given only observed trajectories of the agents in the system, can we learn what the laws of interactions are? We would like to do this without assuming any particular form for the interaction laws, i.e. they might be “any” function of pairwise distances. We consider this problem both the mean-field limit (i.e. the number of particles going to infinity) and in the case of a finite number of agents, with an increasing number of observations, albeit in this talk we will mostly focus on the latter case. We cast this as an inverse problem, and study it in the case where the interaction is governed by an (unknown) function of pairwise distances. We discuss when this problem is well-posed, and we construct estimators for the interaction kernels with provably good statistically and computational properties. We measure their performance on various examples, that include extensions to agent systems with different types of agents, second-order systems, and families of systems with parametric interaction kernels. We also conduct numerical experiments to test the large time behavior of these systems, especially in the cases where they exhibit emergent behavior. This is joint work with F. Lu, J.Miller, S. Tang and M. Zhong.

October 23, 2020 10:00 AM Jason Hartline (Computer Science; Northwestern University)

Title: Mechanism Design and Data Science

Abstract: Computer systems have become the primary mediator of social and economic interactions. A defining aspect of such systems is that the participants have preferences over system outcomes and will manipulate their behavior to obtain outcomes they prefer. Such manipulation interferes with data-driven methods for designing and testing system improvements. A standard approach to resolve this interference is to infer preferences from behavioral data and employ the inferred preferences to evaluate novel system designs.

In this talk Prof. Hartline will describe a method for estimating and comparing the performance of novel systems directly from behavioral data from the original system. This approach skips the step of estimating preferences and is more accurate. Estimation accuracy can be further improved by augmenting the original system; its accuracy then compares favorably with ideal controlled experiments, a.k.a., A/B testing, which are often infeasible. A motivating example will be the paradigmatic problem of designing an auction for the sale of advertisements on an Internet search engine.

October 27, 2020 10:00 AM Woojin Jung, (School of Social Science; Rutgers University)

Title: Using satellite imagery and deep learning to target aid in data-sparse contexts

Abstract: Aid policy has the potential to alleviate global poverty by targeting areas of concentrated need. A critical question remains, however, over whether aid is reaching the areas of most need. Often little ground-truth poverty data is available at a granular level (e.g., village) where aid interventions take place. This research explores remote sensing techniques to measure poverty and target aid in data-sparse contexts. Our study of Myanmar examines i) the performance of different methods of poverty estimation and ii) the extent to which poverty and other development characteristics explain community aid distribution. This study draws from the following sources of data: georeferenced community-driven development projects (n=12,504), daytime and nighttime satellite imagery, the Demographic and Health Survey, and conflict data. We first compare the accuracy of four poverty measures in predicting ground-truth survey data. Using the best poverty estimation in the first step, we investigate the association between village characteristics and aid per capita per village. Our results show that daytime features perform the best in predicting poverty as compared to the analysis of RSG color distribution, Kriging, and nighttime-based measures. We use a Convolutional Neural Network, pre-trained on ImageNet, to extract features from the satellite images in our best model. These features are then trained on the DHS wealth data to predict the DHS wealth index/poverty for villages receiving aid. The linear and non-linear estimator indicate that development assistance flows to low-asset villages, but only marginally. Aid is more likely to be disbursed to those villages that are less populous and farther away from fatal conflicts. Our study concludes that the nuances captured in satellite-based models can be used to target aid to impoverished communities.

November 13, 2020 10:00 AM Vivek Singh, (Behavioral Informatics Lab; Rutgers University)

Title: Auditing and Controlling Algorithmic Bias

Abstract: Today Artificial Intelligence algorithms are used to make multiple decisions affecting human lives, and many such algorithms, such as those used in parole decisions, have been reported to be biased. In this talk, I will share some recent work from our lab on auditing algorithms for bias, designing ways to reduce bias, and expanding the definition of bias. This includes applications such as image search, health information dissemination, and cyberbullying detection. The results will cover a range of data modalities, (e.g., visual, textual, and social) as well as techniques such as fair adversarial networks, flexible fair regression, and fairness-aware fusion.

December 4, 2020 10:00 AM Magnus Egerstedt (Electrical and Computer Engineering; Georgia Institute of Technology)

Title: Long Duration Autonomy With Applications to Persistent Environmental Monitoring

Abstract: When robots are to be deployed over long time scales, optimality should take a backseat to “survivability”, i.e., it is more important that the robots do not break or completely deplete their energy sources than that they perform certain tasks as effectively as possible. For example, in the context of multi-agent robotics, we have a fairly good understanding of how to design coordinated control strategies for making teams of mobile robots achieve geometric objectives, such as assembling shapes or covering areas. But, what happens when these geometric objectives no longer matter all that much? In this talk, we consider this question of long duration autonomy for teams of robots that are deployed in an environment over a sustained period of time and that can be recruited to perform a number of different tasks in a distributed, safe, and provably correct manner. This development will involve the composition of multiple barrier certificates for encoding tasks and safety constraints through the development of non-smooth barrier functions, as well as a detour into ecology as a way of understanding how persistent environmental monitoring can be achieved by studying animals with low-energy life-styles, such as the three-toed sloth.

Bio: Magnus Egerstedt is a Professor and School Chair in the School of Electrical and Computer Engineering at the Georgia Institute of Technology, where he also holds secondary faculty appointments in Mechanical Engineering, Aerospace Engineering, and Interactive Computing. Prior to becoming School Chair, he served as the director for Georgia Tech’s multidisciplinary Institute for Robotics and Intelligent Machines. A native of Sweden, Dr. Egerstedt was born, raised, and educated in Stockholm. He received a B.A. degree in Philosophy from Stockholm University, and M.S. and Ph.D. degrees in Engineering Physics and Applied Mathematics, respectively, from the Royal Institute of Technology. He subsequently was a Postdoctoral Scholar at Harvard University. Dr. Egerstedt conducts research in the areas of control theory and robotics, with particular focus on control and coordination of complex networks, such as multi-robot systems, mobile sensor networks, and cyber-physical systems. He is a Fellow of both the IEEE and IFAC, and is a foreign member of the Royal Swedish Academy of Engineering Sciences. He has received a number of teaching and research awards for his work, including the John. R. Ragazzini Award from the American Automatic Control Council, the O. Hugo Schuck Best Paper Award from the American Control Conference, and the Best Multi-Robot Paper Award from the IEEE International Conference on Robotics and Automation.

December 18, 2020 10:00 AM Tanya Berger-Wolf (Computer Science and Engineering; Ohio State University)

Title: Artificial Intelligence for Wildlife Conservation: AI and Humans Combating Extinction Together

Abstract: Photographs, taken by field scientists, tourists, automated cameras, and incidental photographers, are the most abundant source of data on wildlife today. I will show how fundamental data science and machine learning methods can be used to turn massive collections of images into high resolution information database, enabling scientific inquiry, conservation, and policy decisions. I will demonstrate how computational data science methods are used to collect images from online social media, detect various species of animals and even identify individuals. I will present data science methods to infer and counter biases in the ad-hoc data to provide accurate estimates of population sizes from those image data. I will also point out the risks that AI poses to endangered species data.

I will show how it all can come together to a deployed system, Wildbook, a project of tech for conservation non-profit Wild Me, with species including whales (, sharks (, giraffes (, and many more. In January 2016, Wildbook enabled the first ever full species (the endangered Grevy’s zebra) census using photographs taken by ordinary citizens in Kenya.The resulting numbers are now the official species census used by IUCN Red List and we repeated the effort in 2018, becoming the first certified census from an outside organization accepted by the Kenyan government. The 2020 event has just concluded on January 25-26. Wildbook is becoming the data foundation for wildlife science, conservation, and policy. Read more:

Bio: Dr. Tanya Berger-Wolf is a Professor of Computer Science Engineering, Electrical and Computer Engineering, and Evolution, Ecology, and Organismal Biology at the Ohio State University, where she is also the Director of the Translational Data Analytics Institute. As a computational ecologist, her research is at the unique intersection of computer science, wildlife biology, and social sciences. She creates computational solutions to address questions such as how environmental factors affect the behavior of social animals (humans included). Berger-Wolf is also a director and co-founder of the conservation software non-profit Wild Me, home of the Wildbook project, which enabled the first ever full census of the entire species, the endangered Grevy’s zebra in Kenya, using photographs from ordinary citizens. Wildbook has been featured in media, including The New York Times, CNN, and National Geographic.

Prior to coming to OSU in January 2020, Berger-Wolf was at the University of Illinois at Chicago. Berger-Wolf holds a Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign. She has received numerous awards for her research and mentoring, including University of Illinois Scholar, UIC Distinguished Researcher of the Year, US National Science Foundation CAREER, Association for Women in Science Chicago Innovator, and the UIC Mentor of the Year.

February 19, 2021 10:00 AM Dan Halperin (Computer Science; Tel Aviv University)

Title: Throwing a Sofa Through the Window

Abstract: Planning motion for robots and other artifacts toward desired goal positions while avoiding obstacles on the way becomes harder when the environment is tight or densely cluttered. Indeed, prevalent motion-planning techniques often fail in such settings. The talk centers on recently-developed efficient algorithms to cope with motion in tight quarters.

We study several variants of the problem of moving a convex polytope in three dimensions through a rectangular (and sometimes more general) window. Specifically, we study variants in which the motion is restricted to translations only, discuss situations in which such a motion can be reduced to sliding (translation in a fixed direction) and present efficient algorithms for those variants. We show cases where sliding is insufficient but purely transnational motion works, or where purely transnational motion is insufficient and rotation must be included. Finally, we explore the general setup, where we want to plan a general motion (with all six degrees of freedom) for the polytope through the window and present an efficient algorithm for this problem, with running time close to O(n^4), where n is the number of edges of the polytope. (Joint work with Micha Sharir and Itay Yehuda.)

As time permits I will present additional recent results for motion in tight settings in assembly planning, fixture design, and casting and molding.

Bio: Dan Halperin is a professor of Computer Science at Tel Aviv University. His main field of research is Computational Geometry and Its Applications. A major focus of his work has been in research and development of robust geometric algorithms, principally as part of the CGAL project and library. The application areas he is interested in include robotics, automated manufacturing, algorithmic motion planning and 3D printing. Halperin is an IEEE Fellow and an ACM Fellow.

February 24, 2021 11:45 AM Yang Ning (Department of Statistics and Data Science Cornell University)

Title: Adaptive Estimation in Multivariate Response Regression with Hidden Variables

Abstract: A prominent concern of scientific investigators is the presence of unobserved hidden variables in association analysis. Ignoring hidden variables often yields biased statistical results and misleading scientific conclusions. Motivated by this practical issue, this paper studies the multivariate response regression with hidden variables, $Y = (\Ps)^TX + (B^*)^TZ + E$, where $Y \in \RR^m$ is the response vector, $X\in \RR^p$ is the observable feature, $Z\in \RR^K$ represents the vector of unobserved hidden variables, possibly correlated with $X$, and $E$ is an independent error. The number of hidden variables $K$ is unknown and both $m$ and $p$ are allowed, but not required, to grow with the sample size $n$.
Though $\Ps$ is shown to be non-identifiable due to the presence of hidden variables, we propose to identify the projection of $\Ps$ onto the orthogonal complement of the row space of $B^*$, denoted by $\Ttheta$. The quantity $(\Ttheta)^TX$ measures the effect of $X$ on $Y$ that cannot be explained through the hidden variables, and thus $\Ttheta$ is treated as the parameter of interest. Motivated by the identifiability proof, we propose a novel %and computationally efficient estimation algorithm for $\Ttheta$, called HIVE, under homoscedastic errors. The first step of the algorithm estimates the best linear prediction of $Y$ given $X$, in which the unknown coefficient matrix exhibits an additive decomposition of $\Ps$ and a dense matrix due to the correlation between $X$ and %the hidden variable $Z$. Under the sparsity assumption on $\Ps$, we propose to minimize a penalized least squares loss by regularizing $\Ps$ and the dense matrix via group-lasso and multivariate ridge, respectively. Nonasymptotic deviation bounds of the in-sample prediction error are established. Our second step estimates the row space of $B^*$ by leveraging the covariance structure of the residual vector from the first step. In the last step, we estimate $\Ttheta$ via projecting $Y$ onto the orthogonal complement of the estimated row space of $B^*$ to remove the effect of hidden variables. Non-asymptotic error bounds of our final estimator of $\Ttheta$, which are valid for any $m,p,K$ and $n$, are established. We further show that, under mild assumptions, the rate of our estimator matches the best possible rate with known $B^*$ and is adaptive to the unknown sparsity of $\Ttheta$ induced by the sparsity of $\Ps$. The model identifiability, estimation algorithm and statistical guarantees are further extended to the setting with heteroscedastic errors.

Bio: Dr. Ning is an assistant professor in the Department of Statistics and Data Science at Cornell University. Prior to joining into the Cornell University, he was a post-doc at Princeton University. He received his Ph.D in Biostatistics from the Johns Hopkins University. His research interests focus on the high-dimensional statistics and causal inference with applications to biology, medicine and public health.

February 26, 2021 10:00 AM Hossein Khiabanian (Cancer Institute of New Jersey)

Title: Integrated inference analyses to dissect tumor mutational profiles

Abstract: Recent advances in the use of clinical sequencing platforms in precision oncology settings have resulted in unprecedented access to the genomes of individual tumors. These assays aim to reliably identify and annotate somatic alterations specific to cancer cells for accurate diagnosis and treatment. However, due to the common lack of patient-matched controls, there is a need for a systematic effort to interpret detected variants in tumor-only sequencing data and to accurately describe the genomic landscape of a single tumor. In this talk, I will present a set of integrated, information-theoretic approaches that permit selecting the most consistent mutational model, distinguishing alterations in the tumor from those present in all cells (germline), while accounting for biases inherent to DNA sequencing and sample purity estimation. Using simulations and large, independent clinical datasets, we demonstrate the accuracy and precision of our methods. We will also discuss cases for which these analyses provide a model for tumor evolution, demonstrating that additional inference of mutational signatures and dissection of heterogeneity in tumor microenvironment can generate diagnostic hypotheses that may lead to improved prognostication and treatment design.

Bio: Hossein Khiabanian is an Associate Professor of Pathology in Medical Informatics at Rutgers Cancer Institute of New Jersey. He trained in physics and systems biology, and has developed statistical approaches for analyzing high-throughput data to study hematologic and solid tumors. At Rutgers, he has focused on problems in computational biology and cancer genomics, based on the idea that studying complexity, dynamics, and stochastic patterns in biological data is critical for understanding how disease states initiate and evolve.

March 5, 2021 10:00 AM Moshe Y. Vardi (Computer Science; Rice University)

Title: Ethics Washing in AI

Abstract: Over the past decade Artificial Intelligence, in general, and Machine Learning, in particular, have made impressive advancements, in image recognition, game playing, natural-language understanding and more. But there were also several instances where we saw the harm that these technologies can cause when they are deployed too hastily. A Tesla crashed on Autopilot, killing the driver; a self-driving Uber crashed, killing a pedestrian; and commercial face-recognition systems performed terribly in audits on dark-skinned people. In response to that, there has been much recent talk of AI ethics. Many organizations produced AI-ethics guidelines and companies publicize their newly established responsible-AI teams. But talk is cheap. “Ethics washing” — also called “ethics theater” — is the practice of fabricating or exaggerating a company’s interest in equitable AI systems that work for everyone. An example is when a company promotes “AI for good” initiatives with one hand, while selling surveillance tech to governments and corporate customers with the other. I will argue that the ethical lens is too narrow. The real issue is how to deal with technology’s impact on society. Technology is driving the future, but who is doing the steering?

Bio: Moshe Vardi is a Professor of Computer Science at Rice University, where he also holds the titles of University Professor, the Karen Ostrum George Professor in Computational Engineering and Distinguished Service Professor. He also directs the Ken Kennedy Institute for Information Technology. Prior to joining Rice in 1993, he was managing a research department at the IBM Almaden Research Center. Dr Vardi received his Ph.D. from the Hebrew University of Jerusalem in 1981. His interests focus on applications of logic to computer science and teaching logic across the curriculum. He is an expert in model checking, constraint satisfaction and database theory, common knowledge (logic), and theoretical computer science. Vardi is the recipient of multiple awards and distinctions, including 3 IBM Outstanding Innovation Awards, co-winner of the 2000 Gödel Prize, co-winner of the 2005 ACM Paris Kanellakis Theory and Practice Award, co-winner of the LICS 2006 Test-of-Time Award, the 2008 and 2017 ACM Presidential Award, the 2008 Blaise Pascal Medal in computational science by the European Academy of Sciences, and others. He holds honorary doctorates from eight universities. He is a Guggenheim Fellow, as well as a Fellow of ACM, AAAS and AAAI. He is a member of the US National Academy of Engineering, the National Academy of Sciences, the European Academy of Sciences, and the Academia Europaea. Professor Vardi is an editor of several international journals and the president of the International Federation of Computational Logicians. He is Senior Editor of Communications of the ACM, after serving as its Editor-in-Chief for a decade.

March 5, 2021 2:00 PM Uli Bauer (TU Munich)

Title: Persistent matchmaking

Workshop on Topology: Identifying Order in Complex Systems
March 12, 2021 10:00 AM YingLi Tian (Electrical Engineering; The City College of New York)

Title: Learning Sign Language with AI Driven Grammar Checking

Abstract: American Sign Language (ASL) is a primary means of communication for over 500,000 people in the US, and a distinct language from English, conveyed through hands, facial expressions, and body movements. Most prior work on ASL recognition has focused on identifying a small set of simple signs performed, but current technology is not sufficiently accurate on continuous signing of sentences with an unrestricted vocabulary. In this talk, I will share our research of AI driven ASL learning tools to assist ASL students by enabling them to review and assess their signing skills through immediate, automatic, outside-of-classroom feedback. Our system can identify linguistic/performance attributes of ASL without necessarily identifying the entire sequence of signs and automatically determine if a performance contains
grammatical errors through fusion of multimodality (facial expression, hand gesture, and body pose) and multisensory information (RGB and Depth videos). The system currently can recognize 8 types grammatical mistakes and is able to generate feedback for ASL learners on average in less than 2 minutes for each 1 minute ASL video. Our system has also been tested on videos recorded with cellphones and webcameras.

Bio: Dr. YingLi Tian is a CUNY Distinguished Professor in Electrical Engineering Department at the City College of New York (CCNY) and Computer Science Department at Graduate Center of the City University of New York (CUNY). She is a Fellow of the Institute of Electrical and Electronics Engineers (IEEE), as well as a Fellow of International Association of Pattern Recognition (IAPR). She received her PhD from the Department of Electronic Engineering at the Chinese University of Hong Kong in 1996. Her research interests include computer vision, machine learning, artificial intelligence, assistive technology, medical imaging analysis, and remote sensing. She has published more than 200 peer-reviewed papers in journals and conferences in these areas with 21,500+ citations, and holds 29 issued patents. She is a pioneer in automatic facial expression analysis, human activity understanding, and assistive technology. Dr. Tian’s research on automatic facial expression analysis and database development while working at the Robotics Institute at Carnegie Mellon University has made significant impact in the research community and received the “Test of Time Award” at IEEE International Conference on Automatic Face and Gesture Recognition in 2019. Before joining CCNY, Dr. Tian was a research staff member at IBM T. J. Watson Research Center and led the video analytics team. She received the IBM Outstanding Innovation Achievement Award in 2007 and the IBM Invention achievement Awards every year from 2002 to 2007. Since Dr. Tian joined CCNY in Fall 2008, she has been focusing on assistive technology by applying computer vision and machine learning technologies to help people with special needs including the blind and visually impaired, deaf and hard-of-hearing, and the elderly. She serves as associate editors for IEEE Trans. on Multimedia (TMM), Computer Vision and Image Understanding (CVIU), Journal of Visual Communication and Image Representation (JVCI), and Machine Vision and Applications (MVAP).

March 19, 2021 2:00 PM Henrik Ronellenfitsch (Williams College)

Title: Physics of Functional Networks

Abstract: We are surrounded by functional networks, from fluid transport in plants and animals to macroscopic elastic scaffoldings and microscopic crystals and materials, and engineered power grids. Often, such networks can be seen as optimized for their function, either through evolution and natural selection or by human design. In this presentation, we investigate a number of functional networks from biology and engineering and show how optimization shapes their weighted topology in similar ways despite their different functional goals.

Workshop on Topology: Identifying Order in Complex Systems
March 24, 2021 11:45 AM Colin Fogarty (Sloan School of Management, MIT)

Title: Prepivoting in Finite Population Causal Inference

Abstract: In finite population causal inference exact randomization tests can be constructed for sharp null hypotheses, hypotheses which fully impute the missing potential outcomes. Oftentimes inference is instead desired for the weak null that the sample average of the treatment effects takes on a particular value while leaving the subject-specific treatment effects unspecified. Without proper care, tests valid for sharp null hypotheses may be anti-conservative even asymptotically should only the weak null hold, creating the risk of misinterpretation when randomization tests are deployed in practice. We develop a general framework for unifying modes of inference for sharp and weak nulls, wherein a single procedure simultaneously delivers exact inference for sharp nulls and asymptotically valid inference for weak nulls. To do this, we employ randomization tests based upon prepivoted test statistics, wherein a test statistic is first transformed by a suitably constructed cumulative distribution function and its randomization distribution assuming the sharp null is then enumerated. For a large class of test statistics common in practice, we show that prepivoting may be accomplished by employing a sample-based Gaussian measure governed by a suitably constructed covariance estimator. In essence, the approach enumerates the randomization distribution (assuming the sharp null) of a p-value for a large-sample test known to be valid under the weak null, and uses the resulting randomization distribution to perform inference. The versatility of the method is demonstrated through various examples, including inference for rerandomized experiments.

Bio: Colin Fogarty is the Sarofim Family Career Development Professor and an Assistant Professor of Operations Research and Statistics at the MIT Sloan School of Management. His research interests lie in the design and analysis both of randomized experiments, and of observational studies while assessing the robustness of a study’s findings to hidden biases. Much of his work explores the extent to which classical randomization-based approaches for inference in experiments and observational studies extend to circumstances where heterogeneous treatment effects are suspected. His work also illustrates tangible benefits for many quasi-experimental devices in terms of improved robustness to hidden bias in observational studies. Before joining MIT he completed his Ph.D. in Statistics at the Wharton School of the University of Pennsylvania, where he was advised by Professor Dylan Small.

March 31, 2021 11:45 AM Ya’acov Ritov (Department of Statistics, University of Michigan, Ann Arbor)

Title: The partial linear model (PLM) from semiparametric to modern ramifications.

Abstract: Engle, Granger, Rice, and Weiss (1986) suggested the partially linear model to deal with regression, which has both linear and nonparametric components. Its modern equivalent, inference in the ultra high dimensional regression, was analyzed by Zhang and Zhang (2014) and others. We consider different variations on this theme, including inference without assuming compatibility, design of experiments with unlabeled data, single-index models, and regression discontinuity designs.

Bio: PhD from the Hebrew University of Jerusalem, where he was a professor until 2015. Thereafter, a professor of statistics at the University of Michigan, Ann Arbor.

April 2, 2021 2:00 PM Vidit Nanda (Oxford University)

Title: The missing link

Abstract: Links of strata in singular spaces are fundamental invariants which govern the topology of small neighbourhoods around points in those strata. This talk will focus on inferring links of strata from incomplete information in three completely different contexts. In each case, there are exciting consequences of learning the structure of such links.

Workshop on Topology: Identifying Order in Complex Systems
April 7, 2021 11:45 AM Xiaofeng Shao (Department of Statistics, Univ. of Illinois at Urbana-Champaign)

Title: Change-point detection for COVID-19 time series via self-normalization

Abstract: This talk consists of two parts. In the first part, I will review some basic
idea of self-normalization (SN) for inference of time series in the context of
confidence interval construction and change-point testing in mean. In the second part, I will present a piecewise linear quantile trend model to model infection trajectories of COVID-19 daily new cases. To estimate the change-points in the linear trend, we develop a new segmentation algorithm based on SN test statistics and local scanning. Data analysis for COVID-19 infection trends in many countries demonstrates the usefulness of our new model and segmentation method.

Bio: Xiaofeng Shao is currently a professor at University of Illinois at Urbana-Champaign.
He is a fellow of Institute of Mathematical Statistics (IMS) and American Statistical Association (ASA). His research interests include: Time series analysis, functional data analysis, high dimensional data analysis and their applications in atmospheric science, business, economics, finance, and neuroscience.

April 16, 2021 2:00 PM Paul Bendich (Duke)

Title: From Geometry to Topology: Inverse Theorems for Distributed Persistence

Abstract: What is the “right” topological invariant of a large point cloud X? Prior research has focused on estimating the full persistence diagram of X, a quantity that is very expensive to compute, unstable to outliers, and far from a sufficient statistic. We therefore propose that the correct invariant is not the persistence diagram of X, but rather the collection of persistence diagrams of many small subsets. This invariant, which we call “distributed persistence,” is trivially parallelizable, more stable to outliers, and has a rich inverse theory. The map from the space of point clouds (with the quasi-isometry metric) to the space of distributed persistence invariants (with the Hausdorff-Bottleneck distance) is a global quasi-isometry. This is a much stronger property than simply being injective, as it implies that the inverse of a small neighborhood is a small neighborhood, and is to our knowledge the only result of its kind in the TDA literature. Moreover, the quasi-isometry bounds depend on the size of the subsets taken, so that as the size of these subsets goes from small to large, the invariant interpolates between a purely geometric one and a topological one. Finally, we note that our inverse results do not actually require considering all subsets of a fixed size (an enormous collection), but a relatively small collection satisfying certain covering properties that arise with high probability when randomly sampling subsets. These theoretical results are complemented by two synthetic experiments demonstrating the use of distributed persistence in practice. This is joint work with Elchanan Solomon and Alexander Wagner

Workshop on Topology: Identifying Order in Complex Systems
April 30, 2021 2:00 PM Sabetta Matsumoto (Georgia Tech)

Title: Twisted topological tangles or: the knot theory of knitting

Abstract: Imagine a 1D curve, then use it to fill a 2D manifold that covers an arbitrary 3D object – this computationally intensive materials challenge has been realized in the ancient technology known as knitting. This process for making functional materials 2D materials from 1D portable cloth dates back to prehistory, with the oldest known examples dating from the 11th century CE. Knitted textiles are ubiquitous as they are easy and cheap to create, lightweight, portable, flexible and stretchy. As with many functional materials, the key to knitting’s extraordinary properties lies in its microstructure.

At the 1D level, knits are composed of an interlocking series of slip knots. At the most basic level there is only one manipulation that creates a knitted stitch – pulling a loop of yarn through another loop. However, there exist hundreds of books with thousands of patterns of stitches with seemingly unbounded complexity.

The topology of knitted stitches has a profound impact on the geometry and elasticity of the resulting fabric. This puts a new spin on additive manufacturing – not only can stitch pattern control the local and global geometry of a textile, but the creation process encodes mechanical properties within the material itself. Unlike standard additive manufacturing techniques, the innate properties of the yarn and the stitch microstructure has a direct effect on the global geometric and mechanical outcome of knitted fabrics.

Workshop on Topology: Identifying Order in Complex Systems
May 21, 2021 10:00 AM Haotian Wang (Computer Science; Rutgers University)

Title: Co-evolution of Opinion and Social Tie Dynamics Towards Structural Balance

Abstract: In the natural network structure, especially in the social networks, community structures are one of the prominent properties. An extreme case of that is when the network is partitioned into two camps with opposing relationships. In this talk, I will introduce our co-evolution model for both dynamics of opinions (people’s views on a variety of topics) and dynamics of social appraisals (the approval or disapproval towards each other). It leads to the formation of communities in the networks. The opinion of an individual is updated by the weighted average of opinions from neighbors. And the tie appraisal of two nodes is updated with a margin proportional to the agreement of their opinions.
We show that with favorable conditions on the initial opinion and edge appraisal values, the system stabilizes at finite time, at which edge weights have stable signs (positive or negative), and structure balance is achieved (the multiplication of weights on any triangle is non-negative). Some real-world examples are demonstrated using this co-evolution model. The stable final state is matched with the camps partition in the real world. This explains that community structure naturally evolves as an outcome of the co-evolution model. The model sheds light on why community structure emerges and becomes a widely observed, sustainable property in complex networks.

Bio: Haotian Wang is a Ph.D. candidate in the department of computer science at Rutgers University. His research interests include: computational geometry, algorithm design, and networking application.

May 28, 2021 10:00 AM Kai Gao (Computer Science; Rutgers University)
Title: On Minimizing the Number of Running Buffers for Tabletop Rearrangement

Abstract: For tabletop rearrangement problems with overhand grasps, storage space outside the tabletop workspace, or buffers, can temporarily hold objects which greatly facilitates the resolution of a given rearrangement task. This brings forth the natural question of how many running buffers are required so that certain classes of tabletop rearrangement problems are feasible. In this work, we examine the problem for both the labeled (where each object has a specific goal pose) and the unlabeled (where goal poses of objects are interchangeable) settings. On the structural side, we observe that finding the minimum number of running buffers (MRB) can be carried out on a dependency graph abstracted from a problem instance, and show that computing MRB on dependency graphs is NP-hard. We then prove that under both labeled and unlabeled settings, even for uniform cylindrical objects, the number of required running buffers may grow unbounded as the number of objects to be rearranged increases; we further show that the bound for the unlabeled case is tight. On the algorithmic side, we develop highly effective algorithms for finding MRB for both labeled and unlabeled tabletop rearrangement problems, scalable to over a hundred objects under very high object density. Employing these algorithms, empirical evaluations show that random labeled and unlabeled instances, which more closely mimics real-world setups, have much smaller MRBs.
This is joint work with Si Wei Feng and Jingjin Yu.

Bio: Kai Gao is a second-year doctoral student in Robotics at Rutgers, the State University of New Jersey, working with Professor Jingjin Yu. Currently, his research focuses on resolving combinatorial challenges in robot tasks and motion planning. Before arriving at Rutgers, he received a Bachelor’s degree in Mathematics from the University of Science and Technology of China in 2019.

Rui Wang (Computer Science; Rutgers University)
Title: Planning with Perception in the Loop: Safe and Effective Picking Path in Clutter given Discrete Distributions of Object Poses

Abstract: Picking an item in the presence of other objects can be challenging as it involves occlusions and partial views. Given object models, one approach is to perform object pose estimation and use the most likely candidate pose per object to pick the target without collisions. This approach, however, ignores the uncertainty of the perception process both regarding the target’s and the surrounding objects’ poses. This work proposes first a perception process for 6D pose estimation, which returns a discrete distribution of object poses in a scene. Then, an open-loop planning pipeline is proposed to return safe and effective solutions for moving a robotic arm to pick, which (a) minimizes the probability of collision with the obstructing objects; and (b) maximizes the probability of reaching the target item. The planning framework models the challenge as a stochastic variant of the Minimum Constraint Removal (MCR) problem. The effectiveness of the methodology is verified given both simulated and real data in different scenarios. The experiments demonstrate the importance of considering the uncertainty of the perception process in terms of safe execution. The results also show that the methodology is more effective than conservative MCR approaches, which avoid all possible object poses regardless of the reported uncertainty.

Bio: Rui Wang is a Ph.D. candidate in the department of Computer Science at Rutgers University, supervised by Professor Kostas Bekris. His research lies in task and motion planning on robot manipulation, specifically with failure-explanation planning approaches which reason about the failure of finding a valid plan and use the explanation for further guidance. Prior to his Ph.D. in Rutgers, he received his Master degree in Mechanical Engineering from Columbia University and his Bachelor degree in Vehicle Engineering from Nanjing University of Aeronautics and Astronautics, China.

October 1, 2021 10:00 AM Cameron Thieme, (DIMACS; Rutgers University)

Title: Attractors of Nonsmooth and Multivalued Dynamical Systems

Abstract: Over the past few decades, piecewise-continuous differential equations have become increasingly popular in scientific models. In particular, conceptual climate models often take this form. These nonsmooth systems are typically reframed as Filippov systems, a special type of multivalued dynamical system. Some qualitative properties of these inclusions have been studied over the last few decades, primarily in the context of control systems. Our interest in these systems is in understanding what behavior identified in the nonsmooth model may be continued to families of smooth differential equations which limit to the Filippov system; determining this information is particularly important in this context because the piecewise-continuous model is frequently considered to be a heuristically understandable approximation of a more realistic smooth system. In this talk we will examine how Conley index theory may be applied to the study of differential inclusions in order to address this goal. In particular, we will discuss how attractor-repeller pairs identified in a Filippov system continue to nearby smooth systems.

Bio: Cameron Thieme is a postdoctoral researcher at DIMACS associated with the DATA-INSPIRE Institute. His research focuses on the use of topological methods in dynamical systems. In particular, he is interested in how classical methods developed for single-valued dynamical systems (flows, maps) may be generalized to set-valued ones; these modern, multivalued dynamical systems have applications in conceptual modeling and data analysis. He received his PhD in Mathematics at the University of Minnesota under the supervision of Richard McGehee in 2021.

October 15, 2021 4:00 PM Ronitt Rubinfeld, (MIT)

Title: Locality in Computation

Abstract: Consider a setting in which inputs to and outputs from a computational problem are so large, that there is not time to read them in their entirety. However, if one is only interested in small parts of the output at any given time, is it really necessary to solve the entire computational problem? Is it even necessary to view the whole input? We survey recent work in the model of “local computation algorithms” which for a given input, supports queries by a user to values of specified bits of a legal output. The goal is to design local computation algorithms in such a way that very little of the input needs to be seen in order to determine the value of any single bit of the output. Though this model describes sequential computations, techniques from local distributed algorithms have been extremely important in designing efficient local computation algorithms. In this talk, we describe results on a variety of problems for which sublinear time and space local computation algorithms have been developed — we will give special focus to finding maximal independent sets and generating random objects.

Bio: Ronitt Rubinfeld is the Edwin Sibley Webster Professor in MIT’s Electrical Engineering and Computer Science department, where she has been on the faculty since 2004. She has held faculty positions at Cornell University and Tel Aviv University, and has been a member of the research staff at NEC Research Institute.
Ronitt’s research centers on property testing and sub-linear time algorithms, that provide the foundations for measuring the performance of algorithms that analyze data by looking at only a very small portion of it. Her work has developed the field of sublinear time *property testers* functions, combinatorial objects and distributions.
Ronitt received her PhD from the University of California, Berkeley. Ronitt Rubinfeld was an ONR Young Investigator, a Sloan Fellow, and an invited speaker at the Internal Congress of Mathematics in 2006. She is a fellow of the ACM and of the American Academy of Arts and Sciences.

November 12, 2021 10:00 AM Zhigang Zhu, (Computer Science, Grove School of Engineering, The City College and Graduate Center / CUNY)

Title: SAT-Hub: Smart and Accessible Transportation Hub for Assistive Navigation and Facility Management

Abstract: SAT-Hub aims to provide better location-aware services to traveling public, especially for underserved populations including those with visual impairment, Autism Spectrum Disorder (ASD), or simply navigation challenges, with minimal infrastructure changes. The SAT-Hub project has the following three main technical components: (1). A SAT multilayer live facility model, with a building feature layer, a space information layer, a crowd dynamic layer, and a service information layer. (2). SAT hybrid mobile localization algorithms, using beacons, 2D/3D cameras and onboard sensors, integrated with the information from the multilayer model. (3). SAT multimodal human-centered interfaces, with both the the layered model and the localization algorithms as the drivers for users with disabilities and/or travel challenges to better perform their traveling tasks. This talk will provide an overview of the project, with a number of sample results on various aspects of the cyber-physical-human ecosystem in research, development and commercialization. The research is a collaboration among CUNY, Rutgers, Lighthouse Guild and Bentley Systems, Inc., and is supported by the DHS Summer Research Team (SRT) Program, the NSF Smart and Connected Community Program, the NSF Partnerships of Innovation Program, and the Bentley Research Collaboration Program.

Bio: Dr. Zhigang Zhu is currently Herbert G. Kayser Professor of Computer Science, at The City College and The Graduate Center, The City University of New York. He is Director of the City College Visual Computing Laboratory (CCVCL), and Co-Director of the Master’s Program in Data Science and Engineering at CCNY. Previously he was Associate Professor at Tsinghua University, Beijing and a Senior Research Fellow at the University of Massachusetts, Amherst. Dr. Zhu obtained his BS, MS and PhD degrees, all in Computer Science from Tsinghua University. His research interests include computer vision, multimodal sensing, human-computer interaction, and various applications in assistive technology, robotics, surveillance and transportation. Among other honors, he is a recipient of the President’s Award for Excellence at CCNY in 2013, and in 1999 his PhD thesis was selected into the Hundred National Excellent Doctoral Theses in China. He is an Associate Editor of Machine Vision Applications, Springer

November 19, 2021 10:00 AM Chinwe Ekenna, (University at Albany, SUNY)

Title: Motion Planning Advancements and Applications in Computational Biology: An Algebraic Topology Perspective

Abstract: Techniques for motion planning have advanced to address high-dimensional and complex environments. Understanding the approximations utilized in generating various robot configurations, as well as how much sampling is required to ensure that a path is constructed if one exists, is still a challenge. My talk will highlight advances in the topological representation of planning spaces for robots, as well as topological tools I developed to help explore, measure, and provide an upper-bound on the amount of sampling required in a given environment. This method is used to study protein-protein interactions in computational biology. The identification of biomolecular structures, functions, and interactions is aided by geometric properties of protein surfaces. These characteristics have proven to be significant in predicting protein-ligand or protein-protein interactions. I’ll show how to extract significant geometric information from the protein surface using an algorithm that uses simplicial complexes and discrete Morse theory. We offer the probable intermediate conformations of the biomolecule around the protein surface as it travels to the binding site using the retrieved geometric information.

Bio: Chinwe Ekenna is an Assistant Professor in the Department of Computer Science at the University at Albany, State University of New York who got her PhD from Texas A&M University with Dr. Nancy Amato as her advisor. Chinwe’s research centers on intelligent motion planning applied to robotics and proteins. She has explored intelligent adaptation of robotic motion planning to improve planning time and topological data analysis methods to capture important features of robot planning spaces. Her research interest includes Machine learning, computational geometry, and computational biology. Chinwe is a recipient of the NSF-CRII award on “Topology aware configuration spaces” and has gone on to publish several works in ICRA and IROS on this subject. She is currently an Associate Editor for IEEE-RAL and has served on several program committees for the ICRA, IROS and WAFR conferences. She is a committee member of the IEEE RAS Committee to Explore Synergies in Automation and Robotics (CESAR), which comprises top researchers in the field of automation and robotics.

March 2, 2022 11:45 AM Fei Xue, (Department of Statistics; Purdue University)

Title: Statistical Inference for High-dimensional Block-wise Missing Data

Abstract: For multi-source data, blocks of variable information from certain sources are likely missing. Most existing methods for handling missing data do not take structures of block-wise missing data into consideration. In this talk, I will describe a Multiple Block-wise Imputation (MBI) approach, which incorporates imputations based on both complete and incomplete observations. Specifically, for a given missing pattern group, the imputations in MBI incorporate more samples from groups with fewer observed variables in addition to the group with complete observations. We propose to construct estimating equations based on all available information, and integrate all estimating functions to achieve efficient estimators. In addition, we propose a nearly unbiased estimator for each individual regression coefficient, which is asymptotically normally distributed under mild conditions. Based on these debiased estimators, asymptotically valid confidence intervals and statistical tests about each regression coefficient are constructed. Numerical studies and ADNI data application confirm that the proposed method outperforms existing methods under various missing mechanisms.

Bio: Fei is an Assistant Professor of Statistics at Purdue University. She got her PhD from UIUC and was a postdoc at University of Pennsylvania. Her research interests are data integration, missing data, mediation analysis, machine learning, and statistical genetics.

March 23, 2022 11:45 AM Russell Shinohara (University of Pennsylvania)

Title: Statistical Methods for Harmonizing Multi-scanner Neuroimaging

Abstract: While magnetic resonance imaging (MRI) studies are critical for the diagnosis, monitoring, and study for a wide variety of diseases, their use in quantitative analysis can be complex. An increasingly recognized issue involves the differences between MRI scanners that are used in large multi-center studies. To address this, the current state of the art is to “regress out” or “adjust for” scanner differences. Our group has found these methods to be insufficient, and have advocated for the adaptation of methods pioneered in genomics to help mitigate inter-scanner differences which can vary across the brain and result in both mean and variance shifts. We further study the implications of differences in correlation structures across and between images, and how this affects downstream inference.

Bio: Taki Shinohara is an Associate Professor of Biostatistics at the University of Pennsylvania. He directs the Penn Statistics in Imaging and Visualization Endeavor (PennSIVE), a Center of Excellence focusing on imaging statistics at the Perelman School of Medicine. His laboratory focuses on statistical methods and applications for neuroimaging data, with particular emphasis on multiple sclerosis research and neurodevelopmental studies.

March 30, 2022 11:45 AM Peng Wang (University of Cincinnati)

Title: Repro Sampling Method for Statistical Inference of High Dimensional Linear Models

Abstract: This paper proposes a new and effective simulation-based approach, called the Repro Sampling method, to conduct statistical inference in high dimensional linear models. The Repro method creates and studies the performance of artificial samples (referred to as Repro samples) that are generated by mimicking the sampling mechanism that generated the true observed sample. By doing so, this method provides a new way to quantify model and parameter uncertainty and provide confidence sets with guaranteed coverage rates on a wide range of problems. A general theoretical framework and an effective Monte-Carlo algorithm, with supporting theories, are developed for high dimensional linear models. This method is used to create confidence sets for both the selected models and model coefficients, with both exact and asymptotic inferences, are included. It also provides theoretical development to support computational efficiency. The development provides a simple and effective solution for the difficult post-selection inference problems.

Bio: Dr. Peng Wang is an Associate Professor of Business Analytics in Lindner College of Business at the University of Cincinnati. Prior to joining the College, Dr. Wang obtained his Ph.D. degree in statistics from the University of Illinois at Urbana -Champaign and worked as an Assistant Professor at Bowling Green State University. Dr. Wang’s research interests include longitudinal data analysis, high dimensional inference, basics of statistical inference, and applied statistical learning.

April 6, 2022 11:45 AM Andrew Nobel (University of North Carolina, Chapel Hill)

Title: Stationary Optimal Transport with Applications to Graph Alignment

Abstract: Optimal transport seeks to find couplings of two given distributions with minimum expected cost. This talk considers the setting in which the distributions of interest are stationary stochastic processes, and the cost function depends only on a finite number of coordinates. In this setting, I will argue that it is appropriate, and desirable, to restrict attention to stationary couplings, also known as joinings. The first part of the talk will address estimation of optimal joinings from observations of two ergodic processes. I will then consider optimal transport for Markov chains via transition couplings, beginning with fast computation based on techniques from reinforcement learning. As an illustration, I will show how optimal joinings of Markov chains can be used to effectively compare two weighted graphs with potentially different node sets. This approach yields interpretable alignments of nodes and edges, has a desirable edge-preserving property, and implicitly account for graph factors when these exist.

Bio: Andrew Nobel is the Robert Paul Ziff Distinguished Professor of Statistics and Operations Research at UNC Chapel Hill. His research interests include optimal transport, dynamical systems, and statistical genomics. His research encompasses mathematical foundations and methodological development, as well as real-world applications. His work has addressed an array of problems, including uniform ergodic theorems for VC-classes, matrix reconstruction in Gaussian noise, analysis and implementation of biclustering procedures for large average submatrices, community detection in weighted networks, and analysis of joint and individual variation in multi-view genomic data. Nobel is a fellow of the IMS, and is currently an Associate Editor at JRSS-B.

April 15, 2022 4:00 PM Bin Yu (UC Berkeley, Statistics, EECS, CCB)

Title: Predictability, stability, and causality with a case study to find genetic drivers of a heart disease

“A.I. is like nuclear energy — both promising and dangerous” — Bill Gates, 2019.
Data Science is a pillar of A.I. and has driven most of recent cutting-edge discoveries in biomedical research and beyond. Human judgement calls are ubiquitous at every step of a data science life cycle, e.g., in choosing data cleaning methods, predictive algorithms and data perturbations. Such judgment calls are often responsible for the “dangers” of A.I. To maximally mitigate these dangers, we developed a framework based on three core principles: Predictability, Computability and Stability (PCS). The PCS framework unifies and expands on the best practices of machine learning and statistics. It consists of a workflow and documentation and is supported by our software package v-flow.
In this talk, we first illustrate the PCS framework through the development of iterative random forests (iRF) for predictable and stable non-linear interaction discovery (in collaboration with the Brown Lab at LBNL and Berkeley Statistics). In pursuit of genetic drivers of a heart disease called hypertrophic cardiomyopathy as a CZ Biohub project in collaboration with the Ashley Lab at Stanford Medical School and others, we use iRF and UK Biobank data to recommend gene-gene interaction targets for knock-off experiments. We then analyze the experimental data to show promising findings.

Bio: Bin Yu is Chancellor’s Distinguished Professor and Class of 1936 Second Chair in the departments of statistics and EECS at UC Berkeley. She leads the Yu Group which consists of students and postdocs from Statistics and EECS. She was formally trained as a statistician, but her research extends beyond the realm of statistics. Together with her group, her work has leveraged new computational developments to solve important scientific problems by combining novel statistical machine learning approaches with the domain expertise of her many collaborators in neuroscience, genomics and precision medicine. She and her team develop relevant theory to understand random forests and deep learning for insight into and guidance for practice.
She is a member of the U.S. National Academy of Sciences and of the American Academy of Arts and Sciences. She is Past President of the Institute of Mathematical Statistics (IMS), Guggenheim Fellow, Tukey Memorial Lecturer of the Bernoulli Society, Rietz Lecturer of IMS, and a COPSS E. L. Scott prize winner. She holds an Honorary Doctorate from The University of Lausanne (UNIL), Faculty of Business and Economics, in Switzerland. She has recently served on the inaugural scientific advisory committee of the UK Turing Institute for Data Science and AI, and is serving on the editorial board of Proceedings of National Academy of Sciences (PNAS).

Link to video:

April 22, 2022 4:00 PM Kevin Jamieson, University of Washington

Title: Instance Dependent Sample Complexity Bounds for Interactive Learning

Abstract: The sample complexity of an interactive learning problem, such as multi-armed bandits or reinforcement learning, is the number of interactions with nature required to output an answer (e.g., a recommended arm or policy) that is approximately close to optimal with high probability. While minimax guarantees can be useful rules of thumb to gauge the difficulty of a problem class, algorithms optimized for this worst-case metric often fail to adapt to “easy” instances where fewer samples suffice. In this talk, I will highlight some my group’s work on algorithms that obtain optimal, finite time, instance dependent sample complexities that scale with the true difficulty of the particular instance, versus just the worst-case. In particular, I will describe a unifying experimental design based approach used to obtain such algorithms for best-arm identification for linear bandits, contextual bandits with arbitrary policy classes, and smooth losses for linear dynamical systems.

Bio: Kevin Jamieson is an Assistant Professor in the Paul G. Allen School of Computer Science & Engineering at the University of Washington and is the Guestrin Endowed Professor in Artificial Intelligence and Machine Learning. He received his B.S. in 2009 from the University of Washington, his M.S. in 2010 from Columbia University, and his Ph.D. in 2015 from the University of Wisconsin – Madison under the advisement of Robert Nowak, all in electrical engineering. He returned to the University of Washington as faculty in 2017 after a postdoc with Benjamin Recht at the University of California, Berkeley. Jamieson’s work has been recognized by an NSF CAREER award and Amazon Faculty Research award. His research explores how to leverage already-collected data to inform what future measurements to make next, in a closed loop. The work ranges from theory to practical algorithms with guarantees to open-source machine learning systems and has been adopted in a range of applications, including measuring human perception in psychology studies, adaptive A/B/n testing in dynamic web-environments, numerical optimization, and efficient tuning of hyperparameters for deep neural networks.

Link to video:

April 29, 2022 4:00 PM Adnan Darwiche, (University of California, Los Angeles)

Title: Explaining the Decisions of AI Systems

Abstract: I will present a theory for reasoning about the decisions made by AI systems, particularly classifiers such as decision trees, random forests, Bayesian networks and some limited types of neural networks. The theory is based on “compiling” the input-output behavior of classifiers into discrete functions in the form of tractable circuits. At the heart of the theory is the notion of “complete reason” behind a decision which is extracted from a circuit-instance pair and can be used to answer many queries about the decision, including ones pertaining to explainability, robustness and bias. I will also overview developments on tractable circuits which provide the computational arm for employing this theory in practice and will briefly overview recent results on quantified Boolean logic which provide classifier-independent semantics of this theory that further broadens its applicability.

Bio: Adnan Darwiche is a professor and former chairman of the computer science department at UCLA. He directs the Automated Reasoning Group, which focuses on symbolic reasoning, probabilistic reasoning and their applications to machine learning. Professor Darwiche is Fellow of AAAI and ACM and recipient of the Lockheed Martin Excellence in Teaching Award. He is a former editor-in-chief of the Journal of Artificial Intelligence Research (JAIR) and author of “Modeling and Reasoning with Bayesian Networks,” by Cambridge University Press.

Link to video:

June 7, 2022 12:00 PM Mikhail Khovanov, (Columbia University)

Title: Regular languages and cobordisms of decorated manifolds

Abstract: Regular languages constitute a simple class of languages that can be described via finite state automata. We explain a recently found enhancement of regular languages, extending them to an invariant of one-dimensional cobordisms (1-manifolds stretched between two0-manifolds) with decorations. This approach requires using a circular language as a regularizer and leads to a categorical extension of these familiar concepts. Various necessary concepts, including those of a cobordism, the Boolean semiring and semimodules over it, will be explained in the talk, which is based on a joint recent work with Mee Seong Im.

Link to video:

CoRE 431
June 15, 2022 12:00 PM Robert Bosch, (Oberlin College)

Title: Connecting the Dots: Using Combinatorial Optimization to Design Visual Artwork

Abstract: We will discuss how techniques for solving combinatorial optimization problems (including the traveling salesperson problem and the minimum cost spanning tree problem) can be used to design visual artwork. Examples include TSP Art, the Figurative Tour Problem, labyrinths, structured knight’s tours, and string art.

CoRE 431
June 28, 2022 12:00 PM Sarah Scheffler, (Princeton)

Title: A Systematization of Content Moderation in End-to-End Encryption

Abstract: End-to-end encryption is increasingly adopted in all kinds of communication, including secure messaging, video, audio, email, file sharing, and web browsing. As end-to-end encrypted systems expand and grow, so too do the needs and challenges for content moderation in these systems. This talk systematizes the study of content moderation under end-to-end encryption, including user reporting, metadata-based moderation, and automated content scanning with various client privacy guarantees. We identify a key distinction in the goals of various E2EE content moderation system between protecting users from content they do not want, and detecting groups of colluding users sending content the platform does not wish to host. We also identify several areas of future research in E2EE content moderation, especially creating better tools for transparency, verification, and auditability of these systems.

CoRE 431
July 5, 2022 12:00 PM James Abello, (Rutgers)

Title: Visual Exploration of Billion Edge Graphs

Abstract: Recently, Graph Cities have been proposed as scalable 3D visual representations of partitions of billion graph edge sets into “special” connected subgraphs called fixed points of degree peeling. We present a collection of “intuitive” primitives whose composition is useful for exploring these novel “large” graph city representations. These primitives are implemented as interactive navigation tools that include an eight directional steering wheel, individual building walks, path navigations, city tours, and a collection of visual queries. An interactive city glyph map is used as the central coordinator of all the different city views. Each point on the glyph map is addressable by pairing a peel value and the size interval associated with the glyph summarizing a corresponding bucket. A bucket with a single building has associated a circular glyph with colored spikes encoding its waves. A bucket with multiple buildings is represented by a colored spiral, whose detailed view becomes a local graph vicinity. These graph vicinities can be explored with the same functionality of a full Graph City. To explore the internal structure of a building, a user can zoom-in to obtain a 3D force directed layout of a building’s meta DAG that encodes the building local topological structure. We demonstrate visual exploration of a Friendster social network (1.8 billion edges), a co-occurrence keywords network derived from the Internet Movie Database (115 million edges), and a patent citation network (16.5 million edges).

CoRE 431
July 18, 2022 12:00 PM Esther Ezra, (Bar-Ilan University, Israel)

Title: Arc-Intersection Queries Amid Triangles in Three Dimensions and Related Problems

Abstract: Let T be a set of n triangles in 3-space, and let G be a family of algebraic arcs of constant complexity in 3-space. We show how to preprocess T into a data structure that supports various “intersection queries” for query arcs gϵG, such as detecting whether g intersects any triangle of T, reporting all such triangles, counting the number of intersection points between g and the triangles of T, or returning the first triangle intersected by a directed arc g, if any (i.e., answering arc-shooting queries). Our technique is based on polynomial partitioning and other tools from real algebraic geometry, among which is the cylindrical algebraic decomposition.
Our approach can be extended to many other intersection-searching problems in three and higher dimensions. We exemplify this versatility by giving an efficient data structure for answering segment-intersection queries amid a set of spherical caps in 3-space, and we lay a roadmap for extending our approach to other intersection-searching problems.
Joint work with Pankaj Agarwal, Boris Aronov, Matya Katz, and Micha Sharir.

CoRE 431
September 14, 2022 11:45 AM Dennis Lin, Purdue University

Title: Order-of-addition Experiments: Design and Analysis

Abstract: In Fisher (1971), a lady was able to distinguish (by tasting) from whether the tea or the milk was first added to the cup. This is probably the first popular Order of Addition (OofA) experiment. In general, there are m required components and we hope to determine the optimal sequence for adding these m components one after another. It is often unaffordable to test all the m! treatments (for example, m!=10! is about 3.5 millions), and the design problem arises. We consider the model in which the response of a treatment depends on the pairwise orders of the components. The optimal design theory under this model is established, and the optimal values of the D-, A-, E-, and M/S-criteria are derived. For Model-Free approach, an efficient sequential methodology is proposed, building upon the basic concept of quick-sort algorithm, to explore the optimal order without any model specification. The proposed method is capable to obtain the optimal order for large m (≥ 20). This work can be regarded as an early work of OofA experiment for large number of components. Some theoretical supports are also discussed. One case study for job scheduling will be discussed in detail.

Bio: Dr. Dennis K. J. Lin is a Distinguished Professor and Head of statistics Department at Purdue University. His research interests are quality assurance, industrial statistics, data science, and response surface. He has published near 300 SCI/SSCI papers in a wide variety of journals. He currently serves or has served as associate editor for more than 10 professional journals and was co-editor for Applied Stochastic Models for Business and Industry. Dr. Lin is an elected fellow of ASA, IMS and ASQ, an elected member of ISI and RSS, and a lifetime member of ICSA. He is an honorary chair professor for various universities, including a Chang-Jiang Scholar at Renmin University of China, Fudan University, National Taiwan Normal University, and National Chengchi University (Taiwan). His recent awards including, the Youden Address (ASQ, 2010), the Shewell Award (ASQ, 2010), the Don Owen Award (ASA, 2011), the Loutit Address (SSC, 2011), the Hunter Award (ASQ, 2014), the Shewhart Medal (ASQ, 2015), the SPES Award (ASA-SPES, 2016), the Chow Yuan-Shin Award (2019), and the Deming Lecturer Award (JSM, 2020).

Hill Center, Room 552 and Online
September 21, 2022 11:45 AM Hyunseung Kang, University of Wisconsin, Madison

Title: A Robust, Differentially Private Randomized Experiment for Evaluating Online Educational Programs with Sensitive Student Data

Abstract: Randomized control trials (RCTs) have been the gold standard to evaluate the effectiveness of a program, policy, or treatment on an outcome of interest. However, many RCTs assume that study participants are willing to share their (potentially sensitive) data, specifically their response to treatment. This assumption, while trivial at first, is becoming difficult to satisfy in the modern era, especially in online settings where there are more regulations to protect individuals’ data. The paper presents a new, simple experimental design that is differentially private, one of the strongest notions of data privacy. Also, using works on noncompliance in experimental psychology, we show that our design is robust against “adversarial” participants who may distrust investigators with their personal data and provide contaminated responses to intentionally bias the results of the experiment. Under our new design, we propose unbiased and asymptotically Normal estimators for the average treatment effect. We also present a doubly robust, covariate-adjusted estimator that uses pre-treatment covariates (if available) to improve efficiency. We conclude by using the proposed experimental design to evaluate the effectiveness of online statistics courses at the University of Wisconsin-Madison during the Spring 2021 semester, where many classes were online due to COVID-19.

Bio: Hyunseung (pronounced Hun-Sung) is an Assistant Professor in the Department of Statistics at the University of Wisconsin-Madison. His research is focused on developing theory and methods to analyze causal relationships by using instrumental variables, econometrics, semi/nonparametric methods, network analysis, and machine learning. He is interested in applications to genetics, epidemiology, infectious diseases, health policy, education, and applied microeconomics.

Hill Center, Room 552 and Online
September 28, 2022 12:00 PM Aaditya Ramdas, Carnegie Mellon University

Title: Estimating Means of Bounded Random Variables by Betting

Abstract: This paper derives confidence intervals (CI) and time-uniform confidence sequences (CS) for the classical problem of estimating an unknown mean from bounded observations. We present a general approach for deriving concentration bounds that can be seen as a generalization and improvement of the celebrated Chernoff method. At its heart, it is based on a class of composite nonnegative martingales, with strong connections to testing by betting and the method of mixtures. We show how to extend these ideas to sampling without replacement, another heavily studied problem. In all cases, our bounds are adaptive to the unknown variance, and empirically vastly outperform existing approaches based on Hoeffding or empirical Bernstein inequalities and their recent supermartingale generalizations. In short, we establish a new state-of-the-art for four fundamental problems: CSs and CIs for bounded means, when sampling with and without replacement.
This is joint work with my student Ian Waudby-Smith, and has been accepted as a JRSSB discussion paper in 2023 (

Bio: Aaditya Ramdas (PhD, 2015) is an assistant professor at Carnegie Mellon University, in the Departments of Statistics (75%) and Machine Learning (25%), and an Amazon Visiting Academic (20%). He was a postdoc at UC Berkeley (2015–2018) and obtained his PhD at CMU (2010–2015), receiving the Umesh K. Gavaskar Memorial Thesis Award. His undergraduate degree was in Computer Science from IIT Bombay (2005-09), and he did high-frequency algorithmic trading at a hedge fund (Tower Research) from 2009-10. Aaditya was an inaugural inductee of the COPSS Leadership Academy, and a recipient of the 2021 Bernoulli New Researcher Award.
His work is supported by an NSF CAREER Award, an Adobe Faculty Research Award (2019), a Google Research Scholar award (2022) for structured uncertainty quantification, amongst others. Aaditya’s main theoretical and methodological research interests include selective and simultaneous inference (interactive, structured, online, post-hoc control of false decision rates, etc), game-theoretic statistics and safe anytime-valid inference (confidence sequences, e-values/e-processes, test martingales, etc), and distribution-free black-box predictive inference (conformal prediction, calibration, etc). His areas of applied interest include privacy, neuroscience, genetics and auditing (elections, real-estate, financial), and his group’s work has received multiple best paper awards.

Hill Center, Room 552 and Online
October 4, 2022 11:00 AM Michael Posa, MEAM, UPenn

Title: Hybrid Robotics and Implicit Learning

Abstract: Machine learning has shown incredible promise in robotics, with some notable recent demonstrations in manipulation and sim2real transfer. These results, however, require either an accurate a priori model (for simulation) or a large amount of data. In contrast, my lab is focused on enabling robots to enter novel environments and then, with minimal time to gather information, accomplish complex tasks. In this talk, I will argue that the hybrid or contact-driven nature of real-world robotics, where a robot must safely and quickly interact with objects, drives this high data requirement. In particular, the inductive biases inherent in standard learning methods fundamentally clash with the non-differentiable physics of contact-rich robotics. Focusing on model learning, or system identification, I will show both empirical and theoretical results which demonstrate that contact stiffness leads to poor training and generalization, leading to some healthy skepticism of simulation experiments trained on artificially soft environments. Fortunately, implicit learning formulations, which embed convex optimization problems, can dramatically reshape the optimization landscape for these stiff problems. By carefully reasoning about the roles of stiffness and discontinuity, and integrating non-smooth structures, we demonstrate dramatically improved learning performance. Within this family of approaches, ContactNets accurately identifies the geometry and dynamics of a six-sided cube bouncing, sliding, and rolling across a surface from only a handful of sample trajectories. Similarly, a piecewise-affine hybrid system with thousands of modes can be identified purely from state transitions. I’ll also discuss how these learned models can be deployed for control via recent results in real-time, multi-contact MPC.

Bio: Michael Posa is an Assistant Professor in Mechanical Engineering and Applied Mechanics at the University of Pennsylvania. He leads the Dynamic Autonomy and Intelligent Robotics (DAIR) lab, a group within the Penn GRASP laboratory. His group focuses on developing computationally tractable algorithms to enable robots to operate both dynamically and safely as they quickly maneuver through and interact with their environments, with applications including legged locomotion and manipulation. Michael received his Ph.D. in Electrical Engineering and Computer Science from MIT in 2017, where, among his other research, he spent time on the MIT DARPA Robotics Challenge team. He received his B.S. in Mechanical Engineering from Stanford University in 2007. Before his doctoral studies, he worked as an engineer at Vecna Robotics in Cambridge, Massachusetts, designing control algorithms for the BEAR humanoid robot. He has received the Best Paper award at Hybrid Systems: Computation and Control (HSCC) and been finalist awards at ICRA and IEEE Humanoids. He has also received Google Faculty Research Award in 2019 and the Young Faculty Researcher Award from the Toyota Research Institute in 2021.

For more information, contact: Kostas Bekris, Associate Professor, Computer Science, Rutgers University

1 Spring Street, New Brunswick, NJ and Online
October 4, 2022 11:00 AM Takeaki Kariya, Nagoya University of Commerce and Business School, Japan

Title: A Modelling Framework for Regression with Collinearity

Abstract: This study addresses a fundamental, yet overlooked, gap between the standard theory and empirical practices in the OLS regression y=Xβ + u. To fill it, introducing a new concept “accommodation”, this paper formulates a novel conceptual framework for developing our own model selection process in empirical modelling for given (y,X) with collinearity in X. With no use of y, the new process enables us to find a class of effective and collinearity-resilient models. In fact, it directly controls not only the sampling variance of each OLSE, which includes Variance Inflation Factor, but also the individual power property of each t-test on regression coefficient, which includes what we call “Power Deflation Factor” as a collinearity factor. This framework will give an ordering on the set of all the sub-models in terms of efficiency and collinearity. And to materialize our model selection process, two computational algorithms are proposed.
Consequently, it will provide an advance model-screening process and serve as an empirical platform for pre-selecting a class of effective models that well accommodate y with both collinearity and inefficiency controlled in advance. In such a class of models, we can freely use such statistical measures and procedures with use of y as OLS estimation, t-value, coefficient of determination, stepwise model selection, etc. It is shown that in terms of predictive sampling variance of the k-th OLSE, the lower bound attains if and only if the mean of the explanatory vector 𝒙𝒙 𝒌𝒌 is 0 and 𝒙𝒙 𝒊𝒊 ′𝒙𝒙 𝒌𝒌 = 0 (j≠k). Also without using y, two algorithms for finding models with collinearity controlled are proposed, so that frequently used model selection procedures can be effectively used. However, in Kariya, Kurata and Hayashi (2022, JFSSA conference) since t-statistics are shown to be correlated, the stepwise model selection procedures are ineffective as they stand.

Bio: Professor of Nagoya University of Commerce and Business (2020-). Ph.D. in Statistics (U of Minnesota 75). In the past, Professor of Hitotsubashi U, Kyoto U, Meiji U. etc., Visiting Professor of Rutgers U, LSE, University of Chicago, etc. Published Books Robustness of Tests (with B.K.Sinha, Academic Press 89), Generalized Least Squares (with H. Kurata, Wiley 04), Asset Pricing (with R. Liu, Springer 03), etc. Published articles; The general MNOVA problem (AS 78), Transformations preserving normality and Wishart-ness (JMA 86 with Nabeya), A nonlinear version of the Gauss-Markov
theorem (JASA 85), Equivariant estimation with an ancillary statistic (AS 89), etc. Japan Statistical Society Award (99). President of Japanese Association of Financial Econometrics and Engineering (93-98)

October 5, 2022 11:50 AM Edsel Pena, University of South Carolina

Title: Searching for Truth through Data

Abstract: This talk concerns the role of statistical thinking in the Search for Truth using data. This will bring us to a discussion of P-values, a much-used tool in scientific research, but at the same time a controversial concept which had elicited much, sometimes heated, debates and discussions. In March 2016 the American Statistical Association (ASA) was compelled to release an official statement regarding P-values; a psychology journal has even gone to the extreme of banning the use of P-values in its articles; and in 2018 a special issue of The American Statistician was fully devoted to this issue. A main concern in the use of P-values is the introduction of a somewhat artificial threshold, usually the value of 0.05, when used in decision-making, with implications on reproducibility and replicability of reported scientific results. Some new perspectives on the use of P-values and in the search for truth through data will be discussed. In particular, this will touch on the representation of knowledge and its updating based on observations. Related to the issue of P-values, the following question arises: “When given the P-value, what does it provide in the context of the updated knowledge of the phenomenon under consideration, and what additional information should accompany it?” To be addressed also is the question of whether it is time to move away from hard thresholds such as 0.05, hence surmise whether we are on the verge of a — to quote Wasserstein, Schirm and Lazar
(2019) — a “World Beyond P < 0.05.”

Bio: Edsel A. Pena is Professor of Statistics at the University of South Carolina (UofSC) in Columbia, South Carolina. He is a Fellow of the American Statistical Association (ASA) and an Elected Member of the International Statistical Institute (ISI). He is currently serving as Executive Secretary of the Institute of Mathematical Statistics (IMS). Since August 2020 he has been serving as a Rotator Program Director at the National Science Foundation in the Statistics Program of the Division of Mathematical Sciences. He obtained his PhD degree from Florida State University in 1986. Prior to joining UofSC in 2000, he was a Professor at Bowling Green State University in Ohio. His research interests are in mathematical statistics, stochastic processes, survival analysis, reliability theory, multiple decision-making, nonparametric statistics, and foundational issues of statistical inference.

Hill Center, Room 552 and Online
October 12, 2022 11:50 AM Judy Wang, George Washington University

Title: Copula-based Approaches by Analyzing non-Gaussian Spatial Data

Abstract: Many existing methods for analyzing spatial data rely on the Gaussian assumption, which is violated in many applications such as wind speed, precipitation and COVID mortality data. In this talk, I will discuss several recent developments of copula-based approaches for analyzing non-Gaussian spatial data. First, I will introduce a copula-based spatio-temporal model for analyzing spatio-temporal data and a semiparametric estimator. Second, I will present a copula-based multiple indicator kriging model for the analysis of non-Gaussian spatial data by thresholding the spatial observations at a given set of quantile values. The proposed algorithms are computationally simple, since they model the marginal distribution and the spatio-temporal dependence separately. Instead of assuming a parametric distribution, the approaches model the marginal distributions nonparametrically and thus offer more flexibility. The methods will also provide convenient ways to construct both point and interval predictions based on the estimated conditional quantiles. I will present some numerical results including the analyses of a wind speed and a precipitation data. If time allows, I will also discuss a recent work on copula-based approach for analyzing count spatial data.

Bio: Judy Huixia Wang received her PhD in Statistics from University of Illinois in 2006. She was a faculty member in the Department of Statistics at North Carolina State University from 2006 to 2014. She is currently Professor and Chair in the Department of Statistics at the George Washington University. She received a CAREER award from the National Science Foundation and the Tweedie New Researcher Award from Institute of Mathematical Statistics in 2012. In 2018, she was elected as a Fellow of the American Statistical Association and of the Institute of Mathematical Statistics. She was one of the 2022 IMS Medallion Lecturers. She served as a Program Director in the Division of Mathematical Sciences (DMS) of National Science Foundation from 2018 to 2022, managing the statistics program in DMS as well as a number of interdisciplinary programs that are cross-directorate and cross-agency. Her research interests include quantile regression, semiparametric and nonparametric regression, high dimensional inference, extreme value analysis, spatial analysis, and etc.

Hill Center, Room 552 and Online
October 19, 2022 11:50 AM Vivak Patel, University of Wisconsin, Madison

Title: Counter Examples for Stochastic Gradient Descent

Abstract: Stochastic Gradient Descent (SGD) is a widely deployed algorithm for solving estimation problems that arise in statistics and learning. Accordingly, SGD has been analyzed from many perspectives to understand its behavior and to ensure its reliability, especially from a global convergence/consistency perspective. Unfortunately, we will show through simple examples that existing global convergence analyses make unrealistic deterministic assumptions, which result in incorrect conclusions or the utilization of inappropriate techniques. To be specific, counter to existing results, we will construct a deterministic example under realistic assumptions for which Gradient Descent (GD) will diverge catastrophically. Then, counter to a popular technique, we will provide a deterministic example for which approximating GD with continuous GD leads to incorrect conclusions about GD. Turning to stochastic assumptions, we show that existing stochastic assumptions are unrealistic for simple machine learning and statistics problems. Thus, we highlight that GD and SGD do not have an appropriate theory for learning problems. Finally, we provide a result for the global convergence of GD and SGD that addresses this gap.

Bio: Vivak Patel is an assistant professor of statistics at the University of Wisconsin — Madison. Prior to joining the faculty at UW — Madison, Vivak completed his doctorate in statistics at the University of Chicago, his master’s in mathematics at the University of Cambridge, and his Bachelor of Science n Applied Physics and Biomathematics at Rutgers University in New Brunswick. Vivak’s research is at the intersection of uncertainty and computing. On the one hand, Vivak and his group analyze and improve computational tools and algorithms that are applied to problems with inherent uncertainty, such as learning and statistical estimation. On the other hand, Vivak and his group use statistical concepts and uncertainty to improve computational tools and algorithms for challenging problems. Vivak and his group apply their work to problems arising in statistics, machine learning, data assimilation, differential equations, and control.

Hill Center, Room 552 and Online
October 21, 2022 10:00 AM Anirudha Majumdar, Princeton University

Title: Learning-Based Robot Control from Vision: Formal Guarantees and Fundamental Limits

Abstract: The ability of machine learning techniques to process rich sensory inputs such as vision makes them highly appealing for use in robotic systems (e.g., micro aerial vehicles and robotic manipulators). However, the increasing adoption of learning-based components in the robotics perception and control pipeline poses an important challenge: how can we guarantee the safety and performance of such systems? As an example, consider a micro aerial vehicle that learns to navigate using a thousand different obstacle environments or a robotic manipulator that learns to grasp using a million objects in a dataset. How likely are these systems to remain safe and perform well on a novel (i.e., previously unseen) environment or object? How can we learn control policies for robotic systems that provably generalize to environments that our robot has not previously encountered? Unfortunately, existing approaches either do not provide such guarantees or do so only under very restrictive assumptions.

In this talk, I will present our group’s work on developing a framework for learning control policies for robotic systems with formal guarantees on generalization to novel environments. The key technical insight is to leverage and extend powerful techniques from generalization theory in theoretical machine learning. We apply our techniques on problems including vision-based navigation and manipulation in order to demonstrate the ability to provide strong generalization guarantees on robotic systems with complicated (e.g., nonlinear/hybrid) dynamics, rich sensory inputs (e.g., RGB-D), and neural network-based control policies. I will also present recent work aimed at understanding fundamental limits on safety and performance imposed by a robot’s (imperfect) sensors.

Bio: Anirudha Majumdar is an Assistant Professor at Princeton University in the Mechanical and Aerospace Engineering (MAE) department, and Associated Faculty in the Computer Science department. He also holds a part-time position as a Visiting Research Scientist at the Google AI Lab in Princeton. He received a Ph.D. in Electrical Engineering and Computer Science from the Massachusetts Institute of Technology in 2016, and a B.S.E. in Mechanical Engineering and Mathematics from the University of Pennsylvania in 2011. Subsequently, he was a postdoctoral scholar at Stanford University from 2016 to 2017 at the Autonomous Systems Lab in the Aeronautics and Astronautics department. He is a recipient of the ONR YIP award, the NSF CAREER award, the Google Faculty Research Award (twice), the Amazon Research Award (twice), the Young Faculty Researcher Award from the Toyota Research Institute, the Best Conference Paper Award at the International Conference on Robotics and Automation (ICRA), the Paper of the Year Award from the International Journal of Robotics Research (IJRR), the Alfred Rheinstein Faculty Award (Princeton), and the Excellence in Teaching Award from Princeton’s School of Engineering and Applied Science.

1 Spring Street, (Room 403), New Brunswick, NJ and Online
October 26, 2022 11:50 AM Qian Qin, University of Minnesota

Title: Spectral Telescope: Convergence Rate Bounds for Random-Scan Gibbs Samplers Based on a Hierarchical Structure

Abstract: In this talk, we describe a simple but intriguing hierarchical structure found in random-scan Gibbs samplers, or Glauber dynamics. This structure connects Gibbs samplers targeting higher dimensional distributions to Gibbs samplers targeting lower dimensional ones and leads to a quasi-telescoping property concerning their spectral gaps. Based on this property, we derive three new bounds on the spectral gaps and convergence rates of Gibbs samplers on general domains. The three bounds relate a chain’s spectral gap to, respectively, the correlation structure of the target distribution, a class of random walk chains, and a collection of influence matrices. Notably, one of our results generalizes the technique of spectral independence, which has received considerable attention for its success on finite domains, to general state spaces.

Bio: Qian Qin is an assistant professor at the School of Statistics, University of Minnesota. He obtained his PhD in statistics at the University of Florida under the supervision of Jim Hobert. His research interest lies in Markov chain theory, especially analysis of Markov chain Monte Carlo.

Hill Center, Room 552 and Online
November 2, 2022 11:50 AM Efstathia Bura, Vienna University of Technology

Title: Sufficient Reductions in Regression with Mixed Predictors

Abstract: Most data sets comprise of measurements on continuous and categorical variables. Yet, modeling high-dimensional mixed predictors has received limited attention in the regression and classification statistical literature. We study the general regression problem of inferring on a variable of interest based on high dimensional mixed continuous and binary predictors. The aim is to find a lower dimensional function of the mixed predictor vector that contains all the modeling information in the mixed predictors for the response, which can be either continuous or categorical. The approach we propose identifies sufficient reductions by reversing the regression and modeling the mixed predictors conditional on the response. We derive the maximum likelihood estimator of the sufficient reductions, asymptotic tests for dimension, and a regularized estimator, which simultaneously achieves variable (feature) selection and dimension reduction (feature extraction). We study the performance of the proposed method and compare it with other approaches through simulations and real data examples.

Bio: I am heading the Applied Statistics Research Unit (ASTAT) in the Institute of Statistics and Mathematical Methods in Economics with the Faculty of Mathematics and Geoinformation at the Vienna University of Technology (TU Wien). My work focuses on dimension reduction in regression and classification, high-dimensional statistics, multivariate analysis, and applications in biostatistics, econometrics and legal statistics.

Hill Center, Room 552 and Online
November 9, 2022 11:50 AM Ting Ye, University of Washington

Title: Robust Mendelian Randomization in the Presence of Many Weak Instruments and Widespread Horizontal Pleiotropy

Abstract: Mendelian randomization (MR) has become a popular approach to studying the effect of a modifiable
exposure on an outcome by using genetic variants as instrumental variables (IVs). Two distinct challenges persist in
MR: (i) each genetic variant explains a relatively small proportion of variance in the exposure and there are many such
variants, a setting known as many weak IVs; and (ii) many genetic variants may have direct effects on the outcome not
through the exposure, or in genetic terms, when there exists widespread horizontal pleiotropy. To address these two
challenges simultaneously, we propose a novel estimator, the debiased inverse-variance weighted (dIVW) estimator
for summary-data MR and we establish its statistical properties. An extension to the multivariable MR will also be

Bio: Ting Ye is the Genentech Endowed Assistant Professor in Biostatistics at the University of Washington. She
received her Ph.D. in Statistics in 2019 from the University of Wisconsin-Madison and spent two years as a
postdoctoral fellow in Statistics at the Wharton School, University of Pennsylvania. Her current research focuses on
covariate adjustment in randomized controlled trials, Mendelian randomization, and other natural experiment methods
for causal inference. Her website is

Hill Center, Room 552 and Online
November 16, 2022 11:50 AM Xianyang Zhang, Texas A&M University

Title: Change-point Detection: Computation and Statistical Inference

Abstract: Change-point analysis is concerned with detecting and locating structure breaks in the underlying model of a data sequence. It finds an abundance of applications in a wide variety of fields, for example, bioinformatics, finance, and engineering. This talk provides an overview of two different change-point detection frameworks in the literature. The first approach is based on minimizing a cost function over possible numbers and locations of change points. Such an approach requires finding the cost value repeatedly over different segments of the data set, which can be time-consuming. To tackle this issue, we introduce a new method based on sequential gradient descent to find the cost value accurately and efficiently. The core idea is to update the cost value using the information from previous steps without re-optimizing the objective function. Numerical studies show that the new approach can be orders of magnitude faster than the Pruned Exact Linear Time method without sacrificing estimation accuracy. The second approach combines two-sample hypothesis testing with segmentation techniques. A particular challenge within this framework is dealing with the high-dimensionality of data and the nonparametric nature of structure break. We develop a new methodology to detect structural breaks in the distributions of a sequence of high-dimensional observations. We show that the new approach is more efficient than the existing methods.

Bio: Xianyang Zhang is an Associate Professor in the statistics department at Texas A&M University. He obtained his Ph.D. in statistics from the University of Illinois at Urbana Champaign in 2013. His research interests include high dimensional/large-scale statistical inference, kernel methods, genomics data analysis, functional data analysis, time series, and econometrics.

Hill Center, Room 552 and Online
November 16, 2022 2:00 PM Francesca Tombari, KTH Royal Institute of Technology

Title: Realisations of Posets and Tameness

Abstract: Persistent homology is commonly encoded by vector space-valued functors indexed by posets. These functors are called tame, or persistence modules, and capture the life-span of homological features in a dataset. Every poset can be used to index a persistence module, however some posets are particularly well suited.
We introduce a new construction called realisation, which transforms posets into posets. Intuitively, it associates a continuous structure to a locally discrete poset by filling in empty spaces. Realisations share several properties with upper semi-lattices. They behave similarly with respect to certain notions of dimension for posets that we introduce. Moreover, as indexing posets of persistence modules, they allow for good discretisations and effective computation of homological invariants via Koszul complexes.

Hill Center, Room 705
November 30, 2022 11:50 AM Nianqiao Ju, Purdue University

Title: Data Augmentation MCMC for Bayesian Inference from Privatized Data

Abstract: Differentially private mechanisms protect privacy by introducing additional randomness into the data. When the data analyst has access only to the privatized data, it is a challenge to perform valid statistical inference on parameters underlying the confidential data. Specifically, the likelihood function of the privatized data requires integrating over the large space of confidential databases and is typically intractable. For Bayesian analysis, this results in a posterior distribution that is doubly intractable, rendering traditional MCMC techniques inapplicable. We propose an MCMC framework to perform Bayesian inference from the privatized data, which is applicable to a wide range of statistical models and privacy mechanisms. Our MCMC algorithm augments the model parameters with the unobserved confidential data, and alternately updates each one conditional on the other. For the potentially challenging step of updating the confidential data, we propose a generic approach that exploits the privacy guarantee of the mechanism to ensure efficiency. We give results on computational complexity, acceptance rate, and mixing properties of our MCMC. This talk is based on joint work with Jordan Awan, Robin Gong, and Vinayak Rao. (, NeurIPS 2022).

Bio: Nianqiao (Phyllis) Ju is an assistant professor of statistics at Purdue University. Her research focuses on Bayesian inference and computational methods, with applications in privacy-aware data analysis and infectious disease modeling. In her free time, she enjoys running, skiing, and hiking. Check out her personal website ( and blog (

Hill Center, Room 552 and Online
March 3, 2023 3:00 PM Tom Silver, MIT

Title: Neuro-symbolic Learning for Bilevel Planning

Abstract: Decision-making in robotics domains is complicated by continuous state and action spaces, long horizons, and sparse feedback. One way to address these challenges is to perform bilevel planning with abstractions, where a high-level search for abstract plans is used to guide planning in the original transition space. In this talk, I will give an overview of our recent efforts [1, 2, 3, 4] to design a bilevel planning system with state and action abstractions that are learned from data. I will also make the case for learning abstractions that are compatible with highly optimized PDDL planners, while arguing that PDDL planning should be only one component of a larger integrated planning system.

[1] Learning symbolic operators for task and motion planning. Silver*, Chitnis*, Tenenbaum, Kaelbling, Lozano-Perez. IROS 2021.
[2] Learning neuro-symbolic relational transition models for bilevel planning. Chitnis*, Silver*, Tenenbaum, Lozano-Perez, Kaelbling. IROS 2022.
[3] Predicate invention for bilevel planning. Silver*, Chitnis*, Kumar, McClinton, Lozano-Perez, Kaelbling, Tenenbaum. AAAI 2023.
[4] Learning neuro-symbolic skills for bilevel planning. Silver, Athalye, Tenenbaum, Lozano-Perez, Kaelbling. CoRL 2022.

Bio: Tom Silver is a fifth year PhD student at MIT EECS advised by Leslie Kaelbling and Josh Tenenbaum. His research is at the intersection of machine learning and planning with applications to robotics, and often uses techniques from task and motion planning, program synthesis, and reinforcement learning. Before graduate school, he was a researcher at Vicarious AI and received his B.A. from Harvard in computer science and mathematics in 2016. His work is supported by an NSF fellowship and an MIT presidential fellowship.

Host: Kostas Bekris

1 Spring Street, Room 403,
New Brusnwick, NJ and Online
April 19, 2023 11:50 AM Jerry Reiter, Duke University

Title: How Auxiliary Information Can Help Your Missing Data Problem

Abstract: Many surveys (and other types of databases) suffer from unit and item nonresponse.
Typical practice accounts for unit nonresponse by inflating respondents’ survey
weights, and accounts for item nonresponse using some form of imputation. Most
methods implicitly treat both sources of nonresponse as missing at random.
Sometimes, however, one knows information about the marginal distributions of
some of the variables subject to missingness. In this talk, I discuss how such
information can be leveraged to handle nonignorable missing data, including
allowing different mechanisms for unit and item nonresponse (e.g., nonignorable
unit nonresponse and ignorable item nonresponse).

Bio: Jerry Reiter is the Dean of the Natural Sciences and Professor of Statistical Science
at Duke University. His primary areas of research include methods for ensuring
data privacy, for handling missing and erroneous values, for combining
information across sources, and for analyzing complex data in the social sciences
and public policy. He is a Fellow of the American Statistical Association and a
Fellow of the Institute of Mathematical Statistics. He is the recipient of several
teaching and mentoring awards from Duke University, including the Alumni
Distinguished Undergraduate Teaching Award, the Outstanding Postdoctoral
Mentor Award, and the Master’s of Interdisciplinary Data Science Distinguished
Faculty Award.

Hill Center, Room 552 and Online
June 6, 2023 1:00 PM Dimitris Metaxas, Rutgers University

Title: Scalable and Explainable AI Analytics for Computer Vision and Medical Applications

Abstract: Over the past 30 years, we have been developing a general, scalable, computational learning and AI framework that combines principles of computational learning, neural nets, sparse methods, mixed norms, dictionary learning, and deformable modeling methods. This framework has been used for resolution of complex large scale problems in computer vision and biomedical image analysis. In computer vision we will present new machine learning methods for human behavior analytics and explainable scene understanding. In medical image analysis we will present segmentation, registration, tracking and disease classification methods based on clinical and preclinical data.

Short Bio: Dr. Dimitris Metaxas is a Distinguished Professor of Computer Science Department at Rutgers University. He is director of the Center for Computational Biomedicine, Imaging and Modeling (CBIM) and the NSF IUCRC CARTA Center. Dr. Metaxas has been conducting research towards the development of formal machine learning, AI, physics-based modeling methods to advance computer vision, biomedical data analytics, and computer graphics and has pioneered several related methods. Dr. Metaxas has published over 700 research articles in these areas and has graduated 65 PhD students. He is a Fellow of the MICCAI Society, a Fellow the American Institute of Medical and Biological Engineers and a Fellow of IEEE. He has organized as General Chair or Program Chair, IEEE CVPR, ICCV and MICCAI conferences. He will be a General Chair for IEEE CVPR 2026.

June 26, 2023 11:00 AM Martin Balko, Charles University

Title: Basics of Ramsey Theory: the Work of Erdos and Szekeres

Abstract: Ramsey theory is an important tool in combinatorics which has found numerous applications in various fields. We will review the basic features of Ramsey theory and discuss variants of classical extremal problems about planar points, with a main focus on Erdos-Szekeres-type problems.

June 27, 2023 12:00 PM Robert Dougherty Bliss, Rutgers University

Title: Proofs by Example

Abstract: You have been told that mathematical statements require “rigor” and “proof.” This is an overstated belief. Broad classes of combinatorial, algebraic, and even analytic statements can be established purely based on empirical evidence. These techniques are not as well-known as they should be. We will share in the joy of proof by example with a series of problems about sums, integrals, and recurrences.

July 11, 2023 1:00 PM Kristen Hendricks, Alex Kontorovich, Peter March, Shadi Tahvildar-Zadeh, Simon Thomas, Michael Beals, and Kim Weston, Rutgers University

Title: What does an [X Kind of Mathematician] do?

Abstract: It’s difficult to tell from the outside what the names of types of math research actually mean. In this session, six mathematics faculty will each give a five to ten minute discussion of what researchers in their area think about. Expect presentations from Kristen Hendricks, Alex Kontorovich, Peter March, Shadi Tahvildar-Zadeh, Simon Thomas, Michael Beals, and Kim Weston.

July 18, 2023 10:00 AM Peter Winkler, Dartmouth College

Title: Probability and Intuition

Abstract: Almost everything we do is a gamble: crossing the street, driving, flying, shopping, dating, ordering in a restaurant, enrolling in a poetry course, taking a medication… A thousand times a day we make decisions whose outcomes are uncertain (and so do our leaders, except that they gamble with EVERYONE’s lives).
For practically none of these decisions is mathematics used, nor is it usually appropriate. We rely on feelings and experience, and most of the time they do pretty well for us. But it can be useful to know when intuition can lead us astray.
We’ll examine holes in intuition through the medium of puzzles that are designed just for that purpose. In most cases I think you will find that your intuition is pretty good after all; you just need to be alert to some situations where things have to be looked at a different way.