Biostatistics Seminar: Transcription factor target identification via knock-down experiments

Please join the Biostatistics Seminar October 22, 2018 at 12:00 Noon at 60 College Street, Rm. 216. Please also refer to the flyer attached.

Thank you.

Speaker: Leying Guan

Institution: Stanford University

Time & Place: 12:00 Noon in Room 216, 60 College Street

Title: “Transcription factor target identification via knock-down experiments”


The perturbation of a transcription factor should affect the expression levels of its direct targets. However, not all genes showing changes in expression are direct targets. To increase the chance of detecting direct targets, we propose two new methods. The first one is a modified two-group model using data from a single knock-down experiment. The null group corresponds to genes that are not direct targets, but can have small non-zero effects. The second method constructs a depth constrained network using data from multiple knock-down experiments. I will give details of both methods and present some simulated as well as real data examples.

Math-applied Next week: Applied Math Program: Seminar & Refreshments Wednesday, Oct 10, 2018



Speaker: David van Dijk, Yale

Date: Wednesday, October 10, 2018

Time: 3:45p.m. Refreshments (AKW, 1st Floor Break Area)
4:00p.m. Seminar (LOM 214)

Title: Understanding neural networks inside and out: designing constraints to enhance interpretability

Deep neural networks can learn meaningful representations of data. However, these representations are hard to interpret. In this talk I will present three ongoing projects in which I use specially designed constraints on latent representations of neural nets in order to make them more interpretable. First, I will present SAUCIE (Sparse Autoencoder for Clustering Imputation and Embedding) which is a framework for performing several data analysis tasks on a unified data representation. In SAUCIE we constrain the latent dimensions to be amenable to clustering, batch correction, imputation, and
visualization. Next, I will present a novel class of regularizations termed Graph Spectral Regularizations that impose graph structure on the otherwise unstructured activations of latent layers. By
considering the activations as signals on this graph we can use graph signal processing, and specifically graph filtering, to constrain the activations. I will show that, among other uses, this allows us to extract topological structure, such as clusters and progressions from data. Further, I will show that when the imposed graph is a 2D grid, with a smoothing penalty, the latent encodings become image-like. Such imposed grid structure also allows for addition of convolutional layers, even when the input data is naturally unstructured. Finally, in the third project, I propose a neural network framework, termed DyMoN (Dynamics Modeling network), that is capable of learning any stochastic dynamic process. I show that a DyMoN can learn harmonic and chaotic behavior, of single and double pendula respectively, and can give insight into the dynamics of biological systems.

YINS 9/17 Applied DS Seminar, Andrew Barron: Deep Learning

“Accuracy of High-Dimensional Deep Learning Networks”

Speaker: Andrew Barron
Professor of Statistics and Data Science at Yale University

Monday, September 17, 4:15-5:30pm

Location: Yale Institute for Network Science, 17 Hillhouse Ave, Room 328

Talk summary: It has been experimentally observed in recent years that multi-layer artificial neural networks have a surprising ability to generalize, even when trained with far more parameters than observations. Is there a theoretical basis for this? The best available bounds on their metric entropy and associated complexity measures are essentially linear in the number of parameters, which is inadequate to explain this phenomenon. Here we examine the statistical risk (mean squared predictive error) of multi-layer networks with L1 controls on their parameters and with ramp activation functions (also called lower-rectified linear units). In this setting, the risk is shown to be upper-bounded by [(L^3 log d)/n]^{1/2}, where d is the input dimension to each layer, L is the number of layers, and n is the sample size. In this way, the input dimension can be much larger than the sample size and the estimator can still be accurate, provided the target function has such L1 controls and that the sample size is at least moderately large compared to L^3 log d. The heart of the analysis is the development of a sampling strategy that demonstrates the accuracy of a sparse covering of deep ramp networks. Lower bounds show that the identified risk is minimax optimal, this being so already in the subclass of functions with L = 2. This is joint work with Jason Klusowski.

Statseminars Fwd: YINS 9/12 YINS Seminar, Constantinos Daskalakis: Adversarial Networks

“Improving Generative Adversarial Networks using Game Theory and Statistics”

Speaker: Constantinos Daskalakis
Professor of Computer Science and Electrical Engineering at MIT

Wednesday, September 12, 12:00-1:00pm

Location: Yale Institute for Network Science, 17 Hillhouse Ave, Room 328

Talk summary: Generative Adversarial Networks (aka GANs) are a recently proposed approach for learning samplers of high-dimensional distributions with intricate structure, such as distributions over natural images, given samples from these distributions. They are obtained by setting up a two-player zero-sum game between two neural networks, which learn statistics of a target distribution by adapting their strategies in the game using gradient descent. Despite their intriguing performance in practice, GANs pose great challenges to both optimization and statistics. Their training suffers from oscillations, and they are difficult to scale to high-dimensional settings. We study how game-theoretic and statistical techniques can be brought to bear on these important challenges. We use Game Theory towards improving GAN training, and Statistics towards scaling up the dimensionality of the generated distributions.