Semi-Latin Rectangles: What they are, and some Constructions for good Designs; Improving the performance of the Dirichlet process mixture model (DPMM) and future challenges; Parameter Inference for State-Space models using Sequential Monte Carlo methods.

Mary Woodcock Kroble
Saturday 10 November 2018
Date: 15 May 2019
Time: 2:00 pm - 3:00 pm

Speaker: Wei Jing, Fanny Empacher and Nseobong Peter Uto.  (CREEM)

Abstract

Nseobong Peter Uto

Semi-Latin rectangles are row-column designs with nice combinatorial properties. They generalize the Latin squares  and  semi-Latin  squares; and  have a wide range  of appli- cations in experimental setting  ranging from agriculture to industry. We bring to view some concepts associated with these  designs;  give some applications; and  also some constructions for good designs of this class.

Fanny Empacher

State-space models are a useful class of models for ecologists as they allow us to explicitly model errors in our observations. However, they are often difficult to fit, as evaluating the likelihoods is usually intractable and traditional algorithms like MCMC struggle to make useful proposals. Sequential Monte Carlo methods provide an alternative for these models. I will give an overview of current methods and discuss their advantages and drawbacks at the example of a case study of the UK Grey Seal population.

Wei Jing

We consider the Dirichlet process mixture model (DPMM) in the context  of clustering for continuous data, when the conditional likelihood is set to be the multivariate normal distribution. Our simulation studies show that the DPMM may struggle to uncover the true clusters when the data contain even just a handful of variables, even when the normality assumption is correct. In the talk, we first introduce the basic DPMM with Gaussian kernels. We then show the problem discovered and give potential reasons of why it can be difficult for the DPMM to identify true clusters. Specifically, this may be because of the inflexibility of the Inverse Wishart distribution, which is currently used as the prior distribution for the within-cluster covariance matrix, and the difference between the overall covariance matrix for the variables (calculated from pooling the data of all the clusters) and the within-cluster covariance matrices. To investigate the effect of the prior specification we implemented different prior distributions for the within-cluster covariance matrix, and compared their performance for datasets of different size and structure. We also propose how to initialize the MCMC sampler to effect considerable improvement in the clustering results. Finally, we discuss limitations of our current proposals and future work.