Semi-Latin Rectangles: What they are, and some Constructions for good Designs; Improving the performance of the Dirichlet process mixture model (DPMM) and future challenges; Parameter Inference for State-Space models using Sequential Monte Carlo methods.
Speaker: Wei Jing, Fanny Empacher and Nseobong Peter Uto. (CREEM)
Abstract
Nseobong Peter Uto
Semi-Latin rectangles are row-column designs with nice combinatorial properties. They generalize the Latin squares and semi-Latin squares; and have a wide range of appli- cations in experimental setting ranging from agriculture to industry. We bring to view some concepts associated with these designs; give some applications; and also some constructions for good designs of this class.
Fanny Empacher
State-space models are a useful class of models for ecologists as they allow us to explicitly model errors in our observations. However, they are often difficult to fit, as evaluating the likelihoods is usually intractable and traditional algorithms like MCMC struggle to make useful proposals. Sequential Monte Carlo methods provide an alternative for these models. I will give an overview of current methods and discuss their advantages and drawbacks at the example of a case study of the UK Grey Seal population.
Wei Jing
We consider the Dirichlet process mixture model (DPMM) in the context of clustering for continuous data, when the conditional likelihood is set to be the multivariate normal distribution. Our simulation studies show that the DPMM may struggle to uncover the true clusters when the data contain even just a handful of variables, even when the normality assumption is correct. In the talk, we first introduce the basic DPMM with Gaussian kernels. We then show the problem discovered and give potential reasons of why it can be difficult for the DPMM to identify true clusters. Specifically, this may be because of the inflexibility of the Inverse Wishart distribution, which is currently used as the prior distribution for the within-cluster covariance matrix, and the difference between the overall covariance matrix for the variables (calculated from pooling the data of all the clusters) and the within-cluster covariance matrices. To investigate the effect of the prior specification we implemented different prior distributions for the within-cluster covariance matrix, and compared their performance for datasets of different size and structure. We also propose how to initialize the MCMC sampler to effect considerable improvement in the clustering results. Finally, we discuss limitations of our current proposals and future work.