Project IV: Multivariate Multiple Regression (Ric Crossman)


In Statistical Modelling II (a prerequisite for this project), you were introduced to linear regression, a fundamental approach to statistical modelling. In your introduction to this method, you saw how it can be used to explore the relationship between a single continuous response variable, and one or more predictor variables, which can be numerical or categorical.

This ability to consider how the behaviour of predictor variables relates to the behaviour of a response variable can be very useful in real world contexts. That said, in many of those real-world contexts, we are not interested in just one response variable. Often there are multiple outcomes we might want to explore.

·       In medicine, we might be interested in how a patient's circumstances might affect both their systolic and diastolic blood pressure.

·       In agriculture, we might be interested in how growing conditions affect the speed of growth of a crop, and also how much grain the crop produces.

·       In finance, we might be interested in how the financial climate effects both the profits and expenditures of companies.

Technically, we can already consider this situation: we can create a separate linear regression model for each of the response variables. As each of the above examples hopefully suggests, though, this is a very limited approach, because it does not take into account the fact that the outcome variables may themselves be related. This has multiple implications to how we interpret our models.

This project will consider some or all of these issues, and how they can be addressed via multivariate multiple regression. This will be done through a combination of theoretical exploration and application to relevant data sets.

Group Project


The group project will be centred on how to account for a possible relationship between outcome variables when considering how those outcome variables relate to predictor variable values. We will focus on the case of two outcome variables for ease of graphical representation.

By the end of the group project, students will have learned:

·       The use of the multivariate normal distribution in considering bivariate outcomes,

·       How to compare differences in vector means (including hypothesis tests),

·       When a bivariate test is superior to two univariate tests,

·       How to draw and interpret confidence ellipses.

Mode of Operation and Evidence of Learning for the Group Project


The project will involve reading up on the underlying theory of considering multiple outcomes, and how to rigorously apply statistical concepts and tests in such a situation. Students will demonstrate their understanding by exploring examples and theoretical applications of this material (potentially including forms of trial design), and clearly communicating this exploration in both written and oral formats.

Individual Project


The individual project will build on the introductory material covered in the group project, and use it to explore multivariate multiple regression models. This will include some or all of the following:

·       How to interpret the estimated covariance matrix for the error terms,

·       How to perform hypothesis tests upon model coefficients,

·       How to express uncertainty about future predictions,

·       How to perform principal component analysis.

Some combination of theoretical exploration and application to appropriate datasets will be expected.

Mode of Operation and Evidence of Learning for the Individual Project


The project will involve using combining the underlying theory of considering multiple outcomes with the mathematical theory of multivariate multiple regression. Students will demonstrate their understanding by exploring examples and theoretical applications of this material, and clearly communicating this exploration in both written and oral formats.

Pre-requisites

Statistical Inference 2, Statistical Modelling 2

Web Resources


Getting Started with Multivariate Multiple Regression, UVA Library.
Applied Multivariate Statistical Analysis: Pearson New International Edition, Richard Johnson and Dean Wichern, 2013 Pearson Education Ltd.

Further Information


If you would like more information about this project, or to discuss its scope and/or prerequisites, or if you just have a related question, please let me know at richard.j.crossman@durham.ac.uk.