{"id":1529,"date":"2018-12-31T12:41:05","date_gmt":"2018-12-31T12:41:05","guid":{"rendered":"https:\/\/marriott-stats.com\/nigels-blog\/?p=1529"},"modified":"2023-02-06T00:50:40","modified_gmt":"2023-02-06T00:50:40","slug":"stats-training-materials-multivariate-analysis","status":"publish","type":"post","link":"https:\/\/marriott-stats.com\/nigels-blog\/stats-training-materials-multivariate-analysis\/","title":{"rendered":"Stats Training Materials &#8211; Multivariate Analysis"},"content":{"rendered":"<p>If I were to remark to you that &#8220;the weather is very nice today&#8221; or &#8220;I didn&#8217;t like that person&#8221;, it is unlikely that I would have made such statements based on a single variable.\u00a0 It is more likely that a combination of variables were evaluated to arrive at these statements.\u00a0 When we analysis datasets with multiple variables, we are undertaking <strong>Multivariate Statistical Analysis<\/strong>.<\/p>\n<p>Multivariate Analysis comes in two flavours :-<\/p>\n<ol>\n<li><strong>Analysis of Correlations between Multiple Variables<\/strong> &#8211; Known as R-Analysis &#8211; Informally known as <strong>reducing the dimensionality<\/strong> of your dataset.<\/li>\n<li><strong>Analysis of Distance between Many Objects<\/strong> &#8211; Known as Q-Analysis &#8211; Informally known as <strong>mapping, clustering or segmentation<\/strong> of your dataset.<\/li>\n<\/ol>\n<p><!--more--><\/p>\n<p>At bottom, all multivariate analysis is exploratory analysis of high-dimensional data.\u00a0 There is very little in the way of formal modelling or hypothesis testing.\u00a0 What we are trying to do is find a suitable way to visualise and make sense our data that exists in multi-dimensional space.\u00a0 Often, we seek to reduce (or project) our n-dimensional data into a 2-dimensional chart or table that can be displayed on a screen and a variety of methods exist to help us do this.\u00a0 This is why I like to say that multivariate analysis is more art rather than science but if you have strong statistical thinking skills, you will be able to be more scientific in your interpretation of the data.<\/p>\n<p>I have written the following blog posts using multivariate analysis methods and I hope you find these useful resources for learning more.<\/p>\n<hr \/>\n<h4><span style=\"color: #008000;\"><strong>A. Principal Components Analysis (PCA)<\/strong><\/span><\/h4>\n<p>PCA is a mainstay of R-Analysis, the task of reducing your n-dimensional dataset to fewer (ideally 2 dimensions).\u00a0 As I said before, our weather is an inherently multivariate dataset and over the course of these 4 blogs, I use UK seasonal weather data from the Met Office to explain how PCA works.\u00a0 Please note, the sections on PCA are preceded by some seasonal trend analysis so please skip past these to get to the PCA part which starts in the section headed &#8220;<em>How many dimensions &#8230;<\/em>&#8220;.<\/p>\n<ol>\n<li><a href=\"https:\/\/marriott-stats.com\/nigels-blog\/uk-weather-trends-4-winter-2018\/\" target=\"_blank\" rel=\"noopener noreferrer\">What is a component?<\/a><\/li>\n<li><a href=\"https:\/\/marriott-stats.com\/nigels-blog\/uk-weather-trends-5-spring-2018\/\" target=\"_blank\" rel=\"noopener noreferrer\">What are the advantages of principal components?<\/a><\/li>\n<li><a href=\"https:\/\/marriott-stats.com\/nigels-blog\/uk-weather-trends-6-summer-2018\/\" target=\"_blank\" rel=\"noopener noreferrer\">Can PCA predict our summer weather?<\/a><\/li>\n<li><a href=\"https:\/\/marriott-stats.com\/nigels-blog\/uk-weather-trends-7-autumn-2018\/\" target=\"_blank\" rel=\"noopener noreferrer\">Sometimes PCA tells us nothing<\/a><\/li>\n<\/ol>\n<p>Note all of these posts also the explain the concept of Standardisation (aka Z-Scores).\u00a0 This is an important concept to know about in multivariate analysis so it worth reading the start of these posts to find out more.<\/p>\n<p>Finally, there is a related method to PCA known as Factor Analysis.\u00a0 I am not a big fan of Factor Analysis but it has a lot of similarities to PCA.\u00a0 Please note that software packages will sometimes use Factor Analysis as a generic heading that includes PCA.<\/p>\n<p>&nbsp;<\/p>\n<hr \/>\n<h4><span style=\"color: #008000;\"><strong>B. Multiple Correspondence Analysis (MCA)<\/strong><\/span><\/h4>\n<p>PCA can only be used if your variables are all numerical.\u00a0 When we have categorical variables and we want to reduce the dimensionality of a categorical dataset, we need to use MCA instead.\u00a0 This method is not widely available in software packages but it produces similar outputs to PCA.<\/p>\n<p>As yet I have not written any posts using MCA.<\/p>\n<p>&nbsp;<\/p>\n<hr \/>\n<h4><span style=\"color: #008000;\"><strong>C. Multi-Dimensional Scaling (MDS)<\/strong><\/span><\/h4>\n<p>The most basic method of mapping data so as to assess how far apart objects are from each other is MDS.\u00a0 The idea is relatively simple to explain and the following posts made use of MDS.<\/p>\n<ol>\n<li><a href=\"https:\/\/marriott-stats.com\/nigels-blog\/segmentation-1-who-has-more-in-common-leave-trump-voters-or-remain-clinton-voters-analysis-of-sentiments\/\" target=\"_blank\" rel=\"noopener noreferrer\">Who has more in common?\u00a0 Leave &amp; Trump voters or Remain &amp; Clinton voters?<\/a><\/li>\n<\/ol>\n<p>&nbsp;<\/p>\n<hr \/>\n<h4><span style=\"color: #008000;\"><strong>D. Cluster Analysis (K-Means and AHC)<\/strong><\/span><\/h4>\n<p>Cluster analysis is the mainstay of segmentation, the task of splitting a sample of objects into distinct groups.\u00a0 The two main methods are <a href=\"https:\/\/en.wikipedia.org\/wiki\/Hierarchical_clustering\" target=\"_blank\" rel=\"noopener noreferrer\">Agglomerative Hierarchal Clustering (AHC)<\/a> and <a href=\"https:\/\/en.wikipedia.org\/wiki\/K-means_clustering\" target=\"_blank\" rel=\"noopener noreferrer\">K-Means Clustering<\/a>.\u00a0 Both methods can only be used with numerical datasets.<\/p>\n<p>I have published the following on Cluster Analysis<\/p>\n<ol>\n<li>A presentation given to <a href=\"http:\/\/emps.exeter.ac.uk\/mathematics\/research\/eisa\/\" target=\"_blank\" rel=\"noopener noreferrer\">EXISTA<\/a> about <a href=\"http:\/\/emps.exeter.ac.uk\/media\/universityofexeter\/emps\/eisa\/NigelMarriott.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">how to segment customer databases.<\/a><\/li>\n<\/ol>\n<p>&nbsp;<\/p>\n<hr \/>\n<h4><span style=\"color: #008000;\"><strong>E. Manual Clustering<\/strong><\/span><\/h4>\n<p>It might seem that when you have complex data, you need complex statistical methods to make sense of its multivariate nature.\u00a0 Sometimes though, clusters can be found with a bit of statistical thinking and common sense.\u00a0 It can be a good idea to start a cluster analysis using manual methods before proceeding to the more complex methods.\u00a0 This is especially the case when your data consists of a lot of binary variables.<\/p>\n<p>I have written the following posts which use manual clustering.<\/p>\n<ol>\n<li><a href=\"https:\/\/marriott-stats.com\/nigels-blog\/eu-referendum-6-find-your-way-out-of-the-brexit-maze-in-9-days\/\" target=\"_blank\" rel=\"noopener noreferrer\">Find your way out of the Brexit Maze in 9 days!<\/a><\/li>\n<li><a href=\"https:\/\/marriott-stats.com\/nigels-blog\/by-election-forecasting-model-1-how-to-predict-outcomes-in-the-brexit-era\/\" target=\"_blank\" rel=\"noopener noreferrer\">How to predict by-elections in the Brexit era<\/a>\u00a0&#8211; repeated in <a href=\"https:\/\/www.youtube.com\/watch?v=OL0w7xGSdJU&amp;feature=youtu.be\" target=\"_blank\" rel=\"noopener noreferrer\">this YouTube clip<\/a><\/li>\n<\/ol>\n<p>The first post was referred to in <a href=\"https:\/\/twitter.com\/itvpeston\/status\/1108509657688088577\" target=\"_blank\" rel=\"noopener noreferrer\">the Robert Peston show<\/a> on ITV on 20th March 2019 (<a href=\"https:\/\/www.itv.com\/hub\/peston\/2a4458a0095\" target=\"_blank\" rel=\"noopener noreferrer\">clip starts 34:40 in<\/a>).\u00a0 Apparently it made me &#8220;geek of the week!&#8221;<\/p>\n<hr \/>\n<h4><span style=\"color: #008000;\"><strong>F. Classification Modell<span style=\"color: #008000;\">in<\/span>g<\/strong><\/span><\/h4>\n<p>Classification often makes use of clustering methods but the goal of classification is usually prediction i.e. given what we know about a certain object, can we predict which category they will end up in?\u00a0 A method not mentioned so far that plays a large part in classification modelling is <a href=\"https:\/\/en.wikipedia.org\/wiki\/Linear_discriminant_analysis\" target=\"_blank\" rel=\"noopener noreferrer\">Discriminant Analysis<\/a> which includes an important method of multivariate analysis known as Canonical Variate Analysis (CVA).<\/p>\n<p>As yet, I have not written any posts on Classification Models.<\/p>\n<p>&nbsp;<\/p>\n<hr \/>\n<p>If you would like to book a training course in Multivariate Analysis, then please <a href=\"https:\/\/marriott-stats.com\/contact-us\/\" target=\"_blank\" rel=\"noopener noreferrer\">contact me<\/a>.<\/p>\n<p>I can recommend the following book <a href=\"https:\/\/www.amazon.com\/Multivariate-Statistical-Analysis-Conceptual-Introduction\/dp\/0942154916\/ref=sr_1_1?crid=17M4EJN0VINRR&amp;keywords=multivariate+statistical+analysis&amp;qid=1550412007&amp;s=gateway&amp;sprefix=multivariate+%2Caps%2C217&amp;sr=8-1\" target=\"_blank\" rel=\"noopener noreferrer\">&#8220;Multivariate Statistical Analysis &#8211; A Conceptual Introduction&#8221;<\/a> by Sam Kachigan.\u00a0 This does a very job of keeping the maths to a minimum and focusing on the key concepts instead.<\/p>\n<p>For more information about my other training courses in statistics, please visit my <a href=\"https:\/\/marriott-stats.com\/training\/\" target=\"_blank\" rel=\"noopener noreferrer\">Statistical Training homepage<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>If I were to remark to you that &#8220;the weather is very nice today&#8221; or &#8220;I didn&#8217;t like that person&#8221;, it is unlikely that I would have made such statements based on a single variable.\u00a0 It is more likely that a combination of variables were evaluated to arrive at these statements.\u00a0 When we analysis datasets [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_genesis_hide_title":false,"_genesis_hide_breadcrumbs":false,"_genesis_hide_singular_image":false,"_genesis_hide_footer_widgets":false,"_genesis_custom_body_class":"","_genesis_custom_post_class":"","_genesis_layout":"","footnotes":""},"categories":[7],"tags":[48,72,31,51,93,94,52],"class_list":{"0":"post-1529","1":"post","2":"type-post","3":"status-publish","4":"format-standard","6":"category-stats-training","7":"tag-multivariate-data","8":"tag-principal-components-analysis","9":"tag-segmentation","10":"tag-standardisation","11":"tag-statistical-training","12":"tag-teaching-materials","13":"tag-z-scores","14":"entry","15":"override"},"_links":{"self":[{"href":"https:\/\/marriott-stats.com\/nigels-blog\/wp-json\/wp\/v2\/posts\/1529","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/marriott-stats.com\/nigels-blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/marriott-stats.com\/nigels-blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/marriott-stats.com\/nigels-blog\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/marriott-stats.com\/nigels-blog\/wp-json\/wp\/v2\/comments?post=1529"}],"version-history":[{"count":8,"href":"https:\/\/marriott-stats.com\/nigels-blog\/wp-json\/wp\/v2\/posts\/1529\/revisions"}],"predecessor-version":[{"id":4890,"href":"https:\/\/marriott-stats.com\/nigels-blog\/wp-json\/wp\/v2\/posts\/1529\/revisions\/4890"}],"wp:attachment":[{"href":"https:\/\/marriott-stats.com\/nigels-blog\/wp-json\/wp\/v2\/media?parent=1529"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/marriott-stats.com\/nigels-blog\/wp-json\/wp\/v2\/categories?post=1529"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/marriott-stats.com\/nigels-blog\/wp-json\/wp\/v2\/tags?post=1529"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}