Multivariate Analysis
We are going to generate a random sample with only 100 observations.
[,1] [,2]
[1,] -1.0741271 -6.2308487
[2,] -2.4661300 0.8106840
[3,] 0.2183284 6.6744679
[4,] -5.5522441 -2.0566592
[5,] -0.4462097 3.9457286
[6,] 1.7441782 -2.3031246
[7,] 1.2351988 9.9574407
[8,] -2.7554396 2.0700468
[9,] -2.1873608 6.4698140
[10,] -2.7969106 1.7510170
[11,] 6.3789467 16.1355967
[12,] 0.7323506 3.0172191
[13,] -7.8417874 0.9103693
Next we explore the first distances.
[1] 3.39129380 0.37876413 0.83267824 2.84059203 0.13332892 2.76882922
[7] 2.46643474 0.62195238 1.77851072 0.60374427 9.67284803 0.24288097
[13] 7.01739717 0.74761170 1.22781669 6.42601375 0.71188709 2.57175868
[19] 1.98476521 7.03676032 3.09725662 0.27669797 0.78213571 3.67510427
[25] 1.28436604 2.07657456 0.06721147 0.22965271 3.13841683 0.23892611
[31] 2.09274202 3.31698715 0.72283374 0.20967035 4.21639792 0.08257476
[37] 1.91980070 3.08709476 0.85390245 0.28108069 3.03656235 2.88413520
[43] 1.67783764 1.32440914 0.45278643 1.42526294 1.64212940 3.28646247
[49] 0.93468125 5.50202719
This package performs multivariate normality tests and graphical approaches.
$multivariateNormality
Test Statistic p value Result
1 Mardia Skewness 2.08922939632991 0.719351377140726 YES
2 Mardia Kurtosis -0.271717735704995 0.785839063078596 YES
3 MVN <NA> <NA> YES
$univariateNormality
Test Variable Statistic p value Normality
1 Anderson-Darling Column1 0.1693 0.9324 YES
2 Anderson-Darling Column2 0.2976 0.5828 YES
$Descriptives
n Mean Std.Dev Median Min Max 25th 75th
1 100 -0.6843622 2.895154 -0.6736072 -7.841787 6.378947 -2.7658074 1.201993
2 100 2.1899036 4.948071 1.9433470 -10.704631 16.135597 -0.8697526 5.832291
Skew Kurtosis
1 0.11563077 -0.36866463
2 0.09604895 -0.07602018
$multivariateNormality
Test HZ p value MVN
1 Henze-Zirkler 0.3561086 0.8891539 YES
$univariateNormality
Test Variable Statistic p value Normality
1 Anderson-Darling Column1 0.1693 0.9324 YES
2 Anderson-Darling Column2 0.2976 0.5828 YES
$Descriptives
n Mean Std.Dev Median Min Max 25th 75th
1 100 -0.6843622 2.895154 -0.6736072 -7.841787 6.378947 -2.7658074 1.201993
2 100 2.1899036 4.948071 1.9433470 -10.704631 16.135597 -0.8697526 5.832291
Skew Kurtosis
1 0.11563077 -0.36866463
2 0.09604895 -0.07602018
$multivariateNormality
Test H p value MVN
1 Royston 0.01088171 0.9947412 YES
$univariateNormality
Test Variable Statistic p value Normality
1 Anderson-Darling Column1 0.1693 0.9324 YES
2 Anderson-Darling Column2 0.2976 0.5828 YES
$Descriptives
n Mean Std.Dev Median Min Max 25th 75th
1 100 -0.6843622 2.895154 -0.6736072 -7.841787 6.378947 -2.7658074 1.201993
2 100 2.1899036 4.948071 1.9433470 -10.704631 16.135597 -0.8697526 5.832291
Skew Kurtosis
1 0.11563077 -0.36866463
2 0.09604895 -0.07602018
$multivariateNormality
Test E df p value MVN
1 Doornik-Hansen 0.3881136 4 0.9834385 YES
$univariateNormality
Test Variable Statistic p value Normality
1 Anderson-Darling Column1 0.1693 0.9324 YES
2 Anderson-Darling Column2 0.2976 0.5828 YES
$Descriptives
n Mean Std.Dev Median Min Max 25th 75th
1 100 -0.6843622 2.895154 -0.6736072 -7.841787 6.378947 -2.7658074 1.201993
2 100 2.1899036 4.948071 1.9433470 -10.704631 16.135597 -0.8697526 5.832291
Skew Kurtosis
1 0.11563077 -0.36866463
2 0.09604895 -0.07602018
$multivariateNormality
Test Statistic p value MVN
1 E-statistic 0.4748378 0.912 YES
$univariateNormality
Test Variable Statistic p value Normality
1 Anderson-Darling Column1 0.1693 0.9324 YES
2 Anderson-Darling Column2 0.2976 0.5828 YES
$Descriptives
n Mean Std.Dev Median Min Max 25th 75th
1 100 -0.6843622 2.895154 -0.6736072 -7.841787 6.378947 -2.7658074 1.201993
2 100 2.1899036 4.948071 1.9433470 -10.704631 16.135597 -0.8697526 5.832291
Skew Kurtosis
1 0.11563077 -0.36866463
2 0.09604895 -0.07602018
What can we conclude?
\[ \mathbf{X} \sim p \, N_2(\boldsymbol{\mu}_1, \boldsymbol{\Sigma}_1) + (1-p) \, N_2(\boldsymbol{\mu}_2, \boldsymbol{\Sigma}_2) \]
n <- 1000
p <- 0.6
x <- matrix(NA, ncol=2, nrow=n)
Media1 <- c(0, 0)
Sigma1 <- matrix(c(2, 2, 2, 2), byrow=TRUE, ncol=2)
Media2 <- c(0, 0)
Sigma2 <- matrix(c(2, -2, -2, 2), byrow=TRUE, ncol=2)
for (i in 1:n) {
u <- runif(n=1)
if (u <= p)
x[i, ] <- mvrnorm(n=1, mu=Media1, Sigma=Sigma1)
else
x[i, ] <- mvrnorm(n=1, mu=Media2, Sigma=Sigma2)
}
[,1] [,2]
[1,] -0.04786903 0.04786903
[2,] -0.95473008 -0.95473008
[3,] -0.15476335 0.15476335
[4,] 0.40931732 0.40931732
[5,] 1.06269829 1.06269829
[6,] -0.27170708 -0.27170708
[7,] 0.04105247 0.04105247
[8,] 1.38018770 -1.38018770
[9,] 1.00784519 1.00784519
[10,] -1.92402870 -1.92402870
[11,] 0.64420849 -0.64420849
[12,] -0.94884616 -0.94884616
[13,] 0.14179830 0.14179830
[1] 0.002791700 0.811411475 0.030652276 0.145283948 0.990438826
[6] 0.067019837 0.001233372 2.501040194 0.890488322 3.282377722
[11] 0.546507654 0.801479896 0.016836787 0.493156761 2.998014105
[16] 0.163655750 0.080242837 0.274717446 4.746104832 2.637748291
[21] 0.051762792 0.128223199 0.170947242 0.553148894 2.376350060
[26] 0.191273424 3.841552031 0.001844047 1.440441092 1.622249550
[31] 0.930085738 16.988913719 2.415464228 0.043853709 0.063663213
[36] 0.005958051 0.013315536 2.913530122 0.582271576 0.114249901
[41] 0.135181469 2.173391253 1.067986473 9.598103789 0.822119524
[46] 1.463277002 0.433409414 0.381100495 3.880129768 0.271759184
$multivariateNormality
Test Statistic p value Result
1 Mardia Skewness 5.01324758394094 0.285940909391574 YES
2 Mardia Kurtosis 23.4587751604463 0 NO
3 MVN <NA> <NA> NO
$univariateNormality
Test Variable Statistic p value Normality
1 Anderson-Darling Column1 0.3425 0.4910 YES
2 Anderson-Darling Column2 0.2940 0.5996 YES
$Descriptives
n Mean Std.Dev Median Min Max 25th
1 1000 0.001948238 1.376903 -0.004748540 -4.293856 5.007414 -0.9027623
2 1000 0.005547561 1.376893 0.007787424 -4.922250 5.007414 -0.9069348
75th Skew Kurtosis
1 0.9411775 0.0791648370 0.2450577
2 0.9351713 -0.0002342378 0.2455167
$multivariateNormality
Test HZ p value MVN
1 Henze-Zirkler 46.09091 0 NO
$univariateNormality
Test Variable Statistic p value Normality
1 Anderson-Darling Column1 0.3425 0.4910 YES
2 Anderson-Darling Column2 0.2940 0.5996 YES
$Descriptives
n Mean Std.Dev Median Min Max 25th
1 1000 0.001948238 1.376903 -0.004748540 -4.293856 5.007414 -0.9027623
2 1000 0.005547561 1.376893 0.007787424 -4.922250 5.007414 -0.9069348
75th Skew Kurtosis
1 0.9411775 0.0791648370 0.2450577
2 0.9351713 -0.0002342378 0.2455167
What can we conclude?
Consider the next random variable \(\mathbf{X}\):
\[ \mathbf{X} \sim p \, N_2(\boldsymbol{\mu}_1, \boldsymbol{\Sigma}_1) + (1-p) \, N_2(\boldsymbol{\mu}_2, \boldsymbol{\Sigma}_2) \] assume that \(\boldsymbol{\mu}_1=(0, 0)^\top\) and \(\boldsymbol{\mu}_2=(d, d)^\top\).
The covariance matrices are
\[ \boldsymbol{\Sigma}_1= \begin{pmatrix} \sigma_{11} & \sqrt{\sigma_{11} \sigma_{22}} \rho_1 \\ & \sigma_{22} \end{pmatrix} \]
\[ \boldsymbol{\Sigma}_2= \begin{pmatrix} \sigma_{11} & \sqrt{\sigma_{11} \sigma_{22}} \rho_2 \\ & \sigma_{22} \end{pmatrix} \]
Some parameters are fixed \(\sigma_{11}=1\), \(\sigma_{22}=1.5\) and \(\rho_1=0.3\).
The goal is to explore the effects of \(n\), \(d\), \(\rho_2\) and \(p\) on the rejection rate of \(H_0\) with the Mardia Skewness test.
You need to simulate 50 random samples for each combination of \(n\), \(d\), \(\rho_2\) and \(p\) given above.
In total, we have \(2\times2\times2\times2=16\) treatments or combinations. In only two treatments the null hypothesis true.
Could you identify those treatments?
Complete this table.
\(n\) | \(d\) | \(\rho_2\) | \(p\) | Rejection rate of \(H_0\) |
---|---|---|---|---|
10 | 0 | 0.3 | 0.5 | ? |
10 | 0 | 0.3 | 0.7 | ? |
10 | 0 | 0.5 | 0.5 | ? |
10 | 0 | 0.5 | 0.7 | ? |
10 | 1 | 0.3 | 0.5 | |
10 | 1 | 0.3 | 0.7 | |
10 | 1 | 0.5 | 0.5 | |
10 | 1 | 0.5 | 0.7 | |
20 | 0 | 0.3 | 0.5 | |
20 | 0 | 0.3 | 0.7 | |
20 | 0 | 0.5 | 0.5 | |
20 | 0 | 0.5 | 0.7 | |
20 | 1 | 0.3 | 0.5 | |
20 | 1 | 0.3 | 0.7 | |
20 | 1 | 0.5 | 0.5 | |
20 | 1 | 0.5 | 0.7 |