---
title: "Dissertation - Data Analysis Report"
author: "Partha Vuppalapaty"
date: "3/22/2021"
output:
word_document: default
pdf_document: default
html_document: default
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(cache = TRUE)
library(lavaan)
library(semPlot)
library(semTools)
library(QuantPsyc)
library(MVN)
library(gclus)
library(olsrr)
library(corrplot)
```
## Dataset Preparation
The survey responses dataset (collected from qualtrics) is imported as r dataframe 'df'.
```{r dataPrep}
# Extract survery responses from qualitrics as CSV and load the dataframe
df <- read.csv(file = '/Users/partha/Downloads/Temple DBA/Dissertation Proposal/Data/SurveyResponseData_Dissertation_Defense_20201121-20210326.csv')
# Filter the dataset for finsished survey using the boolean flag field 'Finished'. A value of 1 indicates that the survey is marked as finished by qualtrics
df_finish <- df[which(df$Finished==1 & df$Status==0 ),]
# create interaction effect measures for the independent latent variables using orthogonalize technique (Little, T. D., Bovaird, J. A., & Widaman, K. F. (2006))
df_finish.orth <- orthogonalize(df_finish, c('Q7','Q8','Q9'), c('Q10','Q11','Q12'), match = FALSE)
# subset data frame with just survey questions Q7 through Q18
df_SurveyQs <- df_finish[c("Q7","Q8","Q9","Q10","Q11","Q12","Q13","Q14","Q15","Q16","Q17","Q18")]
```
This dataset 'df_finish' is filtered dataset for completed survey responses from 'df'. To test the moderation effect that is hypothesized in the conceptual model, the latent variables 'demand unpredictability' and 'intelligent automation' are orthogonalized. This will later help in testing the interaction of demand unpredictability on the intelligent automation's effect on desired agility in capital budgeting.
Little, T. D., Bovaird, J. A., & Widaman, K. F. (2006). On the merits of orthogonalizing powered and product terms: Implications for modeling interactions among latent variables. Structural Equation Modeling, 13(4), 497-519.
## Descriptive Statistics
Testing for univariate normality for the survey questions.
It is observed that survey questions do not follow normal distribution.
```{r descriptiveStats, echo=FALSE}
# multi-variate normality test for the survey questions
mult.norm(df_SurveyQs)$mult.test
mvn(data = df_SurveyQs, mvnTest = "mardia", univariatePlot = "qqplot")
mvn(data = df_SurveyQs, mvnTest = "mardia", univariatePlot = "histogram")
mvn(data = df_SurveyQs, mvnTest = "mardia", univariatePlot = "box")
boxplot(df_SurveyQs, horizontal = TRUE)$out
```
## Conceptual Model
The conceptual model is setup in r using
* measurement model
+ Latent variable - demand unpredictability (du) is measured from observations in Q7, Q8 and Q9
+ Latent variable - intelligent automations (ia) is measured from observations in Q10, Q11 and Q12
+ Latent variable - agility in capital budgeting (acb) is measured from observations in Q13 through Q18
+ Latent variable - moderation effect (mod)
* Regressions
+ acb ~ b1 * ia + b2 * du + b3 * mod # Effect of 'ia' on 'acb' and Mediating effect of 'ia' from 'du' onto 'acb'
+ ia ~ a*du ` # Effect of 'du' on 'ia'
```{r conceptualModel, echo=FALSE}
model <- '
# Measurement Model - Latent Variables
du =~ Q7 + Q8 + Q9
ia =~ Q10 + Q11 + Q12
acb =~ Q13 + Q14 + Q15 + Q16 + Q17 + Q18
mod =~ Q7.Q10 + Q7.Q11 + Q8.Q10 + Q8.Q11 + Q9.Q10 + Q9.Q11 + Q9.Q12 + Q7.Q12 + Q8.Q12
# Regressions
ia ~ a*du
acb ~ b1*ia + b2*du + b3*mod
# Direct Effect of du on acb
direct := b2
# Indirect Effect (a*b1)
indirect := a*b1
# Total Effect
total := b2+(a*b1)
'
```
## Model Fitting
In confirmatory factor analysis (CFA), the use of maximum likelihood (ML) assumes that the observed indicators follow a continuous and multivariate normal distribution, which is not appropriate for ordinal observed variables. Robust ML (MLR) has been introduced into CFA models when this normality assumption is slightly or moderately violated. Diagonally weighted least squares (WLSMV), on the other hand, is specifically designed for ordinal data.
Due to the nature of data, default estimator 'ML' does not give meaningful results as this data is collected through survey responses which ordinal and continuous data (mostly 7 point likert scale). So, we use MLSMV (maximum likelihood estimation with robust standard errors and a mean and variance adjusted test statistic) aka the Satterthwaite approach.
Li, CH. Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares. Behav Res 48, 936–949 (2016). https://doi.org/10.3758/s13428-015-0619-7
A CFA/SEM rule of thumb is the ratio of cases to free parameters, or N:q is commonly used for minimum recommendations and 10:1 to 20:1 is a commonly suggested ratio (Schumacker & Lomax, 2015;Kline, 2016;Jackson, 2003). THe N:q ratio for this study is 166/12 = 13.83.
(Sample size rule of thumb reference and analysis. Ref: 10.4236/psych.2018.98126)
Christine, et al (2014) study compared diagonal weighted least squares robust estimation techniques available in 2 popular statistical programs: diagonal weighted least squares (DWLS; LISREL version 8.80) and weighted least squares–mean (WLSM) and weighted least squares—mean and variance adjusted (WLSMV; Mplus version 6.11). Both DWLS and WLSMV produced accurate parameter estimates
Christine DiStefano & Grant B. Morgan (2014) A Comparison of Diagonal Weighted Least Squares Robust Estimation Techniques for Ordinal Data, Structural Equation Modeling: A Multidisciplinary Journal, 21:3, 425-438, DOI: 10.1080/10705511.2014.915373
```{r fitSEMModel, echo=FALSE}
fit <- sem(model, data=df_finish.orth, estimator = "DWLS")
```
## Data Analysis
Display the summary of results for this fitted model. This block analyses the multi-variate normality using mardia test.
```{r dataAnalysis_mvn, echo=FALSE}
# Adding the predicted values for latent variables from the confirmatory factor analysis (CFA)
idx <- lavInspect(fit, "case.idx")
fscores <- lavPredict(fit)
for (fs in colnames(fscores)) {
df_finish.orth[idx, fs] <- fscores[ , fs]
}
mvn(data = df_finish.orth[c("du","ia","acb","mod")], mvnTest = "mardia", univariatePlot = "histogram")
mvn(data = df_finish.orth[c("du","ia","acb","mod")], mvnTest = "mardia", univariatePlot = "qqplot")
mvn(data = df_finish.orth[c("du","ia","acb","mod")], mvnTest = "mardia", univariatePlot = "box")
```
Plots for visual inspection of data through correlation matrix.
```{r xyplots, echo=FALSE}
x1 <- df_finish.orth$du
x2 <- df_finish.orth$ia
x1x2 <- df_finish.orth$mod
y <- df_finish.orth$acb
corr_val <- cor(fscores)
corrplot(corr_val,type = "lower",order="hclust")
corrplot.mixed(corr_val)
# Plot with main and axis titles
# Add regression line
plot(x1, y, main = "du -> acb",
xlab = "Demand Unpredictability", ylab = "Agility in Capital Budgeting",
pch = 19, frame = FALSE)
abline(lm(y ~ x1, data = df_finish.orth), col = "blue")
plot(x1, x2, main = "du -> ia",
xlab = "Demand Unpredictability", ylab = "Intelligent Automations",
pch = 19, frame = FALSE)
abline(lm(x2 ~ x1, data = df_finish.orth), col = "blue")
plot(x2, y, main = "ia -> acb",
xlab = "Intelligent Automations", ylab = "Agility in Capital Budgeting",
pch = 19, frame = FALSE)
abline(lm(y ~ x2, data = df_finish.orth), col = "blue")
plot(x1x2, y, main = "du*ia -> acb",
xlab = "Interaction Effect of du and ia", ylab = "Agility in Capital Budgeting",
pch = 19, frame = FALSE)
abline(lm(y ~ x1x2, data = df_finish.orth), col = "blue")
```
Plots for visual inspection of data through correlation matrix.
```{r dataAnalysis_corrMatrix, echo=FALSE}
# Correlation Matrix for the latent variables
dta <- df_finish.orth[c("du","ia","acb","mod")] # get data
dta.r <- abs(cor(dta)) # get correlations
dta.col <- dmat.color(dta.r) # get colors
cpairs(dta, panel.colors=dta.col, gap=.5, main="Correlation Matrix of Latent Variables in SEM ")
```
Plots for visual inspection of data through basic scatterplots
```{r dataAnalysis_scatterPlotMatrix, echo=FALSE}
# Basic Scatterplot Matrix
pairs(~du+ia+acb+mod,data=dta, main="Simple Scatterplot Matrix")
```
Normality tests for latent variables
```{r dataAnalysis_normalityTests, echo=FALSE}
# Testing multivariate and univariate normality
mult.norm(dta)$mult.test
shapiro.test(dta$du)
shapiro.test(dta$ia)
shapiro.test(dta$mod)
shapiro.test(dta$acb)
```
Plots for visual inspection of data and correlation tests.
```{r dataAnalysis_, echo=FALSE}
model_l <- lm(acb ~ du + ia + mod, data = df_finish.orth)
ols_vif_tol(model_l)
ols_eigen_cindex(model_l)
ols_plot_resid_fit_spread(model_l)
ols_plot_diagnostics(model_l)
ols_plot_cooksd_chart(model_l)
ols_plot_dfbetas(model_l)
#df_finish.orth[111,]
```
### SEM Fit Summary - WLSMV
```{r semfitDWLS, echo=FALSE}
print(summary(fit, fit.measures=TRUE, standardized=TRUE))
```
### All Fit Measures
```{r allFitMeasures, echo=FALSE}
fitMeasures(fit)
```
Analysis of fit measures Kline suggests that at a minimum the following
indices should be reported: 1) The model chi-square 2) RMSEA 3) CFI 4)
SRMR
References: Principles and Practice of Structural Equation Modeling. Rex
B. Kline. 2005. Structural Equation Modelling: Guidelines for
Determining Model Fit. Daire Hooper, et al. 2008.
##Inspect fit
```{r inspectFit, echo=FALSE}
inspect(fit)
```
```{r inspectFit_keyIndices, echo=FALSE}
fitMeasures(fit, c("chisq","df","pvalue","gfi","agfi","nfi", "nnfi","cfi","rmsea","srmr"))
```
visual representation of the path analysis
```{r modelView#1, echo=FALSE}
semPaths(fit,"std",layout = 'tree', edge.label.cex=.9, curvePivot = TRUE)
```
visual representation of the path analysis without estimates
```{r modelView#2, echo=FALSE}
semPaths(fit, "std", "hide")
```
visual representation of the path analysis in lisrel style
```{r modelView#3, echo=FALSE}
semPaths(fit, what="paths",whatLabels="par",style="lisrel",layout="tree", rotation=2)
```
visual representation of the path analysis in lisrel style with color coding
```{r modelView#4, echo=FALSE}
semPaths(fit, 'eq', 'std', style="lisrel", layout='tree2', shapeMan = "rectangle",
edge.color="black", intercepts=F, rotation=2, curvature=T, sizeLat=10,
sizeMan1=9, sizeMan2=5, edge.label.cex=0.9, fixedStyle=1, mar=c(1, 8, 1, 8),
groups = "latents", pastel = TRUE)
```