R: One Way Anova and pairwise post hoc tests (Turkey, Scheffe or other) for numerical columns -


i have 3 columns in dataframe dune (below - bottom of page) describing % cover of marram grass recorded 3 different sand dune ecosystems:

(1) restored; (2) degraded; , (3) natural;

i have performed 2 different 1 way anova tests (below) - test 1 , test 2 - establish significant differences between ecosystems. test 1 shows significant differences between ecosystems; however, test 2 shows no significant differences. box plot's (below) show stark differences in variance between ecosystems.

afterwards, melted dataframe produce factorial column (i.e, headed ecosystem.type) response variable. idea apply glm model (test 3 - below)to test 1 way anova; however, method unsuccessful (please find error message below).

problem

i confused whether code perform each 1 way anova test correct , correct procedure perform post hoc tests (turkey hsd, scheffe or others) distinguish pairs of ecosystems different. if can help, appreciative advice. many thanks....

data(dune) 

test 1

dune.type.1<-aov(natural~restored+degraded, data=dune) summary.aov(dune.type.1, intercept=t)                 df sum sq mean sq f value   pr(>f)          (intercept)  1  34694   34694 138.679 1.34e-09 ***      restored     1     94      94   0.375    0.548          degraded     1    486     486   1.942    0.181          residuals   17   4253     250                                 ---   signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

post-hoc test's

    posthoc<-tukeyhsd(dune.type.1, conf.level=0.95)      error in tukeyhsd.aov(dune.type.1, conf.level = 0.95) :       no factors in fitted model      in addition: warning messages:     1: in replications(paste("~", xx), data = mf) :        non-factors ignored: restored     2: in replications(paste("~", xx), data = mf) :        non-factors ignored: degraded 

test 2

     dune1<-aov(restored~natural, data=dune)      dune2<-aov(restored~degraded, data=dune)      dune3<-aov(degraded~natural, data=dune)       summary(dune1)                   df sum sq mean sq f value pr(>f)      natural      1     86   85.58   0.356  0.558      residuals   18   4325  240.26                     summary(dune2)                   df sum sq mean sq f value pr(>f)      degraded     1    160   159.7   0.676  0.422      residuals   18   4250   236.1                      summary(dune3)                   df sum sq mean sq f value pr(>f)      natural      1  168.5  168.49   2.318  0.145      residuals   18 1308.5   72.69    

test 3

melt.dune<-melt(dune, measure.vars=c("degraded", "restored", "natural"))   colnames(melt.dune)=c("ecosystem.type", "percentage.cover") melt.dune$percentage.cover<-as.numeric(melt.dune$percentage.cover)  glm.dune<-glm(ecosystem.type~percentage.cover, data=melt.dune) summary(glm.dune)  error  glm.dune<-glm(ecosystem.type~percentage.cover, data=melt.dune) error in glm.fit(x = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,  :  na/nan/inf in 'y' in addition: warning messages: 1: in ops.factor(y, mu) : ‘-’ not meaningful factors 2: in ops.factor(eta, offset) : ‘-’ not meaningful factors 3: in ops.factor(y, mu) : ‘-’ not meaningful factors 

melted dataframe

structure(list(ecosystem.type = structure(c(1l, 1l, 1l, 1l, 1l,  1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 1l, 2l,  2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l, 2l,  2l, 2l, 2l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l, 3l,  3l, 3l, 3l, 3l, 3l, 3l, 3l), .label = c("degraded", "restored",  "natural"), class = "factor"), percentage.cover = c(12, 17, 21,  11, 22, 16, 7, 9, 14, 2, 3, 15, 23, 4, 19, 36, 26, 4, 15, 23,  38, 46, 65, 35, 54, 29, 48, 13, 19, 33, 37, 55, 11, 53, 13, 24,  28, 44, 42, 39, 18, 61, 31, 46, 51, 51, 41, 44, 55, 47, 73, 43,  25, 42, 21, 13, 65, 30, 47, 29)), row.names = c(na, -60l), .names =         c("ecosystem.type",   "percentage.cover"), class = "data.frame") 

enter image description here

data

 structure(list(degraded = c(12l, 17l, 21l, 11l, 22l, 16l, 7l,   9l, 14l, 2l, 3l, 15l, 23l, 4l, 19l, 36l, 26l, 4l, 15l, 23l),   restored = c(38l, 46l, 65l, 35l, 54l, 29l, 48l, 13l, 19l,   33l, 37l, 55l, 11l, 53l, 13l, 24l, 28l, 44l, 42l, 39l), natural = c(18l,   61l, 31l, 46l, 51l, 51l, 41l, 44l, 55l, 47l, 73l, 43l, 25l,   42l, 21l, 13l, 65l, 30l, 47l, 29l)), .names = c("degraded",   "restored", "natural"), class = "data.frame", row.names = c(na,   -20l)) 

there several things point you.

first, test 1 , test 2 produce similar results. difference selected intercept on test 1 , outcome tells if fit linear model (i come in few minutes) intercept required. hence significance see whether line force fit needs intercept or not. try using "intercept=t" other outcomes , pretty sure similar results.

the second thing should careful linear model try fit. dune.type.1 model model see how correlated different sand dune ecosystems are. in other words, assume there linear association between natural , restored , every unit increase of restored have increase on natural. if understood correctly want examine mean differences , not correlation. can 2 things:

  1. the data prepared perform t.tests (a test compares mean between several categories). easy in r , valid since variables reasonably normal. have multiple testing issues (you perform 3 t-tests mean differences), , need use bonferroni correction.

  2. but think want following:

first reform data

       data <- data.frame(v = c(dune$degraded, dune$restored, dune$natural),                     labels = c(rep("degraded", times = 20), rep("restored", times = 20),                                rep("natural", times = 20))) 

then fit linear model

    mod.1 <- lm(v ~ labels, data = data)     summary(mod.1)     lm(formula = v ~ labels, data = data)  residuals: min      1q  median      3q     max  -28.650 -10.725   0.875   8.050  31.350   coefficients:                estimate std. error t value pr(>|t|)     (intercept)      14.950      3.066   4.875 9.07e-06 *** labelsnatural    26.700      4.337   6.157 7.95e-08 *** labelsrestored   21.350      4.337   4.923 7.64e-06 *** 

where can see mean of baseline category (i.e degraded) smaller mean of natural category , etc. can check model assumptions, see if model fit

    qqnorm(residuals(mod.1))     qqline(residuals(mod.1)) 

enter image description here residuals reasonably normal model fine. can follow anova approach , have:

    anova.model <- aov(v ~ labels, data = data))     summary(anova.model)               df sum sq mean sq f value   pr(>f)      labels       2   7982    3991   21.22 1.29e-07 ***  residuals   57  10720     188   

which indicates there @ least 1 significant difference between means of sand dune ecosystems, , follow tukey pointwise intervals:

    post <- tukeyhsd(aov(v ~ labels, data = data))     plot(post, ylim = c(0, 4)) 

enter image description here

already adjusted multiple testing :)


Comments

Popular posts from this blog

php - How to add and update images or image url in Volusion using Volusion API -

Laravel mail error `Swift_TransportException in StreamBuffer.php line 269: Connection could not be established with host smtp.gmail.com [ #0]` -

c# SetCompatibleTextRenderingDefault must be called before the first -