Friday, March 16, 2007

Confidence Intervals and all that other confusing stuff!!

Thank goodness we are reviewing standard deviation, standard error, confidence intervals and the like. For some reason, it doesn't seem to matter how many times I have gone over this information in the past it always seems to disappear into some blackhole in my brain, it's so annoying. Confidence intervals, Type I and II errors and interpretation of the p-value are perfect examples of this, all of these seem to have really subtle differences between how you define what is right and what is wrong.

I found with confidence intervals the easiest way for me to understand it was that if 100 experiments were conducted, 95 of them would contain the population mean and 5 wouldn't, if alpha=0.05.
As for the p-value, definitions such as this : If the populations really have the same mean overall, what is the probability that random sampling would lead to a difference between sample means as large (or larger) than you observed? - just do not seem to make it crystal clear, why should we be concerned with the probability of events that have not occurred, so confusing! My interpretation: A small P value e.g. p<0.001 means that your observations are highly unlikely under the null hypothesis of no difference e.g. the mean number of umbrellas left on New Orleans buses is signficantly different from the mean number left on the Texas buses = small p value. If more umbrellas are left on buses in New Orleans then I guess we could infer that people from New Orleans are more forgetful ha ha!! In a nutshell, a small P value makes us reject the null hypothesis because an unlikely event has occurred were the null true. As for Type errors, how can I make this stick?? My interpretation : You said something was different when it wasn't - type 1, you said something was the same when it wasn't - type II. No doubt all this will be forgotten by this evening!!

P.S. Please correct my definitions if they are wrong :)

Sunday, March 11, 2007

Designing Successful Field Studies.

G& E begin this chapter by stating that many studies are initiated without a clear answer to the question of what the actual point of the study is. I know this from my own experience, where I have gone out in the field to collect data without having clearly thought of a specific hypothesis, only that I wanted to find out parasitism levels of a particular caterpillar on two plants just out of curiosity. This can be problematic in the sense that when it comes to down to carrying out analyses or even writing a paper on your results it can be difficult to figure out actually what your main thesis or argument is and highlights the importance of always having a focused question. I guess it always seems to come down to having some kind of argument and being able to defend it with the appropriate theory. When I sat down to write my paper I really struggled with the introduction because I couldn't figure out what it was that I actually arguing about. Sometimes, though just going out and observing what's going on can lead you to formulate a hypothesis so you can eventually conduct experimental tests. For example, trying to find out whether there are any spatial or temporal differences in the variable of interest. For me, my question was simply does parasitism vary between two plant species and over the season? But as G&E mention it is difficult to discuss specific mechanisms without some sense of the spatial or temporal pattern in your data. Seems to me that it is difficult to discuss mechanisms period without conducting any kind of manipulative experiment.

It was interesting reading about snapshot experiments, I get the feeling that scientists particularly some ecologists can have a negative view of experiments conducting over a short term, but I guess it depends on what you are measuring. Surely if you are measuring short term responses, you would only need short term experiments. For example, I am really interested in induced responses in plants following attack by herbivores. My advisor has criticized many experiments conducted in this area as being quickly put together, and seems to place no value on them because they were conducted in a short amount of time and that the same authors keep churning out papers because of the short term nature of their study system. But I wonder what is wrong with this if the response you are measuring is a rapid-induced response!! Surely you wouldn’t need to spend 10 years of gathering field data trying to establish this as long as you have enough replicates.

Monday, March 5, 2007

Monte Carlo, Parametric and Bayseian Analyses

This chapter covered Monte Carlo, Parametric and Bayesian approaches to statistical analysis. Having really only worked with parametric statistics it was interesting to read about Monte Carlo and Bayesian Analysis and Im looking forward to hearing Nicole lecture on the latter. I'd never even heard of Bayesian statistics until last semester when we had one student give a presentation. As she studied systematics she was very enthusiastic about using this approach, however our Professor was alot more sceptical about its usefulness and had all the reserves that many ecologists have that were writtten in one of the footnotes, namely that specifying a prior reflects subjectivity and is considered to be unscientific. However, Bayesians argue that specifying a prior makes explicit all the hidden assumptions of an investigation and so it is a more honest and objective approach to doing science. This makes sense to me and I can't really see how this differs that much from a meta-analysis, where you are conducting your analyses on the data on many other studies, couldn't you use this information to construct priors? If your priors come from published peer review articles, and you use this as your null, this seems alot more appropriate than starting with a null that states there are no differences, I can see why Bayesians think that we would make more progress using this approach. It is interesting that this argument has lasted for centuries, perhaps I am not fully understanding the argument here from the frequentists point of view. I was also surprised to read G&E's impression of non-parametric statistics which also reflected the veiws of my last stats teacher, i.e. avoid them at all costs. Now I know about Monte Carlo, I wonder why people use non-parametric analyses, especially as G&E stated that by ranking data you may lose alot of information in your dataset and perhaps some of the subtlies that could be biologically meaningful to your system, perhaps this is simply due to difficulty of finding appropriate software?

Summary:)

Monte Carlo analysis:
1. Makes minimal assumptions about the underlying distribution of the data
2. Uses randomizations of the observed data as a basis for inference.

Assumptions:
1. The data collected represent random independent samples (common to all statistical tests)
2. The test statistic describes the pattern of interest.
3. The randomization creates an appropriate null distribution for the question.

Advantages:
1. Does not require that data are sampled from a specified probability distribution
2. You can tailor your statistical test to particular questions and datasets, so you
are not forced to use conventional tests that may not be the most powerful for your analysis

Disadvantages:
1. It is computer intensive and is not included in most traditional statistical packages.
2. Different analyses of the same dataset can yield slightly different results, which does not occur with parametric analyses. A parametric analysis assumes a specified distribution and allows for inferences about the underlying parent population from which the data were sampled, with Monte Carlo; inferences are limited to the specific data that have been collected. If the sample is representative of the parent population then the results can be generalized with caution.

Parametric analysis:
1. Assumes that the data were sampled from a distribution of known form
2. Estimates the parameters of the distribution from the data
3. Estimates probabilities from observed frequencies of events
4. Uses these probabilities as a basis for inference (frequentist inference).

Advantages:
Uses a powerful framework based on known probability distributions.
Greater power in making general inferences to the parent population.

Disadvantages:
May not be as powerful as sophisticated Monte Carlo models that are tailored to particular questions or data. In contrast to Bayesian analysis, parametric rarely incorporates a priori information or results from other experiments. Also parametric analyses are often robust to violations of this assumption thanks for the Central Limit Theorem??? But why?? Im not really sure I understand this, although its probably obvious!

Bayesian analysis:

1. Assumes that the data were sampled from a distribution of known form
2. Estimates parameters not only from the data but also from prior knowledge
3. Assigns probabilities to these parameters