Tuesday, April 10, 2007
Anova Presentation
Here is the link for the ANOVA presentation that I gave last semester. There are several other talks available on this webpage on various other statistical procedures. ANOVA is under my surname, Connahs. http://www.tulane.edu/~ldyer/classes/406/634.htm
Friday, March 16, 2007
Confidence Intervals and all that other confusing stuff!!
Thank goodness we are reviewing standard deviation, standard error, confidence intervals and the like. For some reason, it doesn't seem to matter how many times I have gone over this information in the past it always seems to disappear into some blackhole in my brain, it's so annoying. Confidence intervals, Type I and II errors and interpretation of the p-value are perfect examples of this, all of these seem to have really subtle differences between how you define what is right and what is wrong.
I found with confidence intervals the easiest way for me to understand it was that if 100 experiments were conducted, 95 of them would contain the population mean and 5 wouldn't, if alpha=0.05.
As for the p-value, definitions such as this : If the populations really have the same mean overall, what is the probability that random sampling would lead to a difference between sample means as large (or larger) than you observed? - just do not seem to make it crystal clear, why should we be concerned with the probability of events that have not occurred, so confusing! My interpretation: A small P value e.g. p<0.001 means that your observations are highly unlikely under the null hypothesis of no difference e.g. the mean number of umbrellas left on New Orleans buses is signficantly different from the mean number left on the Texas buses = small p value. If more umbrellas are left on buses in New Orleans then I guess we could infer that people from New Orleans are more forgetful ha ha!! In a nutshell, a small P value makes us reject the null hypothesis because an unlikely event has occurred were the null true. As for Type errors, how can I make this stick?? My interpretation : You said something was different when it wasn't - type 1, you said something was the same when it wasn't - type II. No doubt all this will be forgotten by this evening!!
P.S. Please correct my definitions if they are wrong :)
I found with confidence intervals the easiest way for me to understand it was that if 100 experiments were conducted, 95 of them would contain the population mean and 5 wouldn't, if alpha=0.05.
As for the p-value, definitions such as this : If the populations really have the same mean overall, what is the probability that random sampling would lead to a difference between sample means as large (or larger) than you observed? - just do not seem to make it crystal clear, why should we be concerned with the probability of events that have not occurred, so confusing! My interpretation: A small P value e.g. p<0.001 means that your observations are highly unlikely under the null hypothesis of no difference e.g. the mean number of umbrellas left on New Orleans buses is signficantly different from the mean number left on the Texas buses = small p value. If more umbrellas are left on buses in New Orleans then I guess we could infer that people from New Orleans are more forgetful ha ha!! In a nutshell, a small P value makes us reject the null hypothesis because an unlikely event has occurred were the null true. As for Type errors, how can I make this stick?? My interpretation : You said something was different when it wasn't - type 1, you said something was the same when it wasn't - type II. No doubt all this will be forgotten by this evening!!
P.S. Please correct my definitions if they are wrong :)
Sunday, March 11, 2007
Designing Successful Field Studies.
G& E begin this chapter by stating that many studies are initiated without a clear answer to the question of what the actual point of the study is. I know this from my own experience, where I have gone out in the field to collect data without having clearly thought of a specific hypothesis, only that I wanted to find out parasitism levels of a particular caterpillar on two plants just out of curiosity. This can be problematic in the sense that when it comes to down to carrying out analyses or even writing a paper on your results it can be difficult to figure out actually what your main thesis or argument is and highlights the importance of always having a focused question. I guess it always seems to come down to having some kind of argument and being able to defend it with the appropriate theory. When I sat down to write my paper I really struggled with the introduction because I couldn't figure out what it was that I actually arguing about. Sometimes, though just going out and observing what's going on can lead you to formulate a hypothesis so you can eventually conduct experimental tests. For example, trying to find out whether there are any spatial or temporal differences in the variable of interest. For me, my question was simply does parasitism vary between two plant species and over the season? But as G&E mention it is difficult to discuss specific mechanisms without some sense of the spatial or temporal pattern in your data. Seems to me that it is difficult to discuss mechanisms period without conducting any kind of manipulative experiment.
It was interesting reading about snapshot experiments, I get the feeling that scientists particularly some ecologists can have a negative view of experiments conducting over a short term, but I guess it depends on what you are measuring. Surely if you are measuring short term responses, you would only need short term experiments. For example, I am really interested in induced responses in plants following attack by herbivores. My advisor has criticized many experiments conducted in this area as being quickly put together, and seems to place no value on them because they were conducted in a short amount of time and that the same authors keep churning out papers because of the short term nature of their study system. But I wonder what is wrong with this if the response you are measuring is a rapid-induced response!! Surely you wouldn’t need to spend 10 years of gathering field data trying to establish this as long as you have enough replicates.
It was interesting reading about snapshot experiments, I get the feeling that scientists particularly some ecologists can have a negative view of experiments conducting over a short term, but I guess it depends on what you are measuring. Surely if you are measuring short term responses, you would only need short term experiments. For example, I am really interested in induced responses in plants following attack by herbivores. My advisor has criticized many experiments conducted in this area as being quickly put together, and seems to place no value on them because they were conducted in a short amount of time and that the same authors keep churning out papers because of the short term nature of their study system. But I wonder what is wrong with this if the response you are measuring is a rapid-induced response!! Surely you wouldn’t need to spend 10 years of gathering field data trying to establish this as long as you have enough replicates.
Monday, March 5, 2007
Monte Carlo, Parametric and Bayseian Analyses
This chapter covered Monte Carlo, Parametric and Bayesian approaches to statistical analysis. Having really only worked with parametric statistics it was interesting to read about Monte Carlo and Bayesian Analysis and Im looking forward to hearing Nicole lecture on the latter. I'd never even heard of Bayesian statistics until last semester when we had one student give a presentation. As she studied systematics she was very enthusiastic about using this approach, however our Professor was alot more sceptical about its usefulness and had all the reserves that many ecologists have that were writtten in one of the footnotes, namely that specifying a prior reflects subjectivity and is considered to be unscientific. However, Bayesians argue that specifying a prior makes explicit all the hidden assumptions of an investigation and so it is a more honest and objective approach to doing science. This makes sense to me and I can't really see how this differs that much from a meta-analysis, where you are conducting your analyses on the data on many other studies, couldn't you use this information to construct priors? If your priors come from published peer review articles, and you use this as your null, this seems alot more appropriate than starting with a null that states there are no differences, I can see why Bayesians think that we would make more progress using this approach. It is interesting that this argument has lasted for centuries, perhaps I am not fully understanding the argument here from the frequentists point of view. I was also surprised to read G&E's impression of non-parametric statistics which also reflected the veiws of my last stats teacher, i.e. avoid them at all costs. Now I know about Monte Carlo, I wonder why people use non-parametric analyses, especially as G&E stated that by ranking data you may lose alot of information in your dataset and perhaps some of the subtlies that could be biologically meaningful to your system, perhaps this is simply due to difficulty of finding appropriate software?
Summary:)
Monte Carlo analysis:
1. Makes minimal assumptions about the underlying distribution of the data
2. Uses randomizations of the observed data as a basis for inference.
Assumptions:
1. The data collected represent random independent samples (common to all statistical tests)
2. The test statistic describes the pattern of interest.
3. The randomization creates an appropriate null distribution for the question.
Advantages:
1. Does not require that data are sampled from a specified probability distribution
2. You can tailor your statistical test to particular questions and datasets, so you
are not forced to use conventional tests that may not be the most powerful for your analysis
Disadvantages:
1. It is computer intensive and is not included in most traditional statistical packages.
2. Different analyses of the same dataset can yield slightly different results, which does not occur with parametric analyses. A parametric analysis assumes a specified distribution and allows for inferences about the underlying parent population from which the data were sampled, with Monte Carlo; inferences are limited to the specific data that have been collected. If the sample is representative of the parent population then the results can be generalized with caution.
Parametric analysis:
1. Assumes that the data were sampled from a distribution of known form
2. Estimates the parameters of the distribution from the data
3. Estimates probabilities from observed frequencies of events
4. Uses these probabilities as a basis for inference (frequentist inference).
Advantages:
Uses a powerful framework based on known probability distributions.
Greater power in making general inferences to the parent population.
Disadvantages:
May not be as powerful as sophisticated Monte Carlo models that are tailored to particular questions or data. In contrast to Bayesian analysis, parametric rarely incorporates a priori information or results from other experiments. Also parametric analyses are often robust to violations of this assumption thanks for the Central Limit Theorem??? But why?? Im not really sure I understand this, although its probably obvious!
Bayesian analysis:
1. Assumes that the data were sampled from a distribution of known form
2. Estimates parameters not only from the data but also from prior knowledge
3. Assigns probabilities to these parameters
Summary:)
Monte Carlo analysis:
1. Makes minimal assumptions about the underlying distribution of the data
2. Uses randomizations of the observed data as a basis for inference.
Assumptions:
1. The data collected represent random independent samples (common to all statistical tests)
2. The test statistic describes the pattern of interest.
3. The randomization creates an appropriate null distribution for the question.
Advantages:
1. Does not require that data are sampled from a specified probability distribution
2. You can tailor your statistical test to particular questions and datasets, so you
are not forced to use conventional tests that may not be the most powerful for your analysis
Disadvantages:
1. It is computer intensive and is not included in most traditional statistical packages.
2. Different analyses of the same dataset can yield slightly different results, which does not occur with parametric analyses. A parametric analysis assumes a specified distribution and allows for inferences about the underlying parent population from which the data were sampled, with Monte Carlo; inferences are limited to the specific data that have been collected. If the sample is representative of the parent population then the results can be generalized with caution.
Parametric analysis:
1. Assumes that the data were sampled from a distribution of known form
2. Estimates the parameters of the distribution from the data
3. Estimates probabilities from observed frequencies of events
4. Uses these probabilities as a basis for inference (frequentist inference).
Advantages:
Uses a powerful framework based on known probability distributions.
Greater power in making general inferences to the parent population.
Disadvantages:
May not be as powerful as sophisticated Monte Carlo models that are tailored to particular questions or data. In contrast to Bayesian analysis, parametric rarely incorporates a priori information or results from other experiments. Also parametric analyses are often robust to violations of this assumption thanks for the Central Limit Theorem??? But why?? Im not really sure I understand this, although its probably obvious!
Bayesian analysis:
1. Assumes that the data were sampled from a distribution of known form
2. Estimates parameters not only from the data but also from prior knowledge
3. Assigns probabilities to these parameters
Friday, February 9, 2007
Marginal distribution of two-way tables
Well, I have been working through this section in the hope that it will enlighten me on performing chi-square analyses when we get to it later in the book. However, I have been getting alot of error messages from codes typed directly from the book, which has been pretty frustrating. Has anyone worked through the example on 3.1.3. When I type in x for some reason it will not display the entire array, so that when I try to calculate the sum for the rows I get error messages. I just wondered if anyone else had this problem, or whether it is just me, ARGH!!!!
Wednesday, January 31, 2007
OOPs I originally commented this to myself!
I guess I will get the hang of this, Im a bit of a technophobe in some ways. Thanks for the comments! Well, encapsulation is an immune response from the caterpillar that is triggered by the presence of a foreign object in the body. The way some caterpillars respond is by surrounding the object with layers of cells that eventually harden and melanize, therefore preventing the parasitoid from developing. So, the prediction is that caterpillars fed toxic diets are less able to encapsulate the eggs, which beneifts the parasitoid by allowing it to develop. This is PhD dissertation research by someone in the lab whom I helped over the summer. So, we are still waiting to find out the results. For the feeding effeciency part, the data will be analyzed using ANOVA and ANCOVA and to examine whether the diets had any affect on pupal weight. So the data set is basically the replicates (caterpillars), the treatments, (different toxins)and initial caterpillar and pupal weights. We will use ANOVA to test for differences between the treatments and ANCOVA to remove any nusicance effects differences in the inital starting weights on the final pupal weights. For the encapsulation data, which hopefully some of it will be done by the end of semester. We will dissect the caterpillars and measure the degree of melanisation using imaging software. This will provide a measurement based on the color, called an r value. The stats test would then be an ANOVA to test for differences in the r values between treatments....well these are my thoughts so far, who knows what other stats may creep in there!!
Thursday, January 25, 2007
Independant Research
For my independant project I am going to analyze some data that we are gathering in our lab on a project looking at the effects of different plant toxins on caterpillar development. One component of these experiments has also been to examine the immune response of caterpillars fed on varying toxic diets to encapsulation of parasitoid eggs. To do this, we have injected caterpillars fed on various diets with minute glass beads ( a proxy for eggs) and we are going to see if the encapsulation response becomes compromised for caterpillars fed on toxic diets. Hopefully, we will have some really interesting results, Im just hoping the analysis is going to be too intimidating!!
Subscribe to:
Comments (Atom)