Basic Statistics in sage
No need to spend big bucks in the purchase of expensive statistical software packages (SPSS or SAS): the R programming language will do it all for you, and of course sage has a neat way to interact with it. Let me prove you its capabilities with an example taken from one of the many textbooks used to teach the practice of basic statistics to researchers of Social Sciences (sorry, no names, unless you want to pay for the publicity!)
Estimating Mean Weight Change for Anorexic Girls
The example comes from an experimental study that compared various treatments for young girls suffering from anorexia, an eating disorder. For each girl, weight was measured before and after a fixed period of treatment. The variable of interest was the change in weight; that is, weight at the end of the study minus weight at the beginning of the study. The change in weight was positive if the girl gained weight, and negative if she lost weight. The treatments were designed to aid weight gain. The weight changes for the 29 girls undergoing the cognitive behavioral treatment were
Let us examine the data with the statistical tools that R offers us. We start by invoking this programming language in sage, loading the data, and issuing basic commands to explore its properties:
Note the command quartz: this is to communicate with Mac OS X our wish to plot data to a window. There are similar commands to plot in Windows [windows()], in X11 [X11()], or even to save to file [png(), jpeg(), bmp(), tiff(), postcript(), …]. Make sure to read the help on their usage.
Let us try now a more complex task: Confidence interval for population mean. Although the histogram does not suggest a normal population distribution, we are confident that this method will give us a trustworthy result—using the t distribution for computation of confident intervals is robust, especially so when the size of the sample is larger than fifteen:
Note the information offered by the output: with 95% confidence, we infer that the interval contains the population mean weight change. How was this interval calculated? Since 29 girls received the treatment, there are 28 degrees of freedom. The mean weight change was and Can you infer at this point the values of the standard deviation and estimated standard error?
To be safer in estimating the population mean weight change, we could use instead a 99% confidence interval.
Note that at this confidence level, the interval contains zero. This tells us that it is plausible (at this level) that the therapy may not result in any change in the mean weight. Interesting, right?
Let us go a little further in this direction: Let denote the population mean change in weight for this cognitive bahavioral treatment. If the treatment has beneficial efects as expected, then must be positive. To test for no treatment effect vs. positive mean change, we test against Note that this is a one-sided alternative hypothesis, so let us modify our call of the t.test in R accordingly:
It seems that the treatment has an effect!