STATISTICS: A BAYESIAN PERSPECTIVE
Donald A. Berry
Duke University
July 1995


                        PREFACE

This is an introduction to statistics for general students. It differs from
standard texts in that it takes a Bayesian perspective. It views statistics as a
critical tool of science and so it has a strong scientific overtone. While my
outlook is conventional in many ways, its foundation is Bayesian. There are
several advantages of the Bayesian perspective:

#   It allows for direct probability statements, such as the probability that an
experimental procedure is more effective than a standard procedure.

#   It allows for calculating probabilities of future observations.

#   It allows for incorporating evidence from previous experience and previous
experiments into overall conclusions.

#   It is subjective. This is a standard objection to the Bayesian approach;
different people reach different conclusions from the same experiment results.
There would be comfort in giving an answer that others would also give. But
differences of opinion are the norm in science and an approach that explicitly
recognizes such differences is realistic.

Despite differences in focus between the standard and Bayesian approaches, there
are more similarities than differences. Many of the principles illustrated in
the examples and exercises of this text are not peculiar to either approach.


Distinguishing Aspects

In addition to those mentioned above, this text has several other distinguishing
features:

#   It is about statistical ideas, not solely about methods. Ideas are developed
using examples, many of which are case studies in which data from experiments
are brought to bear on scientific questions. However, statistics is not merely a
set of methods for analyzing data. It is also a way for integrating data into
the scientific process.

#   My emphasis is on sound principles of experimentation and on learning
through observation. The process of learning requires calculation. This in turn
leads to a secondary emphasis on methodology. Calculations involved in
developing and motivating general settings can be tedious. I have endeavored to
keep difficulties with this aspect of the learning process to a minimum.

#   Statistics involves learning from data. However, the development in this
text is not one of "data analysis" per se. Not all data sets are fodder for
statistical methods. There must be a well-defined question being addressed.

#   Most of the examples and exercises deal with substantive scientific issues
as well as with the scientific method. In my view, applying statistical ideas
requires familiarity with the substantive scientific issues and should not deal
merely with numbers. So it is inevitable that readers will learn some science as
well as some statistics.

#   In keeping with a subjective nature of the Bayesian approach, I write in the
first person and draw my own conclusions in the various examples.

#   I write with the student in mind, for example, giving detailed discussions
and redundant examples when the concepts are difficult for students. And I
sacrifice numerical accuracy for understanding.


Examples, Exercises and Real Data

Almost all examples and exercises in this text involve real settings and real
data. The examples come from the sciences and from sports, with many involving
medicine. A reason for the number of medical examples is that health issues are
usually interesting for students. Also, medical studies tend to be well designed
and easy to report. Finally, data in medicine are readily available. Nearly all
the examples and exercises that involve artificial settings occur in the two
chapters on probability.

The examples and exercises are chosen to be interesting and relevant for
students. They are also chosen to display the effective and fruitful use of
statistical thinking and methods.

Many examples and exercises appear more than once in the text. The second time
usually addresses an improvement on the first -- sometimes the analysis is more
appropriate and sometimes the calculations are easier. Occasionally, an analysis
is wrong; my intention is to show you what some people would do, saying why it
is wrong. When the same problem appears a second time I repeat the information
given earlier so as to minimize your having to leaf back through the book. But I
give the earlier reference so you will know where you worked on the problem
before.


Using the Computer

This text does not require the use of computers. However, computer programs that
make many of the calculations described in this text are included on a 3.5" disk
that comes with this text. These were written by James Albert who has also
written appendices for many of the chapters; these demonstrate the use of the
computer programs for the chapter in question. For instructors who wish to use
computers more intensively in their course, a companion Bayesian software
package and a guide to its use -- "Bayesian Analysis Using Computers", by James
Albert, 1995 -- have been published separately and are available from Duxbury
Press. This software refers specifically to examples and exercises in this text.
It uses some methodology that is more advanced than the level of this text.

The enclosed 3.5 disk also contains all the datasets used in the text in
electronic form.


Mathematical Prerequisites

The mathematical level of this book is minimal with only an exposure to high
school algebra expected. Exercises in several chapters require formulas. The
latter are usually motivated by calculations and arguments that require some
familiarity with numerical operations.


Coverage

The first five chapters present basic concepts. Chapter 1 is an introduction to
scientific inference and sampling. Chapter 2 describes data displays and data
summaries; it serves to introduce statistical ideas and to provide insights into
statistical problems. Chapter 3 discusses experimental design and the
limitations that poor designs place on inference. Chapters 4 and 5 introduce
probability and conditional probability, which are used extensively in the
remainder of the text. Chapters 6 through 9 consider statistical problems
dealing with proportions, with Chapters 8 and 9 addressing the comparison of two
proportions. Chapters 10 through 13 deal with general populations. Chapter 12
considers the comparison of means of two or more general populations, making
assumptions about the population shape, and Chapter 13 relaxes those
assumptions. Chapter 14 describes regression analysis.

Chapters 1 through 14 contain more than enough material for a one-semester
course. A shorter course may skip Chapters 2, 3 and 13 (except for means and
standard deviations in Section 2.6). Chapters 4 and 5 are essential (except for
betting odds in Section 4.3), and exposure to some of the material in Chapters 6
through 9 is required for Chapters 10 through 13. 

Chapters 6, 8 and 10 proceed from first principles to develop the ideas and
methods that motivate the simpler yet more sophisticated methods of Chapters 7,
9, 11 and 12. To proceed to the smoother handling of calculational matters of
these latter chapters entails some relatively tedious calculations in Chapters
6, 8 and 10. To keep the course moving smoothly, instructors should assign a
minimal amount of homework from these even-numbered chapters.


Alternative Suggested Uses for This Book

In addition to its main aim as a principal text for beginners, this book can
serve other roles as well. In can be used in tandem with a standard text by
instructors who wish to expose their students to an alternative viewpoint. It
can also serve this function as a supplementary text in advanced statistics
courses. In addition, research workers who have used standard statistics, but
who want to exploit the benefits of the Bayesian approach, will find the
development easy to follow and easy to relate to their current statistical
knowledge.


References

There are several advanced Bayesian texts that are excellent future references
for students in this course. All require more mathematical background than is
assumed in this book, and in particular, all require the calculus. They are
listed below approximately in increasing order of mathematical level.

P.M. Lee. Bayesian Statistics: An Introduction. London: Charles Griffin, 1989.

M.H. DeGroot. Probability and Statistics, 2d Ed. Reading, Massachusetts:
Addison-Wesley, 1986

S.J. Press. Bayesian Statistics. New York: Wiley & Sons, 1989.

G.E.P. Box, G.C. Tiao. Bayesian Inference in Statistical Analysis. New York:
John Wiley & Sons, 1973.

Bernardo, J.M., and Smith, A.F.M. Bayesian Theory. Chichester: John Wiley &
Sons, 1994.

J.O. Berger. Statistical Decision Theory and Bayesian Analysis. New York:
Springer-Verlag, 1985.