STATISTICS: A BAYESIAN PERSPECTIVE Donald A. Berry Duke University July 1995 PREFACE This is an introduction to statistics for general students. It differs from standard texts in that it takes a Bayesian perspective. It views statistics as a critical tool of science and so it has a strong scientific overtone. While my outlook is conventional in many ways, its foundation is Bayesian. There are several advantages of the Bayesian perspective: # It allows for direct probability statements, such as the probability that an experimental procedure is more effective than a standard procedure. # It allows for calculating probabilities of future observations. # It allows for incorporating evidence from previous experience and previous experiments into overall conclusions. # It is subjective. This is a standard objection to the Bayesian approach; different people reach different conclusions from the same experiment results. There would be comfort in giving an answer that others would also give. But differences of opinion are the norm in science and an approach that explicitly recognizes such differences is realistic. Despite differences in focus between the standard and Bayesian approaches, there are more similarities than differences. Many of the principles illustrated in the examples and exercises of this text are not peculiar to either approach. Distinguishing Aspects In addition to those mentioned above, this text has several other distinguishing features: # It is about statistical ideas, not solely about methods. Ideas are developed using examples, many of which are case studies in which data from experiments are brought to bear on scientific questions. However, statistics is not merely a set of methods for analyzing data. It is also a way for integrating data into the scientific process. # My emphasis is on sound principles of experimentation and on learning through observation. The process of learning requires calculation. This in turn leads to a secondary emphasis on methodology. Calculations involved in developing and motivating general settings can be tedious. I have endeavored to keep difficulties with this aspect of the learning process to a minimum. # Statistics involves learning from data. However, the development in this text is not one of "data analysis" per se. Not all data sets are fodder for statistical methods. There must be a well-defined question being addressed. # Most of the examples and exercises deal with substantive scientific issues as well as with the scientific method. In my view, applying statistical ideas requires familiarity with the substantive scientific issues and should not deal merely with numbers. So it is inevitable that readers will learn some science as well as some statistics. # In keeping with a subjective nature of the Bayesian approach, I write in the first person and draw my own conclusions in the various examples. # I write with the student in mind, for example, giving detailed discussions and redundant examples when the concepts are difficult for students. And I sacrifice numerical accuracy for understanding. Examples, Exercises and Real Data Almost all examples and exercises in this text involve real settings and real data. The examples come from the sciences and from sports, with many involving medicine. A reason for the number of medical examples is that health issues are usually interesting for students. Also, medical studies tend to be well designed and easy to report. Finally, data in medicine are readily available. Nearly all the examples and exercises that involve artificial settings occur in the two chapters on probability. The examples and exercises are chosen to be interesting and relevant for students. They are also chosen to display the effective and fruitful use of statistical thinking and methods. Many examples and exercises appear more than once in the text. The second time usually addresses an improvement on the first -- sometimes the analysis is more appropriate and sometimes the calculations are easier. Occasionally, an analysis is wrong; my intention is to show you what some people would do, saying why it is wrong. When the same problem appears a second time I repeat the information given earlier so as to minimize your having to leaf back through the book. But I give the earlier reference so you will know where you worked on the problem before. Using the Computer This text does not require the use of computers. However, computer programs that make many of the calculations described in this text are included on a 3.5" disk that comes with this text. These were written by James Albert who has also written appendices for many of the chapters; these demonstrate the use of the computer programs for the chapter in question. For instructors who wish to use computers more intensively in their course, a companion Bayesian software package and a guide to its use -- "Bayesian Analysis Using Computers", by James Albert, 1995 -- have been published separately and are available from Duxbury Press. This software refers specifically to examples and exercises in this text. It uses some methodology that is more advanced than the level of this text. The enclosed 3.5 disk also contains all the datasets used in the text in electronic form. Mathematical Prerequisites The mathematical level of this book is minimal with only an exposure to high school algebra expected. Exercises in several chapters require formulas. The latter are usually motivated by calculations and arguments that require some familiarity with numerical operations. Coverage The first five chapters present basic concepts. Chapter 1 is an introduction to scientific inference and sampling. Chapter 2 describes data displays and data summaries; it serves to introduce statistical ideas and to provide insights into statistical problems. Chapter 3 discusses experimental design and the limitations that poor designs place on inference. Chapters 4 and 5 introduce probability and conditional probability, which are used extensively in the remainder of the text. Chapters 6 through 9 consider statistical problems dealing with proportions, with Chapters 8 and 9 addressing the comparison of two proportions. Chapters 10 through 13 deal with general populations. Chapter 12 considers the comparison of means of two or more general populations, making assumptions about the population shape, and Chapter 13 relaxes those assumptions. Chapter 14 describes regression analysis. Chapters 1 through 14 contain more than enough material for a one-semester course. A shorter course may skip Chapters 2, 3 and 13 (except for means and standard deviations in Section 2.6). Chapters 4 and 5 are essential (except for betting odds in Section 4.3), and exposure to some of the material in Chapters 6 through 9 is required for Chapters 10 through 13. Chapters 6, 8 and 10 proceed from first principles to develop the ideas and methods that motivate the simpler yet more sophisticated methods of Chapters 7, 9, 11 and 12. To proceed to the smoother handling of calculational matters of these latter chapters entails some relatively tedious calculations in Chapters 6, 8 and 10. To keep the course moving smoothly, instructors should assign a minimal amount of homework from these even-numbered chapters. Alternative Suggested Uses for This Book In addition to its main aim as a principal text for beginners, this book can serve other roles as well. In can be used in tandem with a standard text by instructors who wish to expose their students to an alternative viewpoint. It can also serve this function as a supplementary text in advanced statistics courses. In addition, research workers who have used standard statistics, but who want to exploit the benefits of the Bayesian approach, will find the development easy to follow and easy to relate to their current statistical knowledge. References There are several advanced Bayesian texts that are excellent future references for students in this course. All require more mathematical background than is assumed in this book, and in particular, all require the calculus. They are listed below approximately in increasing order of mathematical level. P.M. Lee. Bayesian Statistics: An Introduction. London: Charles Griffin, 1989. M.H. DeGroot. Probability and Statistics, 2d Ed. Reading, Massachusetts: Addison-Wesley, 1986 S.J. Press. Bayesian Statistics. New York: Wiley & Sons, 1989. G.E.P. Box, G.C. Tiao. Bayesian Inference in Statistical Analysis. New York: John Wiley & Sons, 1973. Bernardo, J.M., and Smith, A.F.M. Bayesian Theory. Chichester: John Wiley & Sons, 1994. J.O. Berger. Statistical Decision Theory and Bayesian Analysis. New York: Springer-Verlag, 1985.