Educated Ignorance , One Rat Experiments , 6-sigma, Outliers, Ehrlich , Directory
******

Random Happenings or Real Events ???

****

In digging through a tray of nuts and bolts the other day, I was struck by the occasional occurrence of clusters of "like-kind" bolts that just seem to appear in different places within the tray. Now it occurred to me that here was a nice demonstration of the statistics of data-dredging that can lead to erroneous conclusions (or at best, confusion as to the real statistics of the population).

In the real world of happen-chance we are now reminded that there are "clusters" of sick people, that may or may not be the result of some encounter with toxins. We have "sick" buildings, power-line affected homeowners, and researchers that painfully must admit that their data cannot be repeated.

Many years ago, while working for Harry Gelboin at the National Cancer Institute, I studied methyl(a)cholanthrene and its effect on rat liver RNA polymerase. We did a classic experiment where over the period of 16 hours we measured polymerase activity, and found that, first the activity was increased many fold and then, dropped to well below normal levels, before returning to normal at 12 to 16 hours. This was exciting, and exactly what we had predicted. Alas, try as we might, we never were able to repeat the sine-wave response, so the data stayed where it belonged in the notebooks and not in the press.

Another "chance occurrence" happened last spring. In my class of 47 students in Chemistry 108, two of my students died of cancer within the first month. Perhaps some would be quick to point out the dangers of taking this course especially as taught by this particular instructor. But reason prevails and most would accept this as another example of randomness.

Another chance happening in reading the never-ending products of wordsmiths was an article on estimating the population without actually destroying the evidence. The author used a box of chocolates as a target population. By removing a chocolate, noting its presence on a graph, replacing the chocolate, shaking the box to provide a random distribution for your next selection, and taking another, one can arrive at an estimate of the actual population's distribution. If you repeat the process a goodly number of times, you will arrive at an estimate of the numbers of each chocolate. What the author forgot to report are three events that often cause an observer to misinterpret the data. First, one must repeat the sampling until the population as described as a series of parabolas, each describes an asymptote to a straight line. When each individual subset of the population has arrived at this value, then one can be fairly sure (confidence limits expressed) that the observations are representative of the population in toto. The second, is more crucial, even with the best of sampling techniques, because of the random nature of observations, one can be mislead and conclude that a group is either over or under represented. Of course one must be blind to the appearance of the chocolates to avoid reporter bias, and also assume that something is not going on in the population that increases the presence of one subset over another. Alas, the only sure way to resolve these problems is what is usually the case during the Christmas season; one simply eats each chocolate as it is removed and when the box is empty one can be sure of the number of each subspecies. (This is termed destructive testing, certainly as it pertains to a diet.)

Statistical Quotes:

"Smoking is one of the leading causes of statistics," according to columnist Fletcher Knebel.
"There are three kinds of lies; lies, damn lies, and statistics," Benjamin Disraeli
"Figures don't lie, liars figure," unknown.
'Some use statistics "as a drunken man uses lamppost -- for support rather than illumination."' according to Andrew Lang (historian).
"The Government is very keen on amassing statistics. They collect them, add them, raise them to the nth power, take the cube root and prepare wonderful diagrams. But you must never forget that ever one of these figures comes in the first instance from the village watchman, who just puts down what he damn well pleases." A statement by economist Sir Josiah Stamp.

In living testimony to this, Jack's father, a good Haywood County farmer, was so annoyed by having been selected for the "comprehensive" census that he added liberally to the number of plants, animals and other farm related activities that he was "by law" required to answer. No telling what the USDA statistics on farming activities in West Tennessee must have shown if others in the area were so liberal with the truth.

I'm reminded of the story of a traveler who in passing through the country, was impressed by the marksmanship of some local. It seems that on almost ever sign, tree or other permanent object was a circle with a bullet hole precisely in the center. Knowing the value of such skill, he envisioned representing this person and making a vast fortune. He asked of the locals, and found that this was the result of the local, village idiot. He immediately signed up the man for public appearance and transported him to the big city. There in public display before a paying crowd, he called upon the marksman to demonstrate his powerness. Not only was he unable to center the target but in most cases, missed the target entirely. Our city-slicker withdrew in shame, but before going asked how this could happen. The response from our country bumpkin was, " I just shoot at most anything, and then draw the circle around the hole."

And from Dave Barry, Miami Herald writer, "the panel gave the MREs a rating of 8.1 on the taste scale. This is clearly a scientific result, because it contains a decimal point," or so it would appear in considering the tastiness of the Army's packaged food, Meals Ready to Eat.

Back to the beginning, much is made of "clusters" in health related statistical observations. Thus, it was reported that in a town in Massachusetts, there is a cluster of health problems. And the answer quickly goes in search of a question. The public has been drilled on the "cause-effect" argument of scientific reason. Now the question is whether this is simply the result of bad luck (for the unhealthy ones who happened to reside in the community), a pollution problem, or a statistical freak. It will be almost impossible to decide which is actually the cause of this observation.

Or, the party game where in a group of 20 or so, it is found that at least two people will have the same birthday. One can see that so called "clusters" are a random event that lacks an underlying cause. So before you fall for the tears and cries of the injustice of it all, caused by some "to-be- identified" company (always one with deep pockets), think!

And this from the Wall Street Journal: Medical Journals Give New Meaning to 'Political Science';
'Scientific Study Shows Blown Smoke Causes Hearing Loss', -- but not smoking itself(?), or
'male births declining, -- and predictably blamed man-made chemicals', or
"Jefferson Fathered Slave's Last Child" -- based on DNA evidence, although some two dozen other Jefferson's with the same DNA possibility were ignored, or
'tiny amounts of chemicals when combined increase their potency by 1000 fold" -- never mind that the study on which the scientific article was based cannot be repeated.

Publish or Perish

I'm reminded of the four publications that one can reap from any set of data.
First - release to one of the quickie journals (or better yet perhaps to the New York Times, Washington Post, Wall Street Journal or television news media).
Second - a paper in a reviewed journal (of course presentation to a National meeting is mandatory).
Third - perhaps a chapter in a book or maybe an entire book on the subject (numerous speaking engagements and presentations follow, including a symposium on the subject)
Fourth - the retraction (usually only a brief mea culpa in another related paper).

That's science!

NB:
There's controversy brewing on whether there should be publication on the Internet of scientific studies, bypassing the Journals. Apparently, both sides conveniently overlooked Number One, above.
The book, Elementary Statistics, by Mairo F. Triola has some great quotations and The Cartoon Guide to Statistics by Larry Gonick and Woollcott Smith sheds light on this dreary subject.

January 19, 1999.

Joe Wortham's Home Page , About Joe Wortham

Hosted by www.Geocities.ws

1