Data Dredging
Statistics and observations: Thomas Bayes, Data dredging - meta analysis and Bayes law , Emergence of probability, Errors of observation, Figures don't lie, The first statistician, Mother Nature lies!, Null hypothesis, Outliers in science, Polls can be accurate, Shaping public opinion, Sigma, is it real?, Statistical outliers, Statistical quotes, Statistics, Trust not your data, Windshield surveys, Directory of Internet Pages

Science: Chaos and a piece of rope, Free Energy(?), Maxwell's Daemon, New periodic table, Nu periodic table, Nu periodic table, Occram's razor, Poetry in mathematics, Prince Rupert drops, Rules of science, Science of conjecture, String Theory, Directory of Internet Pages


******

Data Dredging

As Baptist ministers salaries increase, the number of prostitutes increases.

Vitamin D prevents falls.

How are these two statements related? They both are the result of flawed statistics. In the case of the preachers, the correlation between salaries is mostly tied to inflation and inflation is mostly tied to a balance of supply and demand, while the number of professionals in any market is also related to supply and demand. This is not to say that all ministers are receiving a higher income, nor that free-lance operators are there because this is a good way to make money.

In the case of consumption of vitamin D, the problem is actually one of semantics as well as retrieval of data using meta analysis. The word �falls' is used, as in breaking bones or other injuries as opposed to a fall from which no physical damage is done. And the problem with the statistical bases upon which the study was predicated is that only those "falls" that resulted in physical damage were reported, a large number of individuals fall every day and no harm (except perhaps to the ego) occurs, these individuals and their data are unavailable so the researchers did the best they could. So does vitamin D prevent falls � that is, have some effect on an individual's balance or ability to catch themselves before tumbling? Maybe, maybe not; but the meta analysis gives no clue.

Statistics are with us to stay so visit the site of Dr.Hossein Arsham at the University of Baltimore to get an understanding of the difference between data acquisition and knowledge (and the transition points in between).

There is one particular figure presented by Dr. Arsham that is most interesting. It shows a relationship between "level of exactness of a statistical model" and the "level of improvement in decision making". Without coordinates, it shows a linear movement from an initial point, "data", to "information" to "facts" to "knowledge". What is missing is "concept" which must come before "data" collection. Then the progression to "information" then to "facts" then to "knowledge" can be made. However, after "knowledge" must come "understanding". At the beginning there has to be the concept of what is to be understood. Call it experimental design if you like but prior to data acquisition a clear idea of what is sought is necessary. Then, data acquired may or may not fit the concept. This to a biologist is the method of successive approximation. With disposal of data that doesn't seem to fit the concept comes the first of many errors to be made by the researcher. Selection of data then leads to the accumulation of information that supports the theory (or concepts). With independent observations the association of information from different sources and with different approaches, the information slowly begins to be regarded as " Fact " since there seems to be no refuting of the concept. As these "facts" become known they become a part of a greater information base which can be regarded as knowledge. Here it is necessary to stand back and reassess the situation. Has the study (or studies) become so focused that the knowledge gained is so specialized and so narrowly applicable that it rather than expanding knowledge, limits the application. The role of understanding is in making knowledge useful.

There is no need to pursue any avenue of research unless the goal is understanding. Knowledge for knowledge sake is mental masturbation.

The worst problem with the development of data for meta analysis and its interpretation by a researcher is that secondary handling of the information is in the hands of those who either are unqualified to assess the significance or worse yet have a particular axe to grind and grasp the study's results as substantiation for their belief(s). None is (are) worse that the Government in giving credence to good information that is twisted to support some prejudiced goal. Unfortunately, meta analysis falls too often into their hands, which gives a deservedly bad reputation to meta analysis when used in this way.

A recently published study in the Journal of the American Medical Association was the basis for this essay which brings to light all that is wrong with this attempt to find significance in the relationship between vitamin D and "falls". Consider Dr. Arsham's graph � Did the researchers using data dredging really increase our understanding of the function or role of vitamin D in protecting one against falling? Worse yet by making the flawed study widely available, the ability to understand is diminished!

For an introduction to meta analysis, the classroom series from Arindam Basu at the University of Pittsburgh provides an easy walk-through of the concepts and applications with Bayesian methodology.

And for a thorough coverage of

concepts

one can't do better than; A History of Philosophy (The Formation and Development of Its Problems and Conceptions), W. Windelband, Translated by James H. Tufts, The Macmillan Company, New York 1921 (2nd edition)

****

Joe Wortham's Home Page, About Joe Wortham, Directory of Web Pages

Questions? Comments? [email protected]

Hosted by www.Geocities.ws

Hosted by www.Geocities.ws

1