Thursday, February 01, 2007

An approach to randomness testing

One of the classical problems of statistical theory is to test a sequence for randomness. A core part of this problem is deciding what might be the alternative.

If this sequence is not random, then what kind of pattern might it have?

The choice of this alternative hypothesis will determine the important properties of the test. Some of the common alternatives are:

  • Serial correlation
  • Trends
  • Seasonality
My addiction to Yahoo's Video Poker has lead me to consider an alternative that the sequence is strategic, that is, some person is managing the sequence purposively, ie. in order to achieve some goal (of which I am unaware). This seems a lot more vague than the kinds of alternatives cited above. I have convinced myself that for both empirical and theoretical reasons, it is unlikely that the Video Poker sequence is generated by a (pseudo)random process. Empirically, one wins too often. Theoretically, a game with low payoffs would be less interesting than one "seeded with winning hands".

I am trying to analyze just how to approach this problem.

  1. What would be evidence that the sequence is strategic?
  2. What would be evidence that I have detected the manager's purpose.

Tuesday, January 02, 2007

Balancing Nature

How many whales are too many?

National Geographic has documented the come-back of the humpback whale. We were their main predator until they were protected. From a threatened species, they have increased their numbers significantly since whale hunting was curtailed. This a welcome development. I enjoy the sight and sounds of the humpback while they play in the nursery around Puerto Rico.

There is a price to be paid for protecting whales. How will we argue that we have enough whales in time to control their numbers before they dominate the oceans and
crowd out other marine life?

It was easy to assert that we had killed too many before protection became the law. We had a distaste for the crude butchery of commercial whaling. It will not be so easy to raise their death rate when that becomes necessary. And we will find it necessary to do just that! Someday, we, or some other species we want to protect will be competing for living space with the humpbacks and other whales.

How will that change affect us?

Labels:

Saturday, December 16, 2006

Two sides to the organic food - rain forest debate

BUY organic, destroy the rainforest,” The Economist said last week, eliciting howls of protest from food-activist bloggers.


Every issue has two sides. Some people think they ought to always be on the side of the angels. Ecology is a complex subject. You may think you are saving the environment when you are causing problems half a world away. Think about it.

Unfortunately, many of the bloggers are only cheerleaders, (like President Bush), not thinkers or doers. Choosing between some theoretically attractive movement like organic gardening and preserving the rainforest demands a lot more data and a lot more analysis than the average blogger has available.

Labels:

Thursday, December 14, 2006

False Feelings of Certainty

Today's San Juan Star brought a reprint of an article by Nickolas Kristoff. Its main point was that our opinions of Muslims have been formed by mainstream media contact with Muslims from the Middle East and that the Muslims of the Far East are different.

Despite his good intentions, I must disagree. We have no logical basis for forming opinions about whole populations and neither does the media. Our experience is limited to those people we have met personally. Our judgments should be similarly limited.

Muslims have complained about hypocrisy of westerners who don't practice what they preach. Yes, which ones? Lumping every person into some ethnic group and then attributing behaviors of different people to a single stereotyped representative is the way we make prejudice a standard operating procedure.

If you don't observe the behavior of a random sample from a population, you have no logical basis for generalization, none! It is hard to overemphasize this point. Generalization from inadequate samples is the way we reach False Feelings of Certainty.
There is a quote from an Abbott and Costello movie that we should all understand -

"All Indians walk in single file. At least the one I saw did."

My advice is to ignore the polls published in the media. They are part of a commercial enterprise that puts making money above all other values. Even what passes for a serious sociological survey may be nothing more than the extrapolation of someone's biases to an entire population. There are a few research organization that have the resources and the commitment to design proper samples for their surveys - most do not. Consider the source.

Labels:

Sunday, August 06, 2006

But what about "averaging opinions"?

Opinions can be data.

"Averaging opinions" can be interpreted in several ways.

First, we can create a numeric scale and use it to describe opinions. Remember the Bo Derek film "Ten!"? Ten was the top of the scale of attractiveness for women. Bo, of course, was the epitome of beauty in that movie. Obviously, there are women who don't match that level of perfection and so their scores would be lower. So, an "average woman" would score something like a 4, 5, or 6.

The second way we can "average opinions" is by counting how many people hold the various qualitative opinions. For example, the political polsters will often ask a sample of people to choose one of several (usually five) typical opinions as the best representative of their own ideas. The possible answers might be:

[Very favorable] [Favorable] [Neutral] [Unfavorable] [Very unfavorable]

The numbers who chose each of the alternatives are called the "frequencies" of the alternatives. We often find that more people choose something close to the "neutral" position and fewer people chose the extremes of "Very ...". This can be interpreted as a natural tendency to moderate our opinions.

We like to agree with each other, so when we discover someone's ideas, we modify the words we use to describe our own opinions so the differences are not so striking.

Labels:

Friday, August 04, 2006

The mathematics of averaging

Central Limit Theorem

The averaging process is so strong that it dictates that a collection of averages is not only clumped near the center of the original data, but that the shape will be close to a bell-shaped curve called the Normal density function. That function can be calculated by the formula

(2π/σ2)-1/2exp[-(x-μ)2/2σ].

In this formula, the Greek letter μ represents the overall average of all the data from the process we are studying and the Greek letter σ represents how diverse the data are, what statisticians call "the Standard Deviation". The Greek letter π is the same π we met in geometry when we calculated the area and circumference of a circle.

For more details see the
Statistics Homepage

Labels:

Averaging

So how does averaging work?

Imagine that you have two numbers which could be any kind of measurement, length of two planks, outside temperature at 6 am and 7 am, or the weight of some book measured twice.

So, you will have two numbers in hand. It is more likely that they are different than that they are the same. If you add them and divide by two, you will get a number in between the extremes. This is averaging on a small scale. On a bigger scale, say with 100 numbers, you will divide the sum by the count, 100.

Naturally, the average will be somewhere in between the extremes. That is all there is to it!

Labels:

Thursday, August 03, 2006

Modern America

On average, we're comfortable

Many Americans are like the statistician I mentioned before. We hear cries of dismay from the right and cries of dismay from the left. They sort of balance each other out. So, on average, we're comfortable. So comfortable that it doesn't seem important to get out and vote, to make our voices heard, to say what we think. We don't want to be mistaken for one of those noisy fanatics on the left or the right. We want to be comfortable!

Labels: