Документ взят из кэша поисковой машины. Адрес
оригинального документа
: http://zebu.uoregon.edu/1999/es202/l6.html
Дата изменения: Thu Jan 21 22:18:36 1999 Дата индексирования: Mon Oct 1 23:49:03 2012 Кодировка: Поисковые слова: planetary nebula |
Sampling represents the problem of accurately acquiring the
necessary data in order to form a representative view of
the problem. This is much more difficult to do than is generally
realized.
Overall Methodology:
The Concept of Random
When you form a sample you represent that by a plotted distribution known as a histogram . A histogram is the distribution of frequency of occurrence of a certain variable within a specified range. An example histogramming is shown below
This one has narrow bins
Here is a "live" example of data sampling and histogram construction
How to construct a histogram:
Here is an example of how, exactly, you do this.
Step 1: Defining the target population and the problem: In this case we are interested in the distribution of tree diameters in a pine forest.
Step 2: Define the measurement: We will define a standard tree diameter one which is measured at a height of 1.0 meters above the ground
Step 3: Define the population: There are thousands of trees in
the forest. We decide (for reasons to be clear later) to go randomly
sample 50 trees. How do we do this? We go the middle of the forest
with a deck of cards. The 4 suits represent North, South, East, and
West and the rank of the card represents a distance in meters.
We
shuffle the deck and get the 10 of hearts - this means that we go
10 meters west and measure the first tree we encounter at that
position. Then we shuffle the deck and go on. This procedure,
of course, assumes that the middle of the forest is just like any
other place in the forest.
After telling our statistics boss about
this they say its a bad assumption and the boss makes us go to
24 separate locations in the forest to measure 50 tree diameters
at each location. So we have a total of 1200 measurements.
As you can see, just looking at the numbers this way doesn't tell
you a lot.
62.653 63.375 63.241 63.574 62.061
61.010 49.314 56.207 61.152 56.125
57.055 56.162 63.174 59.219 60.983
56.327 61.399 64.470 56.693 56.905
66.167 67.443 66.595 55.845 65.250
62.309 64.621 56.444 53.981 57.540
49.154 58.910 59.146 68.144 59.853
58.584 61.382 60.999 51.388 58.044
58.041 65.309 56.949 62.992 54.460
59.850 56.871 56.909 60.206 58.425
Bin Limits | Frequency | Proportion ------------------------------------------------ 30.00 to 34.99 | 0 | 0.000 35.00 to 39.99 | 0 | 0.000 40.00 to 44.99 | 0 | 0.000 45.00 to 49.99 | 22 | 0.018 50.00 to 54.99 | 147 | 0.123 55.00 to 59.99 | 402 | 0.335 60.00 to 64.99 | 428 | 0.357 65.00 to 69.99 | 185 | 0.154 70.00 to 74.99 | 15 | 0.012 75.00 to 79.99 | 0 | 0.000 80.00 to 84.99 | 1 | 0.001 85.00 + | 0 | 0.000 ------------------------------------------------- 1200 1.000
Step 6: We construct the histogram by plotting the frequency vs the bin width. This is also known as a bar graph and its shown here:
A critical question to now ask is "What is the minimum sample size required to accurately represent a distribution"? That depends on the intrinsic shape of the distribution! As we will learn later, for distributions that are intrinsically bell shaped (these are called Normal or Gaussian distributions) 25-30 random measurements are usually good enough.
Note: student exam scores are usually on a bell curve and that is how your grade is determined by your position on that curve relative to the average. We will put this into practice in this class.
Simulations:
Typical Problems:
Examples: