What Is a Sampling Distribution?
A sampling distribution is a concept used in statistics. It is a probability distribution of a statistic obtained from a larger number of samples drawn from a specific population. The sampling distribution of a given population is the distribution of frequencies of a range of different outcomes that could possibly occur for a statistic of a population. This allows entities like governments and businesses to make more well-informed decisions based on the information they gather. There are a few methods of sampling distribution used by researchers, including the sampling distribution of a mean.
Key Takeaways
- A sampling distribution is a probability distribution of a statistic that is obtained through repeated sampling of a specific population.
- It describes a range of possible outcomes for a statistic, such as the mean or mode of some variable, of a population.
- The majority of data analyzed by researchers are actually samples, not populations.
How Sampling Distributions Work
Data allows statisticians, researchers, marketers, analysts, and academics to make important conclusions about specific topics and information. It can help businesses make decisions about their future and boost their performance, or it can help governments plan for services needed by a group of people. A lot of data drawn and used are actually samples rather than populations. A sample is a subset of a population. Put simply, a sample is a smaller part of a larger group. As such, this smaller portion is meant to be representative of the population as a whole.Sampling distributions (or the distribution of data) are statistical metrics that determine whether an event or certain outcome will take place. This distribution depends on a few different factors, including the sample size, the sampling process involved, and the population as a whole. There are a few steps involved with sampling distribution. These include:
- Choosing a random sample from the overall population
- Determine a certain statistic from that group, which could be the standard deviation, median, or mean
- Establishing a frequency distribution of each sample
- Mapping out the distribution on a graph
Once the information is gathered, plotted, and analyzed, researchers can make inferences and conclusions. This can help them make decisions about what to expect in the future. For instance, governments may be able to invest in infrastructure projects based on the needs of a certain community or a company may decide to proceed with a new business venture if the sampling distribution suggests a positive outcome.
Special Considerations
The number of observations in a population, the number of observations in a sample, and the procedure used to draw the sample sets determine the variability of a sampling distribution. The standard deviation of a sampling distribution is called the standard error.
While the mean of a sampling distribution is equal to the mean of the population, the standard error depends on the standard deviation of the population, the size of the population, and the size of the sample.
Knowing how spread apart the mean of each of the sample sets are from each other and from the population mean will give an indication of how close the sample mean is to the population mean. The standard error of the sampling distribution decreases as the sample size increases.Determining a Sampling Distribution
Let's say a medical researcher wants to compare the average weight of all babies born in North America from 1995 to 2005 to those from South America within the same time period. Since they cannot draw the data for the entire population within a reasonable amount of time, they would only use 100 babies in each continent to make a conclusion. The data used is the sample and the average weight calculated is the sample mean.
Now suppose they take repeated random samples from the general population and compute the sample mean for each sample group instead. So, for North America, they pull data for 100 newborn weights recorded in the U.S., Canada, and Mexico as follows:- Four 100 samples from select hospitals in the U.S.
- Five 70 samples from Canada
- Three 150 records from Mexico
The average weight computed for each sample set is the sampling distribution of the mean. Not just the mean can be calculated from a sample. Other statistics, such as the standard deviation, variance, proportion, and range can be calculated from sample data. The standard deviation and variance measure the variability of the sampling distribution.
Types of Sampling Distributions
Here is a brief description of the types of sampling distributions:- Sampling Distribution of the Mean: This method shows a normal distribution where the middle is the mean of the sampling distribution. As such, it represents the mean of the overall population. In order to get to this point, the researcher must figure out the mean of each sample group and map out the individual data.
- Sampling Distribution of Proportion: This method involves choosing a sample set from the overall population to get the proportion of the sample. The mean of the proportions ends up becoming the proportions of the larger group.
- T-Distribution: This type of sampling distribution is common in cases of small sample sizes. It may also be used when there is very little information about the entire population. T-distributions are used to make estimates about the mean and other statistical points.
In statistics, a population is the entire pool from which a statistical sample is drawn. A population may refer to an entire group of people, objects, events, hospital visits, or measurements. A population can thus be said to be an aggregate observation of subjects grouped together by a common feature.
Plotting Sampling Distributions
A population or one sample set of numbers will have a normal distribution. However, because a sampling distribution includes multiple sets of observations, it will not necessarily have a bell-curved shape.
Following our example, the population average weight of babies in North America and in South America has a normal distribution because some babies will be underweight (below the mean) or overweight (above the mean), with most babies falling in between (around the mean). If the average weight of newborns in North America is seven pounds, the sample mean weight in each of the 12 sets of sample observations recorded for North America will be close to seven pounds as well. But if you graph each of the averages calculated in each of the 1,200 sample groups, the resulting shape may result in a uniform distribution, but it is difficult to predict with certainty what the actual shape will turn out to be. The more samples the researcher uses from the population of over a million weight figures, the more the graph will start forming a normal distribution.