The Swag Team: Probabilistic sampling techniques

The first statistical sampling method is simple random sampling. In this method, each item in the population has the same probability of being selected as part of the sample as any other item. For example, a tester could randomly select 5 inputs to a test case from the population of all possible valid inputs within a range of 1-100 to use during test execution. To do this the tester could use a random number generator or simply put each number from 1-100 on a slip of paper in a hat. Mixing them up and drawing out five numbers. Random sampling is possible with or without replacement. If it is without replacement, an item is discarded after it is selected and thus can only occur once in the sample (Black, 1999).

Babbie (2001) studied systematic sampling as another statistical sampling method whereby every nth element from the list is selected as the sample, starting with a sample element n randomly selected from the first k elements. For example, if the population has 1000 elements and a sample size of 100 is needed, then k would be 1000/100 = 10. If number 7 is randomly selected from the first ten elements on the list, the sample would continue down the list selecting the 7^th element from each group of ten elements. Care must be taken when using systematic sampling to ensure that the original population list has not been ordered in a way that introduces any non-random factors into the sampling. An example of systematic sampling would be if the auditor of the acceptance test process selected the 14^th acceptance test case out of the first 20 test cases in a random list of all acceptance test cases to retest during the audit process. The auditor would then keep adding twenty and select the 34^th test case, 54^th test case, 74^th test case and so on to retest until the end of the list is reached.

The statistical sampling method called stratified sampling is used when representatives from each sub-group within a population need to be represented in a sample (Babbie 2001). The first step in stratified sampling is to divide the population into sub-groups (strata) based on mutually exclusive criteria. Random or systematic samples are then taken from each subgroup. The sampling fraction for each sub-group may be taken in the same proportion as the sub-group has in the population. For example, if the person conducting a customer satisfaction survey selected random customers from each customer type in proportion to the number of customers of that type in the population. For example, if 40 samples are to be selected, and 10% of the customers are managers, 60% are users, 25% are operators and 5% are database administrators then 4 managers, 24 uses, 10 operators and 2 administrators would be randomly selected. Stratified sampling can also sample an equal number of items from each subgroup. For example, a development lead randomly selected three modules out of each programming language used to examine against the coding standard (Castillo, 2009).

The fourth and final of the probabilistic sampling techniques is statistical sampling method is called cluster sampling, also called block sampling. In cluster sampling, the population that is being sampled is divided into groups called clusters (Castillo 2009). Instead of these subgroups being homogeneous based on selected criteria as in stratified sampling, a cluster is as heterogeneous as possible to matching the population. A random sample is then taken from within one or more selected clusters. For example, if an organization has 30 small projects currently under development, an auditor looking for compliance to the coding standard might use cluster sampling to randomly select 4 of those projects as representatives for the audit and then randomly sample code modules for auditing from just those 4 projects. Cluster sampling can tell a lot about a particular cluster, but unless the clusters are selected randomly and a lot of clusters are sampled, generalizations cannot always be made about the entire population. For example, random sampling from all the source code modules written during the previous week, or all the modules in a particular sub-system, or all modules written in a particular language may cause biases to enter the sample that would not allow statistically valid generalization (Black, 2004).

The Swag Team

February 2, 2016

Probabilistic sampling techniques

No comments:

Post a Comment