## 17 Sep During these chapters, you will analyze three main concepts of testing data. 1. If you received a random sample size of 345 that is drawn from a population with a mean of 150 and stan

During these chapters, you will analyze three main concepts of testing data.

1. If you received a random sample size of 345 that is drawn from a population with a mean of 150 and standard deviation of 180. What is the standard deviation of the sample mean? Finally, what does the standard deviation mean in this question? Explain.

2. What are your assumptions with the confidence interval at 95%? Explain.

3. When observing hours discrepancy in the workplace, we analyze 32 workers. We noticed the sample mean was found to be 42.1 hours a week, with a standard deviation of 10.4. Test the claim that the standard deviation was at least 13 hours. The hypotheses are:

H0:σ=13

Ha:σ<13

We shall choose α=0.05

Will you accept or reject the hypotheses?

Need 4-5 pages with explanations and working. Please provide peer reviewed citations.

CHAPTER 7 Sampling and Sampling Distributions

CHAPTER 8 Confidence Interval Estimation

CHAPTER 9 Hypothesis Testing

P A R T 3 STATISTICAL INFERENCE

09953_ch07_ptg01_293-322.indd 293 04/03/19 12:30 PM

Copyright 2020 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

Copyright 2020 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-202

CHAPTER 7 Sampling and Sampling Distributions

SAMPLE SIZE SELECTION IN A LEGAL CASE This chapter introduces the important problem of estimat- ing an unknown population quantity by randomly sam- pling from the population. Sampling is often expensive and/or time-consuming, so a key step in any sampling plan is to determine the sample size that produces a prescribed level of accuracy. Some of the issues in finding an appro- priate sample size are discussed in Afshartous (2008). The author was involved as an expert statistical witness for the plaintiff in a court case. Over a period of several years, a service company had collected a flat “special service han-

dling fee” from its client during any month in which a special service request was made. The plaintiff claimed that many of these fees had been charged erroneously and sought to recover all the money collected from such erroneous fees. The statistical question con- cerns either the proportion of all monthly billing records that were erroneous or the total number of all erroneous billing records. Both sides had to agree on a sampling method for sampling through the very large population of billing records. They eventually agreed to simple random sampling, as discussed in this chapter. However, there was some contention (and confusion) regarding the appropriate sample size.

Their initial approach was to find a sample size n sufficiently large to accurately esti- mate p, the unknown proportion of all monthly billing records in error. Specifically, if they wanted to be 95% confident that the error in their estimate of p would be no more than 5%, then a standard sample size formula (provided in Chapter 8) requires n to be 385. (This number is surprisingly independent of the total number of billing records.) Then, for example, if the sample discovered 77 errors, or 20% of the sampled items, they would be 95% confident that between 15% and 25% (20% plus or minus 5%) of all billing records were in error.

The author argued that this “plus or minus 5%” does not necessarily provide the desired level of accuracy for the quantity of most interest, the total number of erroneously charged fees. A couple of numerical examples illustrate his point. Let’s suppose that there were 100,000 billing records total and that 20%, or 20,000, were billed erroneously. Then the plus or minus 5% interval translates to an interval from 15,000 to 25,000 bad bill- ings. That is, we are 95% confident that the estimate is not off by more than 5000 billing records on either side. The author defines the relative error in this case to be 0.25: the potential error, 5000, divided by the number to be estimated, 20,000. Now change the example slightly so that 60%, or 60,000, were billed erroneously. Then plus or minus 5% translates to the interval from 55,000 to 65,000, and the relative error is 5000>60,000, or 0.083. The point is that the same plus or minus 5% absolute error for p results in a much smaller relative error in the second example.

Using this reasoning, the author suggested that they should choose the sample size to achieve a prescribed relative error in the number of bad billings. This can change the mag- nitude of the sample size considerably. For example, the author demonstrated by means of a rather complicated sample size formula that if a relative error of 0.10 is desired and the value of p is somewhere around 0.10, a sample size of about 3600 is required. On the

G en

iy ro

ck a/

Sh ut

te rs

to ck

.c om

09953_ch07_ptg01_293-322.indd 294 04/03/19 12:30 PM

Copyright 2020 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

Copyright 2020 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-202

7-2 Sampling Terminology 2 9 5

other hand, if a relative error of 0.10 is still desired but the value of p is somewhere around 0.5, then the required sample size is only about 400.

Sample size formulas, and statistical arguments that lead to them, are far from intui- tive. In this legal case, by keeping the math to a minimum and using simple terminology like relative error, the author eventually convinced the others to use his approach, even though it led to a considerably larger sample size than the 385 originally proposed.

7-1 Introduction This chapter sets the stage for statistical inference, a topic that is explored in the following two chapters. In a typical statistical inference problem, you want to discover one or more characteristics of a given population. For example, you might want to know the proportion of toothpaste customers who have tried, or intend to try, a particular brand. Or you might want to know the average amount owed on credit card accounts for a population of cus- tomers at a shopping mall. Generally, the population is large and/or spread out, and it is difficult, maybe even impossible, to contact each member. Therefore, you identify a sam- ple of the population and then obtain information from the members of the sample.

There are two main objectives of this chapter. The first is to discuss the sampling schemes that are generally used in real sampling applications. We focus on several types of random samples and see why these are preferable to nonrandom samples. The second objective is to see how the information from a sample of the population—for example, 1% of the population—can be used to infer the properties of the entire popu- lation. The key here is the concept of a sampling distribution. In this chapter we focus on the sampling distribution of the sample mean, and we discuss the role of a famous mathematical result called the central limit theorem. Specifically, we discuss how the central limit theorem is the reason for the importance of the normal distribution in sta- tistical inference.

7-2 Sampling Terminology We begin by introducing some terminology that is used in sampling. In any sampling prob- lem there is a relevant population. The population is the set of all members about which a study intends to make inferences, where an inference is a statement about a numerical characteristic of the population, such as an average income or the proportion of incomes below $50,000. It is important to realize that a population is defined in relationship to any particular study. Any analyst planning a survey should first decide which population the conclusions of the study will concern, so that a sample can be chosen from this population.

For example, if a marketing researcher plans to use a questionnaire to infer consum- ers’ reactions to a new product, she must first decide which population of consumers is of interest—all consumers, consumers over 21 years old, consumers who do most of their shopping online, or others. Once the relevant consumer population has been designated, a sample from this population can then be surveyed. However, it is important to remember that inferences made from the study pertain only to this particular population.

The relevant population contains all members about which a study intends to make inferences.

09953_ch07_ptg01_293-322.indd 295 04/03/19 12:30 PM

Copyright 2020 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

Copyright 2020 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. WCN 02-200-202

2 9 6 C h a p T e r 7 S a m p l i n g a n d S a m p l i n g D i s t r i b u t i o n s

In this chapter we assume that the population is finite and consists of N sampling units. We also assume that a frame of these N sampling units is available. Unfortunately, there are many situations where a complete frame is practically impossible to obtain. For example, if the purpose of a study is to survey the attitudes of all unemployed teenagers in Chicago, it is practically impossible to obtain a complete frame of them. In this situation the best alternative is to obtain a partial frame from which the sample can be selected. If the partial frame omits any significant segments of the population that a complete frame would include, then the resulting sample could be biased. For instance, if you use a restau- rant guide to choose a sample of restaurants, you automatically omit all restaurants that do not advertise in the guide. Depending on the purposes of the study, this could be a serious omission.

There are two basic types of samples: probability samples and judgmental samples. A probability sample is a sample in which the sampling units are chosen from the pop- ulation according to a random mechanism. In contrast, no formal random mechanism is used to select a judgmental sample. In this case the sampling units are chosen according to the sampler’s judgment.

It is customary in virtually all statistical literature to let uppercase N be the population size and lowercase n be the sample size. We follow this convention as well.

Before you can choose a sample from a given population, you typically need a list of all members of the population. In sampling terminology, this list is called a frame, and the potential sample members are called sampling units. Depending on the context, sampling units could be individual people, households, companies, cities, or others.

A frame is a list of all members, called sampling units, in the population.

The members of a probability sample are chosen according to a random mecha- nism, whereas the members of a judgmental sample are chosen according to the sampler’s judgment.

We will not discuss judgmental samples. The reason is very simple—there is no way to measure the accuracy of judgmental samples because the rules of probability do not apply to them. In other words, if a population characteristic is estimated from the obser- vations in a judgmental sample, there is no way to measure the accuracy of this estimate. In addition, it is very difficult to choose a representative sample from a population with- out using some random mechanism. Because our judgment is usually not as good as we think, judgmental samples are likely to contain our own built-in biases. Therefore, we focus exclusively on probability samples from here on.

Why random Sampling?

One reason for sampling randomly from a population is to avoid biases (such as choosing mainly stay-at-home mothers because they are easier to contact). An equally important reason is that random sampling allows you to use probability to make inferences about unknown population parameters. If sampling were not random, there would be no basis for using probability to make such inferences.

Fundamental Insight

09953_ch07_ptg01_293-322.indd 296 04/03/19 12:30 PM

7-3 Methods for Selecting random Samples 2 9 7

7-3 Methods for Selecting Random Samples This section discusses the types of random samples that are used in real sampling applications. Different types of sampling schemes have different properties. There is typically a trade-off between cost and accuracy. Some sampling schemes are cheaper and easier to administer, whereas others are more costly but provide more accurate information. Some of these issues are discussed here. However, anyone who intends to make a living in survey sampling needs to learn much more about the topic than we can cover here.

7-3a Simple Random Sampling The simplest type of sampling scheme is appropriately called simple random sampling. Suppose you want to sample n units from a population of size N. Then a simple random sample of size n has the property that every possible sample of size n has the same prob- ability of being chosen. Simple random samples are the easiest to understand, and their statistical properties are the most straightforward. Therefore, we will focus primarily on simple random samples in the rest of this book. However, as we discuss shortly, more complex random samples are often used in real applications.

A simple random sample of size n is one where each possible sample of size n has the same chance of being chosen.

Let’s illustrate the concept with a simple random sample for a small population. Sup- pose the population size is N 5 5, and the five members of the population are labeled a, b, c, d, and e. Also, suppose the sample size is n 5 2. Then the possible samples are (a, b), (a, c), (a, d), (a, e), (b, c), (b, d), (b, e), (c, d), (c, e), and (d, e). That is, there are 10 possible samples—the number of ways two members can be chosen from five members. Then a simple random sample of size n 5 2 has the property that each of these 10 possible samples has the same probability, 1>10, of being chosen.

One other property of simple random samples can be seen from this example. If you focus on any member of the population, such as member b, you will see that b is a mem- ber of 4 of the 10 samples. Therefore, the probability that b is chosen in a simple random sample is 4>10, or 2>5. In general, any member has the same probability n>N of being chosen in a simple random sample. If you are one of 100,000 members of a population, then the probability that you will be selected in a simple random sample of size 100 is 100>100,000, or 1 out of 1000.

There are several ways simple random samples can be chosen, all of which involve random numbers. One approach that works well for the small example with N 5 5 and n 5 2 is to generate a single random number with the RAND function in Excel®. You divide the interval from 0 to 1 into 10 equal subintervals of length 1>10 each and see which of these subintervals the random number falls into. You then choose the corresponding sample. For example, suppose the random number is 0.465. This is in the fifth subinterval, that is, the interval from 0.4 to 0.5, so you choose the fifth sample, (b, c).

This method is clearly consistent with simple random sampling—each of the samples has the same chance of being chosen—but it is prohibitive when n and N are large. In this case there are too many possible samples to list. Fortunately, there is another method that can be used. The idea is simple. You sort the N members of the population randomly, using Excel’s RAND function to generate random numbers for the sort. Then you include the first n members from the sorted sequence in the random sample. This procedure is illustrated in Example 7.1.

The RAND function in Excel generates numbers that are distributed randomly and uniformly between 0 and 1.

09953_ch07_ptg01_293-322.indd 297 04/03/19 12:30 PM

2 9 8 C h a p T e r 7 S a m p l i n g a n d S a m p l i n g D i s t r i b u t i o n s

EXAMPLE

7.1 SAMPLING FAMILIES TO ANALYZE ANNUAL INCOMES Consider the frame of 40 families with annual incomes shown in column B of Figure 7.1, with several rows hidden. (See the file Random Sampling.xlsm. The extension is .xlsm because this file contains a macro. When you open it, you will need to enable the macro.) We want to choose a simple random sample of size 10 from this frame. How can this be done? And how do summary statistics of the chosen families compare to the corresponding summary statistics of the population?

Objective To illustrate how Excel’s random number function, RAND, can be used to generate simple random samples.

Solution The idea is very simple. You first generate a column of random numbers next to the data. Then you sort the rows according to the random numbers and choose the first 10 families in the sorted rows. The following procedure produces the results in Figure 7.2. (See the first sheet in the finished version of the file.)

1. Random numbers next to a copy. Copy the original data to columns D and E. Then enter the formula

= RAND()

in cell F10 and copy it down column F. 2. Replace with values. To enable sorting, you must first “freeze” the random numbers—that is, replace their formulas

with values. To do this, copy the range F10:F49 and select Paste Values from the Paste dropdown menu on the Home ribbon.

3. Sort. Sort on column F in ascending order. Then the 10 families with the 10 smallest random numbers are the ones in the sample. These are shaded in the figure. (Note that you could instead have chosen the 10 families with the 10 largest ran- dom numbers. This would be equally valid.)

4. Means. Use the AVERAGE, MEDIAN, and STDEV.S functions in row 6 to calculate summary statistics of the first 10 incomes in column E. Similar summary statistics for the population have already been calculated in row 5. (Cell D5 uses the STDEV.P function because this is the population standard deviation.)

To obtain more random samples of size 10 (for comparison), you would need to go through this process repeatedly. To save you the trouble of doing so, we wrote a macro to automate the process. (See the Automated sheet in the finished version of the file.) This sheet looks essentially the same as the sheet in Figure 7.2, except that there is a button to run the macro, and

Figure 7.1 Population Income Data 1

2 3 4 5 6 7 8 9

10 11 12

47 48 49

A B C D Simple random sampling

Summary statistics Mean Median Stdev

$39,985 $38,500 $7,377 Sample Population

Population Family Income

1 $43,300 2 $44,300 3 $34,600

38 $46,900 39 $37,300 40 $41,000

13 4 $38,000

09953_ch07_ptg01_293-322.indd 298 04/03/19 12:30 PM

7-3 Methods for Selecting random Samples 2 9 9

only the required data remain on the spreadsheet. Try clicking this button. Each time you do so, you will get a different random sample—and different summary measures in row 6. By doing this many times and keeping track of the sample summary data, you can see how the summary measures vary from sample to sample. We will have much more to say about this variation later in the chapter.

Figure 7.2 Selecting a Simple Random Sample 1

2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18 19 20

A B C D E F Simple random sampling

Summary statistics Mean Median Stdev

$39,985 $38,500 $7,377 Sampl Population

e $41,490 $42,850 $5,323

Population Random sample Family Income Family Income Random #

1 $43,300 1 $43,300 0.04545 2 $44,300 2 $44,300 0.1496768 3 $34,600 12 $51,500 0.23527 4 $38,000 7 $42,700 0.2746325 5 $44,700 13 $35,900 0.3003506 6 $45,600 15 $43,000 0.3197393 7 $42,700 6 $45,600 0.3610983 8 $36,900 3 $34,600 0.3852641 9 $38,400 9 $38,400 0.4427564

10 $33,700 14 $35,600 0.4447877 11 $44,100 5 $44,700 0.4505899 12 $51,500 $41,000 0.459736121

47 48 49

40 38 $46,900 39 $37,300 0.8644119 39 $37,300 8 $36,900 0.9059098 40 $41,000 10 $33,700 0.9637509

The procedure described in Example 7.1 can be used in Excel to select a simple ran- dom sample of any size from any population. All you need is a frame that lists the pop- ulation values. Then it is just a matter of inserting random numbers, freezing them, and sorting on the random numbers.

Perhaps surprisingly, simple random samples are used infrequently in real applica- tions. There are several reasons for this.

• Because each sampling unit has the same chance of being sampled, simple random sampling can result in samples that are spread over a large geographical region. This can make sampling extremely expensive, especially if personal interviews are used.

• Simple random sampling requires that all sampling units be identified prior to sampling. Sometimes this is infeasible.

• Simple random sampling can result in underrepresentation or overrepresentation of cer- tain segments of the population. For example, if the primary—but not sole—interest is in the graduate student subpopulation of university students, a simple random sam- ple of all university students might not provide enough information about the graduate students.

Despite this, most of the statistical analysis in this book assumes simple random samples. The analysis is considerably more complex for other types of random samples and is best left to more advanced books on sampling.

09953_ch07_ptg01_293-322.indd 299 04/03/19 12:30 PM

3 0 0 C h a p T e r 7 S a m p l i n g a n d S a m p l i n g D i s t r i b u t i o n s

EXAMPLE

7.2 SAMPLING ACCOUNTS RECEIVABLE AT SPRING MILLS The file Accounts Receivable Finished.xlsx contains 280 accounts receivable for Spring Mills Company. There is a single variable, the amount of the bill owed. The file contains the bill amounts for 25 random samples of size 15 each. (They were generated by the method in Example 7.1.) Calculate the average amount owed in each random sample, and create a histogram of these 25 averages.

Objective To demonstrate how sample means are distributed.

Solution In most real-world applications, you would generate only a single random sample from a population, so why have we gener- ated 25 random samples in this example? The reason is that we want to introduce the concept of a sampling distribution, in this case the sampling distribution of the sample mean. This is the distribution of all possible sample means you could generate from all possible samples (of a given size) from a population. By generating a fairly large number of random samples from the population of accounts receivable, you can see what the approximate sampling distribution of the sample mean looks like.

The 25 random samples, one per row, are listed in Figure 7.3. We then used the AVERAGE function in column Q to calculate their sample means, and we created a histogram of these 15 sample means. As you can see, the 15 sample means vary quite a lot, from a low of $332.00 to a high of $799.33. You can check that the population mean, the average of all 280 bill amounts, is $464.29. So the 15 sample means vary around this population mean and are spread out roughly as a bell-shaped curve, as shown in the histogram.

Figure 7.3 Random Samples and Sample Means

$610 $410 $200 $250 $260 $260 $450 $310 $260 $280 $620 $500 $240 $310 $510 $210 $240 $260 $240 $430 $250 $240 $580 $310 $220

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

$460 $570 $470

$1,340 $350 $180 $240 $280 $300 $210

$1,340 $320

$1,580 $240 $220

$1,360 $180 $750 $460 $320

$1,480 $510 $420 $500 $390

$370 $570 $250 $210

$1,380 $190 $460 $220 $930

$1,010 $2,220

$260 $410 $240 $410 $280 $520 $270 $610

$2,030 $210 $460 $410 $290 $490

$370 $280 $570 $200 $260 $000

$000 $460

$1,100 $180

$1,550 $200

$280 $470 $350 $240 $410 $000

$000 $350 $500

$1,590 $1,380

$230

$000 $1,330

$260 $250 $750 $930

$000 $240 $520 $400 $250 $220

$000 $2,030

$380 $310 $580 $250

$000 $560 $560

$1,340 $660 $350

$000 $260 $350 $350 $460 $240

$640 $190 $270 $570 $250 $000

$300 $280 $420

$1,460 $560 $000

$410 $270 $580 $570 $520 $000

$510 $220 $330 $430 $290 $000

$1,340 $270

$2,030 $220 $260 $000

$660 $2,220

$220 $460

$1,460 $650 $580 $240 $370 $290 $650 $290 $560 $220 $240 $260 $250 $190 $210 $190 $200 $300 $560 $350 $320

$1,480 $580 $240 $280 $260 $370 $620 $410 $190 $290 $420 $380 $260 $930 $510 $280

$1,590 $200 $270 $510 $370 $260 $230 $580 $240

$2,030 $540 $210 $270

$1,010 $310 $400 $240 $210 $430 $560

$1,460 $310

$1,340 $540 $250 $540 $350 $230 $560 $250

$1,400 $210 $220 $540

$1,010 $620 $330 $370 $560

$1,330 $350 $290 $220

$1,330 $210 $300 $200 $220

$1,250 $460 $270 $280 $190 $180

$1,520 $220 $560 $270 $540

$1,520 $1,360 $1,520

$330 $270 $180 $330 $510

$1,520 $250 $420 $330 $510 $210 $260 $420 $380 $460 $380 $260 $380 $230 $520 $390 $240

Sample mean $799.33 $590.00 $532.67 $480.00 $540.00 $475.33 $519.33 $415.33 $471.33 $502.00 $581.33 $459.33 $421.33 $500.67 $403.33 $458.67 $395.33 $332.00 $468.00 $458.00 $659.33 $486.00 $527.33 $569.33 $360.00

Sample

Random samples of size 15

Bill Amounts

A B C D E F G H I J K L M N P QO

Histogram of Sample Means

16

14

12

10

8

6

4

2

0 [$332.00, $452.00] [$452.00, $572.00] [$572.00, $692.00] [$692.00, $812.00]

This histogram approximates the sampling distribution of the sample mean. It is approximate because it is based on only 15

Our website has a team of professional writers who can help you write any of your homework. They will write your papers from scratch. We also have a team of editors just to make sure all papers are of HIGH QUALITY & PLAGIARISM FREE. To make an Order you only need to click Ask A Question and we will direct you to our Order Page at WriteDemy. Then fill Our Order Form with all your assignment instructions. Select your deadline and pay for your paper. You will get it few hours before your set deadline.

Fill in all the assignment paper details that are required in the order form with the standard information being the page count, deadline, academic level and type of paper. It is advisable to have this information at hand so that you can quickly fill in the necessary information needed in the form for the essay writer to be immediately assigned to your writing project. Make payment for the custom essay order to enable us to assign a suitable writer to your order. Payments are made through Paypal on a secured billing page. Finally, sit back and relax.

**About Wridemy**

We are a professional paper writing website. If you have searched a question and bumped into our website just know you are in the right place to get help in your coursework. We offer **HIGH QUALITY & PLAGIARISM FREE** Papers.

**How It Works**

To make an Order you only need to click on **“Order Now”** and we will
direct you to our Order Page. Fill Our Order Form with all your assignment instructions. Select your deadline and pay for your paper. You will get it few hours before your set deadline.

**Are there Discounts?**

All new clients are eligible for 20% off in their first Order. Our payment method is safe and secure.

__Hire a tutor today __**CLICK HERE** to make your first order

**CLICK HERE**to make your first order