Wednesday, December 11, 2019
Bus105 Business Statistics
Question: Section 1: Goals and format of the assignment Paraphrase the following, there will be a similar discussions in the lectures and tutorials you can paraphrase that instead This assignment is supposed to help students understand the following concepts. * You can ask a sample of people simple questions and give a numerical summary of the results *The summary is not reliable, different samples would give different answers , This lack of reliability can be well explained by theoretical distributions such as the z distribution. *You can use Hypothesis testing to use sample to answer a questions about the population *The first sections of the assignment involve explaining and summarizing a large data set of simple data and interpreting the results the later sections discuss Hypothesis testing, the issues of collecting and working with large sets of data, and the fact it is very hard to fully understand the applications of the z distribution. Section 2: Description of the data set a)Expand upon and paraphrase the following, definitely describe each of the variables, For each variable answer the question is it categorical or numerical? The data set is the survey responses of 100,000 people every student has their own sample of 100 people , Each person is shown one of 3 possible endings for a tv show, ending 1, ending 2 or ending 3 and they are asked two questions , question 1: Do they like the movie? Question 2: How much would they pay for the DVD? b) Also do one or both of the following *Give some more questions that could be asked *Criticise the existing questions Section 3: Summary of the data set You must do this section first because the descriptive statistics will help you understand the assignment, When you write up the assignment the descriptive statistics section will go in the middle of the assignment, however you really need to do it first. You need to Open up the assignment data set (the file computer assignment with filters) You must only use your sample of 100 numbers use the filter given in the excel sheet to find your own 100 numbers (this should become clear if you actually open the excel file) Read instructions given in the excel file and answer the following questions. Answer the following questions (refer to the excel file for more instructions) you must get excel to give you your data set, your numbers will be different to the other students. Question 1 For the variable Do they like the tv show find =phat1=proportion of people that liked the tv show with ending 1 =phat2=proportion of people that liked the tv show with ending 2 =phat3=proportion of people that liked the tv show with ending 3 find phat1-phat2 phat2-phat3 Give a chart that compares the proportion of the 3 different tv show endings Question 2 find =xbar1=the average amount people would pay for the dvd of the tv show with ending 1 =xbar2= the average amount people would pay for the dvd of the tv show with ending 2 =xbar3=the average amount people would pay for the dvd of the tv show with ending 3 find xbar1-xbar2 xbar2-xbar3 Answer: SECTION 1 The write-up uses the following concepts: Population and sampling techniques Calculating Descriptive values - mean, mode, median, variance, maximum, range and coefficient of variation. Drawing of Graphs - histograms and bar chart Sampling distributions Estimation theory point estimates Hypothesis tests SECTION 2 This report is based on a sample drawn from a set of 10,000 survey responses of people. The sample size is 100, based on student id. The questionnaire administered relates to the person being shown one of 3 possible endings for a TV show, ending 1, ending 2 or ending 3 and they are asked two questions. Q1: Do they like the show? The answer is recorded as YES/ NO. The answer is therefore a categorical data type with 2 categories. Q2: How much would they pay for the DVD? This answer is a quantitative variable, with positive integers only as values. Section III In this section we present some findings as required: First we deal with the point estimates of the proportion of viewers in each group, defined by the ending shown. =phat1=.29 =phat2=.44 =phat3=.27 phat1-phat2 = -0.03 phat2-phat3 = -0.01 The table below shows the distribution of viewers across groups with their liking. Majority of viewers in group 2 and 3 like the show( 23/44 and 21/27), unlike in group 1 where the majority do not like the show. ( 16 out of 29) YES NO endg1 13 16 29 endg2 23 21 44 endg3 21 6 27 57 43 100 We present data on point estimates of average amounts that each group is willing to pay: =xbar1= 4.73 =xbar2= 5.76 =xbar3 = 8.2 sample average = 6.127 xbar1-xbar2 = -1.033 xbar2-xbar3 = - 2.439 Next we consider each group separately and visualise the distribution of willingness to pay using histograms. The following representation shows frequency distribution as well as cumulative percentage of this distribution visually. The table below shows the frequency numerically also. class Frequency ending 1 ending 2 ending 3 0 to less than 2 16 21 6 2 to less than 4 0 0 0 4 to less than 6 0 0 0 6 to less than 8 4 3 3 8 to less than 10 4 10 9 10 to less than 12 3 6 5 12 2 4 4 We now provide descriptive statistics for the 3 categories outlined above. ending 1 ending 2 ending 3 Mean 4.734483 5.768182 8.207407 Standard Error 0.926529 0.81876 0.884996 Median 0.9 8 9 Mode 8 9 9 Standard Deviation 4.989 5.431038 4.598572 Skewness 0.406286 0.290192 -0.87268 Range 13 18 14 Maximum 13 18 14 SECTION 4 Summary of data description: Average amount that a viewer wants to pay for a DVD is highest for show with ending 3- at 8.207 For shows with ending 1 with TV show ending with1 the average amount that a viewer wants to pay for a DVD is lowest at 4.73. The average amount for shows ending with 2 is 5.768, closer to group 1 average. The sample average stands at 6.127. There is less similarity in the maximum amount that a viewer is willing to pay for a DVD across all show. It is highest at 18 for group 2 and lowest for group 1 at 13. The three groups differ in skewness. Shows ending with 1 and 2 are positively skewed in contrast with negative skewness for group 3. This is seen in histograms as well. There is zero willingness to pay for amounts between $2 and $6 across groups. The range is almost same across categories, and equals the maximum amount as the minimum amount is zero in all groups. Range = maximum minimum. When minimum is zero range equals maximum. Variation is captured in an absolute sense by standard deviation. The standard deviation is highest for group 2 and lowest for group 3. SECTION 5 The use of surveys in making conclusions about populations is based on theory of estimation and hypothesis testing. The usefulness of surveys is unchallenged, especially when time and money constraints are important. They become unavoidable when the population itself is infinite or uncountable. The quality of the results ( in terms of relevance and usefulness) and conclusions drawn from a sample are subject to many considerations- size of sample as compared to population size, sampling technique used, confidence level chosen and the questions asked in any questionnaire from sample participants. After controlling for these aspects some problems still remain in the form of non sampling errors that include nature of responses being biased or participants giving wrong information. Some participants may provide frivolous answers as they are not required to prove the veracity of their answers with actions. When we ask how they are willing to pay for DVD they can sound magnanimous and state a large number, but may actually refuse to pay that amount. The questionnaire may not be exhaustive. For example for Q1, along with YES/ NO we can have CANT SAY also. SECTION 6 Let us consider shows ending with 1 and 2 first YES NO Ending 1 13 16 Ending 2 21 21 We use a chi square test of association to test for association between liking show and its ending for two pairs - 1and 2, and 2 and 3. The null hypothesis is independence between ending and liking. The alternative hypothesis says there is an association between liking a shos and its ending. First group 1 and 2, calculating expected values, we note that the chi square test value is 13.02395 and the critical value with 95% confidence is 5.02(with 1 degree of freedom). We DO NOT accept the null hypothesis. observed expected (O-E)^2/E 13 10.44 0.627739 16 10.73 2.588341 23 15.84 3.236465 21 16.28 1.368452 7.820997 . Let us consider shows ending with 2 and 3 now: YES NO Ending 2 23 21 Ending 3 21 6 Again using a chi square test we note that the test value is 13.264 and the critical value with 95% confidence is 5.02. Clearly, there is NO independence between liking the show and its ending. observed expected (O-E)^2/E 23 19.36 0.68438 21 11.88 7.001212 21 11.88 7.001212 6 7.29 0.228272 14.91508 We now conduct tests for checking differences in mean amounts that people are willing to pay for shows ending with 1 and 2. We use a normal distribution for the estimated difference value Ho: 1 = 2 H1: 1 2 Test value = (-1.033/ SE ) SE= (5^2/29 +4.9^2/44) ^.5 = 1.816 Test value = -.871 Using a 99% level of confidence the critical z value is 2.57. As test value critical value we can say that there is a NO significant difference in amounts people are willing to pay for DVDs with shows ending with 1 and 2. Now for shows ending with 2 and 3 . Ho: 2 = 3 H1: 2 3 Test value = (-2.439/ SE ) SE= (4.9^2/44 +4.6^2/27) ^.5 = 1.115 Test value = -2.1156. Using a 99% level of confidence the critical z value is 2.57. as test value critical value we can say that there is NO significant difference in amounts people are willing to pay for DVDs with shows ending with 2 and 3. SECTION VII Sampling distributions are an integral part of estimation and hypothesis theory. This theory forms the basis of any sample analysis to derive population glimpses. As a theory it is based on mathematical theory of probability distributions and mathematical proofs. Sampling theory uses a technique of deriving statistic values from samples. The samples are ideally infinite in number. These statistic values are then applied on probability distributions like normal z, t, F and chi square distributions to aid in the test of hypothesis. We have used two such distributions in our report chi square distribution to check independence of liking across categories, normal distribution to check differences in amount that people are willing to pay across shows. In real life we only draw one sample, which is why a theoretical concept that uses an infinite number of samples is difficult to grasp and understand SECTION VIII. We work with a sample that constitutes 1% of the population data, (100 out of 10000 datapoints). The conclusions are conditional on the student id that is used as the sampling method. These sample points are randomly spread over three groups based on show ending. Our sample has a disproportionate large number of data points in group 2, for shows with ending 2 (44 out of 100). The data shows variations in terms of some descriptive statistics like mean, variance, maximum values, skewness and standard deviation across all groups formed on the basis of ending of the show. There is a similarity in terms of maximum willingness to pay and range. Also there is a statistically significant association between liking show and its ending when we choose 2 pairs of shows( ending 1 and 2=, and ending 2 and 3). The differences in average amounts that a viewer pays are not statistically significant if we choose a 99% confidence level. These results are conditional on the level of confidence ( or Type 1 error chosen) a lower confidence level may change the conclusions.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.