mirror of
				https://github.com/saitohirga/WSJT-X.git
				synced 2025-10-31 13:10:19 -04:00 
			
		
		
		
	
		
			
	
	
		
			782 lines
		
	
	
		
			32 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
		
		
			
		
	
	
			782 lines
		
	
	
		
			32 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
|  | 
 | ||
|  | [section:st_eg Student's t Distribution Examples] | ||
|  | 
 | ||
|  | [section:tut_mean_intervals Calculating confidence intervals on the mean with the Students-t distribution] | ||
|  | 
 | ||
|  | Let's say you have a sample mean, you may wish to know what confidence intervals | ||
|  | you can place on that mean.  Colloquially: "I want an interval that I can be | ||
|  | P% sure contains the true mean".  (On a technical point, note that | ||
|  | the interval either contains the true mean or it does not: the | ||
|  | meaning of the confidence level is subtly | ||
|  | different from this colloquialism.  More background information can be found on the | ||
|  | [@http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm NIST site]). | ||
|  | 
 | ||
|  | The formula for the interval can be expressed as: | ||
|  | 
 | ||
|  | [equation dist_tutorial4] | ||
|  | 
 | ||
|  | Where, ['Y[sub s]] is the sample mean, /s/ is the sample standard deviation, | ||
|  | /N/ is the sample size, /[alpha]/ is the desired significance level and | ||
|  | ['t[sub ([alpha]/2,N-1)]] is the upper critical value of the Students-t | ||
|  | distribution with /N-1/ degrees of freedom. | ||
|  | 
 | ||
|  | [note | ||
|  | The quantity [alpha][space] is the maximum acceptable risk of falsely rejecting | ||
|  | the null-hypothesis.  The smaller the value of [alpha] the greater the | ||
|  | strength of the test. | ||
|  | 
 | ||
|  | The confidence level of the test is defined as 1 - [alpha], and often expressed | ||
|  | as a percentage.  So for example a significance level of 0.05, is equivalent | ||
|  | to a 95% confidence level.  Refer to | ||
|  | [@http://www.itl.nist.gov/div898/handbook/prc/section1/prc14.htm | ||
|  | "What are confidence intervals?"] in __handbook for more information. | ||
|  | ] [/ Note] | ||
|  | 
 | ||
|  | [note | ||
|  | The usual assumptions of | ||
|  | [@http://en.wikipedia.org/wiki/Independent_and_identically-distributed_random_variables independent and identically distributed (i.i.d.)] | ||
|  | variables and [@http://en.wikipedia.org/wiki/Normal_distribution normal distribution] | ||
|  | of course apply here, as they do in other examples. | ||
|  | ] | ||
|  | 
 | ||
|  | From the formula, it should be clear that: | ||
|  | 
 | ||
|  | * The width of the confidence interval decreases as the sample size increases. | ||
|  | * The width increases as the standard deviation increases. | ||
|  | * The width increases as the ['confidence level increases] (0.5 towards 0.99999 - stronger). | ||
|  | * The width increases as the ['significance level decreases] (0.5 towards 0.00000...01 - stronger). | ||
|  | 
 | ||
|  | The following example code is taken from the example program | ||
|  | [@../../example/students_t_single_sample.cpp students_t_single_sample.cpp]. | ||
|  | 
 | ||
|  | We'll begin by defining a procedure to calculate intervals for | ||
|  | various confidence levels; the procedure will print these out | ||
|  | as a table: | ||
|  | 
 | ||
|  |    // Needed includes: | ||
|  |    #include <boost/math/distributions/students_t.hpp> | ||
|  |    #include <iostream> | ||
|  |    #include <iomanip> | ||
|  |    // Bring everything into global namespace for ease of use: | ||
|  |    using namespace boost::math; | ||
|  |    using namespace std; | ||
|  | 
 | ||
|  |    void confidence_limits_on_mean( | ||
|  |       double Sm,           // Sm = Sample Mean. | ||
|  |       double Sd,           // Sd = Sample Standard Deviation. | ||
|  |       unsigned Sn)         // Sn = Sample Size. | ||
|  |    { | ||
|  |       using namespace std; | ||
|  |       using namespace boost::math; | ||
|  | 
 | ||
|  |       // Print out general info: | ||
|  |       cout << | ||
|  |          "__________________________________\n" | ||
|  |          "2-Sided Confidence Limits For Mean\n" | ||
|  |          "__________________________________\n\n"; | ||
|  |       cout << setprecision(7); | ||
|  |       cout << setw(40) << left << "Number of Observations" << "=  " << Sn << "\n"; | ||
|  |       cout << setw(40) << left << "Mean" << "=  " << Sm << "\n"; | ||
|  |       cout << setw(40) << left << "Standard Deviation" << "=  " << Sd << "\n"; | ||
|  | 
 | ||
|  | We'll define a table of significance/risk levels for which we'll compute intervals: | ||
|  | 
 | ||
|  |       double alpha[] = { 0.5, 0.25, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001 }; | ||
|  | 
 | ||
|  | Note that these are the complements of the confidence/probability levels: 0.5, 0.75, 0.9 .. 0.99999). | ||
|  | 
 | ||
|  | Next we'll declare the distribution object we'll need, note that | ||
|  | the /degrees of freedom/ parameter is the sample size less one: | ||
|  | 
 | ||
|  |       students_t dist(Sn - 1); | ||
|  | 
 | ||
|  | Most of what follows in the program is pretty printing, so let's focus | ||
|  | on the calculation of the interval. First we need the t-statistic, | ||
|  | computed using the /quantile/ function and our significance level.  Note | ||
|  | that since the significance levels are the complement of the probability, | ||
|  | we have to wrap the arguments in a call to /complement(...)/: | ||
|  | 
 | ||
|  |    double T = quantile(complement(dist, alpha[i] / 2)); | ||
|  | 
 | ||
|  | Note that alpha was divided by two, since we'll be calculating | ||
|  | both the upper and lower bounds: had we been interested in a single | ||
|  | sided interval then we would have omitted this step. | ||
|  | 
 | ||
|  | Now to complete the picture, we'll get the (one-sided) width of the | ||
|  | interval from the t-statistic | ||
|  | by multiplying by the standard deviation, and dividing by the square | ||
|  | root of the sample size: | ||
|  | 
 | ||
|  |    double w = T * Sd / sqrt(double(Sn)); | ||
|  | 
 | ||
|  | The two-sided interval is then the sample mean plus and minus this width. | ||
|  | 
 | ||
|  | And apart from some more pretty-printing that completes the procedure. | ||
|  | 
 | ||
|  | Let's take a look at some sample output, first using the | ||
|  | [@http://www.itl.nist.gov/div898/handbook/eda/section4/eda428.htm | ||
|  | Heat flow data] from the NIST site.  The data set was collected | ||
|  | by Bob Zarr of NIST in January, 1990 from a heat flow meter | ||
|  | calibration and stability analysis. | ||
|  | The corresponding dataplot | ||
|  | output for this test can be found in | ||
|  | [@http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm | ||
|  | section 3.5.2] of the __handbook. | ||
|  | 
 | ||
|  | [pre''' | ||
|  |    __________________________________ | ||
|  |    2-Sided Confidence Limits For Mean | ||
|  |    __________________________________ | ||
|  | 
 | ||
|  |    Number of Observations                  =  195 | ||
|  |    Mean                                    =  9.26146 | ||
|  |    Standard Deviation                      =  0.02278881 | ||
|  | 
 | ||
|  | 
 | ||
|  |    ___________________________________________________________________ | ||
|  |    Confidence       T           Interval          Lower          Upper | ||
|  |     Value (%)     Value          Width            Limit          Limit | ||
|  |    ___________________________________________________________________ | ||
|  |        50.000     0.676       1.103e-003        9.26036        9.26256 | ||
|  |        75.000     1.154       1.883e-003        9.25958        9.26334 | ||
|  |        90.000     1.653       2.697e-003        9.25876        9.26416 | ||
|  |        95.000     1.972       3.219e-003        9.25824        9.26468 | ||
|  |        99.000     2.601       4.245e-003        9.25721        9.26571 | ||
|  |        99.900     3.341       5.453e-003        9.25601        9.26691 | ||
|  |        99.990     3.973       6.484e-003        9.25498        9.26794 | ||
|  |        99.999     4.537       7.404e-003        9.25406        9.26886 | ||
|  | '''] | ||
|  | 
 | ||
|  | As you can see the large sample size (195) and small standard deviation (0.023) | ||
|  | have combined to give very small intervals, indeed we can be | ||
|  | very confident that the true mean is 9.2. | ||
|  | 
 | ||
|  | For comparison the next example data output is taken from | ||
|  | ['P.K.Hou, O. W. Lau & M.C. Wong, Analyst (1983) vol. 108, p 64. | ||
|  | and from Statistics for Analytical Chemistry, 3rd ed. (1994), pp 54-55 | ||
|  | J. C. Miller and J. N. Miller, Ellis Horwood ISBN 0 13 0309907.] | ||
|  | The values result from the determination of mercury by cold-vapour | ||
|  | atomic absorption. | ||
|  | 
 | ||
|  | [pre''' | ||
|  |    __________________________________ | ||
|  |    2-Sided Confidence Limits For Mean | ||
|  |    __________________________________ | ||
|  | 
 | ||
|  |    Number of Observations                  =  3 | ||
|  |    Mean                                    =  37.8000000 | ||
|  |    Standard Deviation                      =  0.9643650 | ||
|  | 
 | ||
|  | 
 | ||
|  |    ___________________________________________________________________ | ||
|  |    Confidence       T           Interval          Lower          Upper | ||
|  |     Value (%)     Value          Width            Limit          Limit | ||
|  |    ___________________________________________________________________ | ||
|  |        50.000     0.816            0.455       37.34539       38.25461 | ||
|  |        75.000     1.604            0.893       36.90717       38.69283 | ||
|  |        90.000     2.920            1.626       36.17422       39.42578 | ||
|  |        95.000     4.303            2.396       35.40438       40.19562 | ||
|  |        99.000     9.925            5.526       32.27408       43.32592 | ||
|  |        99.900    31.599           17.594       20.20639       55.39361 | ||
|  |        99.990    99.992           55.673      -17.87346       93.47346 | ||
|  |        99.999   316.225          176.067     -138.26683      213.86683 | ||
|  | '''] | ||
|  | 
 | ||
|  | This time the fact that there are only three measurements leads to | ||
|  | much wider intervals, indeed such large intervals that it's hard | ||
|  | to be very confident in the location of the mean. | ||
|  | 
 | ||
|  | [endsect] | ||
|  | 
 | ||
|  | [section:tut_mean_test Testing a sample mean for difference from a "true" mean] | ||
|  | 
 | ||
|  | When calibrating or comparing a scientific instrument or measurement method of some kind, | ||
|  | we want to be answer the question "Does an observed sample mean differ from the | ||
|  | "true" mean in any significant way?".  If it does, then we have evidence of | ||
|  | a systematic difference.  This question can be answered with a Students-t test: | ||
|  | more information can be found | ||
|  | [@http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm | ||
|  | on the NIST site]. | ||
|  | 
 | ||
|  | Of course, the assignment of "true" to one mean may be quite arbitrary, | ||
|  | often this is simply a "traditional" method of measurement. | ||
|  | 
 | ||
|  | The following example code is taken from the example program | ||
|  | [@../../example/students_t_single_sample.cpp students_t_single_sample.cpp]. | ||
|  | 
 | ||
|  | We'll begin by defining a procedure to determine which of the | ||
|  | possible hypothesis are rejected or not-rejected | ||
|  | at a given significance level: | ||
|  | 
 | ||
|  | [note | ||
|  | Non-statisticians might say 'not-rejected' means 'accepted', | ||
|  | (often of the null-hypothesis) implying, wrongly, that there really *IS* no difference, | ||
|  | but statisticans eschew this to avoid implying that there is positive evidence of 'no difference'. | ||
|  | 'Not-rejected' here means there is *no evidence* of difference, but there still might well be a difference. | ||
|  | For example, see [@http://en.wikipedia.org/wiki/Argument_from_ignorance argument from ignorance] and | ||
|  | [@http://www.bmj.com/cgi/content/full/311/7003/485 Absence of evidence does not constitute evidence of absence.] | ||
|  | ] [/ note] | ||
|  | 
 | ||
|  | 
 | ||
|  |    // Needed includes: | ||
|  |    #include <boost/math/distributions/students_t.hpp> | ||
|  |    #include <iostream> | ||
|  |    #include <iomanip> | ||
|  |    // Bring everything into global namespace for ease of use: | ||
|  |    using namespace boost::math; | ||
|  |    using namespace std; | ||
|  | 
 | ||
|  |    void single_sample_t_test(double M, double Sm, double Sd, unsigned Sn, double alpha) | ||
|  |    { | ||
|  |       // | ||
|  |       // M = true mean. | ||
|  |       // Sm = Sample Mean. | ||
|  |       // Sd = Sample Standard Deviation. | ||
|  |       // Sn = Sample Size. | ||
|  |       // alpha = Significance Level. | ||
|  | 
 | ||
|  | Most of the procedure is pretty-printing, so let's just focus on the | ||
|  | calculation, we begin by calculating the t-statistic: | ||
|  | 
 | ||
|  |    // Difference in means: | ||
|  |    double diff = Sm - M; | ||
|  |    // Degrees of freedom: | ||
|  |    unsigned v = Sn - 1; | ||
|  |    // t-statistic: | ||
|  |    double t_stat = diff * sqrt(double(Sn)) / Sd; | ||
|  | 
 | ||
|  | Finally calculate the probability from the t-statistic. If we're interested | ||
|  | in simply whether there is a difference (either less or greater) or not, | ||
|  | we don't care about the sign of the t-statistic, | ||
|  | and we take the complement of the probability for comparison | ||
|  | to the significance level: | ||
|  | 
 | ||
|  |    students_t dist(v); | ||
|  |    double q = cdf(complement(dist, fabs(t_stat))); | ||
|  | 
 | ||
|  | The procedure then prints out the results of the various tests | ||
|  | that can be done, these | ||
|  | can be summarised in the following table: | ||
|  | 
 | ||
|  | [table | ||
|  | [[Hypothesis][Test]] | ||
|  | [[The Null-hypothesis: there is | ||
|  | *no difference* in means] | ||
|  | [Reject if complement of CDF for |t| < significance level / 2: | ||
|  | 
 | ||
|  | `cdf(complement(dist, fabs(t))) < alpha / 2`]] | ||
|  | 
 | ||
|  | [[The Alternative-hypothesis: there | ||
|  | *is difference* in means] | ||
|  | [Reject if complement of CDF for |t| > significance level / 2: | ||
|  | 
 | ||
|  | `cdf(complement(dist, fabs(t))) > alpha / 2`]] | ||
|  | 
 | ||
|  | [[The Alternative-hypothesis: the sample mean *is less* than | ||
|  | the true mean.] | ||
|  | [Reject if CDF of t > 1 - significance level: | ||
|  | 
 | ||
|  | `cdf(complement(dist, t)) < alpha`]] | ||
|  | 
 | ||
|  | [[The Alternative-hypothesis: the sample mean *is greater* than | ||
|  | the true mean.] | ||
|  | [Reject if complement of CDF of t < significance level: | ||
|  | 
 | ||
|  | `cdf(dist, t) < alpha`]] | ||
|  | ] | ||
|  | 
 | ||
|  | [note | ||
|  | Notice that the comparisons are against `alpha / 2` for a two-sided test | ||
|  | and against `alpha` for a one-sided test] | ||
|  | 
 | ||
|  | Now that we have all the parts in place, let's take a look at some | ||
|  | sample output, first using the | ||
|  | [@http://www.itl.nist.gov/div898/handbook/eda/section4/eda428.htm | ||
|  | Heat flow data] from the NIST site.  The data set was collected | ||
|  | by Bob Zarr of NIST in January, 1990 from a heat flow meter | ||
|  | calibration and stability analysis.  The corresponding dataplot | ||
|  | output for this test can be found in | ||
|  | [@http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm | ||
|  | section 3.5.2] of the __handbook. | ||
|  | 
 | ||
|  | [pre | ||
|  | __________________________________ | ||
|  | Student t test for a single sample | ||
|  | __________________________________ | ||
|  | 
 | ||
|  | Number of Observations                                 =  195 | ||
|  | Sample Mean                                            =  9.26146 | ||
|  | Sample Standard Deviation                              =  0.02279 | ||
|  | Expected True Mean                                     =  5.00000 | ||
|  | 
 | ||
|  | Sample Mean - Expected Test Mean                       =  4.26146 | ||
|  | Degrees of Freedom                                     =  194 | ||
|  | T Statistic                                            =  2611.28380 | ||
|  | Probability that difference is due to chance           =  0.000e+000 | ||
|  | 
 | ||
|  | Results for Alternative Hypothesis and alpha           =  0.0500 | ||
|  | 
 | ||
|  | Alternative Hypothesis     Conclusion | ||
|  | Mean != 5.000            NOT REJECTED | ||
|  | Mean  < 5.000            REJECTED | ||
|  | Mean  > 5.000            NOT REJECTED | ||
|  | ] | ||
|  | 
 | ||
|  | You will note the line that says the probability that the difference is | ||
|  | due to chance is zero.  From a philosophical point of view, of course, | ||
|  | the probability can never reach zero.  However, in this case the calculated | ||
|  | probability is smaller than the smallest representable double precision number, | ||
|  | hence the appearance of a zero here.  Whatever its "true" value is, we know it | ||
|  | must be extraordinarily small, so the alternative hypothesis - that there is | ||
|  | a difference in means - is not rejected. | ||
|  | 
 | ||
|  | For comparison the next example data output is taken from | ||
|  | ['P.K.Hou, O. W. Lau & M.C. Wong, Analyst (1983) vol. 108, p 64. | ||
|  | and from Statistics for Analytical Chemistry, 3rd ed. (1994), pp 54-55 | ||
|  | J. C. Miller and J. N. Miller, Ellis Horwood ISBN 0 13 0309907.] | ||
|  | The values result from the determination of mercury by cold-vapour | ||
|  | atomic absorption. | ||
|  | 
 | ||
|  | [pre | ||
|  | __________________________________ | ||
|  | Student t test for a single sample | ||
|  | __________________________________ | ||
|  | 
 | ||
|  | Number of Observations                                 =  3 | ||
|  | Sample Mean                                            =  37.80000 | ||
|  | Sample Standard Deviation                              =  0.96437 | ||
|  | Expected True Mean                                     =  38.90000 | ||
|  | 
 | ||
|  | Sample Mean - Expected Test Mean                       =  -1.10000 | ||
|  | Degrees of Freedom                                     =  2 | ||
|  | T Statistic                                            =  -1.97566 | ||
|  | Probability that difference is due to chance           =  1.869e-001 | ||
|  | 
 | ||
|  | Results for Alternative Hypothesis and alpha           =  0.0500 | ||
|  | 
 | ||
|  | Alternative Hypothesis     Conclusion | ||
|  | Mean != 38.900            REJECTED | ||
|  | Mean  < 38.900            NOT REJECTED | ||
|  | Mean  > 38.900            NOT REJECTED | ||
|  | ] | ||
|  | 
 | ||
|  | As you can see the small number of measurements (3) has led to a large uncertainty | ||
|  | in the location of the true mean.  So even though there appears to be a difference | ||
|  | between the sample mean and the expected true mean, we conclude that there | ||
|  | is no significant difference, and are unable to reject the null hypothesis. | ||
|  | However, if we were to lower the bar for acceptance down to alpha = 0.1 | ||
|  | (a 90% confidence level) we see a different output: | ||
|  | 
 | ||
|  | [pre | ||
|  | __________________________________ | ||
|  | Student t test for a single sample | ||
|  | __________________________________ | ||
|  | 
 | ||
|  | Number of Observations                                 =  3 | ||
|  | Sample Mean                                            =  37.80000 | ||
|  | Sample Standard Deviation                              =  0.96437 | ||
|  | Expected True Mean                                     =  38.90000 | ||
|  | 
 | ||
|  | Sample Mean - Expected Test Mean                       =  -1.10000 | ||
|  | Degrees of Freedom                                     =  2 | ||
|  | T Statistic                                            =  -1.97566 | ||
|  | Probability that difference is due to chance           =  1.869e-001 | ||
|  | 
 | ||
|  | Results for Alternative Hypothesis and alpha           =  0.1000 | ||
|  | 
 | ||
|  | Alternative Hypothesis     Conclusion | ||
|  | Mean != 38.900            REJECTED | ||
|  | Mean  < 38.900            NOT REJECTED | ||
|  | Mean  > 38.900            REJECTED | ||
|  | ] | ||
|  | 
 | ||
|  | In this case, we really have a borderline result, | ||
|  | and more data (and/or more accurate data), | ||
|  | is needed for a more convincing conclusion. | ||
|  | 
 | ||
|  | [endsect] | ||
|  | 
 | ||
|  | [section:tut_mean_size Estimating how large a sample size would have to become | ||
|  | in order to give a significant Students-t test result with a single sample test] | ||
|  | 
 | ||
|  | Imagine you have conducted a Students-t test on a single sample in order | ||
|  | to check for systematic errors in your measurements.  Imagine that the | ||
|  | result is borderline.  At this point one might go off and collect more data, | ||
|  | but it might be prudent to first ask the question "How much more?". | ||
|  | The parameter estimators of the students_t_distribution class | ||
|  | can provide this information. | ||
|  | 
 | ||
|  | This section is based on the example code in | ||
|  | [@../../example/students_t_single_sample.cpp students_t_single_sample.cpp] | ||
|  | and we begin by defining a procedure that will print out a table of | ||
|  | estimated sample sizes for various confidence levels: | ||
|  | 
 | ||
|  |    // Needed includes: | ||
|  |    #include <boost/math/distributions/students_t.hpp> | ||
|  |    #include <iostream> | ||
|  |    #include <iomanip> | ||
|  |    // Bring everything into global namespace for ease of use: | ||
|  |    using namespace boost::math; | ||
|  |    using namespace std; | ||
|  | 
 | ||
|  |    void single_sample_find_df( | ||
|  |       double M,          // M = true mean. | ||
|  |       double Sm,         // Sm = Sample Mean. | ||
|  |       double Sd)         // Sd = Sample Standard Deviation. | ||
|  |    { | ||
|  | 
 | ||
|  | Next we define a table of significance levels: | ||
|  | 
 | ||
|  |       double alpha[] = { 0.5, 0.25, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001 }; | ||
|  | 
 | ||
|  | Printing out the table of sample sizes required for various confidence levels | ||
|  | begins with the table header: | ||
|  | 
 | ||
|  |       cout << "\n\n" | ||
|  |               "_______________________________________________________________\n" | ||
|  |               "Confidence       Estimated          Estimated\n" | ||
|  |               " Value (%)      Sample Size        Sample Size\n" | ||
|  |               "              (one sided test)    (two sided test)\n" | ||
|  |               "_______________________________________________________________\n"; | ||
|  | 
 | ||
|  | 
 | ||
|  | And now the important part: the sample sizes required.  Class | ||
|  | `students_t_distribution` has a static member function | ||
|  | `find_degrees_of_freedom` that will calculate how large | ||
|  | a sample size needs to be in order to give a definitive result. | ||
|  | 
 | ||
|  | The first argument is the difference between the means that you | ||
|  | wish to be able to detect, here it's the absolute value of the | ||
|  | difference between the sample mean, and the true mean. | ||
|  | 
 | ||
|  | Then come two probability values: alpha and beta.  Alpha is the | ||
|  | maximum acceptable risk of rejecting the null-hypothesis when it is | ||
|  | in fact true.  Beta is the maximum acceptable risk of failing to reject | ||
|  | the null-hypothesis when in fact it is false. | ||
|  | Also note that for a two-sided test, alpha must be divided by 2. | ||
|  | 
 | ||
|  | The final parameter of the function is the standard deviation of the sample. | ||
|  | 
 | ||
|  | In this example, we assume that alpha and beta are the same, and call | ||
|  | `find_degrees_of_freedom` twice: once with alpha for a one-sided test, | ||
|  | and once with alpha/2 for a two-sided test. | ||
|  | 
 | ||
|  |       for(unsigned i = 0; i < sizeof(alpha)/sizeof(alpha[0]); ++i) | ||
|  |       { | ||
|  |          // Confidence value: | ||
|  |          cout << fixed << setprecision(3) << setw(10) << right << 100 * (1-alpha[i]); | ||
|  |          // calculate df for single sided test: | ||
|  |          double df = students_t::find_degrees_of_freedom( | ||
|  |             fabs(M - Sm), alpha[i], alpha[i], Sd); | ||
|  |          // convert to sample size: | ||
|  |          double size = ceil(df) + 1; | ||
|  |          // Print size: | ||
|  |          cout << fixed << setprecision(0) << setw(16) << right << size; | ||
|  |          // calculate df for two sided test: | ||
|  |          df = students_t::find_degrees_of_freedom( | ||
|  |             fabs(M - Sm), alpha[i]/2, alpha[i], Sd); | ||
|  |          // convert to sample size: | ||
|  |          size = ceil(df) + 1; | ||
|  |          // Print size: | ||
|  |          cout << fixed << setprecision(0) << setw(16) << right << size << endl; | ||
|  |       } | ||
|  |       cout << endl; | ||
|  |    } | ||
|  | 
 | ||
|  | Let's now look at some sample output using data taken from | ||
|  | ['P.K.Hou, O. W. Lau & M.C. Wong, Analyst (1983) vol. 108, p 64. | ||
|  | and from Statistics for Analytical Chemistry, 3rd ed. (1994), pp 54-55 | ||
|  | J. C. Miller and J. N. Miller, Ellis Horwood ISBN 0 13 0309907.] | ||
|  | The values result from the determination of mercury by cold-vapour | ||
|  | atomic absorption. | ||
|  | 
 | ||
|  | Only three measurements were made, and the Students-t test above | ||
|  | gave a borderline result, so this example | ||
|  | will show us how many samples would need to be collected: | ||
|  | 
 | ||
|  | [pre''' | ||
|  | _____________________________________________________________ | ||
|  | Estimated sample sizes required for various confidence levels | ||
|  | _____________________________________________________________ | ||
|  | 
 | ||
|  | True Mean                               =  38.90000 | ||
|  | Sample Mean                             =  37.80000 | ||
|  | Sample Standard Deviation               =  0.96437 | ||
|  | 
 | ||
|  | 
 | ||
|  | _______________________________________________________________ | ||
|  | Confidence       Estimated          Estimated | ||
|  |  Value (%)      Sample Size        Sample Size | ||
|  |               (one sided test)    (two sided test) | ||
|  | _______________________________________________________________ | ||
|  |     75.000               3               4 | ||
|  |     90.000               7               9 | ||
|  |     95.000              11              13 | ||
|  |     99.000              20              22 | ||
|  |     99.900              35              37 | ||
|  |     99.990              50              53 | ||
|  |     99.999              66              68 | ||
|  | '''] | ||
|  | 
 | ||
|  | So in this case, many more measurements would have had to be made, | ||
|  | for example at the 95% level, 14 measurements in total for a two-sided test. | ||
|  | 
 | ||
|  | [endsect] | ||
|  | [section:two_sample_students_t Comparing the means of two samples with the Students-t test] | ||
|  | 
 | ||
|  | Imagine that we have two samples, and we wish to determine whether | ||
|  | their means are different or not.  This situation often arises when | ||
|  | determining whether a new process or treatment is better than an old one. | ||
|  | 
 | ||
|  | In this example, we'll be using the | ||
|  | [@http://www.itl.nist.gov/div898/handbook/eda/section3/eda3531.htm | ||
|  | Car Mileage sample data] from the | ||
|  | [@http://www.itl.nist.gov NIST website].  The data compares | ||
|  | miles per gallon of US cars with miles per gallon of Japanese cars. | ||
|  | 
 | ||
|  | The sample code is in | ||
|  | [@../../example/students_t_two_samples.cpp students_t_two_samples.cpp]. | ||
|  | 
 | ||
|  | There are two ways in which this test can be conducted: we can assume | ||
|  | that the true standard deviations of the two samples are equal or not. | ||
|  | If the standard deviations are assumed to be equal, then the calculation | ||
|  | of the t-statistic is greatly simplified, so we'll examine that case first. | ||
|  | In real life we should verify whether this assumption is valid with a | ||
|  | Chi-Squared test for equal variances. | ||
|  | 
 | ||
|  | We begin by defining a procedure that will conduct our test assuming equal | ||
|  | variances: | ||
|  | 
 | ||
|  |    // Needed headers: | ||
|  |    #include <boost/math/distributions/students_t.hpp> | ||
|  |    #include <iostream> | ||
|  |    #include <iomanip> | ||
|  |    // Simplify usage: | ||
|  |    using namespace boost::math; | ||
|  |    using namespace std; | ||
|  | 
 | ||
|  |    void two_samples_t_test_equal_sd( | ||
|  |            double Sm1,       // Sm1 = Sample 1 Mean. | ||
|  |            double Sd1,       // Sd1 = Sample 1 Standard Deviation. | ||
|  |            unsigned Sn1,     // Sn1 = Sample 1 Size. | ||
|  |            double Sm2,       // Sm2 = Sample 2 Mean. | ||
|  |            double Sd2,       // Sd2 = Sample 2 Standard Deviation. | ||
|  |            unsigned Sn2,     // Sn2 = Sample 2 Size. | ||
|  |            double alpha)     // alpha = Significance Level. | ||
|  |    { | ||
|  | 
 | ||
|  | 
 | ||
|  | Our procedure will begin by calculating the t-statistic, assuming | ||
|  | equal variances the needed formulae are: | ||
|  | 
 | ||
|  | [equation dist_tutorial1] | ||
|  | 
 | ||
|  | where Sp is the "pooled" standard deviation of the two samples, | ||
|  | and /v/ is the number of degrees of freedom of the two combined | ||
|  | samples.  We can now write the code to calculate the t-statistic: | ||
|  | 
 | ||
|  |    // Degrees of freedom: | ||
|  |    double v = Sn1 + Sn2 - 2; | ||
|  |    cout << setw(55) << left << "Degrees of Freedom" << "=  " << v << "\n"; | ||
|  |    // Pooled variance: | ||
|  |    double sp = sqrt(((Sn1-1) * Sd1 * Sd1 + (Sn2-1) * Sd2 * Sd2) / v); | ||
|  |    cout << setw(55) << left << "Pooled Standard Deviation" << "=  " << sp << "\n"; | ||
|  |    // t-statistic: | ||
|  |    double t_stat = (Sm1 - Sm2) / (sp * sqrt(1.0 / Sn1 + 1.0 / Sn2)); | ||
|  |    cout << setw(55) << left << "T Statistic" << "=  " << t_stat << "\n"; | ||
|  | 
 | ||
|  | The next step is to define our distribution object, and calculate the | ||
|  | complement of the probability: | ||
|  | 
 | ||
|  |    students_t dist(v); | ||
|  |    double q = cdf(complement(dist, fabs(t_stat))); | ||
|  |    cout << setw(55) << left << "Probability that difference is due to chance" << "=  " | ||
|  |       << setprecision(3) << scientific << 2 * q << "\n\n"; | ||
|  | 
 | ||
|  | Here we've used the absolute value of the t-statistic, because we initially | ||
|  | want to know simply whether there is a difference or not (a two-sided test). | ||
|  | However, we can also test whether the mean of the second sample is greater | ||
|  | or is less (one-sided test) than that of the first: | ||
|  | all the possible tests are summed up in the following table: | ||
|  | 
 | ||
|  | [table | ||
|  | [[Hypothesis][Test]] | ||
|  | [[The Null-hypothesis: there is | ||
|  | *no difference* in means] | ||
|  | [Reject if complement of CDF for |t| < significance level / 2: | ||
|  | 
 | ||
|  | `cdf(complement(dist, fabs(t))) < alpha / 2`]] | ||
|  | 
 | ||
|  | [[The Alternative-hypothesis: there is a | ||
|  | *difference* in means] | ||
|  | [Reject if complement of CDF for |t| > significance level / 2: | ||
|  | 
 | ||
|  | `cdf(complement(dist, fabs(t))) < alpha / 2`]] | ||
|  | 
 | ||
|  | [[The Alternative-hypothesis: Sample 1 Mean is *less* than | ||
|  | Sample 2 Mean.] | ||
|  | [Reject if CDF of t > significance level: | ||
|  | 
 | ||
|  | `cdf(dist, t) > alpha`]] | ||
|  | 
 | ||
|  | [[The Alternative-hypothesis: Sample 1 Mean is *greater* than | ||
|  | Sample 2 Mean.] | ||
|  | 
 | ||
|  | [Reject if complement of CDF of t > significance level: | ||
|  | 
 | ||
|  | `cdf(complement(dist, t)) > alpha`]] | ||
|  | ] | ||
|  | 
 | ||
|  | [note | ||
|  | For a two-sided test we must compare against alpha / 2 and not alpha.] | ||
|  | 
 | ||
|  | Most of the rest of the sample program is pretty-printing, so we'll | ||
|  | skip over that, and take a look at the sample output for alpha=0.05 | ||
|  | (a 95% probability level).  For comparison the dataplot output | ||
|  | for the same data is in | ||
|  | [@http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm | ||
|  | section 1.3.5.3] of the __handbook. | ||
|  | 
 | ||
|  | [pre''' | ||
|  |    ________________________________________________ | ||
|  |    Student t test for two samples (equal variances) | ||
|  |    ________________________________________________ | ||
|  | 
 | ||
|  |    Number of Observations (Sample 1)                      =  249 | ||
|  |    Sample 1 Mean                                          =  20.145 | ||
|  |    Sample 1 Standard Deviation                            =  6.4147 | ||
|  |    Number of Observations (Sample 2)                      =  79 | ||
|  |    Sample 2 Mean                                          =  30.481 | ||
|  |    Sample 2 Standard Deviation                            =  6.1077 | ||
|  |    Degrees of Freedom                                     =  326 | ||
|  |    Pooled Standard Deviation                              =  6.3426 | ||
|  |    T Statistic                                            =  -12.621 | ||
|  |    Probability that difference is due to chance           =  5.273e-030 | ||
|  | 
 | ||
|  |    Results for Alternative Hypothesis and alpha           =  0.0500''' | ||
|  | 
 | ||
|  |    Alternative Hypothesis              Conclusion | ||
|  |    Sample 1 Mean != Sample 2 Mean       NOT REJECTED | ||
|  |    Sample 1 Mean <  Sample 2 Mean       NOT REJECTED | ||
|  |    Sample 1 Mean >  Sample 2 Mean       REJECTED | ||
|  | ] | ||
|  | 
 | ||
|  | So with a probability that the difference is due to chance of just | ||
|  | 5.273e-030, we can safely conclude that there is indeed a difference. | ||
|  | 
 | ||
|  | The tests on the alternative hypothesis show that we must | ||
|  | also reject the hypothesis that Sample 1 Mean is | ||
|  | greater than that for Sample 2: in this case Sample 1 represents the | ||
|  | miles per gallon for Japanese cars, and Sample 2 the miles per gallon for | ||
|  | US cars, so we conclude that Japanese cars are on average more | ||
|  | fuel efficient. | ||
|  | 
 | ||
|  | Now that we have the simple case out of the way, let's look for a moment | ||
|  | at the more complex one: that the standard deviations of the two samples | ||
|  | are not equal.  In this case the formula for the t-statistic becomes: | ||
|  | 
 | ||
|  | [equation dist_tutorial2] | ||
|  | 
 | ||
|  | And for the combined degrees of freedom we use the | ||
|  | [@http://en.wikipedia.org/wiki/Welch-Satterthwaite_equation Welch-Satterthwaite] | ||
|  | approximation: | ||
|  | 
 | ||
|  | [equation dist_tutorial3] | ||
|  | 
 | ||
|  | Note that this is one of the rare situations where the degrees-of-freedom | ||
|  | parameter to the Student's t distribution is a real number, and not an | ||
|  | integer value. | ||
|  | 
 | ||
|  | [note | ||
|  | Some statistical packages truncate the effective degrees of freedom to | ||
|  | an integer value: this may be necessary if you are relying on lookup tables, | ||
|  | but since our code fully supports non-integer degrees of freedom there is no | ||
|  | need to truncate in this case.  Also note that when the degrees of freedom | ||
|  | is small then the Welch-Satterthwaite approximation may be a significant | ||
|  | source of error.] | ||
|  | 
 | ||
|  | Putting these formulae into code we get: | ||
|  | 
 | ||
|  |    // Degrees of freedom: | ||
|  |    double v = Sd1 * Sd1 / Sn1 + Sd2 * Sd2 / Sn2; | ||
|  |    v *= v; | ||
|  |    double t1 = Sd1 * Sd1 / Sn1; | ||
|  |    t1 *= t1; | ||
|  |    t1 /=  (Sn1 - 1); | ||
|  |    double t2 = Sd2 * Sd2 / Sn2; | ||
|  |    t2 *= t2; | ||
|  |    t2 /= (Sn2 - 1); | ||
|  |    v /= (t1 + t2); | ||
|  |    cout << setw(55) << left << "Degrees of Freedom" << "=  " << v << "\n"; | ||
|  |    // t-statistic: | ||
|  |    double t_stat = (Sm1 - Sm2) / sqrt(Sd1 * Sd1 / Sn1 + Sd2 * Sd2 / Sn2); | ||
|  |    cout << setw(55) << left << "T Statistic" << "=  " << t_stat << "\n"; | ||
|  | 
 | ||
|  | Thereafter the code and the tests are performed the same as before.  Using | ||
|  | are car mileage data again, here's what the output looks like: | ||
|  | 
 | ||
|  | [pre''' | ||
|  |    __________________________________________________ | ||
|  |    Student t test for two samples (unequal variances) | ||
|  |    __________________________________________________ | ||
|  | 
 | ||
|  |    Number of Observations (Sample 1)                      =  249 | ||
|  |    Sample 1 Mean                                          =  20.145 | ||
|  |    Sample 1 Standard Deviation                            =  6.4147 | ||
|  |    Number of Observations (Sample 2)                      =  79 | ||
|  |    Sample 2 Mean                                          =  30.481 | ||
|  |    Sample 2 Standard Deviation                            =  6.1077 | ||
|  |    Degrees of Freedom                                     =  136.87 | ||
|  |    T Statistic                                            =  -12.946 | ||
|  |    Probability that difference is due to chance           =  1.571e-025 | ||
|  | 
 | ||
|  |    Results for Alternative Hypothesis and alpha           =  0.0500''' | ||
|  | 
 | ||
|  |    Alternative Hypothesis              Conclusion | ||
|  |    Sample 1 Mean != Sample 2 Mean       NOT REJECTED | ||
|  |    Sample 1 Mean <  Sample 2 Mean       NOT REJECTED | ||
|  |    Sample 1 Mean >  Sample 2 Mean       REJECTED | ||
|  | ] | ||
|  | 
 | ||
|  | This time allowing the variances in the two samples to differ has yielded | ||
|  | a higher likelihood that the observed difference is down to chance alone | ||
|  | (1.571e-025 compared to 5.273e-030 when equal variances were assumed). | ||
|  | However, the conclusion remains the same: US cars are less fuel efficient | ||
|  | than Japanese models. | ||
|  | 
 | ||
|  | [endsect] | ||
|  | [section:paired_st Comparing two paired samples with the Student's t distribution] | ||
|  | 
 | ||
|  | Imagine that we have a before and after reading for each item in the sample: | ||
|  | for example we might have measured blood pressure before and after administration | ||
|  | of a new drug.  We can't pool the results and compare the means before and after | ||
|  | the change, because each patient will have a different baseline reading. | ||
|  | Instead we calculate the difference between before and after measurements | ||
|  | in each patient, and calculate the mean and standard deviation of the differences. | ||
|  | To test whether a significant change has taken place, we can then test | ||
|  | the null-hypothesis that the true mean is zero using the same procedure | ||
|  | we used in the single sample cases previously discussed. | ||
|  | 
 | ||
|  | That means we can: | ||
|  | 
 | ||
|  | * [link math_toolkit.stat_tut.weg.st_eg.tut_mean_intervals Calculate confidence intervals of the mean]. | ||
|  | If the endpoints of the interval differ in sign then we are unable to reject | ||
|  | the null-hypothesis that there is no change. | ||
|  | * [link math_toolkit.stat_tut.weg.st_eg.tut_mean_test Test whether the true mean is zero]. If the | ||
|  | result is consistent with a true mean of zero, then we are unable to reject the | ||
|  | null-hypothesis that there is no change. | ||
|  | * [link math_toolkit.stat_tut.weg.st_eg.tut_mean_size Calculate how many pairs of readings we would need | ||
|  | in order to obtain a significant result]. | ||
|  | 
 | ||
|  | [endsect] | ||
|  | 
 | ||
|  | [endsect][/section:st_eg Student's t] | ||
|  | 
 | ||
|  | [/ | ||
|  |   Copyright 2006, 2012 John Maddock and Paul A. Bristow. | ||
|  |   Distributed under the Boost Software License, Version 1.0. | ||
|  |   (See accompanying file LICENSE_1_0.txt or copy at | ||
|  |   http://www.boost.org/LICENSE_1_0.txt). | ||
|  | ] | ||
|  | 
 |