mirror of
				https://github.com/saitohirga/WSJT-X.git
				synced 2025-11-04 05:50:31 -05:00 
			
		
		
		
	
		
			
	
	
		
			782 lines
		
	
	
		
			32 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
		
		
			
		
	
	
			782 lines
		
	
	
		
			32 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[section:st_eg Student's t Distribution Examples]
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[section:tut_mean_intervals Calculating confidence intervals on the mean with the Students-t distribution]
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Let's say you have a sample mean, you may wish to know what confidence intervals
							 | 
						||
| 
								 | 
							
								you can place on that mean.  Colloquially: "I want an interval that I can be
							 | 
						||
| 
								 | 
							
								P% sure contains the true mean".  (On a technical point, note that
							 | 
						||
| 
								 | 
							
								the interval either contains the true mean or it does not: the
							 | 
						||
| 
								 | 
							
								meaning of the confidence level is subtly
							 | 
						||
| 
								 | 
							
								different from this colloquialism.  More background information can be found on the
							 | 
						||
| 
								 | 
							
								[@http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm NIST site]).
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The formula for the interval can be expressed as:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[equation dist_tutorial4]
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Where, ['Y[sub s]] is the sample mean, /s/ is the sample standard deviation,
							 | 
						||
| 
								 | 
							
								/N/ is the sample size, /[alpha]/ is the desired significance level and
							 | 
						||
| 
								 | 
							
								['t[sub ([alpha]/2,N-1)]] is the upper critical value of the Students-t
							 | 
						||
| 
								 | 
							
								distribution with /N-1/ degrees of freedom.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[note
							 | 
						||
| 
								 | 
							
								The quantity [alpha][space] is the maximum acceptable risk of falsely rejecting
							 | 
						||
| 
								 | 
							
								the null-hypothesis.  The smaller the value of [alpha] the greater the
							 | 
						||
| 
								 | 
							
								strength of the test.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The confidence level of the test is defined as 1 - [alpha], and often expressed
							 | 
						||
| 
								 | 
							
								as a percentage.  So for example a significance level of 0.05, is equivalent
							 | 
						||
| 
								 | 
							
								to a 95% confidence level.  Refer to
							 | 
						||
| 
								 | 
							
								[@http://www.itl.nist.gov/div898/handbook/prc/section1/prc14.htm
							 | 
						||
| 
								 | 
							
								"What are confidence intervals?"] in __handbook for more information.
							 | 
						||
| 
								 | 
							
								] [/ Note]
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[note
							 | 
						||
| 
								 | 
							
								The usual assumptions of
							 | 
						||
| 
								 | 
							
								[@http://en.wikipedia.org/wiki/Independent_and_identically-distributed_random_variables independent and identically distributed (i.i.d.)]
							 | 
						||
| 
								 | 
							
								variables and [@http://en.wikipedia.org/wiki/Normal_distribution normal distribution]
							 | 
						||
| 
								 | 
							
								of course apply here, as they do in other examples.
							 | 
						||
| 
								 | 
							
								]
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								From the formula, it should be clear that:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								* The width of the confidence interval decreases as the sample size increases.
							 | 
						||
| 
								 | 
							
								* The width increases as the standard deviation increases.
							 | 
						||
| 
								 | 
							
								* The width increases as the ['confidence level increases] (0.5 towards 0.99999 - stronger).
							 | 
						||
| 
								 | 
							
								* The width increases as the ['significance level decreases] (0.5 towards 0.00000...01 - stronger).
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The following example code is taken from the example program
							 | 
						||
| 
								 | 
							
								[@../../example/students_t_single_sample.cpp students_t_single_sample.cpp].
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								We'll begin by defining a procedure to calculate intervals for
							 | 
						||
| 
								 | 
							
								various confidence levels; the procedure will print these out
							 | 
						||
| 
								 | 
							
								as a table:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								   // Needed includes:
							 | 
						||
| 
								 | 
							
								   #include <boost/math/distributions/students_t.hpp>
							 | 
						||
| 
								 | 
							
								   #include <iostream>
							 | 
						||
| 
								 | 
							
								   #include <iomanip>
							 | 
						||
| 
								 | 
							
								   // Bring everything into global namespace for ease of use:
							 | 
						||
| 
								 | 
							
								   using namespace boost::math;
							 | 
						||
| 
								 | 
							
								   using namespace std;
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								   void confidence_limits_on_mean(
							 | 
						||
| 
								 | 
							
								      double Sm,           // Sm = Sample Mean.
							 | 
						||
| 
								 | 
							
								      double Sd,           // Sd = Sample Standard Deviation.
							 | 
						||
| 
								 | 
							
								      unsigned Sn)         // Sn = Sample Size.
							 | 
						||
| 
								 | 
							
								   {
							 | 
						||
| 
								 | 
							
								      using namespace std;
							 | 
						||
| 
								 | 
							
								      using namespace boost::math;
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								      // Print out general info:
							 | 
						||
| 
								 | 
							
								      cout <<
							 | 
						||
| 
								 | 
							
								         "__________________________________\n"
							 | 
						||
| 
								 | 
							
								         "2-Sided Confidence Limits For Mean\n"
							 | 
						||
| 
								 | 
							
								         "__________________________________\n\n";
							 | 
						||
| 
								 | 
							
								      cout << setprecision(7);
							 | 
						||
| 
								 | 
							
								      cout << setw(40) << left << "Number of Observations" << "=  " << Sn << "\n";
							 | 
						||
| 
								 | 
							
								      cout << setw(40) << left << "Mean" << "=  " << Sm << "\n";
							 | 
						||
| 
								 | 
							
								      cout << setw(40) << left << "Standard Deviation" << "=  " << Sd << "\n";
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								We'll define a table of significance/risk levels for which we'll compute intervals:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								      double alpha[] = { 0.5, 0.25, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001 };
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Note that these are the complements of the confidence/probability levels: 0.5, 0.75, 0.9 .. 0.99999).
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Next we'll declare the distribution object we'll need, note that
							 | 
						||
| 
								 | 
							
								the /degrees of freedom/ parameter is the sample size less one:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								      students_t dist(Sn - 1);
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Most of what follows in the program is pretty printing, so let's focus
							 | 
						||
| 
								 | 
							
								on the calculation of the interval. First we need the t-statistic,
							 | 
						||
| 
								 | 
							
								computed using the /quantile/ function and our significance level.  Note
							 | 
						||
| 
								 | 
							
								that since the significance levels are the complement of the probability,
							 | 
						||
| 
								 | 
							
								we have to wrap the arguments in a call to /complement(...)/:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								   double T = quantile(complement(dist, alpha[i] / 2));
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Note that alpha was divided by two, since we'll be calculating
							 | 
						||
| 
								 | 
							
								both the upper and lower bounds: had we been interested in a single
							 | 
						||
| 
								 | 
							
								sided interval then we would have omitted this step.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Now to complete the picture, we'll get the (one-sided) width of the
							 | 
						||
| 
								 | 
							
								interval from the t-statistic
							 | 
						||
| 
								 | 
							
								by multiplying by the standard deviation, and dividing by the square
							 | 
						||
| 
								 | 
							
								root of the sample size:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								   double w = T * Sd / sqrt(double(Sn));
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The two-sided interval is then the sample mean plus and minus this width.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								And apart from some more pretty-printing that completes the procedure.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Let's take a look at some sample output, first using the
							 | 
						||
| 
								 | 
							
								[@http://www.itl.nist.gov/div898/handbook/eda/section4/eda428.htm
							 | 
						||
| 
								 | 
							
								Heat flow data] from the NIST site.  The data set was collected
							 | 
						||
| 
								 | 
							
								by Bob Zarr of NIST in January, 1990 from a heat flow meter
							 | 
						||
| 
								 | 
							
								calibration and stability analysis.
							 | 
						||
| 
								 | 
							
								The corresponding dataplot
							 | 
						||
| 
								 | 
							
								output for this test can be found in
							 | 
						||
| 
								 | 
							
								[@http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm
							 | 
						||
| 
								 | 
							
								section 3.5.2] of the __handbook.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[pre'''
							 | 
						||
| 
								 | 
							
								   __________________________________
							 | 
						||
| 
								 | 
							
								   2-Sided Confidence Limits For Mean
							 | 
						||
| 
								 | 
							
								   __________________________________
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								   Number of Observations                  =  195
							 | 
						||
| 
								 | 
							
								   Mean                                    =  9.26146
							 | 
						||
| 
								 | 
							
								   Standard Deviation                      =  0.02278881
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								   ___________________________________________________________________
							 | 
						||
| 
								 | 
							
								   Confidence       T           Interval          Lower          Upper
							 | 
						||
| 
								 | 
							
								    Value (%)     Value          Width            Limit          Limit
							 | 
						||
| 
								 | 
							
								   ___________________________________________________________________
							 | 
						||
| 
								 | 
							
								       50.000     0.676       1.103e-003        9.26036        9.26256
							 | 
						||
| 
								 | 
							
								       75.000     1.154       1.883e-003        9.25958        9.26334
							 | 
						||
| 
								 | 
							
								       90.000     1.653       2.697e-003        9.25876        9.26416
							 | 
						||
| 
								 | 
							
								       95.000     1.972       3.219e-003        9.25824        9.26468
							 | 
						||
| 
								 | 
							
								       99.000     2.601       4.245e-003        9.25721        9.26571
							 | 
						||
| 
								 | 
							
								       99.900     3.341       5.453e-003        9.25601        9.26691
							 | 
						||
| 
								 | 
							
								       99.990     3.973       6.484e-003        9.25498        9.26794
							 | 
						||
| 
								 | 
							
								       99.999     4.537       7.404e-003        9.25406        9.26886
							 | 
						||
| 
								 | 
							
								''']
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								As you can see the large sample size (195) and small standard deviation (0.023)
							 | 
						||
| 
								 | 
							
								have combined to give very small intervals, indeed we can be
							 | 
						||
| 
								 | 
							
								very confident that the true mean is 9.2.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								For comparison the next example data output is taken from
							 | 
						||
| 
								 | 
							
								['P.K.Hou, O. W. Lau & M.C. Wong, Analyst (1983) vol. 108, p 64.
							 | 
						||
| 
								 | 
							
								and from Statistics for Analytical Chemistry, 3rd ed. (1994), pp 54-55
							 | 
						||
| 
								 | 
							
								J. C. Miller and J. N. Miller, Ellis Horwood ISBN 0 13 0309907.]
							 | 
						||
| 
								 | 
							
								The values result from the determination of mercury by cold-vapour
							 | 
						||
| 
								 | 
							
								atomic absorption.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[pre'''
							 | 
						||
| 
								 | 
							
								   __________________________________
							 | 
						||
| 
								 | 
							
								   2-Sided Confidence Limits For Mean
							 | 
						||
| 
								 | 
							
								   __________________________________
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								   Number of Observations                  =  3
							 | 
						||
| 
								 | 
							
								   Mean                                    =  37.8000000
							 | 
						||
| 
								 | 
							
								   Standard Deviation                      =  0.9643650
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								   ___________________________________________________________________
							 | 
						||
| 
								 | 
							
								   Confidence       T           Interval          Lower          Upper
							 | 
						||
| 
								 | 
							
								    Value (%)     Value          Width            Limit          Limit
							 | 
						||
| 
								 | 
							
								   ___________________________________________________________________
							 | 
						||
| 
								 | 
							
								       50.000     0.816            0.455       37.34539       38.25461
							 | 
						||
| 
								 | 
							
								       75.000     1.604            0.893       36.90717       38.69283
							 | 
						||
| 
								 | 
							
								       90.000     2.920            1.626       36.17422       39.42578
							 | 
						||
| 
								 | 
							
								       95.000     4.303            2.396       35.40438       40.19562
							 | 
						||
| 
								 | 
							
								       99.000     9.925            5.526       32.27408       43.32592
							 | 
						||
| 
								 | 
							
								       99.900    31.599           17.594       20.20639       55.39361
							 | 
						||
| 
								 | 
							
								       99.990    99.992           55.673      -17.87346       93.47346
							 | 
						||
| 
								 | 
							
								       99.999   316.225          176.067     -138.26683      213.86683
							 | 
						||
| 
								 | 
							
								''']
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								This time the fact that there are only three measurements leads to
							 | 
						||
| 
								 | 
							
								much wider intervals, indeed such large intervals that it's hard
							 | 
						||
| 
								 | 
							
								to be very confident in the location of the mean.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[endsect]
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[section:tut_mean_test Testing a sample mean for difference from a "true" mean]
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								When calibrating or comparing a scientific instrument or measurement method of some kind,
							 | 
						||
| 
								 | 
							
								we want to be answer the question "Does an observed sample mean differ from the
							 | 
						||
| 
								 | 
							
								"true" mean in any significant way?".  If it does, then we have evidence of
							 | 
						||
| 
								 | 
							
								a systematic difference.  This question can be answered with a Students-t test:
							 | 
						||
| 
								 | 
							
								more information can be found
							 | 
						||
| 
								 | 
							
								[@http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm
							 | 
						||
| 
								 | 
							
								on the NIST site].
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Of course, the assignment of "true" to one mean may be quite arbitrary,
							 | 
						||
| 
								 | 
							
								often this is simply a "traditional" method of measurement.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The following example code is taken from the example program
							 | 
						||
| 
								 | 
							
								[@../../example/students_t_single_sample.cpp students_t_single_sample.cpp].
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								We'll begin by defining a procedure to determine which of the
							 | 
						||
| 
								 | 
							
								possible hypothesis are rejected or not-rejected
							 | 
						||
| 
								 | 
							
								at a given significance level:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[note
							 | 
						||
| 
								 | 
							
								Non-statisticians might say 'not-rejected' means 'accepted',
							 | 
						||
| 
								 | 
							
								(often of the null-hypothesis) implying, wrongly, that there really *IS* no difference,
							 | 
						||
| 
								 | 
							
								but statisticans eschew this to avoid implying that there is positive evidence of 'no difference'.
							 | 
						||
| 
								 | 
							
								'Not-rejected' here means there is *no evidence* of difference, but there still might well be a difference.
							 | 
						||
| 
								 | 
							
								For example, see [@http://en.wikipedia.org/wiki/Argument_from_ignorance argument from ignorance] and
							 | 
						||
| 
								 | 
							
								[@http://www.bmj.com/cgi/content/full/311/7003/485 Absence of evidence does not constitute evidence of absence.]
							 | 
						||
| 
								 | 
							
								] [/ note]
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								   // Needed includes:
							 | 
						||
| 
								 | 
							
								   #include <boost/math/distributions/students_t.hpp>
							 | 
						||
| 
								 | 
							
								   #include <iostream>
							 | 
						||
| 
								 | 
							
								   #include <iomanip>
							 | 
						||
| 
								 | 
							
								   // Bring everything into global namespace for ease of use:
							 | 
						||
| 
								 | 
							
								   using namespace boost::math;
							 | 
						||
| 
								 | 
							
								   using namespace std;
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								   void single_sample_t_test(double M, double Sm, double Sd, unsigned Sn, double alpha)
							 | 
						||
| 
								 | 
							
								   {
							 | 
						||
| 
								 | 
							
								      //
							 | 
						||
| 
								 | 
							
								      // M = true mean.
							 | 
						||
| 
								 | 
							
								      // Sm = Sample Mean.
							 | 
						||
| 
								 | 
							
								      // Sd = Sample Standard Deviation.
							 | 
						||
| 
								 | 
							
								      // Sn = Sample Size.
							 | 
						||
| 
								 | 
							
								      // alpha = Significance Level.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Most of the procedure is pretty-printing, so let's just focus on the
							 | 
						||
| 
								 | 
							
								calculation, we begin by calculating the t-statistic:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								   // Difference in means:
							 | 
						||
| 
								 | 
							
								   double diff = Sm - M;
							 | 
						||
| 
								 | 
							
								   // Degrees of freedom:
							 | 
						||
| 
								 | 
							
								   unsigned v = Sn - 1;
							 | 
						||
| 
								 | 
							
								   // t-statistic:
							 | 
						||
| 
								 | 
							
								   double t_stat = diff * sqrt(double(Sn)) / Sd;
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Finally calculate the probability from the t-statistic. If we're interested
							 | 
						||
| 
								 | 
							
								in simply whether there is a difference (either less or greater) or not,
							 | 
						||
| 
								 | 
							
								we don't care about the sign of the t-statistic,
							 | 
						||
| 
								 | 
							
								and we take the complement of the probability for comparison
							 | 
						||
| 
								 | 
							
								to the significance level:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								   students_t dist(v);
							 | 
						||
| 
								 | 
							
								   double q = cdf(complement(dist, fabs(t_stat)));
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The procedure then prints out the results of the various tests
							 | 
						||
| 
								 | 
							
								that can be done, these
							 | 
						||
| 
								 | 
							
								can be summarised in the following table:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[table
							 | 
						||
| 
								 | 
							
								[[Hypothesis][Test]]
							 | 
						||
| 
								 | 
							
								[[The Null-hypothesis: there is
							 | 
						||
| 
								 | 
							
								*no difference* in means]
							 | 
						||
| 
								 | 
							
								[Reject if complement of CDF for |t| < significance level / 2:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								`cdf(complement(dist, fabs(t))) < alpha / 2`]]
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[[The Alternative-hypothesis: there
							 | 
						||
| 
								 | 
							
								*is difference* in means]
							 | 
						||
| 
								 | 
							
								[Reject if complement of CDF for |t| > significance level / 2:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								`cdf(complement(dist, fabs(t))) > alpha / 2`]]
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[[The Alternative-hypothesis: the sample mean *is less* than
							 | 
						||
| 
								 | 
							
								the true mean.]
							 | 
						||
| 
								 | 
							
								[Reject if CDF of t > 1 - significance level:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								`cdf(complement(dist, t)) < alpha`]]
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[[The Alternative-hypothesis: the sample mean *is greater* than
							 | 
						||
| 
								 | 
							
								the true mean.]
							 | 
						||
| 
								 | 
							
								[Reject if complement of CDF of t < significance level:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								`cdf(dist, t) < alpha`]]
							 | 
						||
| 
								 | 
							
								]
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[note
							 | 
						||
| 
								 | 
							
								Notice that the comparisons are against `alpha / 2` for a two-sided test
							 | 
						||
| 
								 | 
							
								and against `alpha` for a one-sided test]
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Now that we have all the parts in place, let's take a look at some
							 | 
						||
| 
								 | 
							
								sample output, first using the
							 | 
						||
| 
								 | 
							
								[@http://www.itl.nist.gov/div898/handbook/eda/section4/eda428.htm
							 | 
						||
| 
								 | 
							
								Heat flow data] from the NIST site.  The data set was collected
							 | 
						||
| 
								 | 
							
								by Bob Zarr of NIST in January, 1990 from a heat flow meter
							 | 
						||
| 
								 | 
							
								calibration and stability analysis.  The corresponding dataplot
							 | 
						||
| 
								 | 
							
								output for this test can be found in
							 | 
						||
| 
								 | 
							
								[@http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm
							 | 
						||
| 
								 | 
							
								section 3.5.2] of the __handbook.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[pre
							 | 
						||
| 
								 | 
							
								__________________________________
							 | 
						||
| 
								 | 
							
								Student t test for a single sample
							 | 
						||
| 
								 | 
							
								__________________________________
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Number of Observations                                 =  195
							 | 
						||
| 
								 | 
							
								Sample Mean                                            =  9.26146
							 | 
						||
| 
								 | 
							
								Sample Standard Deviation                              =  0.02279
							 | 
						||
| 
								 | 
							
								Expected True Mean                                     =  5.00000
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Sample Mean - Expected Test Mean                       =  4.26146
							 | 
						||
| 
								 | 
							
								Degrees of Freedom                                     =  194
							 | 
						||
| 
								 | 
							
								T Statistic                                            =  2611.28380
							 | 
						||
| 
								 | 
							
								Probability that difference is due to chance           =  0.000e+000
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Results for Alternative Hypothesis and alpha           =  0.0500
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Alternative Hypothesis     Conclusion
							 | 
						||
| 
								 | 
							
								Mean != 5.000            NOT REJECTED
							 | 
						||
| 
								 | 
							
								Mean  < 5.000            REJECTED
							 | 
						||
| 
								 | 
							
								Mean  > 5.000            NOT REJECTED
							 | 
						||
| 
								 | 
							
								]
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								You will note the line that says the probability that the difference is
							 | 
						||
| 
								 | 
							
								due to chance is zero.  From a philosophical point of view, of course,
							 | 
						||
| 
								 | 
							
								the probability can never reach zero.  However, in this case the calculated
							 | 
						||
| 
								 | 
							
								probability is smaller than the smallest representable double precision number,
							 | 
						||
| 
								 | 
							
								hence the appearance of a zero here.  Whatever its "true" value is, we know it
							 | 
						||
| 
								 | 
							
								must be extraordinarily small, so the alternative hypothesis - that there is
							 | 
						||
| 
								 | 
							
								a difference in means - is not rejected.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								For comparison the next example data output is taken from
							 | 
						||
| 
								 | 
							
								['P.K.Hou, O. W. Lau & M.C. Wong, Analyst (1983) vol. 108, p 64.
							 | 
						||
| 
								 | 
							
								and from Statistics for Analytical Chemistry, 3rd ed. (1994), pp 54-55
							 | 
						||
| 
								 | 
							
								J. C. Miller and J. N. Miller, Ellis Horwood ISBN 0 13 0309907.]
							 | 
						||
| 
								 | 
							
								The values result from the determination of mercury by cold-vapour
							 | 
						||
| 
								 | 
							
								atomic absorption.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[pre
							 | 
						||
| 
								 | 
							
								__________________________________
							 | 
						||
| 
								 | 
							
								Student t test for a single sample
							 | 
						||
| 
								 | 
							
								__________________________________
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Number of Observations                                 =  3
							 | 
						||
| 
								 | 
							
								Sample Mean                                            =  37.80000
							 | 
						||
| 
								 | 
							
								Sample Standard Deviation                              =  0.96437
							 | 
						||
| 
								 | 
							
								Expected True Mean                                     =  38.90000
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Sample Mean - Expected Test Mean                       =  -1.10000
							 | 
						||
| 
								 | 
							
								Degrees of Freedom                                     =  2
							 | 
						||
| 
								 | 
							
								T Statistic                                            =  -1.97566
							 | 
						||
| 
								 | 
							
								Probability that difference is due to chance           =  1.869e-001
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Results for Alternative Hypothesis and alpha           =  0.0500
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Alternative Hypothesis     Conclusion
							 | 
						||
| 
								 | 
							
								Mean != 38.900            REJECTED
							 | 
						||
| 
								 | 
							
								Mean  < 38.900            NOT REJECTED
							 | 
						||
| 
								 | 
							
								Mean  > 38.900            NOT REJECTED
							 | 
						||
| 
								 | 
							
								]
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								As you can see the small number of measurements (3) has led to a large uncertainty
							 | 
						||
| 
								 | 
							
								in the location of the true mean.  So even though there appears to be a difference
							 | 
						||
| 
								 | 
							
								between the sample mean and the expected true mean, we conclude that there
							 | 
						||
| 
								 | 
							
								is no significant difference, and are unable to reject the null hypothesis.
							 | 
						||
| 
								 | 
							
								However, if we were to lower the bar for acceptance down to alpha = 0.1
							 | 
						||
| 
								 | 
							
								(a 90% confidence level) we see a different output:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[pre
							 | 
						||
| 
								 | 
							
								__________________________________
							 | 
						||
| 
								 | 
							
								Student t test for a single sample
							 | 
						||
| 
								 | 
							
								__________________________________
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Number of Observations                                 =  3
							 | 
						||
| 
								 | 
							
								Sample Mean                                            =  37.80000
							 | 
						||
| 
								 | 
							
								Sample Standard Deviation                              =  0.96437
							 | 
						||
| 
								 | 
							
								Expected True Mean                                     =  38.90000
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Sample Mean - Expected Test Mean                       =  -1.10000
							 | 
						||
| 
								 | 
							
								Degrees of Freedom                                     =  2
							 | 
						||
| 
								 | 
							
								T Statistic                                            =  -1.97566
							 | 
						||
| 
								 | 
							
								Probability that difference is due to chance           =  1.869e-001
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Results for Alternative Hypothesis and alpha           =  0.1000
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Alternative Hypothesis     Conclusion
							 | 
						||
| 
								 | 
							
								Mean != 38.900            REJECTED
							 | 
						||
| 
								 | 
							
								Mean  < 38.900            NOT REJECTED
							 | 
						||
| 
								 | 
							
								Mean  > 38.900            REJECTED
							 | 
						||
| 
								 | 
							
								]
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								In this case, we really have a borderline result,
							 | 
						||
| 
								 | 
							
								and more data (and/or more accurate data),
							 | 
						||
| 
								 | 
							
								is needed for a more convincing conclusion.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[endsect]
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[section:tut_mean_size Estimating how large a sample size would have to become
							 | 
						||
| 
								 | 
							
								in order to give a significant Students-t test result with a single sample test]
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Imagine you have conducted a Students-t test on a single sample in order
							 | 
						||
| 
								 | 
							
								to check for systematic errors in your measurements.  Imagine that the
							 | 
						||
| 
								 | 
							
								result is borderline.  At this point one might go off and collect more data,
							 | 
						||
| 
								 | 
							
								but it might be prudent to first ask the question "How much more?".
							 | 
						||
| 
								 | 
							
								The parameter estimators of the students_t_distribution class
							 | 
						||
| 
								 | 
							
								can provide this information.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								This section is based on the example code in
							 | 
						||
| 
								 | 
							
								[@../../example/students_t_single_sample.cpp students_t_single_sample.cpp]
							 | 
						||
| 
								 | 
							
								and we begin by defining a procedure that will print out a table of
							 | 
						||
| 
								 | 
							
								estimated sample sizes for various confidence levels:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								   // Needed includes:
							 | 
						||
| 
								 | 
							
								   #include <boost/math/distributions/students_t.hpp>
							 | 
						||
| 
								 | 
							
								   #include <iostream>
							 | 
						||
| 
								 | 
							
								   #include <iomanip>
							 | 
						||
| 
								 | 
							
								   // Bring everything into global namespace for ease of use:
							 | 
						||
| 
								 | 
							
								   using namespace boost::math;
							 | 
						||
| 
								 | 
							
								   using namespace std;
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								   void single_sample_find_df(
							 | 
						||
| 
								 | 
							
								      double M,          // M = true mean.
							 | 
						||
| 
								 | 
							
								      double Sm,         // Sm = Sample Mean.
							 | 
						||
| 
								 | 
							
								      double Sd)         // Sd = Sample Standard Deviation.
							 | 
						||
| 
								 | 
							
								   {
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Next we define a table of significance levels:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								      double alpha[] = { 0.5, 0.25, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001 };
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Printing out the table of sample sizes required for various confidence levels
							 | 
						||
| 
								 | 
							
								begins with the table header:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								      cout << "\n\n"
							 | 
						||
| 
								 | 
							
								              "_______________________________________________________________\n"
							 | 
						||
| 
								 | 
							
								              "Confidence       Estimated          Estimated\n"
							 | 
						||
| 
								 | 
							
								              " Value (%)      Sample Size        Sample Size\n"
							 | 
						||
| 
								 | 
							
								              "              (one sided test)    (two sided test)\n"
							 | 
						||
| 
								 | 
							
								              "_______________________________________________________________\n";
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								And now the important part: the sample sizes required.  Class
							 | 
						||
| 
								 | 
							
								`students_t_distribution` has a static member function
							 | 
						||
| 
								 | 
							
								`find_degrees_of_freedom` that will calculate how large
							 | 
						||
| 
								 | 
							
								a sample size needs to be in order to give a definitive result.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The first argument is the difference between the means that you
							 | 
						||
| 
								 | 
							
								wish to be able to detect, here it's the absolute value of the
							 | 
						||
| 
								 | 
							
								difference between the sample mean, and the true mean.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Then come two probability values: alpha and beta.  Alpha is the
							 | 
						||
| 
								 | 
							
								maximum acceptable risk of rejecting the null-hypothesis when it is
							 | 
						||
| 
								 | 
							
								in fact true.  Beta is the maximum acceptable risk of failing to reject
							 | 
						||
| 
								 | 
							
								the null-hypothesis when in fact it is false.
							 | 
						||
| 
								 | 
							
								Also note that for a two-sided test, alpha must be divided by 2.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The final parameter of the function is the standard deviation of the sample.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								In this example, we assume that alpha and beta are the same, and call
							 | 
						||
| 
								 | 
							
								`find_degrees_of_freedom` twice: once with alpha for a one-sided test,
							 | 
						||
| 
								 | 
							
								and once with alpha/2 for a two-sided test.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								      for(unsigned i = 0; i < sizeof(alpha)/sizeof(alpha[0]); ++i)
							 | 
						||
| 
								 | 
							
								      {
							 | 
						||
| 
								 | 
							
								         // Confidence value:
							 | 
						||
| 
								 | 
							
								         cout << fixed << setprecision(3) << setw(10) << right << 100 * (1-alpha[i]);
							 | 
						||
| 
								 | 
							
								         // calculate df for single sided test:
							 | 
						||
| 
								 | 
							
								         double df = students_t::find_degrees_of_freedom(
							 | 
						||
| 
								 | 
							
								            fabs(M - Sm), alpha[i], alpha[i], Sd);
							 | 
						||
| 
								 | 
							
								         // convert to sample size:
							 | 
						||
| 
								 | 
							
								         double size = ceil(df) + 1;
							 | 
						||
| 
								 | 
							
								         // Print size:
							 | 
						||
| 
								 | 
							
								         cout << fixed << setprecision(0) << setw(16) << right << size;
							 | 
						||
| 
								 | 
							
								         // calculate df for two sided test:
							 | 
						||
| 
								 | 
							
								         df = students_t::find_degrees_of_freedom(
							 | 
						||
| 
								 | 
							
								            fabs(M - Sm), alpha[i]/2, alpha[i], Sd);
							 | 
						||
| 
								 | 
							
								         // convert to sample size:
							 | 
						||
| 
								 | 
							
								         size = ceil(df) + 1;
							 | 
						||
| 
								 | 
							
								         // Print size:
							 | 
						||
| 
								 | 
							
								         cout << fixed << setprecision(0) << setw(16) << right << size << endl;
							 | 
						||
| 
								 | 
							
								      }
							 | 
						||
| 
								 | 
							
								      cout << endl;
							 | 
						||
| 
								 | 
							
								   }
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Let's now look at some sample output using data taken from
							 | 
						||
| 
								 | 
							
								['P.K.Hou, O. W. Lau & M.C. Wong, Analyst (1983) vol. 108, p 64.
							 | 
						||
| 
								 | 
							
								and from Statistics for Analytical Chemistry, 3rd ed. (1994), pp 54-55
							 | 
						||
| 
								 | 
							
								J. C. Miller and J. N. Miller, Ellis Horwood ISBN 0 13 0309907.]
							 | 
						||
| 
								 | 
							
								The values result from the determination of mercury by cold-vapour
							 | 
						||
| 
								 | 
							
								atomic absorption.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Only three measurements were made, and the Students-t test above
							 | 
						||
| 
								 | 
							
								gave a borderline result, so this example
							 | 
						||
| 
								 | 
							
								will show us how many samples would need to be collected:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[pre'''
							 | 
						||
| 
								 | 
							
								_____________________________________________________________
							 | 
						||
| 
								 | 
							
								Estimated sample sizes required for various confidence levels
							 | 
						||
| 
								 | 
							
								_____________________________________________________________
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								True Mean                               =  38.90000
							 | 
						||
| 
								 | 
							
								Sample Mean                             =  37.80000
							 | 
						||
| 
								 | 
							
								Sample Standard Deviation               =  0.96437
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								_______________________________________________________________
							 | 
						||
| 
								 | 
							
								Confidence       Estimated          Estimated
							 | 
						||
| 
								 | 
							
								 Value (%)      Sample Size        Sample Size
							 | 
						||
| 
								 | 
							
								              (one sided test)    (two sided test)
							 | 
						||
| 
								 | 
							
								_______________________________________________________________
							 | 
						||
| 
								 | 
							
								    75.000               3               4
							 | 
						||
| 
								 | 
							
								    90.000               7               9
							 | 
						||
| 
								 | 
							
								    95.000              11              13
							 | 
						||
| 
								 | 
							
								    99.000              20              22
							 | 
						||
| 
								 | 
							
								    99.900              35              37
							 | 
						||
| 
								 | 
							
								    99.990              50              53
							 | 
						||
| 
								 | 
							
								    99.999              66              68
							 | 
						||
| 
								 | 
							
								''']
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								So in this case, many more measurements would have had to be made,
							 | 
						||
| 
								 | 
							
								for example at the 95% level, 14 measurements in total for a two-sided test.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[endsect]
							 | 
						||
| 
								 | 
							
								[section:two_sample_students_t Comparing the means of two samples with the Students-t test]
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Imagine that we have two samples, and we wish to determine whether
							 | 
						||
| 
								 | 
							
								their means are different or not.  This situation often arises when
							 | 
						||
| 
								 | 
							
								determining whether a new process or treatment is better than an old one.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								In this example, we'll be using the
							 | 
						||
| 
								 | 
							
								[@http://www.itl.nist.gov/div898/handbook/eda/section3/eda3531.htm
							 | 
						||
| 
								 | 
							
								Car Mileage sample data] from the
							 | 
						||
| 
								 | 
							
								[@http://www.itl.nist.gov NIST website].  The data compares
							 | 
						||
| 
								 | 
							
								miles per gallon of US cars with miles per gallon of Japanese cars.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The sample code is in
							 | 
						||
| 
								 | 
							
								[@../../example/students_t_two_samples.cpp students_t_two_samples.cpp].
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								There are two ways in which this test can be conducted: we can assume
							 | 
						||
| 
								 | 
							
								that the true standard deviations of the two samples are equal or not.
							 | 
						||
| 
								 | 
							
								If the standard deviations are assumed to be equal, then the calculation
							 | 
						||
| 
								 | 
							
								of the t-statistic is greatly simplified, so we'll examine that case first.
							 | 
						||
| 
								 | 
							
								In real life we should verify whether this assumption is valid with a
							 | 
						||
| 
								 | 
							
								Chi-Squared test for equal variances.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								We begin by defining a procedure that will conduct our test assuming equal
							 | 
						||
| 
								 | 
							
								variances:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								   // Needed headers:
							 | 
						||
| 
								 | 
							
								   #include <boost/math/distributions/students_t.hpp>
							 | 
						||
| 
								 | 
							
								   #include <iostream>
							 | 
						||
| 
								 | 
							
								   #include <iomanip>
							 | 
						||
| 
								 | 
							
								   // Simplify usage:
							 | 
						||
| 
								 | 
							
								   using namespace boost::math;
							 | 
						||
| 
								 | 
							
								   using namespace std;
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								   void two_samples_t_test_equal_sd(
							 | 
						||
| 
								 | 
							
								           double Sm1,       // Sm1 = Sample 1 Mean.
							 | 
						||
| 
								 | 
							
								           double Sd1,       // Sd1 = Sample 1 Standard Deviation.
							 | 
						||
| 
								 | 
							
								           unsigned Sn1,     // Sn1 = Sample 1 Size.
							 | 
						||
| 
								 | 
							
								           double Sm2,       // Sm2 = Sample 2 Mean.
							 | 
						||
| 
								 | 
							
								           double Sd2,       // Sd2 = Sample 2 Standard Deviation.
							 | 
						||
| 
								 | 
							
								           unsigned Sn2,     // Sn2 = Sample 2 Size.
							 | 
						||
| 
								 | 
							
								           double alpha)     // alpha = Significance Level.
							 | 
						||
| 
								 | 
							
								   {
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Our procedure will begin by calculating the t-statistic, assuming
							 | 
						||
| 
								 | 
							
								equal variances the needed formulae are:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[equation dist_tutorial1]
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								where Sp is the "pooled" standard deviation of the two samples,
							 | 
						||
| 
								 | 
							
								and /v/ is the number of degrees of freedom of the two combined
							 | 
						||
| 
								 | 
							
								samples.  We can now write the code to calculate the t-statistic:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								   // Degrees of freedom:
							 | 
						||
| 
								 | 
							
								   double v = Sn1 + Sn2 - 2;
							 | 
						||
| 
								 | 
							
								   cout << setw(55) << left << "Degrees of Freedom" << "=  " << v << "\n";
							 | 
						||
| 
								 | 
							
								   // Pooled variance:
							 | 
						||
| 
								 | 
							
								   double sp = sqrt(((Sn1-1) * Sd1 * Sd1 + (Sn2-1) * Sd2 * Sd2) / v);
							 | 
						||
| 
								 | 
							
								   cout << setw(55) << left << "Pooled Standard Deviation" << "=  " << sp << "\n";
							 | 
						||
| 
								 | 
							
								   // t-statistic:
							 | 
						||
| 
								 | 
							
								   double t_stat = (Sm1 - Sm2) / (sp * sqrt(1.0 / Sn1 + 1.0 / Sn2));
							 | 
						||
| 
								 | 
							
								   cout << setw(55) << left << "T Statistic" << "=  " << t_stat << "\n";
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The next step is to define our distribution object, and calculate the
							 | 
						||
| 
								 | 
							
								complement of the probability:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								   students_t dist(v);
							 | 
						||
| 
								 | 
							
								   double q = cdf(complement(dist, fabs(t_stat)));
							 | 
						||
| 
								 | 
							
								   cout << setw(55) << left << "Probability that difference is due to chance" << "=  "
							 | 
						||
| 
								 | 
							
								      << setprecision(3) << scientific << 2 * q << "\n\n";
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Here we've used the absolute value of the t-statistic, because we initially
							 | 
						||
| 
								 | 
							
								want to know simply whether there is a difference or not (a two-sided test).
							 | 
						||
| 
								 | 
							
								However, we can also test whether the mean of the second sample is greater
							 | 
						||
| 
								 | 
							
								or is less (one-sided test) than that of the first:
							 | 
						||
| 
								 | 
							
								all the possible tests are summed up in the following table:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[table
							 | 
						||
| 
								 | 
							
								[[Hypothesis][Test]]
							 | 
						||
| 
								 | 
							
								[[The Null-hypothesis: there is
							 | 
						||
| 
								 | 
							
								*no difference* in means]
							 | 
						||
| 
								 | 
							
								[Reject if complement of CDF for |t| < significance level / 2:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								`cdf(complement(dist, fabs(t))) < alpha / 2`]]
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[[The Alternative-hypothesis: there is a
							 | 
						||
| 
								 | 
							
								*difference* in means]
							 | 
						||
| 
								 | 
							
								[Reject if complement of CDF for |t| > significance level / 2:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								`cdf(complement(dist, fabs(t))) < alpha / 2`]]
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[[The Alternative-hypothesis: Sample 1 Mean is *less* than
							 | 
						||
| 
								 | 
							
								Sample 2 Mean.]
							 | 
						||
| 
								 | 
							
								[Reject if CDF of t > significance level:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								`cdf(dist, t) > alpha`]]
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[[The Alternative-hypothesis: Sample 1 Mean is *greater* than
							 | 
						||
| 
								 | 
							
								Sample 2 Mean.]
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[Reject if complement of CDF of t > significance level:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								`cdf(complement(dist, t)) > alpha`]]
							 | 
						||
| 
								 | 
							
								]
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[note
							 | 
						||
| 
								 | 
							
								For a two-sided test we must compare against alpha / 2 and not alpha.]
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Most of the rest of the sample program is pretty-printing, so we'll
							 | 
						||
| 
								 | 
							
								skip over that, and take a look at the sample output for alpha=0.05
							 | 
						||
| 
								 | 
							
								(a 95% probability level).  For comparison the dataplot output
							 | 
						||
| 
								 | 
							
								for the same data is in
							 | 
						||
| 
								 | 
							
								[@http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm
							 | 
						||
| 
								 | 
							
								section 1.3.5.3] of the __handbook.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[pre'''
							 | 
						||
| 
								 | 
							
								   ________________________________________________
							 | 
						||
| 
								 | 
							
								   Student t test for two samples (equal variances)
							 | 
						||
| 
								 | 
							
								   ________________________________________________
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								   Number of Observations (Sample 1)                      =  249
							 | 
						||
| 
								 | 
							
								   Sample 1 Mean                                          =  20.145
							 | 
						||
| 
								 | 
							
								   Sample 1 Standard Deviation                            =  6.4147
							 | 
						||
| 
								 | 
							
								   Number of Observations (Sample 2)                      =  79
							 | 
						||
| 
								 | 
							
								   Sample 2 Mean                                          =  30.481
							 | 
						||
| 
								 | 
							
								   Sample 2 Standard Deviation                            =  6.1077
							 | 
						||
| 
								 | 
							
								   Degrees of Freedom                                     =  326
							 | 
						||
| 
								 | 
							
								   Pooled Standard Deviation                              =  6.3426
							 | 
						||
| 
								 | 
							
								   T Statistic                                            =  -12.621
							 | 
						||
| 
								 | 
							
								   Probability that difference is due to chance           =  5.273e-030
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								   Results for Alternative Hypothesis and alpha           =  0.0500'''
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								   Alternative Hypothesis              Conclusion
							 | 
						||
| 
								 | 
							
								   Sample 1 Mean != Sample 2 Mean       NOT REJECTED
							 | 
						||
| 
								 | 
							
								   Sample 1 Mean <  Sample 2 Mean       NOT REJECTED
							 | 
						||
| 
								 | 
							
								   Sample 1 Mean >  Sample 2 Mean       REJECTED
							 | 
						||
| 
								 | 
							
								]
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								So with a probability that the difference is due to chance of just
							 | 
						||
| 
								 | 
							
								5.273e-030, we can safely conclude that there is indeed a difference.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The tests on the alternative hypothesis show that we must
							 | 
						||
| 
								 | 
							
								also reject the hypothesis that Sample 1 Mean is
							 | 
						||
| 
								 | 
							
								greater than that for Sample 2: in this case Sample 1 represents the
							 | 
						||
| 
								 | 
							
								miles per gallon for Japanese cars, and Sample 2 the miles per gallon for
							 | 
						||
| 
								 | 
							
								US cars, so we conclude that Japanese cars are on average more
							 | 
						||
| 
								 | 
							
								fuel efficient.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Now that we have the simple case out of the way, let's look for a moment
							 | 
						||
| 
								 | 
							
								at the more complex one: that the standard deviations of the two samples
							 | 
						||
| 
								 | 
							
								are not equal.  In this case the formula for the t-statistic becomes:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[equation dist_tutorial2]
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								And for the combined degrees of freedom we use the
							 | 
						||
| 
								 | 
							
								[@http://en.wikipedia.org/wiki/Welch-Satterthwaite_equation Welch-Satterthwaite]
							 | 
						||
| 
								 | 
							
								approximation:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[equation dist_tutorial3]
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Note that this is one of the rare situations where the degrees-of-freedom
							 | 
						||
| 
								 | 
							
								parameter to the Student's t distribution is a real number, and not an
							 | 
						||
| 
								 | 
							
								integer value.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[note
							 | 
						||
| 
								 | 
							
								Some statistical packages truncate the effective degrees of freedom to
							 | 
						||
| 
								 | 
							
								an integer value: this may be necessary if you are relying on lookup tables,
							 | 
						||
| 
								 | 
							
								but since our code fully supports non-integer degrees of freedom there is no
							 | 
						||
| 
								 | 
							
								need to truncate in this case.  Also note that when the degrees of freedom
							 | 
						||
| 
								 | 
							
								is small then the Welch-Satterthwaite approximation may be a significant
							 | 
						||
| 
								 | 
							
								source of error.]
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Putting these formulae into code we get:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								   // Degrees of freedom:
							 | 
						||
| 
								 | 
							
								   double v = Sd1 * Sd1 / Sn1 + Sd2 * Sd2 / Sn2;
							 | 
						||
| 
								 | 
							
								   v *= v;
							 | 
						||
| 
								 | 
							
								   double t1 = Sd1 * Sd1 / Sn1;
							 | 
						||
| 
								 | 
							
								   t1 *= t1;
							 | 
						||
| 
								 | 
							
								   t1 /=  (Sn1 - 1);
							 | 
						||
| 
								 | 
							
								   double t2 = Sd2 * Sd2 / Sn2;
							 | 
						||
| 
								 | 
							
								   t2 *= t2;
							 | 
						||
| 
								 | 
							
								   t2 /= (Sn2 - 1);
							 | 
						||
| 
								 | 
							
								   v /= (t1 + t2);
							 | 
						||
| 
								 | 
							
								   cout << setw(55) << left << "Degrees of Freedom" << "=  " << v << "\n";
							 | 
						||
| 
								 | 
							
								   // t-statistic:
							 | 
						||
| 
								 | 
							
								   double t_stat = (Sm1 - Sm2) / sqrt(Sd1 * Sd1 / Sn1 + Sd2 * Sd2 / Sn2);
							 | 
						||
| 
								 | 
							
								   cout << setw(55) << left << "T Statistic" << "=  " << t_stat << "\n";
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Thereafter the code and the tests are performed the same as before.  Using
							 | 
						||
| 
								 | 
							
								are car mileage data again, here's what the output looks like:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[pre'''
							 | 
						||
| 
								 | 
							
								   __________________________________________________
							 | 
						||
| 
								 | 
							
								   Student t test for two samples (unequal variances)
							 | 
						||
| 
								 | 
							
								   __________________________________________________
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								   Number of Observations (Sample 1)                      =  249
							 | 
						||
| 
								 | 
							
								   Sample 1 Mean                                          =  20.145
							 | 
						||
| 
								 | 
							
								   Sample 1 Standard Deviation                            =  6.4147
							 | 
						||
| 
								 | 
							
								   Number of Observations (Sample 2)                      =  79
							 | 
						||
| 
								 | 
							
								   Sample 2 Mean                                          =  30.481
							 | 
						||
| 
								 | 
							
								   Sample 2 Standard Deviation                            =  6.1077
							 | 
						||
| 
								 | 
							
								   Degrees of Freedom                                     =  136.87
							 | 
						||
| 
								 | 
							
								   T Statistic                                            =  -12.946
							 | 
						||
| 
								 | 
							
								   Probability that difference is due to chance           =  1.571e-025
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								   Results for Alternative Hypothesis and alpha           =  0.0500'''
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								   Alternative Hypothesis              Conclusion
							 | 
						||
| 
								 | 
							
								   Sample 1 Mean != Sample 2 Mean       NOT REJECTED
							 | 
						||
| 
								 | 
							
								   Sample 1 Mean <  Sample 2 Mean       NOT REJECTED
							 | 
						||
| 
								 | 
							
								   Sample 1 Mean >  Sample 2 Mean       REJECTED
							 | 
						||
| 
								 | 
							
								]
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								This time allowing the variances in the two samples to differ has yielded
							 | 
						||
| 
								 | 
							
								a higher likelihood that the observed difference is down to chance alone
							 | 
						||
| 
								 | 
							
								(1.571e-025 compared to 5.273e-030 when equal variances were assumed).
							 | 
						||
| 
								 | 
							
								However, the conclusion remains the same: US cars are less fuel efficient
							 | 
						||
| 
								 | 
							
								than Japanese models.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[endsect]
							 | 
						||
| 
								 | 
							
								[section:paired_st Comparing two paired samples with the Student's t distribution]
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Imagine that we have a before and after reading for each item in the sample:
							 | 
						||
| 
								 | 
							
								for example we might have measured blood pressure before and after administration
							 | 
						||
| 
								 | 
							
								of a new drug.  We can't pool the results and compare the means before and after
							 | 
						||
| 
								 | 
							
								the change, because each patient will have a different baseline reading.
							 | 
						||
| 
								 | 
							
								Instead we calculate the difference between before and after measurements
							 | 
						||
| 
								 | 
							
								in each patient, and calculate the mean and standard deviation of the differences.
							 | 
						||
| 
								 | 
							
								To test whether a significant change has taken place, we can then test
							 | 
						||
| 
								 | 
							
								the null-hypothesis that the true mean is zero using the same procedure
							 | 
						||
| 
								 | 
							
								we used in the single sample cases previously discussed.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								That means we can:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								* [link math_toolkit.stat_tut.weg.st_eg.tut_mean_intervals Calculate confidence intervals of the mean].
							 | 
						||
| 
								 | 
							
								If the endpoints of the interval differ in sign then we are unable to reject
							 | 
						||
| 
								 | 
							
								the null-hypothesis that there is no change.
							 | 
						||
| 
								 | 
							
								* [link math_toolkit.stat_tut.weg.st_eg.tut_mean_test Test whether the true mean is zero]. If the
							 | 
						||
| 
								 | 
							
								result is consistent with a true mean of zero, then we are unable to reject the
							 | 
						||
| 
								 | 
							
								null-hypothesis that there is no change.
							 | 
						||
| 
								 | 
							
								* [link math_toolkit.stat_tut.weg.st_eg.tut_mean_size Calculate how many pairs of readings we would need
							 | 
						||
| 
								 | 
							
								in order to obtain a significant result].
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[endsect]
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[endsect][/section:st_eg Student's t]
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								[/
							 | 
						||
| 
								 | 
							
								  Copyright 2006, 2012 John Maddock and Paul A. Bristow.
							 | 
						||
| 
								 | 
							
								  Distributed under the Boost Software License, Version 1.0.
							 | 
						||
| 
								 | 
							
								  (See accompanying file LICENSE_1_0.txt or copy at
							 | 
						||
| 
								 | 
							
								  http://www.boost.org/LICENSE_1_0.txt).
							 | 
						||
| 
								 | 
							
								]
							 | 
						||
| 
								 | 
							
								
							 |