Skip to main content
Back to Statistics & Probability
JEE Main 2018
Statistics & Probability
Statistics
Hard

Question

For two data sets, each of size 5, the variances are given to be 4 and 5 and the corresponding means are given to be 2 and 4, respectively. The variance of the combined data set is

Options

Solution

This problem requires us to calculate the variance of a combined data set using the given statistics of two individual data sets.

1. Key Concepts and Formulas

To solve this problem, we'll utilize the fundamental definitions and computational formulas for mean and variance:

  • Mean (X\overline{X}): The average of a data set. For a set of nn observations XiX_i, the mean is X=Xin\overline{X} = \frac{\sum X_i}{n}.
  • Variance (σ2\sigma^2): A measure of the spread of data points around the mean. The most convenient computational formula for variance is σ2=Xi2n(X)2\sigma^2 = \frac{\sum X_i^2}{n} - (\overline{X})^2. This formula can be rearranged to find the sum of squares: Xi2=n(σ2+X2)\sum X_i^2 = n(\sigma^2 + \overline{X}^2).
  • Combined Mean of two data sets: For two data sets, XX (size n1n_1, mean x1\overline{x}_1) and YY (size n2n_2, mean x2\overline{x}_2), the mean of the combined data set (Xcombined\overline{X}_{combined}) is Xcombined=n1x1+n2x2n1+n2\overline{X}_{combined} = \frac{n_1 \overline{x}_1 + n_2 \overline{x}_2}{n_1 + n_2}.
  • Combined Variance of two data sets: The variance of the combined data set can be found using the general variance formula: σcombined2=(all data points)2total number of data points(Xcombined)2\sigma^2_{combined} = \frac{\sum (\text{all data points})^2}{\text{total number of data points}} - (\overline{X}_{combined})^2. This requires calculating the sum of squares for all individual data points.

2. Step-by-Step Solution

Step 1: Understand the Given Information

We are given two data sets. Let's label them Data Set 1 and Data Set 2.

  • For Data Set 1:

    • Number of observations (n1n_1) = 5
    • Variance (σ12{\sigma_1}^2) = 4
    • Mean (x1\overline{x}_1) = 2
  • For Data Set 2:

    • Number of observations (n2n_2) = 5
    • Variance (σ22{\sigma_2}^2) = 5
    • Mean (x2\overline{x}_2) = 4

Our objective is to calculate the variance of the combined data set.

Step 2: Calculate the Sum of Observations for Each Data Set

The mean formula X=Xin\overline{X} = \frac{\sum X_i}{n} allows us to find the sum of observations (Xi\sum X_i) for each set. This is crucial for calculating the combined mean.

  • For Data Set 1: x1=xin1\overline{x}_1 = \frac{\sum x_i}{n_1} Substituting the given values: 2=xi52 = \frac{\sum x_i}{5} Multiplying both sides by 5: xi=2×5=10\sum x_i = 2 \times 5 = 10

  • For Data Set 2: x2=yin2\overline{x}_2 = \frac{\sum y_i}{n_2} Substituting the given values: 4=yi54 = \frac{\sum y_i}{5} Multiplying both sides by 5: yi=4×5=20\sum y_i = 4 \times 5 = 20

Step 3: Calculate the Sum of Squares for Each Data Set

We use the computational formula for variance, σ2=Xi2n(X)2\sigma^2 = \frac{\sum X_i^2}{n} - (\overline{X})^2, rearranged to find Xi2=n(σ2+X2)\sum X_i^2 = n(\sigma^2 + \overline{X}^2). These sums of squares are essential for calculating the combined variance.

  • For Data Set 1: σ12=xi2n1(x1)2{\sigma_1}^2 = \frac{\sum x_i^2}{n_1} - (\overline{x}_1)^2 Substituting the given values: 4=xi25(2)24 = \frac{\sum x_i^2}{5} - (2)^2 4=xi2544 = \frac{\sum x_i^2}{5} - 4 Adding 4 to both sides: 8=xi258 = \frac{\sum x_i^2}{5} Multiplying both sides by 5: xi2=8×5=40\sum x_i^2 = 8 \times 5 = 40

  • For Data Set 2: σ22=yi2n2(x2)2{\sigma_2}^2 = \frac{\sum y_i^2}{n_2} - (\overline{x}_2)^2 Substituting the given values: 5=yi25(4)25 = \frac{\sum y_i^2}{5} - (4)^2 5=yi25165 = \frac{\sum y_i^2}{5} - 16 Adding 16 to both sides: 21=yi2521 = \frac{\sum y_i^2}{5} Multiplying both sides by 5: yi2=21×5=105\sum y_i^2 = 21 \times 5 = 105

Step 4: Calculate the Mean of the Combined Data Set

First, find the total number of observations and the total sum of observations for the combined data set.

  • Total number of observations (NN) = n1+n2=5+5=10n_1 + n_2 = 5 + 5 = 10.
  • Total sum of observations (Xcombined\sum X_{combined}) = xi+yi=10+20=30\sum x_i + \sum y_i = 10 + 20 = 30.

Now, calculate the combined mean (Xcombined\overline{X}_{combined}): Xcombined=XcombinedN=3010=3\overline{X}_{combined} = \frac{\sum X_{combined}}{N} = \frac{30}{10} = 3 This combined mean will be used in the final variance calculation.

Step 5: Calculate the Variance of the Combined Data Set

First, find the sum of squares of all observations in the combined data set.

  • Total sum of squares (Xcombined2\sum X_{combined}^2) = xi2+yi2=40+105=145\sum x_i^2 + \sum y_i^2 = 40 + 105 = 145.

Now, apply the variance formula for the combined data set: σcombined2=Xcombined2N(Xcombined)2\sigma^2_{combined} = \frac{\sum X_{combined}^2}{N} - (\overline{X}_{combined})^2 Substituting the calculated values: σcombined2=14510(3)2\sigma^2_{combined} = \frac{145}{10} - (3)^2 σcombined2=14.59\sigma^2_{combined} = 14.5 - 9 σcombined2=5.5\sigma^2_{combined} = 5.5 As a fraction, this is: σcombined2=112\sigma^2_{combined} = \frac{11}{2}

3. Common Mistakes & Tips

  • Incorrect Combined Mean: A common error is to simply average the individual means, e.g., (x1+x2)/2(\overline{x}_1 + \overline{x}_2)/2. This is only correct if n1=n2n_1 = n_2. In general, use the weighted average formula: Xcombined=n1x1+n2x2n1+n2\overline{X}_{combined} = \frac{n_1 \overline{x}_1 + n_2 \overline{x}_2}{n_1 + n_2}. In this specific problem, since n1=n2=5n_1=n_2=5, (x1+x2)/2=(2+4)/2=3(\overline{x}_1 + \overline{x}_2)/2 = (2+4)/2 = 3, which happens to be correct. However, always use the general formula for robustness.
  • Confusing Variance Formulas: Always remember the computational formula σ2=Xi2n(X)2\sigma^2 = \frac{\sum X_i^2}{n} - (\overline{X})^2 as it is usually more efficient than σ2=(XiX)2n\sigma^2 = \frac{\sum (X_i - \overline{X})^2}{n} for problems involving sums of squares.
  • Alternative Combined Variance Formula: There's a direct formula for combined variance: σcombined2=n1σ12+n2σ22n1+n2+n1n2(n1+n2)2(x1x2)2\sigma^2_{combined} = \frac{n_1 \sigma_1^2 + n_2 \sigma_2^2}{n_1+n_2} + \frac{n_1 n_2}{(n_1+n_2)^2}(\overline{x}_1 - \overline{x}_2)^2 Let's verify our answer using this formula: σcombined2=5×4+5×55+5+5×5(5+5)2(24)2\sigma^2_{combined} = \frac{5 \times 4 + 5 \times 5}{5+5} + \frac{5 \times 5}{(5+5)^2}(2 - 4)^2 σcombined2=20+2510+25100(2)2\sigma^2_{combined} = \frac{20 + 25}{10} + \frac{25}{100}(-2)^2 σcombined2=4510+25100(4)\sigma^2_{combined} = \frac{45}{10} + \frac{25}{100}(4) σcombined2=4.5+100100\sigma^2_{combined} = 4.5 + \frac{100}{100} σcombined2=4.5+1=5.5\sigma^2_{combined} = 4.5 + 1 = 5.5 Both methods yield the same result, confirming our calculation. This formula can be a shortcut if remembered, but the fundamental approach used in the main solution is generally more robust as it relies directly on the definitions.

4. Summary

To find the variance of the combined data set, we first calculated the sum of observations and the sum of squares for each individual data set using their respective means and variances. Then, we combined these sums to find the total sum of observations and total sum of squares for the entire data set. Finally, we used the combined mean and total sum of squares in the standard variance formula to determine the variance of the combined data set. The result of these calculations is 5.55.5 or 112\frac{11}{2}.

5. Final Answer

The variance of the combined data set is 112\frac{11}{2}. This corresponds to option (B).

The final answer is 11/2\boxed{\text{11/2}}

Practice More Statistics & Probability Questions

View All Questions