This calculator returns the chi-square contribution for a single observed and expected value using Karl Pearson's original formula: χ² = (observed − expected)² / expected. Enter two numbers, get the one piece of the chi-square statistic that corresponds to that category.
Most chi-square tools on the internet ask you to dump in a whole grid of numbers and hand back a single summary. That's fine when you already know what you're doing. But when you're learning the test, debugging a homework answer, or trying to figure out which category is pulling the overall result — what you actually want is to see the formula work on one row at a time. That's what this calculator does.
Below you'll find the concept in plain English, worked examples from real statistics problems, a critical-values table for interpreting the summed total, and answers to the questions students and researchers ask most often.
What Chi-Square Actually Measures
A chi-square term answers one question: how surprising is this observation, given what we expected?
Three pieces do the work:
- Observed (O) — the count you actually recorded
- Expected (E) — the count predicted by your null hypothesis, theory, or reference distribution
- (O − E)² / E — squares the gap (so overestimates and underestimates both count), then scales it by what was expected (so being off by 10 matters more when you expected 5 than when you expected 500)
The bigger the result, the further that single observation sits from what the model predicted, in standardized terms. Close to zero means the category matched expectation. A large value means that category is doing heavy lifting in your overall test statistic.
This single-term view is more than a teaching device. When a full chi-square test comes back significant, researchers routinely look at the per-category contributions to find where the deviation lives. That's the second job this calculator is good for.
How to Use This Calculator
- Enter your observed value. The count you actually recorded — rolls that landed on six, survey respondents who picked option A, plants with purple flowers.
- Enter your expected value. The count predicted by your null hypothesis or model. For 60 rolls of a fair die, you'd expect 10 per face.
- Read the χ² value. The calculator returns (O − E)² / E — the chi-square contribution for that single category.
For a full test across multiple categories, run the calculator once per category and add the results. The total is your chi-square test statistic, which you compare against the critical value for your degrees of freedom (see the table further down).
Worked Examples
Example 1: Is the Die Fair?
You roll a die 60 times and land 15 sixes. Under a fair-die hypothesis, you'd expect 10.
- Observed = 15, Expected = 10
- (15 − 10)² / 10 = 25 / 10 = 2.5
That's the chi-square contribution for "sixes." Repeat for the other five faces and sum the six values — if the total exceeds 11.07 (the critical value at df = 5, α = 0.05), you have evidence the die isn't fair.
Example 2: Survey Shift
You expected 50 respondents to pick "very satisfied" based on last year's benchmark. This year, 32 did.
- Observed = 32, Expected = 50
- (32 − 50)² / 50 = 324 / 50 = 6.48
A contribution of 6.48 is substantial. On its own it's not a test result — you'd still need the other satisfaction categories and a degrees-of-freedom lookup — but it tells you immediately that "very satisfied" is where the pattern shifted hardest.
Example 3: Mendelian Genetics
In a monohybrid cross, you expect 75 purple-flowered plants out of 100. You count 78.
- Observed = 78, Expected = 75
- (78 − 75)² / 75 = 9 / 75 = 0.12
Tiny contribution — the observation sits right on top of the Mendelian prediction. Across all categories, totals this small are the hallmark of a theory your data can't distinguish from.
Example 4: Quality-Control Inspection
A factory expects 2% defective units in a batch of 500 — that's 10 expected defectives. The inspector finds 22.
- Observed = 22, Expected = 10
- (22 − 10)² / 10 = 144 / 10 = 14.4
A single-term value of 14.4 is a loud signal. Even before combining with the "non-defective" category, this tells you the batch isn't behaving like the historical baseline.
Combining Terms Into a Full Chi-Square Statistic
A full chi-square test sums one term per category:
χ² = Σ [(Oᵢ − Eᵢ)² / Eᵢ]
The Σ just means "add them all up." For four categories you'd:
- Calculate (O − E)² / E for category 1
- Do the same for categories 2, 3, and 4
- Add the four results
That sum is your test statistic. Compare it to the critical value for your degrees of freedom (number of categories − 1 for goodness-of-fit; (rows − 1) × (columns − 1) for a contingency table).
Critical Values at α = 0.05
Degrees of Freedom | Critical Value |
|---|---|
1 | 3.84 |
2 | 5.99 |
3 | 7.81 |
4 | 9.49 |
5 | 11.07 |
6 | 12.59 |
8 | 15.51 |
10 | 18.31 |
15 | 25.00 |
20 | 31.41 |
If your summed chi-square exceeds the critical value for your df, you reject the null hypothesis at the 5% level. For stricter tests use α = 0.01, which raises the bar (for df = 1, the 0.01 critical value is 6.63).
Interpreting a Single Term
On its own, a per-category value isn't a significance test — that job belongs to the sum. But as a diagnostic it's genuinely useful:
- Near zero (below ~1): observation sits close to expectation
- Moderate (roughly 1–4): meaningful deviation, worth noting
- Large (above ~4): this category is pulling hard on your overall statistic — investigate it
A practical rule: after a significant full test, scan the per-term values and focus attention on the largest two or three. Those are the categories where your observed data disagrees most with your model — and usually the ones worth writing about.
When Per-Term Values Matter Most
There are three situations where calculating one chi-square term at a time isn't just convenient — it's the right tool:
- Learning the formula. Running a grid-based calculator hides the mechanics. Doing it term by term builds real intuition for why squaring matters, why division by E matters, and what makes a contribution "big."
- Post-hoc analysis. When a full chi-square test rejects the null, per-term values tell you which categories drove the rejection. Reporting "χ² = 24.3, p < .001, with the bulk of the deviation concentrated in categories A and D" is far more informative than reporting just the summary.
- Checking hand calculations. Textbook problems and exam answers are graded term by term. Verifying each contribution individually catches arithmetic mistakes that a summary-only tool would bury.
Technical Notes
Formula: χ² = (observed − expected)² / expected
This is one component of the full test statistic χ² = Σ [(Oᵢ − Eᵢ)² / Eᵢ], summed across all categories in your data.
Assumptions:
- Observations are independent
- Data are counts (frequencies), not percentages or proportions
- Expected counts are generally 5 or larger per category
- Categories are mutually exclusive
When to use a different test: If expected counts are very small, use Fisher's exact test. If your data are continuous, use a t-test or ANOVA. If you're comparing two proportions directly, a z-test for proportions is often simpler.
Historical note: The chi-square test was introduced by Karl Pearson in 1900 and is one of the oldest tools in inferential statistics. Despite its age, it remains the default test for categorical count data in fields ranging from genetics to market research — because the logic is transparent and the math is easy to verify by hand, one term at a time.