Chi-Square Calculator

Calculate the chi-square contribution for any observed and expected value using χ² = (O − E)² / E. Includes worked examples, critical-value table, and guidance on combining terms into a full test statistic.

This calculator returns the chi-square contribution for a single observed and expected value using Karl Pearson's original formula: χ² = (observed − expected)² / expected. Enter two numbers, get the one piece of the chi-square statistic that corresponds to that category.

Most chi-square tools on the internet ask you to dump in a whole grid of numbers and hand back a single summary. That's fine when you already know what you're doing. But when you're learning the test, debugging a homework answer, or trying to figure out which category is pulling the overall result — what you actually want is to see the formula work on one row at a time. That's what this calculator does.

Below you'll find the concept in plain English, worked examples from real statistics problems, a critical-values table for interpreting the summed total, and answers to the questions students and researchers ask most often.

What Chi-Square Actually Measures

A chi-square term answers one question: how surprising is this observation, given what we expected?

Three pieces do the work:

  • Observed (O) — the count you actually recorded
  • Expected (E) — the count predicted by your null hypothesis, theory, or reference distribution
  • (O − E)² / E — squares the gap (so overestimates and underestimates both count), then scales it by what was expected (so being off by 10 matters more when you expected 5 than when you expected 500)

The bigger the result, the further that single observation sits from what the model predicted, in standardized terms. Close to zero means the category matched expectation. A large value means that category is doing heavy lifting in your overall test statistic.

This single-term view is more than a teaching device. When a full chi-square test comes back significant, researchers routinely look at the per-category contributions to find where the deviation lives. That's the second job this calculator is good for.

How to Use This Calculator

  1. Enter your observed value. The count you actually recorded — rolls that landed on six, survey respondents who picked option A, plants with purple flowers.
  2. Enter your expected value. The count predicted by your null hypothesis or model. For 60 rolls of a fair die, you'd expect 10 per face.
  3. Read the χ² value. The calculator returns (O − E)² / E — the chi-square contribution for that single category.

For a full test across multiple categories, run the calculator once per category and add the results. The total is your chi-square test statistic, which you compare against the critical value for your degrees of freedom (see the table further down).

Worked Examples

Example 1: Is the Die Fair?

You roll a die 60 times and land 15 sixes. Under a fair-die hypothesis, you'd expect 10.

  • Observed = 15, Expected = 10
  • (15 − 10)² / 10 = 25 / 10 = 2.5

That's the chi-square contribution for "sixes." Repeat for the other five faces and sum the six values — if the total exceeds 11.07 (the critical value at df = 5, α = 0.05), you have evidence the die isn't fair.

Example 2: Survey Shift

You expected 50 respondents to pick "very satisfied" based on last year's benchmark. This year, 32 did.

  • Observed = 32, Expected = 50
  • (32 − 50)² / 50 = 324 / 50 = 6.48

A contribution of 6.48 is substantial. On its own it's not a test result — you'd still need the other satisfaction categories and a degrees-of-freedom lookup — but it tells you immediately that "very satisfied" is where the pattern shifted hardest.

Example 3: Mendelian Genetics

In a monohybrid cross, you expect 75 purple-flowered plants out of 100. You count 78.

  • Observed = 78, Expected = 75
  • (78 − 75)² / 75 = 9 / 75 = 0.12

Tiny contribution — the observation sits right on top of the Mendelian prediction. Across all categories, totals this small are the hallmark of a theory your data can't distinguish from.

Example 4: Quality-Control Inspection

A factory expects 2% defective units in a batch of 500 — that's 10 expected defectives. The inspector finds 22.

  • Observed = 22, Expected = 10
  • (22 − 10)² / 10 = 144 / 10 = 14.4

A single-term value of 14.4 is a loud signal. Even before combining with the "non-defective" category, this tells you the batch isn't behaving like the historical baseline.

Combining Terms Into a Full Chi-Square Statistic

A full chi-square test sums one term per category:

χ² = Σ [(Oᵢ − Eᵢ)² / Eᵢ]

The Σ just means "add them all up." For four categories you'd:

  1. Calculate (O − E)² / E for category 1
  2. Do the same for categories 2, 3, and 4
  3. Add the four results

That sum is your test statistic. Compare it to the critical value for your degrees of freedom (number of categories − 1 for goodness-of-fit; (rows − 1) × (columns − 1) for a contingency table).

Critical Values at α = 0.05

Degrees of Freedom

Critical Value

1

3.84

2

5.99

3

7.81

4

9.49

5

11.07

6

12.59

8

15.51

10

18.31

15

25.00

20

31.41

If your summed chi-square exceeds the critical value for your df, you reject the null hypothesis at the 5% level. For stricter tests use α = 0.01, which raises the bar (for df = 1, the 0.01 critical value is 6.63).

Interpreting a Single Term

On its own, a per-category value isn't a significance test — that job belongs to the sum. But as a diagnostic it's genuinely useful:

  • Near zero (below ~1): observation sits close to expectation
  • Moderate (roughly 1–4): meaningful deviation, worth noting
  • Large (above ~4): this category is pulling hard on your overall statistic — investigate it

A practical rule: after a significant full test, scan the per-term values and focus attention on the largest two or three. Those are the categories where your observed data disagrees most with your model — and usually the ones worth writing about.

When Per-Term Values Matter Most

There are three situations where calculating one chi-square term at a time isn't just convenient — it's the right tool:

  1. Learning the formula. Running a grid-based calculator hides the mechanics. Doing it term by term builds real intuition for why squaring matters, why division by E matters, and what makes a contribution "big."
  2. Post-hoc analysis. When a full chi-square test rejects the null, per-term values tell you which categories drove the rejection. Reporting "χ² = 24.3, p < .001, with the bulk of the deviation concentrated in categories A and D" is far more informative than reporting just the summary.
  3. Checking hand calculations. Textbook problems and exam answers are graded term by term. Verifying each contribution individually catches arithmetic mistakes that a summary-only tool would bury.

Technical Notes

Formula: χ² = (observed − expected)² / expected

This is one component of the full test statistic χ² = Σ [(Oᵢ − Eᵢ)² / Eᵢ], summed across all categories in your data.

Assumptions:

  • Observations are independent
  • Data are counts (frequencies), not percentages or proportions
  • Expected counts are generally 5 or larger per category
  • Categories are mutually exclusive

When to use a different test: If expected counts are very small, use Fisher's exact test. If your data are continuous, use a t-test or ANOVA. If you're comparing two proportions directly, a z-test for proportions is often simpler.

Historical note: The chi-square test was introduced by Karl Pearson in 1900 and is one of the oldest tools in inferential statistics. Despite its age, it remains the default test for categorical count data in fields ranging from genetics to market research — because the logic is transparent and the math is easy to verify by hand, one term at a time.

Frequently Asked Questions

Does this calculator give me a p-value?

No — it returns a single chi-square term. A p-value requires the full test statistic (summed across all categories) and your degrees of freedom, then a chi-square distribution lookup.

What are degrees of freedom?

For a goodness-of-fit test, it's the number of categories minus 1. For a contingency table (test of independence), it's (rows − 1) × (columns − 1). Degrees of freedom tell the chi-square distribution how spread out the statistic should be by chance alone.

Why divide by the expected value?

Dividing by E keeps contributions comparable across categories of different sizes. A gap of 10 means one thing when you expected 10 and something very different when you expected 1,000. Scaling by E reflects that.

Why is the difference squared?

Squaring makes every term positive — so overestimates and underestimates both count as deviations — and it gives larger gaps disproportionately more weight, which matches how "surprising" data actually feels.

Can chi-square be negative?

No. The numerator is squared and the denominator is a positive count, so every term is zero or positive.

What happens when the expected value is zero?

The formula breaks — you can't divide by zero. Chi-square tests require reasonably large expected counts: a common rule of thumb is that at least 80% of expected counts should be 5 or more, and none should be below 1. If yours are too small, combine adjacent categories or switch to Fisher's exact test.

Do observed and expected both have to be whole numbers?

Observed counts are whole numbers because they're actual tallies. Expected values usually aren't — 65 die rolls means 10.833... expected per face. That's fine; the formula handles non-integer expected values without any adjustment.

What's the difference between a goodness-of-fit test and a test of independence?

Goodness-of-fit compares one observed distribution to an expected distribution ("is this die fair?"). Test of independence compares observed counts in a two-way table to what you'd expect if two variables were unrelated ("does treatment type depend on age group?"). Both use the same per-cell formula — only the degrees-of-freedom calculation differs.

How big does chi-square need to be for significance?

Depends on your degrees of freedom and significance level. With df = 1 at α = 0.05, the critical value is 3.84. With df = 4 it's 9.49; with df = 10 it's 18.31. See the critical-values table above for common values.

Why does my textbook use χ² for the statistic and the distribution?

Because they're the same thing. The test statistic is called chi-square because, under the null hypothesis, its sampling distribution follows the chi-square distribution with the appropriate degrees of freedom. That's what makes the critical-value lookup work.

What if my observed count is much larger than expected — is there a "too significant"?

There isn't a ceiling, but extremely large single-term values (say, above 20–30 when others are tiny) often point to a data-entry error, a miscalculated expected value, or a category that shouldn't have been included. Worth double-checking before writing up results.