# SamplingShala: Methodology and Assumptions

This document is the technical companion to SamplingShala. It states, for every design
type, exactly which formula the deterministic engine uses, what each input means, the valid
range the interface allows, and the assumptions and limits that come with the method. It is
written so a statistician, reviewer, or funder can vet the tool, and so an evaluator can make
an informed decision about any plan it produces.

SamplingShala is a free planning and learning aid. It implements standard textbook sampling
methods faithfully. It is not a substitute for a qualified statistician or evaluator. Confirm
any design with a domain expert before using it for a real evaluation.

All arithmetic is done by a deterministic engine. No estimate is ever produced by a language
model. The verified functions are the single source of truth.

## How the five steps feed the output

1. **Question** picks the design type, which selects the formula family.
2. **Scale** sets the cascade (states, districts, blocks, schools, people per site). This
   defines the population frame N used in the finite population correction and the number of
   schools used in the binding-constraint calculation.
3. **Who** chooses stakeholder groups and a sampling mode for each. Statistical groups are
   sized independently; quota, census, purposive, and qualitative groups are not power-sized.
4. **Precision** sets confidence, power, the target effect or margin, clustering, cluster
   size, attrition, and any design-specific control.
5. **Strategy** reconciles every group into one field plan and shows the binding constraint,
   the per-group table, a copy-ready justification, structural flags, and the methods panel.

## Shared building blocks

These apply to every design that produces a statistical sample size (all except pure
qualitative inquiry and LQAS, which have their own logic).

### Critical values

- Confidence critical value, two-sided: `z = invNorm(1 - (1 - C)/2)`, where C is the
  confidence level. At 95%, z = 1.95996.
- Power critical value, one-sided: `z_b = invNorm(power)`. At 80% power, z_b = 0.84162.
- `invNorm` is Acklam's inverse normal CDF; `normCdf` uses an `erf` approximation. These
  match standard tables to better than four decimal places.

Allowed ranges: confidence 50 to 99.9%, power 50 to 99%.

### Design effect (clustering)

When people are sampled inside shared sites (for example students within schools), their
responses are correlated, so each additional person inside a site carries less independent
information. The variance inflates by the design effect:

```
DEFF = 1 + (m - 1) x ICC
```

where `m` is the number of people sampled per site and `ICC` is the intra-cluster
correlation. The independent-sample size is multiplied by DEFF, then divided into sites
(`clusters = ceil(totalN / m)`).

Allowed ranges: ICC 0 to 0.30 (typical education values fall around 0.10 to 0.25); m 3 to 50.

### Finite population correction

For a known frame of size N, the requirement shrinks:

```
n = n0 / (1 + (n0 - 1) / N)
```

A large frame changes the number very little; a small frame can reduce it sharply. When the
resulting sample exceeds half the population, the tool suggests a census may be simpler.

### Attrition buffer

The achieved-sample target is divided by the share expected to remain, so dropout does not
erode power:

```
nFinal = ceil(n0 x DEFF / (1 - attrition))
```

Allowed range: attrition 0 to 30%. For longitudinal panels it compounds across waves (see
below).

## Design types

### Descriptive survey and the quantitative strand of Mixed methods

Estimates a single proportion to within a stated margin.

```
n0 = z^2 x p(1 - p) / e^2
```

- `p` is the expected proportion of the outcome. The default is 50%, which maximises
  `p(1 - p)` and so gives the most conservative (largest) sample. Range 2 to 90%.
- `e` is the margin of error, in percentage points. Range 0.5 to 10 pp.

Then DEFF, finite population correction, and the attrition buffer are applied.

**Rare-outcome guard.** The normal approximation behind this formula is unreliable when the
margin is large relative to a small rate (for example a 5 pp margin on a 4% outcome, whose
interval would reach below zero). The tool now warns in two tiers: when the margin meets or
exceeds the rate, the estimate cannot be read and must be tightened; for any rare outcome
below 10%, it advises keeping the margin under about half the rate. This lets teams plan for
genuinely rare but important outcomes (dropout, child marriage, severe malnutrition) without
being handed a misleadingly small number.

### Effect designs: Baseline-endline, Longitudinal cohort, RCT, Quasi-experimental

Sizes a study to detect a minimum effect with the chosen power.

```
n0 (per arm) = 2 x (z + z_b)^2 / MDES^2
```

- `MDES` is the minimum detectable effect in standard-deviation units (Cohen's d). Range
  0.10 to 0.60 SD.
- Randomised (RCT) and quasi-experimental designs use two arms; baseline-endline and
  longitudinal designs use one.

Worked check: at 95% confidence and 80% power, a 0.30 SD effect needs about 175 per arm; a
0.20 SD effect about 393; a 0.50 SD effect about 63. The engine returns 174.4, 392.4, and
62.8 respectively before rounding.

Design-specific adjustments:

- **Baseline-endline panel.** If the same individuals are followed (panel design), the change
  is measured more precisely and the size is multiplied by `(1 - r)`, where r is the
  between-wave correlation (default 0.5). A fresh cross-section each round uses the full size.
- **Longitudinal cohort.** 2 to 5 waves. Attrition compounds:
  `effective loss = 1 - (1 - a)^(waves - 1)`. Panel correlation also applies.
- **Quasi-experimental.** A matched comparison group is less efficient than randomisation,
  so the size is multiplied by an inflation factor (1.00 to 1.50). This is a planning proxy;
  the exact value should be confirmed against the R-squared of the matching or the achieved
  covariate balance.

### Rapid quality check (LQAS)

Lot Quality Assurance Sampling classifies many areas as pass or fail, quickly and cheaply,
rather than estimating any single one precisely.

For each lot, sample `n` respondents and pass the lot if at least `d` are positive. The
decision rule `d` is chosen to minimise the larger of two binomial error probabilities:

- `alpha`: wrongly failing a lot whose true coverage is at the benchmark `pU`.
- `beta`: wrongly passing a lot whose true coverage is at or below a lower threshold
  `pL = pU - 0.25` (floored at 0.05).

```
d* = argmin_d  max( 1 - BinomCDF(d-1; n, pU),  BinomCDF(d-1; n, pL) )
```

Allowed ranges: benchmark 50 to 95%, respondents per lot 12 to 30, number of lots 1 to 600.
Worked check: n = 19 at an 80% benchmark yields a pass rule of d = 14, matching standard
WHO-style LQAS tables.

LQAS gives classification, not a precise per-area estimate. Pool all checks across lots for a
single programme-level coverage estimate.

### Qualitative inquiry

Qualitative work is purposive and is not sized by a power calculation. The plan applies
maximum-variation sampling across stakeholder strata and suggests, per stratum, about 8
in-depth interviews and 3 focus group discussions, with 2 to 3 key informant interviews,
across up to 4 deliberately contrasting districts. Saturation is assessed after each
stratum's first interviews and data collection stops when two consecutive interviews yield no
new themes. These numbers are starting points to be adjusted as themes emerge, not fixed
quotas.

## Standing assumptions and limits

The numbers are defensible planning figures, not exact final-analysis values. Across the
quantitative designs the engine assumes:

- a simple-random or stratified sampling base, with probability proportional to size at the
  school stage;
- the normal approximation to the relevant sampling distribution;
- equal allocation across arms in two-arm designs;
- a two-sided alpha paired with one-sided power, the standard convention;
- outcomes treated as continuous (in SD units) for the effect designs;
- a fixed 25-point grey zone for LQAS, and a single inflation proxy for quasi-experimental
  matching.

These are conventional modelling choices, not errors. Where your situation departs from them
(a binary primary outcome, unequal arms, a known design effect from prior rounds, a different
LQAS grey zone), treat the tool's output as a starting point and adjust with a statistician.

## Verification

Every release is checked before shipping:

1. The script parses cleanly through `new Function(...)` with Node.
2. The file contains zero en or em dashes.
3. Exactly one opening and one closing `<script>` tag.
4. The known statistics above are spot-checked: 175 per arm at 0.30 SD and 80% power; 384 at
   p = 50% and a 5 pp margin; a two-sided p-value of about 0.011 for a 0.40 SD difference at
   n = 80 per group; LQAS n = 19 at 80% giving a pass rule of 14.