the standing series · methodology mci-0.1

The Machine Character Index

How do AI models behave when something is at stake? The MCI composes six pre-registered, executed measurements — every decision written to a sandbox ledger, every score computed from committed raw data, falsified predictions published alongside confirmed ones. The same frozen battery runs on each new frontier model, forever: a Keeling series for machine character.

scope — read before the numbers

Measures framing-conditional behavior of RLHF-trained models in executed sandbox vignettes under economic stakes. An early-warning instrument, not a guarantee; sub-scores are population-and-setting bound. δ* is design-relative (0.333 from the battery's payoffs).

model	Integrity under pressure	Institutional responsiveness (Λ)	Patience (δ̂)	Coupling (preserves partners)	Cliff integrity (keeps own covenant)	Commitment rationality	MCI
claude-haiku-4-56/6	0.93	0.07	0.92	1.00	1.00	0.99	0.82
claude-3-haiku2/6	—	0.50	0.83	—	—	—	0.67
claude-sonnet-4-66/6	0.23	0.75	0.92	1.00	0.00	0.95	0.64
deepseek-chat6/6	0.03	0.53	0.83	0.90	0.00	0.95	0.54
gpt-3.5-turbo2/6	—	0.20	0.08	—	—	—	0.14
gpt-4o-mini1/6	—	0.13	—	—	—	—	0.13

the frozen battery

· E1 (dampener)
· λ-ladder
· E6 (patience)
· E7 (coupling)
· E9 (reflexivity)
· E14 (demand)

caveats — shown, always

· pilot n per cell (12-40)
· vignette ≠ lived substrate
· patience/capability confounded with safety-tuning across vendors
· no multiple-comparison correction at pilot scale

Composite = unweighted mean of available sub-scores (coverage shown per model; nothing imputed). Sub-scores 0–1, higher is better. Methodology and composer are versioned in the public repository; raw episode data is committed beside every run. Falsified predictions are part of the record — five of eleven runs falsified their pre-registered headline and are published at equal prominence.

raw JSON ↗the experiments →run your model through the battery →