CRUSH Failure-Domain & Placement Helper

Host or rack? Enter your topology and protection scheme to check whether CRUSH can actually place your pool's data safely, plus the right min_size and the CLI to create the rule.

Host vs Rack

min_size

firstn / indep

Topology Check

Free · No Login

Topology & Protection

Protection

Replicated size=2

Replicated size=3

Replicated size=4

Erasure Coded

Desired Failure Domain

osd

host

chassis

rack

row

datacenter

Hosts

Racks

OSDs / Host

Device Class

CRUSH Rules

Replicated needs:≥ size distinct domains

EC needs:≥ k+m distinct domains

Recommend:+1 spare domain for recovery

min_size (rep):size − 1

min_size (EC):k + 1

Below min_size:I/O halts (by design)

CRUSH mode:firstn (rep), indep (EC)

Documentation

CRUSH Maps ↗
Failure domains, buckets, rule types
Managing Pools ↗
Setting crush_rule, size, min_size
EC Profile Planner →
Pick k+m before checking domain count

Topology Check Results

configure your topology on the left
and click CHECK TOPOLOGY
to verify it safely supports your protection scheme

The CRUSH Failure-Domain Hierarchy

Ceph's CRUSH map is a tree: osd → host → chassis → rack → row → room → datacenter (root at the top). When you set a pool's failure domain to "host," CRUSH guarantees no two copies (or EC chunks) of the same PG land on the same host — but says nothing about whether they land in the same rack. Choosing a higher level in the hierarchy protects against a bigger blast radius (a whole rack losing power) at the cost of requiring more distinct domains at that level to satisfy your protection scheme.

Most single-rack or small clusters use host as the failure domain, since that's the most granular level above the OSD itself and matches the most common real failure mode (a server dying). Multi-rack deployments with redundant power/networking per rack can justify rack as the failure domain — but only if there are enough racks to satisfy the protection scheme.

Minimum domains by protection scheme

Scheme	Minimum Domains	Recommended	min_size
Replicated size=2	2	3	1
Replicated size=3	3	4	2
Replicated size=4	4	5	3
EC 4+2	6	7	5
EC 8+3	11	12	9

Scheme

Minimum Domains

Recommended

min_size

Replicated size=2

Replicated size=3

Replicated size=4

EC 4+2

EC 8+3

"Minimum" is the bare floor CRUSH needs to place data at all. "Recommended" adds one spare domain so the cluster can recover after losing a single domain without going degraded indefinitely — see the Usable Capacity calculator for how this same +1 reserve logic affects usable space.

min_size — Why I/O Halts Instead of Risking Data

min_size is the minimum number of copies (replicated) or chunks (EC) that must be available for a PG to serve I/O at all. For replication it's size − 1; for erasure coding it's k + 1. If the available copies/chunks drop below min_size — say, two simultaneous host failures on a size=3 pool with min_size=2 — Ceph stops serving I/O on the affected PGs entirely rather than risk writes that can't be reliably protected. This looks alarming (the cluster appears to "freeze" for affected pools) but it's the safer failure mode than silently continuing without redundancy.

Frequently Asked Questions

Why not just always use rack as the failure domain for safety?

Because it requires more physical domains to satisfy the same protection scheme. A single-rack cluster literally cannot use rack as a meaningful failure domain — there's only one rack, so CRUSH has nowhere else to place additional copies/chunks. Match the failure domain level to how many of that unit you actually have, not aspirationally to the safest-sounding option.

What's the difference between firstn and indep CRUSH algorithms?

firstn is used for replicated pools — if a chosen OSD becomes unavailable, CRUSH tries the next one in a deterministic sequence, which works fine because all replicas are interchangeable. indep is used for erasure-coded pools, where each chunk position (1st data chunk, 2nd parity chunk, etc.) is meaningful — indep mode replaces a failed position independently without reshuffling the other positions, which firstn would do and which would corrupt EC chunk ordering.

How do I verify my CRUSH map actually has the domains I think it does?

Run ceph osd crush tree --show-shadow to see the full hierarchy including the per-device-class shadow trees, or ceph osd tree for a simpler host/OSD view. If you're mixing device classes (hdd/ssd/nvme) on the same hosts, make sure your crush rule specifies the device class — otherwise CRUSH may place PGs on the wrong tier.

Can I change a pool's failure domain after creation?

Yes — failure domain lives in the CRUSH rule, not the pool itself. Create a new crush rule at the desired domain level and apply it with ceph osd pool set <pool> crush_rule <new-rule>. This triggers a full data rebalance as PGs move to satisfy the new placement rule, so treat it like any other major topology change — stage it and monitor recovery.