"Too Many PGs" / "Too Few PGs" Fixer

Seeing HEALTH_WARN too_many_pgs or suspect your pools are under-sized? Enter your current pg_num and topology to get the diagnosis, the corrected pg_num, and the safe change procedure.

HEALTH_WARN Fix-It

mon_max_pg_per_osd

pg_autoscale_mode

Nautilus+ pgp_num

Free · No Login

Current Pool State

Current pg_num // this pool

OSD Count // total

Pools Sharing OSDs // total count

Protection

Target PGs/OSD // Ceph recommends 100–200

mon_max_pg_per_osd // varies by release, default ~250

Ceph Release // affects merge support

Pre-Nautilus (<14.x)

Nautilus+ (14.x+)

Warning text you're seeing // optional

Fix Rules

Target:~100 PGs/OSD; healthy 50–200

Warn below:50 PGs/OSD

Warn above:250 PGs/OSD (mon_max default)

RAM cost:~10 MB / 100 PGs / OSD

pgp_num:auto-tracks pg_num on Nautilus+

Merging:decrease pg_num — Nautilus+ only

Autoscaler:may override manual pg_num values

Documentation

Health Checks — too_many_pgs ↗
Official explanation and remediation
Placement Groups ↗
pg_num/pgp_num semantics, autoscaler
PG Calculator →
Full multi-pool sizing from scratch

Diagnosis & Fix

enter your current pg_num and topology
and click DIAGNOSE & FIX
to see whether it's too many, too few, or fine

Why Ceph Warns About PG Count

Every placement group costs memory on every OSD that hosts a copy of it — roughly 10MB of RAM per 100 PGs per OSD in most releases. A cluster with 50 OSDs holding 10,000 total PG-copies burns hundreds of megabytes just tracking PG metadata, before any actual data caching. Too many PGs also slows peering and recovery, since each PG is its own unit of recovery work with its own bookkeeping overhead.

Too few PGs has the opposite problem: data and load distribute unevenly across OSDs, some OSDs become I/O hotspots, and recovery after a failure concentrates onto fewer OSDs than it should — making recovery slower and the cluster less resilient during that window.

The diagnosis bands

Below ~50 PGs/OSD

too_few_pgs territory. Increase pg_num on the affected pool(s) — Ceph will create new PGs and begin migrating data to use them. This is the same direction of change as growing a cluster's pool sizing from scratch.

50–200 PGs/OSD

The healthy band. 100 is the documented sweet spot for most clusters; up to 200 is reasonable for higher-performance or larger-pool-count setups.

Above 250 PGs/OSD

too_many_pgs territory — and above mon_max_pg_per_osd (default ~250, varies by release) Ceph raises HEALTH_WARN outright. Reduce pg_num (Nautilus+ only) or rebalance PGs across more pools/OSDs.

The 25% rounding rule

The official Ceph pgcalc tool rounds pg_num to the nearest power of two for CRUSH's mapping algorithm — but if that nearest power of two undershoots the raw target by more than 25%, it rounds up to the next power of two instead, to avoid landing meaningfully under the target.

The Safe Change Procedure

Increasing pg_num on Nautilus and later automatically increases pgp_num to match — data migration begins immediately and proportionally to the size of the change. Always change pg_num in stages on a production cluster (e.g. roughly double at a time) rather than jumping straight to the final value, and watch ceph -s return to HEALTH_OK between steps.

Decreasing pg_num — "PG merging" — requires Nautilus (14.x) or later. On older releases, pg_num cannot be reduced at all; the only way to fix an over-provisioned pre-Nautilus pool is to create a new pool with the correct pg_num and migrate data into it.

The pg_autoscale_mode module (on by default since Nautilus in many distros) can override your manual pg_num changes if it disagrees with your target. Check its state with ceph osd pool autoscale-status before manually tuning pg_num — if autoscale_mode is "on", your manual ceph osd pool set pg_num may get reverted.

Frequently Asked Questions

What exactly does HEALTH_WARN too_many_pgs mean?

It means the average PGs per OSD across the cluster has crossed mon_max_pg_per_osd (default ~250, but check your release — it has changed over time). Find the offending pools with ceph osd pool ls detail and reduce pg_num on the largest contributors, or raise mon_max_pg_per_osd temporarily while you migrate to a cleaner pool layout (not a long-term fix, just breathing room).

Can I just raise mon_max_pg_per_osd instead of fixing pg_num?

You can, and it silences the warning, but it doesn't address the underlying RAM/recovery-time cost of running with too many PGs per OSD — it just stops Ceph from telling you about it. Treat raising the limit as a temporary pressure release while you correct pg_num, not a permanent fix.

How long does a pg_num change take to complete?

It depends on cluster size, data volume, and how aggressive your backfill/recovery throttle settings are. Watch ceph -s for the "active+remapped+backfilling" PG states to clear, and ceph pg stat for a quick summary. Large changes on busy production clusters can take hours; always test the procedure on a maintenance window if possible.

Should I just turn on the autoscaler and stop thinking about this?

For most clusters, yes — ceph osd pool set <pool> pg_autoscale_mode on lets Ceph manage pg_num automatically based on actual stored data and OSD count, and it has gotten reliable since its introduction in Nautilus. Large or unusually shaped clusters (heavy multi-tenant, very uneven pool sizes) sometimes still benefit from manual tuning — use this tool to sanity-check what the autoscaler should be landing on.