Why does a fixed legend break in automated pipelines?

Hardcoded font sizes and column counts only work for controlled, single-purpose maps. When pipelines ingest datasets with variable classification depth, symbology count, or spatial density, fixed legends truncate categories, overlap map frames, or leave excessive white space.

How should I adjust scale factors for 300 DPI print exports?

Multiply base_fontsize by 1.2 before passing it to the scaling function. The default formula is calibrated for 96–150 DPI screen and PDF exports; high-DPI print renders require proportionally larger base sizes to maintain legibility at the physical output size.

Automating Legend Scaling Based on Layer Complexity

Q: What is the safe range for the legend scale factor?

Clamp the multiplier between 0.6 (floor — prevents thin symbology from disappearing) and 1.4 (ceiling — prevents legends from dominating page layouts). Outside this range, typography either becomes unreadable or consumes too much layout space.

This page demonstrates how to compute a per-layer complexity score and use it to drive legend font sizes, symbol dimensions, and column counts at render time — a specific implementation technique within the Dynamic Legend Injection for Variable Datasets workflow. When your Dynamic Map & Data Embedding Workflows pipeline accepts unpredictable input schemas, hardcoded legend geometry breaks; this technique replaces it with a repeatable, bounded algorithm that degrades gracefully across sparse and dense datasets alike.

Prerequisites

Python 3.10+, geopandas>=0.14, matplotlib>=3.8, numpy>=1.26
A rendered matplotlib Axes object and a GeoDataFrame with at least one categorical or graduated column
Familiarity with automated static map generation from GeoJSON — the upstream step that produces the Axes object passed here
pip install geopandas matplotlib numpy

The Scaling Pipeline at a Glance

The diagram below shows the data flow from a raw GeoDataFrame to a sized legend block. Each stage maps to one of the numbered implementation steps.

Step 1 — Design the Composite Complexity Index

Layer complexity cannot be captured by raw feature count alone. A composite metric that accounts for four dimensions gives the scaling function enough signal to make sensible decisions across different data types.

The four inputs are:

Symbology class count — unique categorical values or graduated classification breaks
Geometric density — features per square map unit, normalized with log10 to compress outliers
Label ratio — proportion of features with visible labels, capped at 1.0
Multi-layer stacking — additional weight added when thematic layers overlap (applied as a flat multiplier in the integration step)

The weighted combination keeps the score on a predictable scale:

Text

complexity_score = (class_count × 0.45) + (normalized_density × 0.30) + (label_ratio × 0.25)

When working with graduated symbology, multiply class_count by the number of classification breaks to reflect visual weight accurately. The label_ratio caps at 1.0 to avoid skewing the index when nearly every feature carries a label.

Python

import math
import numpy as np
import geopandas as gpd
import matplotlib.pyplot as plt
from matplotlib.patches import Patch
import logging

logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")


def calculate_complexity(
    gdf: gpd.GeoDataFrame,
    class_col: str,
    map_extent_sq_km: float,
) -> float:
    """Compute a normalized complexity score for legend scaling.

    Returns 0.0 for empty layers or zero-area extents to trigger
    the safe minimum scale floor downstream.
    """
    if gdf.empty or map_extent_sq_km <= 0:
        return 0.0

    class_count = gdf[class_col].nunique()
    feature_density = len(gdf) / max(map_extent_sq_km, 0.001)
    norm_density = min(1.0, math.log10(feature_density + 1) / 3.0)

    # Proportion of features with labels — simplified heuristic
    labeled_ratio = min(1.0, len(gdf) / (len(gdf) + 500))

    score = (class_count * 0.45) + (norm_density * 0.30) + (labeled_ratio * 0.25)
    return max(0.0, score)

Step 2 — Apply Bounded Logarithmic Scaling

Raw complexity scores grow linearly or exponentially, which maps poorly to typography constraints. A log10 transform compresses high values while preserving resolution at the low end. The diagram below illustrates how the clamped curve behaves: scores below roughly 2 produce a sub-1.0 multiplier (shrinking the legend), scores above roughly 15 hit the ceiling.

Hard bounds ensure the multiplier stays within a safe typographic range:

Text

scale_factor = clamp(0.6, 1.4,  1.0 + 0.15 × log10(complexity_score + 1))

The +1 offset prevents log10(0) errors on zero-complexity layers. The 0.6 floor keeps thin symbology visible on high-DPI exports; the 1.4 ceiling prevents legends from dominating page layouts. When applying the same factor to chart-to-PDF synchronization with matplotlib, pass the identical clamped multiplier to chart annotation font sizes so visual hierarchy remains consistent across a multi-element report page.

Python

def derive_scale_factor(complexity_score: float) -> float:
    """Map a complexity score to a bounded multiplier via log10 compression."""
    if complexity_score <= 0:
        return 0.6
    raw = 1.0 + 0.15 * math.log10(complexity_score + 1)
    return min(1.4, max(0.6, raw))

Step 3 — Scale Legend Typography and Layout

Apply the derived factor to three independent legend parameters: font size, marker scale, and column count. Column count uses discrete thresholds rather than a continuous formula — fractional columns do not exist, so a stepwise rule produces cleaner output.

Python

def apply_dynamic_legend(
    ax: plt.Axes,
    gdf: gpd.GeoDataFrame,
    class_col: str,
    base_fontsize: int = 9,
    base_markerscale: float = 1.0,
    map_extent_sq_km: float = 100.0,
) -> plt.Legend:
    """Generate and scale a legend based on layer complexity."""
    complexity = calculate_complexity(gdf, class_col, map_extent_sq_km)
    scale = derive_scale_factor(complexity)

    unique_classes = gdf[class_col].dropna().unique()
    handles = [
        Patch(facecolor=plt.cm.tab20(i % 20), label=str(cls))
        for i, cls in enumerate(unique_classes)
    ]

    scaled_font = max(6, int(base_fontsize * scale))
    scaled_marker = max(0.5, base_markerscale * scale)

    # Discrete column thresholds — fractional columns are not meaningful
    n_cols = 1
    if len(unique_classes) > 8:
        n_cols = 2
    if len(unique_classes) > 15:
        n_cols = 3

    legend = ax.legend(
        handles=handles,
        loc="upper right",
        fontsize=scaled_font,
        markerscale=scaled_marker,
        ncol=n_cols,
        framealpha=0.9,
        title="Legend",
        title_fontsize=scaled_font + 1,
    )

    logging.info(
        "Complexity: %.2f | Scale: %.2f | Font: %dpt | Cols: %d",
        complexity, scale, scaled_font, n_cols,
    )
    return legend

Step 4 — Integrate into the Rendering Loop

Call apply_dynamic_legend after the map axis is populated and before plt.savefig() or your PDF export routine. For batch pipelines that process many layers with identical schemas, cache complexity_score and scale_factor after the first computation rather than recalculating per page.

Python

fig, ax = plt.subplots(figsize=(10, 8))
gdf.plot(column="land_use", ax=ax, legend=False)

apply_dynamic_legend(
    ax=ax,
    gdf=gdf,
    class_col="land_use",
    base_fontsize=9,
    map_extent_sq_km=bbox_area_sq_km,
)

plt.savefig("output_report.pdf", dpi=150, bbox_inches="tight")
plt.close(fig)

DPI adjustment for print exports: The formula is calibrated for 96–150 DPI. For print-ready 300+ DPI exports, multiply base_fontsize by 1.2 before passing it in — physical point sizes at print resolution require larger base values to maintain legibility at the final output dimensions.

Step 5 — Validate Against Page Geometry

After rendering, measure the legend bounding box relative to the page and assert it stays within the target budget. Use pdfplumber for PDF output or check legend.get_window_extent() directly on the Axes for on-screen validation.

Python

import pdfplumber

def validate_legend_height(pdf_path: str, max_fraction: float = 0.15) -> bool:
    """Return True if the legend occupies less than max_fraction of every page height."""
    with pdfplumber.open(pdf_path) as pdf:
        for i, page in enumerate(pdf.pages):
            # Approximate: check all rects taller than 10pt and narrower than half the page
            rects = [r for r in page.rects if r["height"] > 10 and r["width"] < page.width / 2]
            for r in rects:
                if r["height"] / page.height > max_fraction:
                    print(f"Page {i+1}: legend rect {r['height']:.1f}pt exceeds {max_fraction*100:.0f}% of {page.height:.1f}pt page")
                    return False
    return True

assert validate_legend_height("output_report.pdf"), "Legend overflows page budget — reduce log coefficient"

If the assertion fails, lower the logarithmic coefficient from 0.15 to 0.12, or reduce the upper clamp from 1.4 to 1.25. Rerun the pipeline and re-validate.

Key Parameters / Configuration Reference

Parameter	Type	Default	Effect
`base_fontsize`	`int`	`9`	Baseline legend font size in points; multiplied by `scale_factor`
`base_markerscale`	`float`	`1.0`	Baseline marker size multiplier for `ax.legend(markerscale=…)`
`map_extent_sq_km`	`float`	`100.0`	Map area used for density normalization; inaccurate values skew density
Log coefficient	`float`	`0.15`	Controls how steeply complexity raises the scale factor; reduce to `0.12` for conservative layouts
Scale floor	`float`	`0.6`	Minimum multiplier; prevents symbology from becoming invisible
Scale ceiling	`float`	`1.4`	Maximum multiplier; prevents legend from crowding map content
DPI multiplier	`float`	`1.0` (× `1.2` at 300 DPI)	Pre-scales `base_fontsize` for high-resolution print exports

Common Pitfalls

Inaccurate map_extent_sq_km inflating density. Passing the full CRS bounding box of a world-scale dataset while the rendered map shows only a city block produces a near-zero density figure and under-scales the legend. Compute extent from the clipped, projected rendering envelope, not the raw data bounds.
Ignoring NaN classes in nunique(). GeoDataFrame[col].nunique() drops NaN by default, which is correct for class count. However, if your rendering code treats NaN as a visible “Unknown” class, add 1 to class_count manually to include it in the complexity score.
Applying scale factor to both base_fontsize and title_fontsize independently. The implementation sets title_fontsize = scaled_font + 1 (a fixed offset). If you override title_fontsize with base_fontsize * scale * 1.1, you may create a disproportionately large title at high complexity values. Keep the +1 offset pattern.
Forgetting plt.close(fig) in batch loops. Each unclosed figure leaks memory. In pipelines that generate hundreds of pages — common when working with table pagination strategies for large attribute tables — unclosed figures exhaust system memory within minutes. Always call plt.close(fig) after savefig.

Verification

Run the following inline assertions during development and as a CI smoke test before deploying the pipeline:

Python

import geopandas as gpd
import numpy as np
from shapely.geometry import Point

# Build a synthetic GeoDataFrame with 20 classes
rng = np.random.default_rng(42)
gdf_test = gpd.GeoDataFrame(
    {"category": [f"class_{i % 20}" for i in range(200)]},
    geometry=[Point(rng.uniform(0, 1), rng.uniform(0, 1)) for _ in range(200)],
    crs="EPSG:4326",
)

score = calculate_complexity(gdf_test, "category", map_extent_sq_km=50.0)
factor = derive_scale_factor(score)

assert score > 0, "Non-empty layer must produce a positive complexity score"
assert 0.6 <= factor <= 1.4, f"Scale factor {factor} outside safe range"

# Empty layer must return floor
empty_gdf = gpd.GeoDataFrame({"category": []}, geometry=[], crs="EPSG:4326")
assert calculate_complexity(empty_gdf, "category", 50.0) == 0.0
assert derive_scale_factor(0.0) == 0.6

print("All legend scaling assertions passed.")

A passing run confirms the complexity function returns positive values for real data, the log transform clamps correctly, and the empty-layer guard triggers the minimum scale floor.

Dynamic Legend Injection for Variable Datasets — parent guide covering the full legend injection pipeline
Automated Static Map Generation from GeoJSON — upstream step that produces the rendered Axes object consumed here
Chart-to-PDF Sync with Matplotlib — apply the same scale factor to chart annotations for consistent visual hierarchy
Table Pagination Strategies for Large Attribute Tables — companion technique for managing layout budgets when attribute data co-exists with scaled legends