Dynamic Legend Injection for Variable Datasets

Q: How do I prevent legend text overflow in multi-column layouts at print resolution?

Compute label widths with matplotlib's font_manager before layout calculation. If the longest label exceeds (column_width - 2 * pad) at 300 DPI, either truncate with an ellipsis and add a tooltip annotation, reduce font size proportionally (floor at 7pt for WCAG contrast compliance), or widen the legend frame. Never let the renderer clip silently — verify with fig.canvas.get_renderer() and Text.get_window_extent().

Static cartographic templates degrade silently when dataset distributions shift: hardcoded class boundaries produce empty legend entries, colour ramps map to non-existent ranges, and oversized legend frames overlap critical map features. This workflow closes that gap by programmatically generating, positioning, and formatting legends that match the exact symbology, class breaks, and attribute structure present at runtime — part of the broader Dynamic Map & Data Embedding Workflows automation surface.

Prerequisites

Python 3.10+
Libraries: geopandas>=0.13, matplotlib>=3.7, shapely>=2.0, pandas>=2.0, mapclassify>=2.8, reportlab>=4.0
Data inputs: Validated GeoJSON, Shapefile, or GeoPackage with a consistent attribute schema
Font assets: TTF/OTF fonts installed system-wide or embedded in the rendering environment
CRS standardisation: All input layers share a common projected CRS before symbology calculation

Bash

pip install geopandas matplotlib shapely pandas mapclassify reportlab

Assumed prior knowledge: Automated Static Map Generation from GeoJSON covers CRS normalisation and the geopandas plot pipeline that feeds this legend workflow.

Pipeline Architecture

The diagram below shows end-to-end data flow from raw spatial input to PDF output, highlighting where dynamic legend state is computed and injected.

Step-by-Step Implementation

Step 1 — Schema Validation & Data Ingestion

Decouple ingestion from rendering so the same loader can feed multiple reporting modules. Validate that the target attribute column is present before any classification is attempted; failing early produces actionable error messages rather than cryptic renderer exceptions downstream.

Python

import geopandas as gpd
import logging
from typing import Optional

logger = logging.getLogger(__name__)

def load_and_validate_gdf(
    filepath: str,
    target_column: str,
    target_crs: int = 3857,
) -> gpd.GeoDataFrame:
    """Ingest spatial data, validate schema, and standardise CRS.

    Args:
        filepath: Path to GeoJSON, Shapefile, or GeoPackage.
        target_column: Attribute column used for classification and symbology.
        target_crs: EPSG code for output CRS. Defaults to Web Mercator (3857).

    Returns:
        A GeoDataFrame with null geometries removed and CRS normalised.
    """
    try:
        gdf = gpd.read_file(filepath)
    except Exception as exc:
        logger.error("Failed to load spatial file %s: %s", filepath, exc)
        raise

    if target_column not in gdf.columns:
        raise ValueError(
            f"Required column '{target_column}' missing from schema. "
            f"Available columns: {list(gdf.columns)}"
        )

    gdf = gdf.dropna(subset=["geometry"])

    if gdf.crs is None:
        raise ValueError("Input dataset has no CRS defined.")
    if gdf.crs.to_epsg() != target_crs:
        gdf = gdf.to_crs(epsg=target_crs)

    logger.info("Loaded %d features, CRS: EPSG:%d", len(gdf), target_crs)
    return gdf

Step 2 — Classification & Break Calculation

Apply mapclassify to determine legend structure at runtime. The function must handle edge cases where data variance is too low for meaningful breaks — a dataset with only two unique values cannot produce five Jenks classes without error.

Python

import mapclassify
import numpy as np
import pandas as pd
from typing import Tuple

def calculate_breaks(
    series: pd.Series,
    scheme: str = "quantiles",
    n_classes: int = 5,
) -> Tuple[list[float], list[str]]:
    """Compute classification breaks and generate human-readable labels.

    Args:
        series: Numeric attribute column from the GeoDataFrame.
        scheme: One of 'quantiles', 'natural_breaks', 'equal_interval'.
        n_classes: Target number of legend classes.

    Returns:
        A tuple of (breaks, labels). Both are empty lists if series is empty
        or has insufficient variance.
    """
    valid = series.dropna()
    if valid.empty:
        logger.warning("Empty series — returning zero-class legend.")
        return [], []

    # Clamp n_classes to the number of unique values to prevent classifier errors
    unique_count = valid.nunique()
    k = min(n_classes, unique_count)
    if k < 2:
        single_val = float(valid.iloc[0])
        return [single_val], [f"{single_val:.2f}"]

    scheme_map = {
        "quantiles": mapclassify.Quantiles,
        "natural_breaks": mapclassify.NaturalBreaks,
        "equal_interval": mapclassify.EqualInterval,
    }
    classifier_cls = scheme_map.get(scheme, mapclassify.Quantiles)
    classifier = classifier_cls(valid, k=k)

    breaks: list[float] = classifier.bins.tolist()
    lower_bounds = [float(valid.min())] + breaks[:-1]
    labels = [f"{lo:.2f} – {hi:.2f}" for lo, hi in zip(lower_bounds, breaks)]
    return breaks, labels

NaturalBreaks (Jenks) is the correct choice for environmental data with skewed distributions (elevation, pollution concentration). Quantiles produces visually balanced choropleth legends but can misrepresent outlier magnitude. Document the scheme in version-controlled report configs so classification decisions are auditable.

Step 3 — Symbology Mapping & Handle Generation

Construct matplotlib legend handles explicitly rather than relying on the library’s auto-detection, which silently fails when features are filtered or sparse. Explicit construction guarantees a 1-to-1 correspondence between breaks and rendered patches.

Python

import matplotlib.pyplot as plt
from matplotlib.patches import Patch
from matplotlib.colors import ListedColormap

def generate_legend_handles(
    breaks: list[float],
    labels: list[str],
    cmap_name: str = "viridis",
) -> Tuple[ListedColormap, list[Patch]]:
    """Create explicit matplotlib legend handles for dynamic injection.

    Args:
        breaks: Class boundaries from calculate_breaks().
        labels: Human-readable labels aligned with breaks.
        cmap_name: A matplotlib colormap name.

    Returns:
        A (ListedColormap, list[Patch]) tuple ready for ax.legend(handles=...).
    """
    n = len(breaks)
    if n == 0:
        return ListedColormap(["#cccccc"]), []

    cmap = plt.get_cmap(cmap_name, n)
    handles = [
        Patch(facecolor=cmap(i / max(n - 1, 1)), label=labels[i])
        for i in range(n)
    ]
    return cmap, handles

For categorical (nominal) datasets, swap ListedColormap for a qualitative palette such as "Set2" or "tab10". When the dataset drives table pagination across multi-page reports, pass the same colour assignments to both the map legend and the table header row to maintain visual consistency.

Step 4 — Dynamic Layout Calculation

Compute legend dimensions from item count and estimated label width. If item count exceeds max_items_per_col, switch to a multi-column layout rather than stretching a single column off-canvas. This logic also gates automatic scaling — if columns would exceed two, the pipeline should invoke Automating Legend Scaling Based on Layer Complexity for font and patch size reduction.

Python

def calculate_legend_layout(
    handles: list[Patch],
    max_items_per_col: int = 8,
) -> dict:
    """Determine column count and legend anchor for ax.legend().

    Args:
        handles: Legend patch handles from generate_legend_handles().
        max_items_per_col: Maximum items before adding a second column.

    Returns:
        A dict of keyword arguments passable directly to ax.legend().
    """
    n_items = len(handles)
    n_cols = max(1, (n_items + max_items_per_col - 1) // max_items_per_col)

    return {
        "ncol": n_cols,
        "bbox_to_anchor": (1.02, 0.5),
        "loc": "center left",
        "borderaxespad": 0.1,
        "fancybox": True,
        "facecolor": "white",
        "edgecolor": "#aaaaaa",
        "framealpha": 0.92,
    }

Step 5 — Report Assembly & PDF Export

Inject the computed handles and layout parameters into the figure canvas. Export via matplotlib’s native PDF backend for full vector fidelity — never rasterise at this stage, as rasterised legends produce fuzzy text at print resolution. For multi-page documents, coordinate page margins with the document architecture layout rules for spatial reports before committing to a figsize.

Python

import matplotlib
matplotlib.use("Agg")  # headless — must be called before pyplot import
import matplotlib.pyplot as plt
import geopandas as gpd

def render_map_with_dynamic_legend(
    gdf: gpd.GeoDataFrame,
    target_column: str,
    breaks: list[float],
    labels: list[str],
    handles: list[Patch],
    layout: dict,
    output_path: str,
) -> None:
    """Render spatial data with injected legend and export to PDF.

    Args:
        gdf: Validated, CRS-normalised GeoDataFrame.
        target_column: Column used for choropleth colouring.
        breaks: Classification boundaries.
        labels: Legend class labels aligned with breaks.
        handles: Matplotlib Patch objects for the legend.
        layout: Keyword args dict from calculate_legend_layout().
        output_path: Destination path for the PDF file.
    """
    cmap = plt.get_cmap("viridis", len(breaks))
    fig, ax = plt.subplots(figsize=(8, 6))

    gdf.plot(
        column=target_column,
        cmap=cmap,
        ax=ax,
        legend=False,
        edgecolor="black",
        linewidth=0.4,
    )

    ax.legend(
        handles=handles,
        title=target_column.replace("_", " ").title(),
        **layout,
    )
    ax.set_axis_off()
    fig.tight_layout()

    fig.savefig(output_path, format="pdf", dpi=300, bbox_inches="tight")
    plt.close(fig)
    logger.info("Legend-injected map written to %s", output_path)

Production-Ready Script

The script below wires all five stages into a single callable entry point with full logging, error handling, and configurable parameters. Drop it into a CI pipeline or cron job without modification.

Python

#!/usr/bin/env python3
"""dynamic_legend_pipeline.py — production legend injection script.

Usage:
    python dynamic_legend_pipeline.py \
        --input regions.geojson \
        --column population_density \
        --scheme natural_breaks \
        --classes 6 \
        --cmap plasma \
        --output report_map.pdf
"""

from __future__ import annotations
import argparse
import logging
import sys

import matplotlib
matplotlib.use("Agg")

import geopandas as gpd
import matplotlib.pyplot as plt
from matplotlib.patches import Patch
from matplotlib.colors import ListedColormap
import mapclassify
import pandas as pd
from typing import Tuple

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s %(levelname)s %(name)s — %(message)s",
)
logger = logging.getLogger("dynamic_legend_pipeline")


# ── Ingestion ──────────────────────────────────────────────────────────────────

def load_and_validate_gdf(
    filepath: str, target_column: str, target_crs: int = 3857
) -> gpd.GeoDataFrame:
    gdf = gpd.read_file(filepath)
    if target_column not in gdf.columns:
        raise ValueError(f"Column '{target_column}' not found in {list(gdf.columns)}")
    gdf = gdf.dropna(subset=["geometry"])
    if gdf.crs is None:
        raise ValueError("Dataset has no CRS.")
    if gdf.crs.to_epsg() != target_crs:
        gdf = gdf.to_crs(epsg=target_crs)
    return gdf


# ── Classification ─────────────────────────────────────────────────────────────

def calculate_breaks(
    series: pd.Series, scheme: str = "quantiles", n_classes: int = 5
) -> Tuple[list[float], list[str]]:
    valid = series.dropna()
    if valid.empty:
        return [], []
    k = min(n_classes, valid.nunique())
    if k < 2:
        v = float(valid.iloc[0])
        return [v], [f"{v:.2f}"]
    scheme_map = {
        "quantiles": mapclassify.Quantiles,
        "natural_breaks": mapclassify.NaturalBreaks,
        "equal_interval": mapclassify.EqualInterval,
    }
    classifier = scheme_map.get(scheme, mapclassify.Quantiles)(valid, k=k)
    breaks = classifier.bins.tolist()
    lowers = [float(valid.min())] + breaks[:-1]
    labels = [f"{lo:.2f} – {hi:.2f}" for lo, hi in zip(lowers, breaks)]
    return breaks, labels


# ── Handles & Layout ───────────────────────────────────────────────────────────

def generate_legend_handles(
    breaks: list[float], labels: list[str], cmap_name: str = "viridis"
) -> Tuple[ListedColormap, list[Patch]]:
    n = len(breaks)
    if n == 0:
        return ListedColormap(["#cccccc"]), []
    cmap = plt.get_cmap(cmap_name, n)
    handles = [
        Patch(facecolor=cmap(i / max(n - 1, 1)), label=labels[i])
        for i in range(n)
    ]
    return cmap, handles


def calculate_legend_layout(handles: list[Patch], max_per_col: int = 8) -> dict:
    n_cols = max(1, (len(handles) + max_per_col - 1) // max_per_col)
    return {
        "ncol": n_cols,
        "bbox_to_anchor": (1.02, 0.5),
        "loc": "center left",
        "borderaxespad": 0.1,
        "fancybox": True,
        "facecolor": "white",
        "edgecolor": "#aaaaaa",
        "framealpha": 0.92,
    }


# ── Render & Export ────────────────────────────────────────────────────────────

def render_map_with_dynamic_legend(
    gdf: gpd.GeoDataFrame,
    target_column: str,
    breaks: list[float],
    handles: list[Patch],
    layout: dict,
    output_path: str,
    cmap_name: str = "viridis",
) -> None:
    cmap = plt.get_cmap(cmap_name, len(breaks))
    fig, ax = plt.subplots(figsize=(8, 6))
    gdf.plot(column=target_column, cmap=cmap, ax=ax, legend=False,
             edgecolor="black", linewidth=0.4)
    ax.legend(
        handles=handles,
        title=target_column.replace("_", " ").title(),
        **layout,
    )
    ax.set_axis_off()
    fig.tight_layout()
    fig.savefig(output_path, format="pdf", dpi=300, bbox_inches="tight")
    plt.close(fig)
    logger.info("Written: %s", output_path)


# ── CLI entrypoint ─────────────────────────────────────────────────────────────

def main() -> None:
    parser = argparse.ArgumentParser(description="Dynamic legend injection pipeline")
    parser.add_argument("--input", required=True)
    parser.add_argument("--column", required=True)
    parser.add_argument("--scheme", default="quantiles",
                        choices=["quantiles", "natural_breaks", "equal_interval"])
    parser.add_argument("--classes", type=int, default=5)
    parser.add_argument("--cmap", default="viridis")
    parser.add_argument("--output", default="output.pdf")
    args = parser.parse_args()

    try:
        gdf = load_and_validate_gdf(args.input, args.column)
        breaks, labels = calculate_breaks(gdf[args.column], args.scheme, args.classes)
        if not breaks:
            logger.error("No valid breaks produced — check input data.")
            sys.exit(1)
        _, handles = generate_legend_handles(breaks, labels, args.cmap)
        layout = calculate_legend_layout(handles)
        render_map_with_dynamic_legend(
            gdf, args.column, breaks, handles, layout, args.output, args.cmap
        )
    except Exception as exc:
        logger.exception("Pipeline failed: %s", exc)
        sys.exit(1)


if __name__ == "__main__":
    main()

Edge Cases & Advanced Configuration

Null & Outlier Handling

Classification algorithms raise ValueError when encountering NaN, inf, or extreme outliers that collapse class variance. Pre-filter with pd.to_numeric(..., errors="coerce"), then apply Winsorization before break calculation:

Python

from scipy.stats import mstats

def winsorise(series: pd.Series, limits: tuple[float, float] = (0.02, 0.02)) -> pd.Series:
    """Clip values at the specified lower and upper percentile limits."""
    clean = pd.to_numeric(series, errors="coerce").dropna()
    return pd.Series(mstats.winsorize(clean, limits=limits), index=clean.index)

Memory Management for Batch Processing

matplotlib retains figure references across savefig calls unless explicitly released. In batch jobs generating hundreds of maps, memory accumulates and triggers OS-level OOM kills. Always pair every plt.subplots() call with plt.close(fig):

Python

# Correct pattern for batch processing
for config in batch_configs:
    fig, ax = plt.subplots(figsize=(8, 6))
    # ... render ...
    fig.savefig(config["output_path"], format="pdf", dpi=300, bbox_inches="tight")
    plt.close(fig)  # critical — releases Agg backend memory

Multi-Format Outputs (PNG + PDF)

When the same legend must appear in both screen exports (PNG at 96 DPI) and print-ready PDFs (300 DPI), generate two separate fig.savefig calls rather than resampling. Legend text rendered at 96 DPI and scaled up to 300 DPI produces pixelated output incompatible with the print-ready page sizing standards for GIS reports workflow.

Headless CI/CD Environments

Docker containers typically have no display server. Set matplotlib.use("Agg") before any pyplot import — not after — or matplotlib binds to whatever backend it detects at import time:

Python

# At the TOP of every script file, before any other matplotlib import
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt

Add MPLBACKEND=Agg to your Docker environment file as a belt-and-suspenders safeguard:

DOCKERFILE

ENV MPLBACKEND=Agg

CRS-Aware Legend Positioning

Legends positioned with bbox_to_anchor in axis-relative coordinates can misalign when map projections distort aspect ratios (notably polar projections and conic projections at high latitudes). Convert to figure-relative coordinates when exporting to fixed-layout A4/Letter templates:

Python

# Figure-relative anchor avoids CRS-induced aspect ratio distortion
legend = ax.legend(handles=handles, **layout)
legend.set_bbox_to_anchor(
    ax.transAxes.transform((1.02, 0.5)),
    transform=fig.transFigure
)

Troubleshooting

Symptom	Likely Cause	Resolution
`ValueError: k must be less than or equal to the number of unique values`	`n_classes` exceeds unique value count in column	Clamp `k = min(n_classes, series.nunique())` before calling the classifier
Legend patches render but labels are empty strings	`labels` list length does not match `breaks` length	Verify `zip(lowers, breaks)` in `calculate_breaks` — check off-by-one in `lower_bounds` construction
PDF legend text is blurry	Figure rasterised at 72 DPI before `savefig`	Set `dpi=300` in `savefig`; ensure `format="pdf"` is explicit (not inferred from extension)
Legend overflows into map body	`bbox_to_anchor` anchor outside axes bounds	Use `loc="lower right"` with `bbox_to_anchor=None` for in-axes placement when sidebar space is unavailable
`OOM killed` during batch run	Figure objects not released	Add `plt.close(fig)` after every `savefig`; use `matplotlib.use("Agg")` to avoid display server overhead
Colours do not match between map and table	Different cmap instances per module	Instantiate `cmap` once in `generate_legend_handles` and pass the same object to both rendering functions

Frequently Asked Questions

Why do static legend templates fail when dataset distributions change?

Static templates hardcode class boundaries and item counts. When data distributions shift — new sensor readings, updated cadastral boundaries, demographic revisions — the hardcoded breaks produce either empty classes or unlabelled overflow items. Dynamic injection recalculates breaks and legend geometry at runtime, so the output always matches the actual data.

Which classification scheme should I use for continuous environmental data?

Natural Breaks (Jenks) minimises within-class variance and is the default for environmental datasets with uneven distributions (e.g., rainfall, elevation). Use Quantiles when you need equal feature counts per class for choropleth readability. Use Equal Interval only for data with uniform distributions or when communicating absolute magnitude differences matters more than visual balance.

How do I prevent legend text overflow in multi-column layouts at print resolution?

Compute label widths with matplotlib’s font_manager before layout calculation. If the longest label exceeds (column_width - 2 * pad) at 300 DPI, either truncate with an ellipsis and add a tooltip annotation, reduce font size proportionally (floor at 7pt for WCAG contrast compliance), or widen the legend frame. Never let the renderer clip silently — verify with fig.canvas.get_renderer() and Text.get_window_extent().

Detailed Guides in This Section

Automating Legend Scaling Based on Layer Complexity — dynamically adjusts patch size, font size, and column count when layer feature counts or class breadth exceed fixed layout thresholds.

Automated Static Map Generation from GeoJSON — establishes the GeoJSON ingestion and CRS normalisation pipeline that feeds this legend workflow.
Table Pagination Strategies for Large Attribute Tables — coordinates pagination logic when legends accompany multi-page attribute tables in the same document.
Chart-to-PDF Sync with Matplotlib — covers figure-canvas coordination between matplotlib chart outputs and ReportLab PDF canvases, relevant when embedding legend-annotated maps alongside charts.
Conditional Rendering for Missing Spatial Data — handles Jinja2-level fallback logic when the attribute column driving legend generation is absent from a template context.
Dynamic Map & Data Embedding Workflows ↑ parent section

Dynamic legend injection transforms static cartographic templates into self-adjusting reporting engines that maintain cartographic integrity regardless of input variability. Combined with the document architecture layout rules for spatial reports for page geometry and the CI/CD hardening patterns in this section, the pipeline scales to thousands of automated map generations without manual intervention.

Explore this section