Iterating through shapefile attributes in ReportLab

Iterating through shapefile attributes in ReportLab requires a two-stage pipeline: first, parse the .shp/.dbf vector data using a dedicated GIS library; second, feed the cleaned, Python-native records into ReportLab’s Table or LongTable flowables. ReportLab’s PDF engine does not read geospatial formats natively, so all attribute iteration, type coercion, and encoding normalization must occur in memory before the data reaches the layout stage.

Environment & Dependency Matrix

Component Minimum Version Critical Notes
Python 3.8+ Predictable zip() behavior, modern type hints, and stable pathlib integration.
ReportLab 3.6.12+ Table pagination is memory-safe; earlier builds leak on datasets >5k rows.
pyshp 2.3.0+ Lightweight .dbf parser. Explicit encoding parameter prevents silent character corruption.
geopandas / fiona 1.9.0+ / 0.14+ Use only when CRS projection, geometry validation, or spatial joins precede table generation.

Step-by-Step Data Pipeline

  1. Open & Parse: Load the shapefile with explicit encoding. The .dbf attribute table is accessed via the library’s record iterator.
  2. Extract Headers: Skip the mandatory DeletionFlag field (index 0) and collect remaining column names.
  3. Sanitize Records: Convert None to empty strings, strip whitespace, and force UTF-8 compliance. ReportLab’s text engine rejects raw bytes or legacy cp1252 sequences.
  4. Map to Layout: Pass the header row and sanitized records to reportlab.platypus.Table. Apply TableStyle for grid lines, alternating row colors, and header formatting. For enterprise-scale exports, consider Loop Mapping for Dynamic Attribute Tables to decouple data iteration from presentation rules.
  5. Build PDF Flow: Attach the table to a SimpleDocTemplate or DocTemplate flow. ReportLab automatically handles page breaks, header repetition (repeatRows=1), and margin constraints.

Complete Implementation

Python
import os
import shapefile
from reportlab.lib.pagesizes import A4
from reportlab.lib import colors
from reportlab.lib.units import mm
from reportlab.platypus import (
    SimpleDocTemplate, Table, TableStyle, Paragraph, Spacer
)
from reportlab.lib.styles import getSampleStyleSheet

def safe_str(val: object) -> str:
    """Convert GIS record values to UTF-8-safe strings."""
    if val is None:
        return ""
    # Handle bytes from legacy DBF encodings
    if isinstance(val, bytes):
        return val.decode("utf-8", errors="replace").strip()
    return str(val).strip()

def build_shapefile_report(shp_path: str, output_pdf: str, max_rows: int = 5000) -> None:
    # 1. Load shapefile with explicit encoding
    with shapefile.Reader(shp_path, encoding="cp1252") as sf:
        fields = [f[0] for f in sf.fields[1:]]  # Skip DeletionFlag
        
        # 2. Iterate records & sanitize
        table_data = [fields]  # Header row
        for i, record in enumerate(sf.iterRecords()):
            if i >= max_rows:
                break
            table_data.append([safe_str(v) for v in record])

    # 3. Configure ReportLab table
    # Dynamic column widths based on header length (fallback to 70mm)
    col_widths = [min(len(f) * 4 + 20, 120) for f in fields]
    table = Table(table_data, colWidths=col_widths, repeatRows=1)
    
    # 4. Apply styling
    table.setStyle(TableStyle([
        ("BACKGROUND", (0, 0), (-1, 0), colors.HexColor("#2C3E50")),
        ("TEXTCOLOR", (0, 0), (-1, 0), colors.white),
        ("ALIGN", (0, 0), (-1, 0), "CENTER"),
        ("FONTNAME", (0, 0), (-1, 0), "Helvetica-Bold"),
        ("FONTSIZE", (0, 0), (-1, -1), 9),
        ("BOTTOMPADDING", (0, 0), (-1, 0), 6),
        ("TOPPADDING", (0, 0), (-1, 0), 6),
        ("BACKGROUND", (0, 1), (-1, -1), colors.HexColor("#F8F9FA")),
        ("GRID", (0, 0), (-1, -1), 0.5, colors.grey),
        ("VALIGN", (0, 0), (-1, -1), "MIDDLE"),
        ("ROWBACKGROUNDS", (0, 1), (-1, -1), [colors.white, colors.HexColor("#F1F3F5")]),
    ]))

    # 5. Build PDF document
    doc = SimpleDocTemplate(
        output_pdf,
        pagesize=A4,
        rightMargin=15*mm,
        leftMargin=15*mm,
        topMargin=15*mm,
        bottomMargin=15*mm,
    )
    
    styles = getSampleStyleSheet()
    elements = [
        Paragraph("Shapefile Attribute Report", styles["Title"]),
        Spacer(1*mm, 6*mm),
        table
    ]
    doc.build(elements)

# Usage: build_shapefile_report("data/municipalities.shp", "output/report.pdf")

Critical Considerations

Encoding & Character Safety

Shapefile .dbf tables historically default to cp1252 or latin1. When pyshp reads these files, it may return bytes objects or silently corrupt diacritics. Always decode during iteration using errors="replace" or errors="ignore". ReportLab’s PDF renderer expects clean Unicode strings; passing raw bytes triggers UnicodeEncodeError during cell rendering. For advanced typography or CJK character support, embed a TrueType font via reportlab.pdfbase.ttfonts and register it before table construction.

Memory & Pagination Limits

ReportLab’s Table holds all row data in memory. For datasets exceeding 10,000 records, switch to reportlab.platypus.LongTable or implement chunked iteration. LongTable streams rows to the PDF generator, reducing peak RAM usage by ~60%. Pair this with generator-based record parsing to avoid loading the entire .dbf into a list. When separating data extraction from layout generation, adopt Jinja2 Templating & Theme Logic patterns to keep styling rules externalized and reusable across multiple GIS exports.

Performance Optimization

  • Pre-calculate column widths: Measure header string lengths once rather than recalculating per row.
  • Disable unnecessary features: Turn off splitByRow=False if your data contains no multi-line cells; it speeds up layout calculations.
  • Use pyshp over geopandas for pure attribute extraction: geopandas loads full geometries into memory, adding overhead when only .dbf fields are required. See the official ReportLab Tables documentation for advanced styling hooks, and consult the pyshp repository for encoding flags and iterator optimizations.