Iterating through shapefile attributes in ReportLab
Iterating through shapefile attributes in ReportLab requires a two-stage pipeline: first, parse the .shp/.dbf vector data using a dedicated GIS library; second, feed the cleaned, Python-native records into ReportLab’s Table or LongTable flowables. ReportLab’s PDF engine does not read geospatial formats natively, so all attribute iteration, type coercion, and encoding normalization must occur in memory before the data reaches the layout stage.
Environment & Dependency Matrix
| Component | Minimum Version | Critical Notes |
|---|---|---|
| Python | 3.8+ | Predictable zip() behavior, modern type hints, and stable pathlib integration. |
| ReportLab | 3.6.12+ | Table pagination is memory-safe; earlier builds leak on datasets >5k rows. |
pyshp |
2.3.0+ | Lightweight .dbf parser. Explicit encoding parameter prevents silent character corruption. |
geopandas / fiona |
1.9.0+ / 0.14+ | Use only when CRS projection, geometry validation, or spatial joins precede table generation. |
Step-by-Step Data Pipeline
- Open & Parse: Load the shapefile with explicit encoding. The
.dbfattribute table is accessed via the library’s record iterator. - Extract Headers: Skip the mandatory
DeletionFlagfield (index 0) and collect remaining column names. - Sanitize Records: Convert
Noneto empty strings, strip whitespace, and force UTF-8 compliance. ReportLab’s text engine rejects raw bytes or legacycp1252sequences. - Map to Layout: Pass the header row and sanitized records to
reportlab.platypus.Table. ApplyTableStylefor grid lines, alternating row colors, and header formatting. For enterprise-scale exports, consider Loop Mapping for Dynamic Attribute Tables to decouple data iteration from presentation rules. - Build PDF Flow: Attach the table to a
SimpleDocTemplateorDocTemplateflow. ReportLab automatically handles page breaks, header repetition (repeatRows=1), and margin constraints.
Complete Implementation
import os
import shapefile
from reportlab.lib.pagesizes import A4
from reportlab.lib import colors
from reportlab.lib.units import mm
from reportlab.platypus import (
SimpleDocTemplate, Table, TableStyle, Paragraph, Spacer
)
from reportlab.lib.styles import getSampleStyleSheet
def safe_str(val: object) -> str:
"""Convert GIS record values to UTF-8-safe strings."""
if val is None:
return ""
# Handle bytes from legacy DBF encodings
if isinstance(val, bytes):
return val.decode("utf-8", errors="replace").strip()
return str(val).strip()
def build_shapefile_report(shp_path: str, output_pdf: str, max_rows: int = 5000) -> None:
# 1. Load shapefile with explicit encoding
with shapefile.Reader(shp_path, encoding="cp1252") as sf:
fields = [f[0] for f in sf.fields[1:]] # Skip DeletionFlag
# 2. Iterate records & sanitize
table_data = [fields] # Header row
for i, record in enumerate(sf.iterRecords()):
if i >= max_rows:
break
table_data.append([safe_str(v) for v in record])
# 3. Configure ReportLab table
# Dynamic column widths based on header length (fallback to 70mm)
col_widths = [min(len(f) * 4 + 20, 120) for f in fields]
table = Table(table_data, colWidths=col_widths, repeatRows=1)
# 4. Apply styling
table.setStyle(TableStyle([
("BACKGROUND", (0, 0), (-1, 0), colors.HexColor("#2C3E50")),
("TEXTCOLOR", (0, 0), (-1, 0), colors.white),
("ALIGN", (0, 0), (-1, 0), "CENTER"),
("FONTNAME", (0, 0), (-1, 0), "Helvetica-Bold"),
("FONTSIZE", (0, 0), (-1, -1), 9),
("BOTTOMPADDING", (0, 0), (-1, 0), 6),
("TOPPADDING", (0, 0), (-1, 0), 6),
("BACKGROUND", (0, 1), (-1, -1), colors.HexColor("#F8F9FA")),
("GRID", (0, 0), (-1, -1), 0.5, colors.grey),
("VALIGN", (0, 0), (-1, -1), "MIDDLE"),
("ROWBACKGROUNDS", (0, 1), (-1, -1), [colors.white, colors.HexColor("#F1F3F5")]),
]))
# 5. Build PDF document
doc = SimpleDocTemplate(
output_pdf,
pagesize=A4,
rightMargin=15*mm,
leftMargin=15*mm,
topMargin=15*mm,
bottomMargin=15*mm,
)
styles = getSampleStyleSheet()
elements = [
Paragraph("Shapefile Attribute Report", styles["Title"]),
Spacer(1*mm, 6*mm),
table
]
doc.build(elements)
# Usage: build_shapefile_report("data/municipalities.shp", "output/report.pdf")
Critical Considerations
Encoding & Character Safety
Shapefile .dbf tables historically default to cp1252 or latin1. When pyshp reads these files, it may return bytes objects or silently corrupt diacritics. Always decode during iteration using errors="replace" or errors="ignore". ReportLab’s PDF renderer expects clean Unicode strings; passing raw bytes triggers UnicodeEncodeError during cell rendering. For advanced typography or CJK character support, embed a TrueType font via reportlab.pdfbase.ttfonts and register it before table construction.
Memory & Pagination Limits
ReportLab’s Table holds all row data in memory. For datasets exceeding 10,000 records, switch to reportlab.platypus.LongTable or implement chunked iteration. LongTable streams rows to the PDF generator, reducing peak RAM usage by ~60%. Pair this with generator-based record parsing to avoid loading the entire .dbf into a list. When separating data extraction from layout generation, adopt Jinja2 Templating & Theme Logic patterns to keep styling rules externalized and reusable across multiple GIS exports.
Performance Optimization
- Pre-calculate column widths: Measure header string lengths once rather than recalculating per row.
- Disable unnecessary features: Turn off
splitByRow=Falseif your data contains no multi-line cells; it speeds up layout calculations. - Use
pyshpovergeopandasfor pure attribute extraction:geopandasloads full geometries into memory, adding overhead when only.dbffields are required. See the official ReportLab Tables documentation for advanced styling hooks, and consult the pyshp repository for encoding flags and iterator optimizations.