Data Gap Report

Jessar Industries · 2026-05-03 · Unified product database · Asset coverage index · Phase 1 content extraction pipeline (1,667 primary SKUs)
Total unique SKUs
3,211
Across all sources
Bilingual EN+FR
2,659
Names in both languages
Box artwork PDFs
2,595
Provided by client
Studio photos
2,808
Quasimodo product shots
Have UPC
2,506
Barcode populated

Field coverage across all SKUs

FieldHaveMissingCoverage
Studio photo (Quasimodo)2,808403
87.4%
Product category2,661550
82.9%
EN name2,660551
82.8%
FR name2,660551
82.8%
Product subcategory2,623588
81.7%
Box artwork PDF2,595616
80.8%
UPC barcode2,506705
78.0%
Short description2,1411,070
66.7%
Site categories2,1411,070
66.7%
Product images (current site)2,1381,073
66.6%
Appears in a catalog PDF1,7151,496
53.4%
WP full description03,211
0.0%

Per-SKU completeness score distribution

Score bandSKUsDistribution
80–1001,611
50.2%
60–793
0.1%
40–591,053
32.8%
20–39521
16.2%
0–1923
0.7%

Migration action breakdown

ActionSKUsMeaning
preserve1,100Live on jessar.ca AND active per masters → migrate as-is
add567Active per masters but missing from current site → must add
cleanup81Live on jessar.ca but discontinued per masters → remove before migration
investigate366Live on jessar.ca but not in any master → status unclear, investigate
skip1,097Not live and not active, OR excluded — out of scope

Content Extraction Pipeline

Phase 1 output — 1,667 primary SKUs · box artwork PDFs + WooCommerce specs + Excel merge · 2026-05-03

Primary SKUs processed
1,667
Entire primary tier
PDFs extracted
1,557
110 have no PDF on drive
Avg fields populated
8.3/17
Across all primary SKUs
Fully populated
0
No SKU hits all 17 fields*
Weight sourced from Excel
969
Items-Jessar for Sean.xlsx

Extracted field coverage — 1,667 primary SKUs

Fields marked ⚑ category-specific have lower nominal coverage because they only apply to a subset of products — see notes below the table.

FieldHaveMissingCoverageStatus / Source
Product name (EN)1,556111
93.3%
From PDF + DB
Product name (FR)1,556111
93.3%
From PDF + DB
Dimensions1,57097
94.2%
PDF text + Excel (W×L×D)
Weight1,005662
60.3%
Excel poid-stoc (kg) — gaps may lack data in any source
Features (EN)1,030637
61.8%
Bullet points from PDF
Features (FR)988679
59.3%
Bullet points from PDF
Country of origin911756
54.6%
From PDF text
Materials898769
53.9%
From PDF text
Romance copy (EN)822845
49.3%
→ Generate for 845 SKUs
Romance copy (FR)765902
45.9%
→ Generate for 902 SKUs
Warranty4801,187
28.8%
From PDF text
Voltage 6091,058
36.5%
76% within electrical/lighting (678 SKUs)
Wattage 5031,164
30.2%
60.5% within electrical/lighting (678 SKUs)
Capacity3371,330
20.2%
Only for applicable products (cookware, storage)
Certifications 2341,433
14.0%
→ Logos in PDF images — needs vision pass
Care instructions (EN) 2061,461
12.4%
33% within KITCH (608 SKUs) — N/A for lighting
Care instructions (FR) 1811,486
10.9%
33% within KITCH — N/A for lighting/electrical

* "Fully populated" at 17/17 would require certifications and care on every product, including bulbs and fixtures where those fields are structurally inapplicable. Adjusted completeness (excluding N/A fields per category) is tracked separately in content_gaps.csv.

Extraction score distribution — 1,667 primary SKUs · out of 17 possible fields

Score bandSKUsDistribution
80–99 (14–17 fields)24
1.4%
60–79 (10–13 fields)475
28.5%
40–59 (7–10 fields)657
39.4%
20–39 (3–6 fields)383
23.0%
0–19 (0–3 fields)128
7.7%

The 128 SKUs at 0–19 are predominantly no-PDF records (no box artwork provided). The 383 at 20–39 typically have name + dimensions + weight from Excel, but no PDF text content.