Aldi UK Scraped Data Case Study: Reducing Overstock Losses

// 01 — The Problem

Why does Aldi specifically have an overstock problem?

Aldi UK operates on a high-velocity, lean SKU model — roughly 1,900 core lines compared to Tesco’s 40,000+. That sounds like it should make inventory easy. But there’s a catch: Aldi’s legendary Specialbuys (WIGIG — When It’s Gone, It’s Gone) aisle introduces 30–50 new non-food items every Thursday and Sunday. These have no sales history, no demand curve, and no reorder trigger.

When a Specialbuy misses its sales target — because the weather turned, a competitor ran a better deal, or the item just didn’t resonate — it doesn’t get repriced quietly. It sits at full price until Aldi decides to mark it down, by which point the opportunity cost of shelf space and working capital has already compounded.

⚠

The core tension: Aldi’s EDLP model means it rarely runs mid-cycle promotions. Unlike Tesco or Sainsbury’s which use Clubcard deals to clear slow stock, Aldi has fewer levers to pull. By the time a markdown decision is made, write-downs are often 40–60% of original cost.

PROBLEM_01

No Sell-Through Visibility

Aldi doesn’t publish real-time stock levels. Buyers use weekly sales reports with 5–7 day lag — too slow for Specialbuys with a 2-week shelf life.

−£1.1M/yr

PROBLEM_02

Demand Forecast Error

Seasonal and novelty items routinely miss forecast by ±35%. No external price benchmarking means buyers can’t tell if it’s a price issue or a demand issue.

−£1.8M/yr

PROBLEM_03

Competitor Blind Spots

When Lidl runs a parallel Specialbuy at 15% less, Aldi’s version stalls — but nobody realises it until inventory review. Weeks of missed repricing window.

−£1.3M/yr

// 02 — The Data Pipeline

What does the Aldi UK scrape actually collect?

Aldi UK’s website (aldi.co.uk) exposes product listings, category pages, and the Specialbuys calendar — all accessible without authentication. A well-structured scraper can extract a rich dataset at two cadences: weekly core grocery prices and bi-weekly Specialbuy launches.

🌐

Source

aldi.co.uk product pages + Specialbuys calendar. Category sitemaps for full crawl.

HTML / JSON-LD

🕷️

Extract

Playwright/BeautifulSoup scraper. Captures name, price, SKU, availability, category, image alt text.

Python · Weekly

🔧

Transform

Normalise categories, deduplicate SKUs, compute week-over-week price delta and availability flags.

Pandas / dbt

🗄️

Store

Time-series table in PostgreSQL. Each row = one SKU snapshot per week. ~100K rows/year.

Postgres · S3

📊

Signal

Overstock risk score computed per SKU: velocity rank + price stasis + Specialbuy age + competitor gap.

Risk Model

Here’s what a single Aldi UK scraped product record looks like in the database:

// Scraped product record — aldi.co.uk · Week 14, 2025 { “sku_id”: “ALU-GB-SB-091234”, “product_name”: “Gardenline 6-Piece Rattan Set”, “category”: “Specialbuys / Garden”, “price_gbp”: 299.99, “price_prev_week”: 299.99, “price_delta_pct”: 0.0, “availability”: “in_stock”, “availability_flag”: “persistent”, // still in stock 3+ weeks “specialbuy_launch”: “2025-03-27”, “weeks_since_launch”: 3, “competitor_price”: 259.00, // Lidl equivalent, same week “comp_gap_pct”: +15.7, “overstock_risk_score”: 87, // 0–100, HIGH above 70 “recommended_action”: “MARKDOWN_NOW” }

// 03 — Data Schema

Every field in the scraped dataset — and why it matters for overstock

Not all scraped fields are equal. Some exist purely for identification, others are the actual signal that tells you whether something is turning into dead stock. Here’s the full annotated schema.

// 04 — Live Dataset Sample

Scraped Aldi UK SKUs — Overstock Risk Dashboard

Below is a sample of 22 SKUs from a live scrape — a mix of core grocery lines and Specialbuys. The Risk Score column is the composite overstock signal. Filter by category and sort any column. Products scoring above 70 are flagged for immediate review.

SHOWING 22 / 22 RECORDS

Product	Category	Price	Wks Live	Comp Gap	Stock Level	Risk Score ↕	Action

// Risk Score = (weeks_live × 8) + (comp_gap × 1.2) + (stock_pct × 0.6) · capped 0–100 · scraped data illustrative

// 05 — Sales Velocity

Which products are moving — and which are stalling?

Velocity is units sold per week, inferred from availability changes across scrape snapshots. A product that was “in_stock” week 1 and shows “low_stock” by week 3 has high velocity. One that stays “in_stock” for 5+ weeks without a price change is your overstock candidate.

// ESTIMATED WEEKLY VELOCITY — TOP 10 MONITORED SKUs

Fast mover (>80/wk)

Moderate (30–80)

Slow (<30/wk)

📡

How velocity is inferred from scraped data: Aldi doesn’t publish unit sales. But availability status (in_stock → low_stock → sold_out) gives a directional signal. Combined with store count estimates and category seasonality coefficients, you can model relative velocity within ±20% — good enough for a markdown trigger.

// 06 — Loss Breakdown

Where the £4.2M in overstock losses comes from

Overstock losses aren’t just the unsold stock. They include the opportunity cost of tied-up working capital, the cost of markdown decisions made too late, and logistic disposal costs. The breakdown below shows estimated annual losses per category — and how much a data-driven approach could recover.

// TOTAL ANNUAL OVERSTOCK COST

£4,200,000

// RECOVERABLE VIA SCRAPE SIGNALS

£1,180,000

✓

The recovery mechanism is straightforward: When the scraper flags a Specialbuy as “persistent stock” in week 3 with a competitor gap above 10%, an automated alert triggers a buyer review. Moving the markdown decision from week 6 to week 3 doubles the sell-through window at a reduced price — turning a 55% write-down into a 25% markdown.

// 07 — Before vs After

How the scraping workflow changes the buying decision process

The difference isn’t in having better buyers — it’s in giving buyers the right data at the right time. Here’s what the Specialbuy management workflow looks like with and without scraped price intelligence.

Without Scrape Data

Flying Blind

🕐 Buyer reviews inventory weekly — 5–7 day lag on sales data from ERP system

❓ No competitor price visibility — Lidl running £40 lower is invisible until damage is done

📅 Markdown decision made at week 6–8 when stock is visibly stuck — too late

📉 Typical write-down: 50–60% of cost price. Disposal to clearance channels.

🔄 Buyer manually checks 30–50 new Specialbuy lines per week — cognitive overload

VS

With Scrape Data

Data-Driven Buying

⚡ Automated weekly scrape supplements ERP with real-time availability signals and price stasis flags

🎯 Competitor gap tracked automatically — buyer alerted when Lidl/B&Q diverges by >10%

📢 Markdown recommendation triggered at week 3 when risk score crosses 70 threshold

💰 Typical write-down: 20–25%. Sells through at reduced price before clearance needed.

🤖 Risk dashboard auto-prioritises top 10 highest-risk SKUs — buyer reviews in 20 mins/week

// 08 — ROI Calculator

Model the impact for your own retail operation

The numbers above are calibrated to Aldi UK’s scale. Plug in your own figures to estimate what a scraping-based overstock signal could save your operation.

// OVERSTOCK LOSS REDUCTION ESTIMATOR

Annual Revenue (£)

Overstock Loss Rate (%)

% of SKUs with Scrape Signal

Annual Overstock Cost

—

Recoverable via Signals

—

Scrape Infra Cost/yr

—

Net ROI

—

// 09 — Limitations & Caveats

What this approach cannot tell you

Scraped data is genuinely powerful — but it’s not a magic number. Anyone building a production overstock model on this needs to be honest about its gaps.

⚙

Availability ≠ units sold. Scraped availability flags (in_stock / low_stock) are a proxy signal, not actual sell-through data. A product can stay “in_stock” because Aldi replenished it from a regional depot — the signal would look like slow movement even if the item is selling fine. Always cross-reference with internal EPOS data when the signal fires.

🌍

Scraping is a signal layer, not a source of truth. The scraped dataset reflects Aldi’s online product listing — which may not perfectly match in-store availability. Regional store differences, online-only exclusives and distribution delays all create noise. Treat risk scores as prioritisation cues, not hard decisions.

⚖

Legal & ToS considerations: Scraping aldi.co.uk for internal business intelligence sits in a grey area. Publicly accessible pricing data is generally permissible in UK law, but bulk automated requests may violate Aldi’s ToS. Always consult legal counsel, rate-limit requests, and explore whether Aldi’s data partnerships or price comparison feeds offer the same data through a compliant channel.

Reducing Overstock LossesWith Scraped Aldi UK Data