Filtering Fake Sale Prices Using Historical Averages

A retailer slaps a “was $89.99, now $44.99” banner on a product that has quietly sold at $46 for six months, and your competitive feed records a 50% markdown that never really happened. Anchor pricing and phantom “sales” are the single most corrosive noise source in a price index: they fire false repricing signals, inflate perceived competitor aggression, and bias every trend model downstream. This guide is a focused, runnable recipe for flagging those fake markdowns by comparing each new observation against a robust historical baseline rather than the retailer’s own inflated reference price. It sits under the parent guide on statistical outlier detection for price data, and assumes the monetary baseline is already clean — currencies aligned per converting multi-currency prices to a base currency and genuine multi-buy offers resolved per parsing complex promotional discount structures, so this stage only has to decide whether a discount is statistically real.

A fake sale is defined here precisely: a price point that violates the historical baseline threshold while lacking corresponding inventory-depletion signals or verified promotional metadata. The job of this stage is to compute that baseline robustly, gate each new reading against it, and never silently drop a row — flagged records are tagged, not deleted.

Prerequisites & Input Contract

Each record must arrive already normalized to a single base currency, tax-resolved, and matched to a canonical sku_id before the filter runs. Tax and shipping stripping is handled upstream by the tax and shipping cost normalization rules stage; if those fees leak into normalized_price, regional checkout variance will inflate the baseline and trigger false flags across multi-region feeds.

# Input contract: one normalized observation per (sku_id, timestamp).
# Sorted ascending by timestamp within each sku_id by the caller, or sorted here.
record = {
    "timestamp": "2026-06-21T00:00:00Z",  # ISO-8601, one snapshot per scrape cycle
    "sku_id": "p-44182",                   # canonical product id (post-matching)
    "normalized_price": 46.00,             # base-currency, tax/shipping stripped
    "promo_flag": False,                   # True if a verified promo was parsed upstream
}

Library versions used throughout: Python 3.11+, pandas>=2.1, and numpy>=1.26. Install with pip install "pandas>=2.1" "numpy>=1.26". Snapshots must be daily: a 90-day exponential window assumes one observation per SKU per day, and hourly feeds inject micro-noise that inflates the deviation estimate. Down-sample intraday captures to a daily close before this stage.

Step-by-Step Implementation

The filter builds two independent baselines per SKU, derives a robust deviation score, and flags a reading only when it is both statistically improbable and directed downward (a fake sale is a phantom discount, not a price rise).

Step 1 — Enforce dtypes and per-SKU ordering

Rolling operations are order-sensitive and memory-hungry. Coerce types once, parse timestamps, and sort within each SKU so every window sees its own product’s history and nothing else.

import pandas as pd
import numpy as np

def _prepare(df: pd.DataFrame) -> pd.DataFrame:
    df = df.copy()
    df["normalized_price"] = df["normalized_price"].astype(np.float32)
    df["timestamp"] = pd.to_datetime(df["timestamp"], utc=True)
    return df.sort_values(["sku_id", "timestamp"]).reset_index(drop=True)

Step 2 — Build a rolling EMA baseline

An exponential moving average weights recent behaviour more heavily while preserving long-term equilibrium, so it smooths weekend flash sales without lagging a genuine sustained drop. Use an explicit alpha (the smoothing factor) and a min_periods warm-up equal to the window — ewm rejects span and alpha together.

def _ema_baseline(grouped, alpha: float, warmup: int) -> pd.Series:
    # Per-SKU exponential mean; min_periods suppresses output until the window fills.
    return grouped.transform(
        lambda x: x.ewm(alpha=alpha, adjust=False, min_periods=warmup).mean()
    )

Step 3 — Build a rolling median + MAD baseline

Fast-moving goods and seasonal electronics rarely follow a Gaussian distribution, so a mean-based spread is easily dragged by a single loss-leader. The median resists that, and the Median Absolute Deviation (MAD) gives a robust scale estimate. The MAD must itself be rolling — a series-wide median(|x − rolling_median|) collapses to one scalar per SKU and erases all temporal sensitivity.

def _robust_baseline(df, grouped, window: int):
    rolling_median = grouped.transform(
        lambda x: x.rolling(window=window, min_periods=1).median()
    )
    abs_dev = (df["normalized_price"] - rolling_median).abs()
    mad = abs_dev.groupby(df["sku_id"]).transform(
        lambda x: x.rolling(window=window, min_periods=1).median()
    )
    return rolling_median, mad

Step 4 — Derive a robust z-score and an IQR band

Scale the MAD by the constant 1.4826 so it approximates a standard deviation for normally distributed data, giving a robust z-score that is far less twitchy than (x − mean) / std. Pair it with a rolling Interquartile Range (IQR) band as an independent second opinion.

def _scores(df, grouped, rolling_median, mad, window: int):
    df["z_score_mad"] = (
        (df["normalized_price"] - rolling_median) / (mad * 1.4826 + 1e-6)
    )
    q1 = grouped.transform(lambda x: x.rolling(window, min_periods=1).quantile(0.25))
    q3 = grouped.transform(lambda x: x.rolling(window, min_periods=1).quantile(0.75))
    df["iqr"] = q3 - q1
    return q1, q3

Step 5 — Flag downward outliers and suppress cold starts

A reading is a fake-sale candidate when it breaches the z-score threshold or the IQR fence, and it sits below the EMA baseline. New SKUs without enough history are forced negative so a sparse window never produces a confident flag.

def compute_price_outlier_flags(
    df: pd.DataFrame,
    ema_window: int = 90,
    ema_alpha: float = 0.03,
    mad_window: int = 60,
    sigma_threshold: float = 2.5,
    iqr_multiplier: float = 1.5,
) -> pd.DataFrame:
    """Flag phantom markdowns against robust per-SKU historical baselines."""
    df = _prepare(df)
    grouped = df.groupby("sku_id")["normalized_price"]

    df["price_ema"] = _ema_baseline(grouped, ema_alpha, ema_window)
    rolling_median, mad = _robust_baseline(df, grouped, mad_window)
    df["mad"] = mad
    q1, q3 = _scores(df, grouped, rolling_median, mad, mad_window)

    sigma_violation = df["z_score_mad"].abs() > sigma_threshold
    iqr_violation = (
        (df["normalized_price"] < (q1 - iqr_multiplier * df["iqr"]))
        | (df["normalized_price"] > (q3 + iqr_multiplier * df["iqr"]))
    )
    df["is_fake_sale"] = (
        (sigma_violation | iqr_violation)
        & (df["normalized_price"] < df["price_ema"])  # only downward phantom drops
        & (~df["promo_flag"])                          # trust verified promos
    )
    # Cold-start guard: too little history -> never flag.
    df.loc[grouped.transform("count") < mad_window, "is_fake_sale"] = False
    return df

Running this against a SKU with a flat ~$46 history and one phantom “$44.99 from $89.99” spike yields is_fake_sale = True on the spike row and False everywhere else, because the paid price barely moved while the displayed reference price did not enter the calculation at all.

Recommended thresholds by category

Category	`ema_window`	`ema_alpha`	`mad_window`	`sigma_threshold`	`iqr_multiplier`
Stable FMCG / grocery	90	`0.03`	60	`2.5`	`1.5`
Seasonal electronics	120	`0.02`	90	`3.0`	`2.0`
Fashion / apparel (genuine clearance)	60	`0.05`	45	`3.5`	`2.5`
High-churn marketplace resellers	45	`0.08`	30	`2.0`	`1.2`

Loosen thresholds for categories with frequent legitimate clearance so real end-of-season markdowns are not misread as fakes; tighten them for stable staples where any large swing is suspect.

Verification & Testing

Validate against a synthetic fixture with a known phantom spike, a genuine sustained drop, and a cold-start SKU. These assertions double as regression guards when you later tune thresholds.

def test_fake_sale_filter():
    rng = pd.date_range("2026-01-01", periods=80, freq="D", tz="UTC")
    rows = []
    rows += [{"timestamp": t, "sku_id": "flat", "normalized_price": 46.0,
              "promo_flag": False} for t in rng]
    rows += [{"timestamp": t, "sku_id": "drop",
              "normalized_price": 46.0 if i < 70 else 30.0,  # real, sustained
              "promo_flag": False} for i, t in enumerate(rng)]
    df = pd.DataFrame(rows)
    # Inject one phantom one-day spike into the flat SKU.
    df.loc[(df.sku_id == "flat") & (df.timestamp == rng[75]), "normalized_price"] = 22.0

    out = compute_price_outlier_flags(df)
    flat = out[out.sku_id == "flat"]
    assert flat["is_fake_sale"].sum() == 1               # the single phantom dip
    assert bool(flat.iloc[75]["is_fake_sale"]) is True

    drop = out[out.sku_id == "drop"]
    # A sustained drop pulls the baseline down and should NOT stay flagged forever.
    assert drop["is_fake_sale"].sum() <= 2
    print("fake-sale filter assertions passed")

Beyond unit tests, sample 100 is_fake_sale = True rows weekly and have an analyst confirm whether each was a genuine phantom markdown. That hit-rate is your live precision and tells you whether sigma_threshold needs to move.

Edge Cases & Gotchas

Genuine sustained markdowns. A real, permanent price cut breaches the baseline on day one, then the rolling window absorbs it and flags clear within a window length. If they persist, add a secondary trend-decay monitor that tracks consecutive EMA slope reversals over a 14-day horizon and exonerates a “drop” the moment the new level holds.
Gradual stair-step discounting. Competitors who lower price incrementally across 3–4 cycles can stay under a 2.5σ gate at every step. Compare the cumulative drop over the window against the baseline, not just the day-over-day delta.
Cold-start SKUs. min_periods=1 keeps the pipeline from emitting NaN, but confidence is low until the window fills. The count < mad_window guard already suppresses flags; never let a 5-observation SKU produce a confident verdict.
Forex contamination. If normalized_price is synced on real-time spot rates, intraday FX volatility looks like a phantom dip. Anchor every value to a daily mid-market snapshot upstream — see converting multi-currency prices to a base currency — before this stage ever runs.

Performance Notes

The exponential and rolling reductions are each O(n) in observations per SKU, and the whole pass is dominated by the groupby().transform() shuffles. On a single worker this comfortably handles low millions of rows. Past roughly 10M rows per day, groupby().transform() on an unchunked frame becomes the bottleneck and risks OOM kills — graduate to polars (whose lazy over expressions fuse these windows) or dask.dataframe for out-of-core execution, and partition by sku_id so each window stays node-local. Keep everything vectorized: never reach for .apply() with a Python callable here, as it bypasses the C extensions and degrades throughput by 10–50× in high-frequency feeds. Validate flags against regulatory baselines such as the FTC Guides Against Deceptive Pricing (16 CFR Part 233) so the is_fake_sale column is defensible, not just statistical.

Frequently Asked Questions

Why compare against a historical baseline instead of the retailer’s own “was” price? Because the “was” price is exactly the value being manipulated. Anchor pricing inflates the reference so a modest discount looks dramatic. The robust rolling baseline reflects what the product actually traded at, which is the only honest comparison point.

Why use MAD and a robust z-score instead of mean and standard deviation? Retail price series are skewed and frequently contaminated by loss leaders. A single extreme value drags both the mean and the standard deviation, masking real anomalies. The median and MAD are resistant to that contamination, so the score stays meaningful even when 10–15% of the window is noise.

Will a genuine permanent price cut be flagged forever? No. It is flagged on the first one or two observations, then the rolling window absorbs the new level and treats it as the baseline. Pair the filter with a short trend-decay check if you want even those initial flags cleared once the new price holds.

Statistical Outlier Detection for Price Data — the parent guide defining the robust estimators and the clean/outlier output contract this recipe plugs into.
Standardizing Unit Pricing Across Marketplaces — ensures the normalized_price you baseline against is comparable across UOM and marketplace.
Parsing Complex Promotional Discount Structures — supplies the verified promo_flag that exonerates legitimate markdowns from the fake-sale gate.
Converting Multi-Currency Prices to a Base Currency — the upstream FX alignment that keeps forex drift from masquerading as a phantom dip.

Filtering Fake Sale Prices Using Historical Averages #

Prerequisites & Input Contract #

Step-by-Step Implementation #

Step 1 — Enforce dtypes and per-SKU ordering #

Step 2 — Build a rolling EMA baseline #

Step 3 — Build a rolling median + MAD baseline #

Step 4 — Derive a robust z-score and an IQR band #

Step 5 — Flag downward outliers and suppress cold starts #

Recommended thresholds by category #

Verification & Testing #

Edge Cases & Gotchas #

Performance Notes #

Frequently Asked Questions #

Related #