Outputs

Outputs

The pipeline produces three main outputs. Here's what each contains and how to use it.


Output 1: Ranked Products (Full Detail)

A comprehensive list of products with complete transparency about their scores.

What's Included

For each product:

FieldDescription
product_idUnique identifier
urlSephora product page
love_scoreFinal score (0-1)
priority_strategyHow it was ranked (love, volume, etc.)

Score Components:

FieldDescription
organic_quality0-1
engagement_quality0-1
authenticity0-1
diversity0-1
trend0-1

Raw Metrics:

FieldDescription
review_countTotal reviews
avg_ratingOverall average
organic_avg_ratingUnpaid reviewers only
organic_ratio% genuine reviews
pct_negative% 1-2 star reviews

Adjustments:

FieldDescription
triggered_adjustmentsWhich penalties/boosts fired
score_explanationHuman-readable summary

Example Entry

{
  "product_id": "P12345",
  "url": "https://sephora.com/product/...",
  "love_score": 0.82,
  "score_components": {
    "organic_quality": 0.85,
    "engagement_quality": 0.72,
    "authenticity": 0.88,
    "diversity": 0.65,
    "trend": 0.58
  },
  "triggered_adjustments": ["power_user_boost"],
  "score_explanation": "Strong organic quality (0.85), boosted by power user endorsement (+4%)"
}

Use Case

When you want to understand why a product ranks where it does. Full transparency for analysis.


Output 2: Products Needing Details (Tiered List)

A prioritized list for the next phase: scraping product pages for ingredients, prices, and descriptions.

What's Included

FieldDescription
product_idUnique identifier
urlSephora product page
priority_score0-100 points
priority_tierHigh / Medium / Low
review_countTotal reviews
avg_ratingOverall average

The Tiers

TierScoreCountAction
High70+~2,000Scrape immediately
Medium40-69~4,000Scrape eventually
Low15-39~3,000Skip for now
Skip<15~1,000Ignore

Use Case

Feed this to the product page scraper to get:

  • Current prices
  • Full ingredient lists
  • Product descriptions
  • How-to-use instructions
  • Size/volume information
💡Why Separate Lists?

The ranked products list is for analysis. The tiered list is for action — it tells the scraper what to prioritize next.


Output 3: Analytics Reports

Periodic reports that summarize the dataset at a macro level.

Report Contents

Executive Summary

  • Total reviews, products, users
  • Date range covered
  • Key statistics

Rating Distribution

  • 5-star: 64.3%
  • 4-star: 18.7%
  • 3-star: 7.5%
  • 2-star: 4.2%
  • 1-star: 5.3%

Demographic Breakdown

  • Reviews by skin type
  • Reviews by skin tone
  • Reviews by age (where available)

Engagement Insights

  • Average helpful votes
  • Photo inclusion rate
  • Substantive review percentage

Top Products

  • Highest Love Score
  • Most reviewed
  • Best organic rating

Polarizing Products

  • High variance + many negatives
  • Love-it-or-hate-it patterns

Temporal Trends

  • Reviews by month
  • Reviews by year
  • Seasonal patterns

Formats

Reports are generated in two formats:

  • Markdown — Human-readable, good for documentation
  • JSON — Machine-readable, good for dashboards

Use Case

Understanding the dataset at a high level. Spotting trends. Generating presentations.


File Locations

OutputLocation
Ranked Productsdata/products/products_to_scrape.jsonl
Tiered Productsdata/products/products_for_details.jsonl
Analytics Reportanalysis/reports/sephora_analytics_report_{timestamp}.md
Analytics Dataanalysis/reports/sephora_analytics_data_{timestamp}.json

What's Next?

Now you know what the pipeline produces. Next, learn how to run it.

Next: How to Run → — The execution sequence.