Outputs

The pipeline produces three main outputs. Here's what each contains and how to use it.

Output 1: Ranked Products (Full Detail)

A comprehensive list of products with complete transparency about their scores.

What's Included

For each product:

Field	Description
`product_id`	Unique identifier
`url`	Sephora product page
`love_score`	Final score (0-1)
`priority_strategy`	How it was ranked (love, volume, etc.)

Score Components:

Field	Description
`organic_quality`	0-1
`engagement_quality`	0-1
`authenticity`	0-1
`diversity`	0-1
`trend`	0-1

Raw Metrics:

Field	Description
`review_count`	Total reviews
`avg_rating`	Overall average
`organic_avg_rating`	Unpaid reviewers only
`organic_ratio`	% genuine reviews
`pct_negative`	% 1-2 star reviews

Adjustments:

Field	Description
`triggered_adjustments`	Which penalties/boosts fired
`score_explanation`	Human-readable summary

Example Entry

{
  "product_id": "P12345",
  "url": "https://sephora.com/product/...",
  "love_score": 0.82,
  "score_components": {
    "organic_quality": 0.85,
    "engagement_quality": 0.72,
    "authenticity": 0.88,
    "diversity": 0.65,
    "trend": 0.58
  },
  "triggered_adjustments": ["power_user_boost"],
  "score_explanation": "Strong organic quality (0.85), boosted by power user endorsement (+4%)"
}

Use Case

When you want to understand why a product ranks where it does. Full transparency for analysis.

Output 2: Products Needing Details (Tiered List)

A prioritized list for the next phase: scraping product pages for ingredients, prices, and descriptions.

What's Included

Field	Description
`product_id`	Unique identifier
`url`	Sephora product page
`priority_score`	0-100 points
`priority_tier`	High / Medium / Low
`review_count`	Total reviews
`avg_rating`	Overall average

The Tiers

Tier	Score	Count	Action
High	70+	~2,000	Scrape immediately
Medium	40-69	~4,000	Scrape eventually
Low	15-39	~3,000	Skip for now
Skip	<15	~1,000	Ignore

Use Case

Feed this to the product page scraper to get:

Current prices
Full ingredient lists
Product descriptions
How-to-use instructions
Size/volume information

💡Why Separate Lists?

The ranked products list is for analysis. The tiered list is for action — it tells the scraper what to prioritize next.

Output 3: Analytics Reports

Periodic reports that summarize the dataset at a macro level.

Report Contents

Executive Summary

Total reviews, products, users
Date range covered
Key statistics

Rating Distribution

5-star: 64.3%
4-star: 18.7%
3-star: 7.5%
2-star: 4.2%
1-star: 5.3%

Demographic Breakdown

Reviews by skin type
Reviews by skin tone
Reviews by age (where available)

Engagement Insights

Average helpful votes
Photo inclusion rate
Substantive review percentage

Top Products

Highest Love Score
Most reviewed
Best organic rating

Polarizing Products

High variance + many negatives
Love-it-or-hate-it patterns

Temporal Trends

Reviews by month
Reviews by year
Seasonal patterns

Formats

Reports are generated in two formats:

Markdown — Human-readable, good for documentation
JSON — Machine-readable, good for dashboards

Use Case

Understanding the dataset at a high level. Spotting trends. Generating presentations.

File Locations

Output	Location
Ranked Products	`data/products/products_to_scrape.jsonl`
Tiered Products	`data/products/products_for_details.jsonl`
Analytics Report	`analysis/reports/sephora_analytics_report_{timestamp}.md`
Analytics Data	`analysis/reports/sephora_analytics_data_{timestamp}.json`

What's Next?

Now you know what the pipeline produces. Next, learn how to run it.

Next: How to Run → — The execution sequence.

4. Product Ranking How to Run