Pipeline lineage

Data Sources and Lineage

Every number in an Imprint report traces back to a specific source through a documented path. Data flows from collection through analysis and scoring to final rendering. Here is the full pipeline.

01Collection

Public data gathered from three primary sources. YouTube Data API provides channel statistics, video metadata, and upload history. Web scraping captures video transcripts, channel links, and business email. DataForSEO supplies niche search volume and keyword data.

Data sources

YouTube Data API
Official API
Channel statistics, video metadata, and upload history.
ToS-compliant, free tier
Web Scraping
Public web data
Video transcripts, channel links, and business email.
Public pages only, no authentication bypass
DataForSEO
Licensed API
Niche search volume and keyword data.
Commercial license

Key outputs

subscriberCount
videoCount
averageRecentViews
video transcripts
uploadCadence
channel email
niche keywords

02Analysis

Anthropic Claude Sonnet analyzes scraped content against a structured analytical framework. Produces 23 structured fields per creator including purchaseContext (8 sub-fields), riskFactors, partnershipProfile, audiencePersonas, competitiveProducts, and willingnessToPay. Each analysis costs approximately $2 per creator.

Data sources

Anthropic Claude Sonnet
LLM analysis
Structured content analysis against an analytical framework.
Commercial API license

Key outputs

purchaseContext
riskFactors
partnershipProfile
structuredDemographics
competitiveProducts
painPoints
willingnessToPay
imprintScore

03Scoring

The fromEnrichedAnalysis adapter normalizes raw analysis into a standardized ScoringCreator shape. Six-axis formulas compute niche-level scores from the normalized data. Validation gates (SP1) reject data that fails schema or plausibility checks before it enters scoring. Regression baselines (SP2) catch any formula drift across commits.

Data sources

SP1 Validation Gates
Internal guardrail
Schema and plausibility checks that reject bad data before scoring.
N/A
SP2 Regression Baselines
Internal guardrail
Golden baselines and property tests that catch formula drift.
N/A

Key outputs

6 axis scores
composite Imprint Score
niche catalog entry

04Rendering

Niche catalog compiled from scored creators. Five report generators produce HTML: State-of-Niche (free SEO), Niche Intelligence ($499), Brand-Creator Match ($1,499), Creator Due Diligence ($4,999), Audience Behavior. Magazine companion generated from shared data loader.

Data sources

Niche Catalog
Internal data store
Compiled catalog of scored creators grouped by niche.
N/A
Report Generators
Internal renderer
Five HTML generators plus magazine companion renderer.
N/A

Key outputs

HTML reports
magazine exports
SEO pages
Stripe-gated delivery

Traceability

Every stat traced to its source.

Each row below maps a report metric to the data source it originates from, the derivation path, and a confidence rating.

Report statSubscriber Count

SourceYouTube Data API

Pathchannels.statistics.subscriberCount

ConfidenceVerified

Report statVideo Count

SourceYouTube Data API

Pathchannels.statistics.videoCount

ConfidenceVerified

Report statAverage Recent Views

SourceYouTube Data API

PathComputed from last 10 videos' viewCount

ConfidenceCalculated

Report statEngagement Rate

SourceCalculated

PathaverageRecentViews / subscriberCount × 100

ConfidenceCalculated

Report statUpload Cadence

SourceYouTube Data API + analysis

PathlastUploadDaysAgo + frequency pattern detection

ConfidenceCalculated

Report statCurrent Health Score

Source6-axis formula

Path4 components (avgSubs, uploadCadence, libraryDepth, platformFootprint)

ConfidenceCalculated

Report statNiche Trajectory Score

Source6-axis formula

Path5 components (demandYoY, platformMomentum, creatorActivity, seasonality, diversification)

ConfidenceCalculated

Report statEngagement Depth Score

Source6-axis formula

Path4 components (nicheEngagementRate, commentQuality, topCreatorEngagement, shortFormShare)

ConfidenceCalculated

Report statPurchase Intent Score

Source6-axis formula

Path4 components (WTP, context, income, friction)

ConfidenceCalculated

Report statAudience Receptivity Score

Source6-axis formula

Path4 components (brandedSentiment, engagementRetention, audienceSignals, partnershipRetention)

ConfidenceCalculated

Report statPartnership Readiness Score

Source6-axis formula

Path5 components (readiness, openness, sponsorships, mentions, rates)

ConfidenceCalculated

Report statKey Pain Points

SourceClaude analysis

PathpainPoints[].point from transcript analysis

ConfidenceAnalyzed

Report statCompetitive Products

SourceClaude analysis

PathcompetitiveProducts[].name from content review

ConfidenceAnalyzed

Report statWillingness to Pay

SourceClaude analysis

PathwillingnessToPay from audience signal analysis

ConfidenceEstimated

Report statRisk Factors

SourceClaude analysis

PathriskFactors[].description from content review

ConfidenceAnalyzed

Confidence tiers

Verified
Directly from API response, no transformation
Calculated
Derived from verified inputs via known formula
Analyzed
Extracted by Claude from content review
Estimated
Inferred from indirect signals