Skip to content
by Imprint
Methodology
Pipeline lineage

Data Sources and Lineage

Every number in an Imprint report traces back to a specific source through a documented path. Data flows from collection through analysis and scoring to final rendering. Here is the full pipeline.

01Collection

Public data gathered from three primary sources. YouTube Data API provides channel statistics, video metadata, and upload history. Web scraping captures video transcripts, channel links, and business email. DataForSEO supplies niche search volume and keyword data.

Data sources

  • YouTube Data API

    Official API

    Channel statistics, video metadata, and upload history.

    ToS-compliant, free tier

  • Web Scraping

    Public web data

    Video transcripts, channel links, and business email.

    Public pages only, no authentication bypass

  • DataForSEO

    Licensed API

    Niche search volume and keyword data.

    Commercial license

Key outputs

  • subscriberCount
  • videoCount
  • averageRecentViews
  • video transcripts
  • uploadCadence
  • channel email
  • niche keywords
02Analysis

Anthropic Claude Sonnet analyzes scraped content against a structured analytical framework. Produces 23 structured fields per creator including purchaseContext (8 sub-fields), riskFactors, partnershipProfile, audiencePersonas, competitiveProducts, and willingnessToPay. Each analysis costs approximately $2 per creator.

Data sources

  • Anthropic Claude Sonnet

    LLM analysis

    Structured content analysis against an analytical framework.

    Commercial API license

Key outputs

  • purchaseContext
  • riskFactors
  • partnershipProfile
  • structuredDemographics
  • competitiveProducts
  • painPoints
  • willingnessToPay
  • imprintScore
03Scoring

The fromEnrichedAnalysis adapter normalizes raw analysis into a standardized ScoringCreator shape. Six-axis formulas compute niche-level scores from the normalized data. Validation gates (SP1) reject data that fails schema or plausibility checks before it enters scoring. Regression baselines (SP2) catch any formula drift across commits.

Data sources

  • SP1 Validation Gates

    Internal guardrail

    Schema and plausibility checks that reject bad data before scoring.

    N/A

  • SP2 Regression Baselines

    Internal guardrail

    Golden baselines and property tests that catch formula drift.

    N/A

Key outputs

  • 6 axis scores
  • composite Imprint Score
  • niche catalog entry
04Rendering

Niche catalog compiled from scored creators. Five report generators produce HTML: State-of-Niche (free SEO), Niche Intelligence ($499), Brand-Creator Match ($1,499), Creator Due Diligence ($4,999), Audience Behavior. Magazine companion generated from shared data loader.

Data sources

  • Niche Catalog

    Internal data store

    Compiled catalog of scored creators grouped by niche.

    N/A

  • Report Generators

    Internal renderer

    Five HTML generators plus magazine companion renderer.

    N/A

Key outputs

  • HTML reports
  • magazine exports
  • SEO pages
  • Stripe-gated delivery
Traceability

Every stat traced to its source.

Each row below maps a report metric to the data source it originates from, the derivation path, and a confidence rating.

Report statSubscriber Count

SourceYouTube Data API

Pathchannels.statistics.subscriberCount

ConfidenceVerified

Report statVideo Count

SourceYouTube Data API

Pathchannels.statistics.videoCount

ConfidenceVerified

Report statAverage Recent Views

SourceYouTube Data API

PathComputed from last 10 videos' viewCount

ConfidenceCalculated

Report statEngagement Rate

SourceCalculated

PathaverageRecentViews / subscriberCount × 100

ConfidenceCalculated

Report statUpload Cadence

SourceYouTube Data API + analysis

PathlastUploadDaysAgo + frequency pattern detection

ConfidenceCalculated

Report statCurrent Health Score

Source6-axis formula

Path4 components (avgSubs, uploadCadence, libraryDepth, platformFootprint)

ConfidenceCalculated

Report statNiche Trajectory Score

Source6-axis formula

Path5 components (demandYoY, platformMomentum, creatorActivity, seasonality, diversification)

ConfidenceCalculated

Report statEngagement Depth Score

Source6-axis formula

Path4 components (nicheEngagementRate, commentQuality, topCreatorEngagement, shortFormShare)

ConfidenceCalculated

Report statPurchase Intent Score

Source6-axis formula

Path4 components (WTP, context, income, friction)

ConfidenceCalculated

Report statAudience Receptivity Score

Source6-axis formula

Path4 components (brandedSentiment, engagementRetention, audienceSignals, partnershipRetention)

ConfidenceCalculated

Report statPartnership Readiness Score

Source6-axis formula

Path5 components (readiness, openness, sponsorships, mentions, rates)

ConfidenceCalculated

Report statKey Pain Points

SourceClaude analysis

PathpainPoints[].point from transcript analysis

ConfidenceAnalyzed

Report statCompetitive Products

SourceClaude analysis

PathcompetitiveProducts[].name from content review

ConfidenceAnalyzed

Report statWillingness to Pay

SourceClaude analysis

PathwillingnessToPay from audience signal analysis

ConfidenceEstimated

Report statRisk Factors

SourceClaude analysis

PathriskFactors[].description from content review

ConfidenceAnalyzed

Confidence tiers

  • Verified

    Directly from API response, no transformation

  • Calculated

    Derived from verified inputs via known formula

  • Analyzed

    Extracted by Claude from content review

  • Estimated

    Inferred from indirect signals