Data Sources and Lineage
Every number in an Imprint report traces back to a specific source through a documented path. Data flows from collection through analysis and scoring to final rendering. Here is the full pipeline.
Public data gathered from three primary sources. YouTube Data API provides channel statistics, video metadata, and upload history. Web scraping captures video transcripts, channel links, and business email. DataForSEO supplies niche search volume and keyword data.
Data sources
YouTube Data API
Official APIChannel statistics, video metadata, and upload history.
ToS-compliant, free tier
Web Scraping
Public web dataVideo transcripts, channel links, and business email.
Public pages only, no authentication bypass
DataForSEO
Licensed APINiche search volume and keyword data.
Commercial license
Key outputs
- subscriberCount
- videoCount
- averageRecentViews
- video transcripts
- uploadCadence
- channel email
- niche keywords
Anthropic Claude Sonnet analyzes scraped content against a structured analytical framework. Produces 23 structured fields per creator including purchaseContext (8 sub-fields), riskFactors, partnershipProfile, audiencePersonas, competitiveProducts, and willingnessToPay. Each analysis costs approximately $2 per creator.
Data sources
Anthropic Claude Sonnet
LLM analysisStructured content analysis against an analytical framework.
Commercial API license
Key outputs
- purchaseContext
- riskFactors
- partnershipProfile
- structuredDemographics
- competitiveProducts
- painPoints
- willingnessToPay
- imprintScore
The fromEnrichedAnalysis adapter normalizes raw analysis into a standardized ScoringCreator shape. Six-axis formulas compute niche-level scores from the normalized data. Validation gates (SP1) reject data that fails schema or plausibility checks before it enters scoring. Regression baselines (SP2) catch any formula drift across commits.
Data sources
SP1 Validation Gates
Internal guardrailSchema and plausibility checks that reject bad data before scoring.
N/A
SP2 Regression Baselines
Internal guardrailGolden baselines and property tests that catch formula drift.
N/A
Key outputs
- 6 axis scores
- composite Imprint Score
- niche catalog entry
Niche catalog compiled from scored creators. Five report generators produce HTML: State-of-Niche (free SEO), Niche Intelligence ($499), Brand-Creator Match ($1,499), Creator Due Diligence ($4,999), Audience Behavior. Magazine companion generated from shared data loader.
Data sources
Niche Catalog
Internal data storeCompiled catalog of scored creators grouped by niche.
N/A
Report Generators
Internal rendererFive HTML generators plus magazine companion renderer.
N/A
Key outputs
- HTML reports
- magazine exports
- SEO pages
- Stripe-gated delivery
Every stat traced to its source.
Each row below maps a report metric to the data source it originates from, the derivation path, and a confidence rating.
Report statSubscriber Count
SourceYouTube Data API
Pathchannels.statistics.subscriberCount
ConfidenceVerified
Report statVideo Count
SourceYouTube Data API
Pathchannels.statistics.videoCount
ConfidenceVerified
Report statAverage Recent Views
SourceYouTube Data API
PathComputed from last 10 videos' viewCount
ConfidenceCalculated
Report statEngagement Rate
SourceCalculated
PathaverageRecentViews / subscriberCount × 100
ConfidenceCalculated
Report statUpload Cadence
SourceYouTube Data API + analysis
PathlastUploadDaysAgo + frequency pattern detection
ConfidenceCalculated
Report statCurrent Health Score
Source6-axis formula
Path4 components (avgSubs, uploadCadence, libraryDepth, platformFootprint)
ConfidenceCalculated
Report statNiche Trajectory Score
Source6-axis formula
Path5 components (demandYoY, platformMomentum, creatorActivity, seasonality, diversification)
ConfidenceCalculated
Report statEngagement Depth Score
Source6-axis formula
Path4 components (nicheEngagementRate, commentQuality, topCreatorEngagement, shortFormShare)
ConfidenceCalculated
Report statPurchase Intent Score
Source6-axis formula
Path4 components (WTP, context, income, friction)
ConfidenceCalculated
Report statAudience Receptivity Score
Source6-axis formula
Path4 components (brandedSentiment, engagementRetention, audienceSignals, partnershipRetention)
ConfidenceCalculated
Report statPartnership Readiness Score
Source6-axis formula
Path5 components (readiness, openness, sponsorships, mentions, rates)
ConfidenceCalculated
Report statKey Pain Points
SourceClaude analysis
PathpainPoints[].point from transcript analysis
ConfidenceAnalyzed
Report statCompetitive Products
SourceClaude analysis
PathcompetitiveProducts[].name from content review
ConfidenceAnalyzed
Report statWillingness to Pay
SourceClaude analysis
PathwillingnessToPay from audience signal analysis
ConfidenceEstimated
Report statRisk Factors
SourceClaude analysis
PathriskFactors[].description from content review
ConfidenceAnalyzed
Confidence tiers
- Verified
Directly from API response, no transformation
- Calculated
Derived from verified inputs via known formula
- Analyzed
Extracted by Claude from content review
- Estimated
Inferred from indirect signals