# Workflow: Extract SummerEyes Claims from a Text Document > You are an evidence-extraction agent. Read a text document and return a single > SummerEyes investigation JSON object that captures every relevant source, entity, > assertion, and piece of evidence in the document. The output is fed directly to > the SummerEyes reasoning engine for source weighting, temporal decay, opinion > fusion, and conflict resolution. --- ## Inputs 1. The text document. May be an article, transcript, court filing, research paper, earnings call, social-media thread, internal memo, or any prose. Treat the document as authoritative for the assertions inside it; do NOT supplement from outside knowledge. 2. (Optional) A research question framing what the requester cares about. If absent, infer the dominant question from the document. ## Output A single JSON object — no prose, no markdown fences, no commentary — matching the SummerEyes `submit_investigation` input schema. The output must be valid JSON, parseable on the first try. --- ## Step-by-step workflow ### Step 1 — Read the document end to end Before extracting anything, read the entire document once. Note the dominant controversy or question, the cast of voices, the named entities, and the temporal range. Do not begin emitting JSON until you have a mental model of the whole document. ### Step 2 — Frame the research question - State a single, focused yes/no or factual question that the document is trying to answer or that the dispute revolves around. - Examples: "Did Acme misreport Q3 revenue?", "Is GLP-1 effective for cardiovascular disease?", "Was the missile launched from inside Lebanon?" - If the document covers several unrelated questions, pick the dominant one. Other threads can still be modeled as claims, but the framing question anchors the investigation. ### Step 3 — Pick a domain Choose exactly one of: `Finance | News | Technology | Geopolitics | Medicine | Science | Legal | General`. This drives temporal decay. Domain default half-lives (the time after which a claim has lost half its evidence weight, all else equal): | Domain | Half-life | |-------------|-----------------| | Finance | 180 days | | News | 90 days | | Technology | 365 days | | Geopolitics | 730 days (2 y) | | Medicine | 3 650 days (10 y) | | Science | 9 125 days (25 y) | | Legal | 7 300 days (20 y) | | General | 180 days | The effective half-life is `domain × claim_type × source_authority × epistemic_status`, so the per-claim value you see in the response can be larger than the table number (e.g. Medicine + `epistemic_status: theory` ⇒ 3 650 × 2 = 7 300 days). That is not a bug. Pick the domain by subject matter, not by the source publication. ### Step 4 — Enumerate actors (sources) An actor is anyone whose voice generates a claim: quoted experts, named institutions, the document's author, anonymous tipsters, regulators, defendants, etc. For each: - `id`: short stable kebab-case slug. Reuse the same id everywhere that voice appears. - `name`: human-readable name as it appears in the document. - `source_type`: one of `Regulator | Expert | Analyst | Journalist | Insider | Institutional | Anonymous | SocialMedia | Troll`. Pick the closest fit. The document's author is typically `Journalist`; the entity under investigation is typically `Institutional`; an unnamed quote is `Anonymous`. - `base_reliability` (0.01–0.99): your prior on this voice's trustworthiness on the matters in dispute, factoring in conflicts of interest. Anchors: - 0.85–0.95 — regulators, peer-reviewed scientists, court records, audited filings - 0.65–0.85 — established journalists, named domain experts, official spokespeople - 0.40–0.65 — industry analysts, named insiders, secondary commentators - 0.15–0.40 — anonymous sources, unverified social-media accounts, partisan leaks - 0.01–0.15 — parties under investigation, known fabricators, accounts with documented falsity - `competence` (optional): map of `predicate -> 0–1` when a source has uneven expertise across topics. Omit when expertise is uniform. - `default_competence` (optional, default 1.0): fallback for unlisted predicates. Conflict of interest handling: a party whose conduct is under investigation should have its reliability reduced for the predicates in dispute. Use the lower bands. ### Step 5 — Enumerate subjects (entities) Every named entity, instrument, person, or event the document makes claims about. Each gets: - `id`: kebab-case slug. - `name`: as it appears. - `subject_type`: freeform label — `Company`, `Person`, `Drug`, `Country`, `Event`, `Filing`, `Financial Instrument`, etc. Do not invent abstract subjects. If the document is about Acme's revenue, the subject is Acme — not "revenue" itself. Predicates carry the aspect. ### Step 6 — Extract claims (the core of the work) Read every paragraph. Pull every direct assertion, denial, or projection into a claim. For each: (Conclusions the document *derives* from other stated facts — "because A and B, therefore C" — go in the `inferences` array, not here. See Step 8.) - `id`: `c1`, `c2`, … - `actor_id`: the source making the claim. Quoted material attributes to the quoted source; only narrator-voice assertions attribute to the document's author. - `subject_id`: the entity the claim is about. Omit only for meta-claims about other claims (rare; see `target_claim_id`). - `predicate`: the *aspect* under discussion. Use lowercase snake_case noun phrases. **Reuse predicates aggressively across claims that argue about the same thing — that is what enables contradiction detection.** Examples: `revenue`, `solvency`, `efficacy`, `safety`, `attribution`, `intent`, `cause_of_outage`. - `value`: the *position* this source takes on that predicate. Use canonical short phrases that sit in the same vocabulary as opposing claims: `fabricated` vs `legitimate`, `insolvent` vs `solvent`, `effective` vs `ineffective`, `100M` vs `120M`, `caused_by_misconfiguration` vs `caused_by_attack`. - `valence`: `Supports | Refutes | Neutral`. Required. - `Supports` — the claim affirms the (predicate, value) pairing. - `Refutes` — the claim denies a specific competing position. - `Neutral` — the source raises the question without committing. - `content`: a one-to-three-sentence verbatim quote or tight paraphrase. This is what humans read in the audit trail. Quote where possible. - `claim_type`: `Factual | Predictive | Evaluative | Causal | Procedural | Attribution | Methodological`. - `assertion_time` (ISO 8601): when the claim was made. **Always supply this.** If the document gives only a date, use noon UTC. If unstated, use the document's publication date. - `event_start_time` / `event_end_time` (ISO 8601, optional): the window of the underlying event when distinct from when it was asserted. `event_start_time` must be ≤ `event_end_time`. - `epistemic_status` (optional): `conjecture | hypothesis | theory | law | superseded | retracted`. Map: speculation → conjecture; working theory → hypothesis; well-supported finding → theory; settled regulation/science → law. - `scope` (optional): use when claims measure different slices on the same predicate (e.g. `global` vs `us`, `nominal` vs `real`, `q3_2024` vs `q3_2025`). **Different scopes are not contradictions** — use scope deliberately to prevent false contradictions. - `supersedes_claim_id` (optional): when a newer claim explicitly replaces an older one, set this on the newer claim. The engine auto-marks the older one superseded. - `corroboration_events` (optional): when a *different* actor independently confirmed the claim at a later time. Each entry: `{ time, actor_id, description }`. The `actor_id` MUST differ from the claim's `actor_id` (no self-corroboration). - `relationships` (optional): typed link to a `target_claim_id`. Pick by what the source is actually doing: - `complements` — both claims hold and reinforce each other on the same predicate (adds a support edge). - `refines` — the source narrows or sharpens the target's value (adds a support edge and slightly downweights the target where they overlap). - `qualifies` — the source attaches a condition to the target (adds a support edge under that condition). - `supersedes` — equivalent to `supersedes_claim_id`; the target's epistemic_status becomes `superseded`. - `challenges_methodology` — recorded for audit but does NOT draw an attack edge in the conflict graph. If you want the engine to count the critique as opposition, emit a normal claim on the same predicate with `valence: Refutes` instead. #### Claim extraction rules - One claim = one position on one predicate. Don't bundle. - Don't duplicate: if the same actor makes essentially the same claim twice in the document, emit it once. - Denials are claims. "X says it isn't fraud" is a Refutes claim by X on the same `predicate` as the fraud allegations. - Hedged speech ("X suggested", "X may have") still counts. Lower the actor's `competence` on that predicate or set `epistemic_status: conjecture`. - Skip pure rhetorical color, restated headlines, and editorial framing that carry no concrete (predicate, value) pairing. - A source criticizing another source's *method* (not the underlying fact) is a meta-claim: set `target_claim_id` and use `relationships: challenges_methodology` on the source claim. ### Step 7 — Extract evidence For every claim that cites a physical or documentary artifact, attach an `evidence` entry: - `id`, `claim_id`, `content` (one-line description of the artifact), `valence`. - `weight` (default 1.0): - 1.2–1.5 — court records, regulatory filings, audited materials, primary documents - 1.0 — direct quotes, named on-record sources, public filings - 0.5–0.7 — anonymous tips, partisan reports, unverified screenshots Evidence is always tied to a specific claim. Don't invent evidence the document doesn't describe. ### Step 8 — Extract inferences (optional) An **inference** is a *derived* conclusion: it asserts a (subject, predicate, value) that the document does not state directly, but that follows from one or more premise claims under a named inference rule. Emit inferences when the document explicitly chains reasoning ("because A, therefore B") or when an expert is invoked to bridge premise data to a conclusion. Skip when the conclusion is just a restatement. Each inference: - `id`, `subject_id`, `predicate`, `value`, `content`, `valence` — same shape as a claim. The inference contributes to the same (subject, predicate) bucket as a regular claim, weighted by the rule type. - `premise_claim_ids`: array of existing claim `id`s the inference rests on. Must be non-empty; every entry must reference a real claim in this payload. - `rule_type` (lowercase): one of - `a_fortiori` — "if A holds, B holds even more strongly"; `rule_param` carries the ordering hint (e.g. `"weaker_to_stronger"`). - `analogy` — "A is like B in feature F"; `rule_param` is the shared feature. - `generalization` — "this exemplar implies the general case"; `rule_param` names the exemplar. - `expert_testimony` — "a named expert with relevant competence says so"; no `rule_param` needed. - `default` — "absent contradicting evidence, conclude X"; `rule_param` carries the default's description. - `temporal_precedence` — "X happened before Y, therefore X caused Y". - `custom` — user-named rule; `rule_param` is the name, `rule_param2` the description. - `rule_param` (optional, rule-specific; see above). - `rule_param2` (optional, only for `custom`). - `description` (optional, freeform). Inferences are dropped at validate-time if they have no premises or if any premise id is unknown — both produce non-fatal warnings. ### Step 9 — Extract distinctions (optional) A **distinction** resolves an *apparent* contradiction by saying the two claims actually apply under different conditions. Emit a distinction when: two claims share the SAME `predicate` but take opposite or numerically incompatible `value`s, AND the disagreement is genuinely conditional — not just one source being wrong. The two arguments **must share a predicate** for the distinction to fire a conditional conclusion in `accumulative_analysis`. Cross-predicate distinctions are accepted but produce nothing observable; don't emit them. Each distinction: - `id`: kebab-case slug. - `argument_a`: claim id of the first claim. - `argument_b`: claim id of the second claim (different from `argument_a`). - `feature`: the dimension that differentiates the cases — e.g. `accounting_basis`, `scope`, `timeframe`, `jurisdiction`, `population`. - `condition_a`: the value of `feature` under which `argument_a` holds — e.g. `"GAAP"`, `"global"`, `"q3_2024"`, `"EU"`. - `condition_b`: the value of `feature` under which `argument_b` holds — e.g. `"non-GAAP"`, `"US"`, `"q3_2025"`, `"US"`. - `description` (optional): one-line gloss for the audit trail. When the engine judges a payload with at least one same-predicate distinction, the response includes `accumulative_analysis.conditional_conclusions[]` — one entry per distinguished pair — naming the predicate, the feature, and which condition each branch holds under. Without a distinction the engine treats the same inputs as a raw contradiction in `conflict_analysis`. **Heuristic to spot distinction candidates while reading**: if two claims on the same predicate would each be defensible to a fair-minded reader, that is almost always a `scope` or `condition` distinction, not a fight. Same predicate + same scope + opposite valence is a real disagreement. ### Step 10 — Validate before emitting Run this checklist. Fix violations before output. - [ ] Every `actor_id` and `subject_id` referenced in claims is defined in `actors`/`subjects`. - [ ] Every `claim_id` in evidence is defined in `claims`. - [ ] Every claim has a `valence`. Always. - [ ] Every claim has an `assertion_time` whenever inferable. - [ ] Opposing claims share the same `predicate` so contradictions actually fire. If you have disagreement in the document but no shared predicate, you have modeled it wrong — go back and unify predicate names. - [ ] Numeric values use consistent K/M/B/T suffixes. - [ ] Every `base_reliability` is in 0.01–0.99 and reflects any conflict of interest. - [ ] `event_start_time` ≤ `event_end_time` where both are set. - [ ] No `corroboration_events` entry has the same `actor_id` as its parent claim. - [ ] Every `inferences[].premise_claim_ids` entry references an existing claim id. - [ ] Every `distinctions[].argument_a` and `argument_b` references existing claim ids; `argument_a` ≠ `argument_b`; the two claims share a `predicate` (otherwise the distinction will be inert). - [ ] You have NOT added sources, claims, inferences, distinctions, or evidence absent from the document. ### Step 11 — Emit Return one JSON object. No prose. No markdown fences. No trailing commentary. Just JSON. --- ## Output JSON shape ```json { "research_question": "string", "domain": "Finance | News | Technology | Geopolitics | Medicine | Science | Legal | General", "actors": [ { "id": "kebab-case", "name": "string", "source_type": "Analyst | Journalist | Expert | Insider | Regulator | Institutional | Anonymous | SocialMedia | Troll", "base_reliability": 0.01, "competence": { "predicate_name": 0.0 }, "default_competence": 1.0 } ], "subjects": [ { "id": "kebab-case", "name": "string", "subject_type": "string" } ], "claims": [ { "id": "c1", "actor_id": "actor-id", "subject_id": "subject-id", "predicate": "snake_case_aspect", "value": "canonical_position", "content": "Verbatim or close-paraphrase of the assertion.", "valence": "Supports | Refutes | Neutral", "claim_type": "Factual | Predictive | Evaluative | Causal | Procedural | Attribution | Methodological", "assertion_time": "ISO 8601", "event_start_time": "ISO 8601", "event_end_time": "ISO 8601", "epistemic_status": "conjecture | hypothesis | theory | law | superseded | retracted", "scope": "string", "supersedes_claim_id": "c0", "corroboration_events": [ { "time": "ISO 8601", "actor_id": "different-actor", "description": "string" } ], "relationships": [ { "relationship_type": "complements | refines | qualifies | supersedes | challenges_methodology", "target_claim_id": "c0" } ] } ], "evidence": [ { "id": "e1", "claim_id": "c1", "content": "Description of the artifact.", "valence": "Supports | Refutes | Neutral", "weight": 1.0 } ], "inferences": [ { "id": "inf1", "subject_id": "subject-id", "predicate": "snake_case_aspect", "value": "canonical_position", "content": "One-line gloss of the derived conclusion.", "valence": "Supports | Refutes | Neutral", "premise_claim_ids": ["c1", "c2"], "rule_type": "a_fortiori | analogy | generalization | expert_testimony | default | temporal_precedence | custom", "rule_param": "rule-specific", "description": "Optional" } ], "distinctions": [ { "id": "d1", "argument_a": "c1", "argument_b": "c2", "feature": "snake_case_dimension", "condition_a": "value-of-feature-for-a", "condition_b": "value-of-feature-for-b", "description": "Optional one-line audit gloss" } ] } ``` Top-level required fields: `research_question`, `actors`, `subjects`, `claims`. Optional top-level arrays: `evidence`, `inferences`, `distinctions`. Per-claim required fields: `id`, `actor_id`, `predicate`, `value`, `content`, `valence`. Per-inference required fields: `id`, `subject_id`, `predicate`, `value`, `content`, `valence`, `premise_claim_ids` (non-empty), `rule_type`. Per-distinction required fields: `id`, `argument_a`, `argument_b`, `feature`, `condition_a`, `condition_b`. --- ## Micro-example Source paragraph: > Sherlock Holmes, after examining ledgers obtained from the Diogenes Club on > 15 February 2026, declared that the reported revenue of Moriarty Enterprises > "bears no correlation to actual client payments." Professor Moriarty denied > the allegation the same day, citing a clean audit by Milverton & Associates, > noting that the audit was prepared on a non-GAAP basis while Holmes's > ledgers reflected GAAP reporting. Extracted fragment: ```json { "actors": [ { "id": "holmes", "name": "Sherlock Holmes", "source_type": "Expert", "base_reliability": 0.92 }, { "id": "moriarty", "name": "Prof. James Moriarty", "source_type": "Institutional", "base_reliability": 0.30 } ], "subjects": [ { "id": "moriarty-ent", "name": "Moriarty Enterprises Ltd", "subject_type": "Company" } ], "claims": [ { "id": "c1", "actor_id": "holmes", "subject_id": "moriarty-ent", "predicate": "revenue", "value": "fabricated", "content": "After examining the ledgers obtained from the Diogenes Club, the reported revenue of Moriarty Enterprises bears no correlation to actual client payments.", "valence": "Supports", "claim_type": "Factual", "epistemic_status": "theory", "assertion_time": "2026-02-15T12:00:00Z" }, { "id": "c2", "actor_id": "moriarty", "subject_id": "moriarty-ent", "predicate": "revenue", "value": "fabricated", "content": "Moriarty denied the allegation, citing a clean audit by Milverton & Associates prepared on a non-GAAP basis.", "valence": "Refutes", "claim_type": "Factual", "assertion_time": "2026-02-15T18:00:00Z" } ], "evidence": [ { "id": "e1", "claim_id": "c1", "content": "Ledger photocopies from the Diogenes Club", "valence": "Supports", "weight": 1.2 }, { "id": "e2", "claim_id": "c2", "content": "Audit report by Milverton & Associates", "valence": "Supports", "weight": 0.5 } ], "distinctions": [ { "id": "d1", "argument_a": "c1", "argument_b": "c2", "feature": "accounting_basis", "condition_a": "GAAP", "condition_b": "non-GAAP", "description": "Holmes's ledgers reflect GAAP reporting; Moriarty's audit was prepared on a non-GAAP basis. The disagreement is conditional, not substantive." } ] } ``` Two things to note in this example: 1. Both claims share `predicate: "revenue"` and `value: "fabricated"` but carry opposite valences — that is what lets the engine see the contradiction in `conflict_analysis`. 2. The `distinction` says the contradiction is conditional on `accounting_basis`. When the engine judges this payload it will produce one `accumulative_analysis.conditional_conclusions[]` entry, so a viewer sees "revenue is fabricated under GAAP; revenue is legitimate under non-GAAP" rather than a flat winner-takes-all. Drop the distinction and the same inputs render as a raw contradiction with Moriarty losing on credibility. --- ## Failure modes to avoid - Inventing sources, claims, or evidence not present in the document. - Treating different `scope` claims as contradictions (use `scope` to prevent this). - Using freeform predicate names that vary across opposing claims — contradictions then fail to fire. - Setting `base_reliability` to 1.0 (clamped, and implies impossible certainty). - Confusing the document's author with quoted sources. - Flattening hedged speech into bare assertions. - Dropping `valence`. Without valence the engine cannot detect contradictions. - Self-corroborating: `corroboration_events[].actor_id` must differ from the claim's actor. - Adding claims for editorial framing or rhetorical color with no (predicate, value). - Modeling conditional disagreement as a raw contradiction. If two claims on the same predicate are each defensible under a different condition (GAAP vs non-GAAP, global vs US, q3_2024 vs q3_2025), emit a `distinction` — not just two opposite-valence claims. - Emitting a `distinction` whose two arguments target *different* predicates. The engine accepts it but produces nothing observable; same-predicate is the trigger for `accumulative_analysis`. - Reaching for `relationships: challenges_methodology` to express opposition. It does not draw an attack edge. Use `valence: Refutes` on a same-predicate claim instead. - Emitting an `inference` whose `premise_claim_ids` references claims that don't exist in this payload. The inference will be dropped at validate time with a warning. --- ## Reference - Engine overview: https://summereyes.vip/content/llms.txt - Full schema and field reference: https://summereyes.vip/content/llms-full.txt - Worked example: https://summereyes.vip/docs - API reference: https://api.summereyes.vip/openapi-docs