Skip to main content

Entity Extraction

Automatically extract and analyze people, organizations, money amounts, and other key entities from your case documents.

What is Entity Extraction?

Entity extraction uses AI to automatically identify and categorize important information in your documents:

Entity TypeExamples
PeopleJohn Smith, Dr. Sarah Martinez, Michael Chen
OrganizationsAcme Corp, Global Holdings Inc, Department of Justice
Money$500,000, $1.5 million, 125,000 shares
DatesAugust 7, 2024, March 2024
Emailsjohn@company.com, contact@acme.com
Phones(555) 123-4567

How It Works

BillionLens uses Grounded Extraction with citation verification:

  1. Document Search - AI searches all indexed documents
  2. Entity Detection - Identifies entity mentions with citations
  3. Verification - Each entity is verified against source documents
  4. Description Generation - AI creates contextual descriptions
  5. Alias Detection - Groups name variations (e.g., "Mike" = "Michael Chen")
  6. Deduplication - Merges duplicate entities intelligently
Citation-Required

Every extracted entity must be verified in source documents. This prevents hallucination and ensures accuracy for e-discovery.

Running Entity Extraction

Extract Entities

  1. Navigate to your case
  2. Click Entities card
  3. Click Extract Entities button
  4. Wait for extraction to complete
  5. View extracted entities organized by type

Progress Tracking

During extraction, you'll see:

  • Progress percentage
  • Current extraction phase
  • Entity counts as they're discovered

Viewing Entities

Entity Tabs

Entities are organized by type:

TabDescription
AllAll entity types combined
PeoplePerson names
OrganizationsCompanies, agencies, institutions
MoneyDollar amounts and financial values
DatesImportant dates
EmailsEmail addresses

Entity Details

Click the Context link on any entity to see:

  • Description - AI-generated explanation of the entity's role
  • Also Known As - Alias names (for people)
  • Source Documents - Which files mention this entity

Example Entity

John Smith (Person)

  • Description: "John Smith is the Chief Technology Officer at Acme Corp, responsible for overseeing product development and engineering teams."
  • Also Known As: J. Smith, Johnny

Alias Detection & Deduplication

Automatic Alias Detection

BillionLens automatically detects when different names refer to the same person:

Examples:

  • "John Smith" = "Johnny Smith" = "J. Smith"
  • "Michael Chen" = "Mike Chen" = "M. Chen"
  • "Dr. Martinez" = "Sarah Martinez" = "S. Martinez"

How It Works

The AI uses several strategies:

  • Middle names - "John Michael Smith" matches "Michael Smith"
  • Nicknames - "Mike" is recognized as nickname for "Michael"
  • Abbreviations - "J. Smith" matches "John Smith"
  • Typos - "Jhon" matches "John"

Merged Entities

When aliases are detected:

  • One canonical name is chosen (usually the most complete)
  • Other variations are stored as "Also Known As"
  • All document references are preserved

Entity Descriptions

AI-Generated Descriptions

Each entity gets a contextual description based on document content:

Good description:

"Michael Chen holds the roles of COO/Representative and Chief Executive Officer, and is also listed as a founder and initial member of the board of directors."

Not just quotes:

Descriptions explain the entity's role and significance, not just raw quotes from documents.

Description Sources

Descriptions are generated by:

  1. Finding all mentions of the entity in documents
  2. Understanding the context of each mention
  3. Synthesizing a concise explanation
  4. Verifying against source citations

Searching Entities

Filter by Type

Click the entity type tabs to filter:

  • View only People
  • View only Organizations
  • View only Money amounts
  • etc.

Use the search box to find specific entities:

  • Search for "Smith" to find all Smith entities
  • Search for company names
  • Search for specific amounts

Use Cases

Building Witness Lists

Extract PERSON entities to identify:

  • Potential witnesses
  • Parties involved
  • Key decision makers
  • People who signed documents

Identifying Key Players

Find ORGANIZATION entities to see:

  • Companies involved
  • Law firms
  • Government agencies
  • Banks and financial institutions

Financial Analysis

Extract MONEY entities to:

  • Track investment amounts
  • Identify financial transactions
  • Find payment records
  • Understand ownership stakes

Contact Discovery

Find EMAIL entities to:

  • Build contact lists
  • Identify key communicators
  • Track correspondence patterns

Accuracy & Verification

Citation-Required Extraction

Unlike simple NER (Named Entity Recognition), BillionLens uses grounded extraction:

Traditional NERBillionLens Grounded Extraction
Pattern matchingAI with document search
May hallucinateCitation-required
No contextFull document context
No descriptionsAI-generated descriptions

Verification Process

Every entity must pass verification:

  1. Entity name must appear in source documents
  2. Citation must be retrievable
  3. Description must be grounded in document text

Rejected Entities

Entities that can't be verified are rejected:

  • Prevents hallucination
  • Ensures e-discovery accuracy
  • Logs rejected entities for review

Best Practices

Review Important Entities

  • Manually verify critical entity extractions
  • Cross-reference with source documents
  • Check for missing entities

Use Descriptions

  • Read AI-generated descriptions for context
  • Descriptions explain relationships and roles
  • More useful than just names

Combine with Chat

  • Use entity extraction to find candidates
  • Use AI chat to explore relationships
  • Ask follow-up questions about specific entities

Re-extract When Needed

  • Re-run extraction if you add new documents
  • Previous entities are cleared before re-extraction
  • Ensures consistency

Next: Learn about Timeline Visualization.