Why Your Codebase is Invisible to AI (And What to Do About It)
I watched GitHub Copilot suggest the same validation logic three times in one week. Different syntax. Different variable names. Same exact purpose.
The AI wasn't broken. My codebase was invisible.

AI models have limited visibility. If your logic is fragmented, it remains invisible to the AI's context.
Here's the problem: AI can write code, but it can't see your patterns. Not the way humans do. When you have the same logic scattered across different files with different names, AI treats each one as unique. So it solves it again. And again.
The Context Window Crisis
Every time your AI assistant helps with code, it needs context. It reads your file, follows imports, understands dependencies. All of this costs tokens. The more fragmented your code, the more tokens you burn.
Let me show you a real example from building ReceiptClaimer.
Example 1: User Validation - The Hard Way
I had user validation logic spread across 8 files:
api/auth/validate-email.tsapi/auth/validate-password.tsapi/users/check-email-exists.tsapi/users/validate-username.tslib/validators/email.tslib/validators/password-strength.tsutils/auth/email-format.tsutils/validation/user-fields.ts
Each file: 80–150 lines. Different patterns. Different error handling. Different import chains.
When AI needed to help with user validation, it had to:
- Read the current file (200 tokens)
- Follow imports to understand the pattern (3,200 tokens)
- Pull in dependencies to match types (5,800 tokens)
- Scan similar files to understand conventions (3,250 tokens)
Total context cost: 12,450 tokens per request.
Example 2: User Validation - The Smart Way
After refactoring, I consolidated to 2 files:
lib/user-validation/index.ts- All validation logiclib/user-validation/types.ts- Shared types
Each file: 200–250 lines. Single pattern. Clear error handling. Minimal imports.
It costs far less for AI to help after consolidation.
Three Ways Your Code Becomes Invisible
1. Semantic Duplicates: Same Logic, Different Disguise
Traditional linters catch copy-paste duplication. They're useless for semantic duplicates.
Here's what I mean. Both functions do the exact same thing:
// File: api/receipts/validate.ts
function checkReceiptData(data: any): boolean {
if (!data.merchant) return false;
if (!data.amount) return false;
if (data.amount <= 0) return false;
if (!data.date) return false;
return true;
}
// File: lib/validators/receipt-validator.ts
export function isValidReceipt(receipt: ReceiptInput): boolean {
const hasRequiredFields = receipt.merchant &&
receipt.amount &&
receipt.date;
const hasPositiveAmount = receipt.amount > 0;
return hasRequiredFields && hasPositiveAmount;
}2. Domain Fragmentation: Scattered Logic That Bleeds Tokens
Receipt Processing (fragmented):
src/
api/
receipts/
upload.ts # Handles file upload
extract.ts # Calls OCR service
parse.ts # Parses OCR response
lib/
ocr/
google-vision.ts # Google Vision integration
openai-vision.ts # OpenAI Vision integration
parsers/
receipt-parser.ts # Parsing logic
services/
receipt-service.ts # Business logic
utils/
file-upload.ts # S3 upload helperReceipt Processing (consolidated):
src/
domains/
receipt-processing/
index.ts # Public API
ocr-service.ts # OCR abstraction
parser.ts # Parsing logic
storage.ts # S3 operations
types.ts # Shared types3. Low Cohesion: Mixed Concerns That Confuse Everyone
Instead of one file doing everything, you have files that do unrelated things. AI can't figure out what the file is for.
// lib/utils/helpers.ts (820 lines)
export function formatCurrency(amount: number): string { ... }
export function parseDate(dateStr: string): Date { ... }
export function uploadToS3(file: Buffer): Promise<string> { ... }
export function validateEmail(email: string): boolean { ... }
export function generateToken(): string { ... }
export function calculateGST(amount: number): number { ... }
export function hashPassword(pwd: string): Promise<string> { ... }How to Measure Invisibility
You can't fix what you can't measure. So I built tools to measure these three dimensions.
Measuring Semantic Duplicates
- Parse code into AST (Abstract Syntax Trees)
- Extract semantic tokens (variable names → generic placeholders)
- Calculate Jaccard similarity (set-based comparison)
// Function A
function validateUser(user) {
if (!user.email) return false;
if (!user.password) return false;
return true;
}
// Function B
function checkUserValid(data) {
const hasEmail = !!data.email;
const hasPassword = !!data.password;
return hasEmail && hasPassword;
}Function A tokens: [if, not, property, return, false, return, true]
Function B tokens: [const, property, return, and]Jaccard similarity: 0.78 (78% similar)
Anything above 0.70? Probably a semantic duplicate worth reviewing.
Tool: npx @aiready/pattern-detect
Measuring Fragmentation
Context budget tells you how many tokens AI needs to understand a file.
I built @aiready/context-analyzer to measure:
- Import depth - How many levels deep do imports go?
- Context budget - Total tokens needed to understand this file
- Cohesion score - Are imports related to each other?
- Fragmentation score - Is this domain split across files?
src/api/receipts/upload.ts
Import depth: 7 levels
Context budget: 12,450 tokens
Cohesion: 0.34 (low - mixed concerns)
Fragmentation: 0.78 (high - scattered domain)High fragmentation + low cohesion = AI will struggle.
Tool: npx @aiready/context-analyzer
Measuring Consistency (Coming Soon)
The third dimension: pattern consistency.
Do you handle errors the same way everywhere? Use the same naming conventions? Follow the same async patterns?
I'm building @aiready/consistency to detect:
- Mixed error handling patterns (try-catch vs callbacks vs promises)
- Inconsistent naming (camelCase vs snake_case)
- Import style drift (ES modules vs require)
- Async pattern mixing (async/await vs .then())
Status: Beta release next week.
The ReceiptClaimer Results
I ran these tools on my own codebase — ReceiptClaimer, an AI-powered receipt tracker for Australian taxpayers. Here's what I found:
Before Measurement
- Semantic duplicates: 23 patterns repeated 87 times
- Average import depth: 5.8 levels
- Average context budget: 8,200 tokens per file
- Cohesion score: 0.42 (poor)
- Monthly AI costs: ~$380 (estimated)
After Refactoring (4 weeks)
- Semantic duplicates: 3 patterns repeated 8 times (-87%)
- Average import depth: 2.9 levels (-50%)
- Average context budget: 2,100 tokens per file (-74%)
- Cohesion score: 0.89 (excellent)
- Monthly AI costs: ~$95 (estimated)
Time invested: 40 hours over 4 weeks
Annual savings: $3,420 in AI costs
ROI: 12.6 months (probably faster due to velocity gains)
What You Can Do Today
You don't need to refactor everything. Start with measurement.
Step 1: Measure Your Semantic Duplicates
npx @aiready/pattern-detectLook for:
- Similarity scores > 70%
- Patterns repeated 3+ times
- Core domains (auth, validation, API handlers)
Step 2: Measure Your Fragmentation
npx @aiready/context-analyzerLook for:
- Import depth > 5 levels
- Context budget > 8,000 tokens
- Cohesion score < 0.50
- Files with fragmentation > 0.70
Step 3: Pick ONE Domain to Fix
Don't refactor everything. Pick your most painful domain:
- The one where AI suggestions are worst
- The one where code reviews take longest
- The one where new developers get confused
Focus there. Consolidate files. Extract common patterns. Measure again.
Step 4: Track Improvements
Run the tools weekly. Watch the metrics improve. Share results with your team.
The goal isn't perfect code. It's visible code.
Next in This Series
In Part 3, I'll dive deep into the technical details: "Building AIReady: Metrics That Actually Matter"
We'll explore:
- Why traditional metrics (cyclomatic complexity, code coverage) miss AI problems
- How Jaccard similarity works on AST tokens (with diagrams)
- The three dimensions of AI-readiness and how they interact
- Design decisions: Why I built a hub-and-spoke architecture
- Open source philosophy: Free forever, configurable by design
Until then, run the tools. Measure your codebase. See how invisible it really is.
Try it yourself:
- GitHub: github.com/caopengau/aiready-cli
- Docs: aiready.dev
- Report issues: github.com/caopengau/aiready-cli/issues
Want to support this work?
- ⭐ Star the repo
- 🐛 Report issues you find
- 💬 Share your results (I read every comment)
Read the full series:
- Part 1: The AI Code Debt Tsunami is Here (And We're Not Ready)
- Part 2: Why Your Codebase is Invisible to AI (And What to Do About It) ← You are here
- Part 3: AI Code Quality Metrics That Actually Matter
- Part 4: Deep Dive: Semantic Duplicate Detection with AST Analysis
- Part 5: The Hidden Cost of Import Chains
- Part 6: Visualizing the Invisible: Seeing the Shape of AI Code Debt
*Peng Cao is the founder of receiptclaimer and creator of aiready, an open-source suite for measuring and optimising codebases for AI adoption.*
Join the Discussion
Have questions or want to share your AI code quality story? Drop them below. I read every comment.