AI Context Flow


@cruise_chen I’m so glad you asked, because we’ve built a sophisticated multi-layered memory indexing system that mimics how human memory actually works.

Here’s how we index different memory contexts:

1. Three Tiers of Memory (Like Human Memory)

Short-term memory: We preserve the last 3 conversation turns verbatim – this is your “working memory” that keeps the immediate context fresh

Mid-term memory: Older conversations get intelligently summarized using LLM, extracting key facts, decisions, and entities – think of it as your brain consolidating information while you sleep

Long-term memory: All uploaded documents and contexts are converted to semantic vectors (1024-dimensional embeddings using BGE-large-en-v1.5) and stored in AWS S3 Vectors for retrieval

2. Intelligent Query-Data Separation

Before indexing, we use LLM to analyze user input and separate it into:

QUERY: What they’re asking

DATA: What they’re providing

This prevents “memory pollution” where questions get mixed with actual information you want to remember.

3. Multi-Tenant Isolation

Every memory is indexed with hierarchical metadata:

userId → profileId → context/file → chunks

This means your memories are completely isolated per user and per profile (like having separate notebooks for different projects).

4. Semantic Chunking & Retrieval

Documents aren’t stored as raw text – we:

– Break them into semantic chunks

– Generate vector embeddings (capturing meaning, not just keywords)

– Use cosine similarity for retrieval (finding conceptually related content, even with different wording)

5. Context-Aware Optimization

We dynamically optimize what memory to use based on:

– Token budget (no overwhelming the AI with too much history)

– Semantic relevance (only pull memories that matter for the current query)

– Conversation continuity (balance between efficiency and context preservation)

The magic? Unlike traditional keyword search, our vector-based indexing understands meaning.

Ask “What’s my Python code for authentication?” and it’ll find your login implementation even if you never used the word “authentication” in your original document.

It’s serverless, scales automatically, and because we’re using AWS S3 Vectors, there’s no infrastructure to manage – just pure memory intelligence!

Hope this gives you a sense of how much work has gone into this seemingly small memory extension!

Would love your feedback if you are deep into the AI memory space!



Source link