State Machines: Precise Markdown Processing
Copy
+-----------------------------------------------------------------------+
| |
| THE CENTER |
| |
+-----------------------------------------------------------------------+
| |
| "CodexMono is The Brick - the fundamental unit that enables |
| Monokinetics: unified Human + AI experience through predictable, |
| trustable visual alignment." |
| |
| State machines implement The Brick principle by providing |
| deterministic, character-by-character processing that guarantees |
| predictable output. |
| |
| Just as VTE Parsers process escape sequences atomically to build |
| terminal grid cells (The Brick of terminal output), markdown state |
| machines process formatting markers atomically to build range |
| objects (The Brick of document structure). |
| |
+-----------------------------------------------------------------------+
The Terminal-to-Document Bridge
Copy
+-----------------------------------------------------------------------+
| |
| SEAMLESS WORKFLOW: TERMINAL TO DOCUMENT |
| |
+-----------------------------------------------------------------------+
| |
| VTE State Machine (Terminal) |
| | |
| v |
| Grid Cells (Terminal Brick) |
| | |
| | Copy |
| v |
| VmdBox Paste Detection |
| | |
| v |
| Markdown State Machine (Document) |
| | |
| v |
| Range Objects (Document Brick) |
| | |
| v |
| Human + AI Aligned Visual Experience |
| |
+-----------------------------------------------------------------------+
The Regex Problem
Traditional markdown parsers use regex patterns. This creates fundamental problems in interactive editors.Copy
+-----------------------------------------------------------------------+
| |
| THE REGEX OVERLAPPING PROBLEM |
| |
+-----------------------------------------------------------------------+
| |
| Input: "**bold** and *italic*" |
| |
| WHAT REGEX SEES: |
| +----------------------------------------------------------------+ |
| | | |
| | Bold pattern matches: "**bold**" OK | |
| | | |
| | Italic pattern matches: "*bold*" AND "*italic*" | |
| | ^^^^^^^^ | |
| | FALSE POSITIVE! | |
| | | |
| +----------------------------------------------------------------+ |
| |
| WHY IT FAILS: |
| +----------------------------------------------------------------+ |
| | | |
| | Regex sees PATTERNS, not STRUCTURE | |
| | | |
| | It matches character sequences without understanding context | |
| | | |
| | "*bold*" matches italic pattern even though those asterisks | |
| | are PART OF the bold construct | |
| | | |
| +----------------------------------------------------------------+ |
| |
+-----------------------------------------------------------------------+
The State Machine Solution
A Finite State Machine (FSM) processes text character-by-character, maintaining explicit state about what construct is being parsed.Copy
+-----------------------------------------------------------------------+
| |
| THE THREE ESSENTIAL PROPERTIES (The Brick Requires) |
| |
+-----------------------------------------------------------------------+
| |
| 1. DETERMINISTIC |
| +----------------------------------------------------------+ |
| | Same input ALWAYS produces same output | |
| | No ambiguity, no surprises | |
| +----------------------------------------------------------+ |
| |
| 2. ATOMIC |
| +----------------------------------------------------------+ |
| | Each character is processed as a discrete unit | |
| | No partial states visible to user | |
| +----------------------------------------------------------+ |
| |
| 3. PREDICTABLE |
| +----------------------------------------------------------+ |
| | Behavior can be traced through explicit state transitions| |
| | Debuggable, verifiable, trustable | |
| +----------------------------------------------------------+ |
| |
+-----------------------------------------------------------------------+
State Transition Diagram
Copy
+-----------------------------------------------------------------------+
| |
| STATE MACHINE FOR MARKDOWN INLINE FORMATTING |
| |
+-----------------------------------------------------------------------+
| |
| ANY CHAR (not *) |
| +----------------------+ |
| | | |
| v | |
| +-------------+ * +-------------+ | |
| | NORMAL |------>| ONE_STAR |-----+ |
| +-------------+ +-------------+ |
| ^ | |
| | | * |
| | v |
| | +-------------+ |
| | | TWO_STAR | |
| | +-------------+ |
| | | |
| | | * |
| | v |
| | +-------------+ |
| | | THREE_STAR | |
| | +-------------+ |
| | | |
| | +-----------+-----------+ |
| | | | | |
| | v v v |
| | +----------+ +----------+ +----------+ |
| | |IN_ITALIC | | IN_BOLD | |IN_BOLDITA| |
| | +----------+ +----------+ |LIC | |
| | | | +----------+ |
| | | * | ** | *** |
| | | | | |
| +---------+-----------+-----------+ |
| EMIT RANGE |
| |
+-----------------------------------------------------------------------+
The Critical Decision Points
Copy
+-----------------------------------------------------------------------+
| |
| HOW THE STATE MACHINE DECIDES |
| |
+-----------------------------------------------------------------------+
| |
| First * |
| +----------------------------------------------------------------+ |
| | Could be italic OR start of bold | |
| | Decision: WAIT | |
| +----------------------------------------------------------------+ |
| |
| Second * |
| +----------------------------------------------------------------+ |
| | Could be bold OR start of bolditalic | |
| | Decision: WAIT | |
| +----------------------------------------------------------------+ |
| |
| Third * |
| +----------------------------------------------------------------+ |
| | Must be bolditalic | |
| | Decision: WAIT for content | |
| +----------------------------------------------------------------+ |
| |
| Non-* character after markers |
| +----------------------------------------------------------------+ |
| | CONFIRMS the construct type | |
| | Decision: Enter content accumulation state | |
| +----------------------------------------------------------------+ |
| |
| The "waiting states" prevent premature commitment to a construct |
| type. This is WHY state machines are deterministic. |
| |
+-----------------------------------------------------------------------+
Range Object: The Document Brick
When parsing completes, the state machine produces Range Objects - precise position markers for every formatting construct.Copy
+-----------------------------------------------------------------------+
| |
| RANGE OBJECT ANATOMY |
| |
+-----------------------------------------------------------------------+
| |
| Input: "Hello **world** there" |
| |
| Visual Breakdown: |
| +------------------------------------------------------------------+ |
| | H e l l o * * w o r l d * * t h e r e | |
| | 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | |
| | | | | | | |
| | | +--contentStart | +--closeEnd | |
| | | | | |
| | +--openStart +--closeStart | |
| | | |
| | [OPEN] [CONTENT....] [CLOSE] | |
| +------------------------------------------------------------------+ |
| |
+-----------------------------------------------------------------------+
| |
| WHY SIX POSITIONS? |
| |
| +-------------------------------+ +-------------------------------+ |
| | Smart Backspace | | Syntax Highlighting | |
| | Delete markers while | | Style markers differently | |
| | preserving content | | from content | |
| +-------------------------------+ +-------------------------------+ |
| |
| +-------------------------------+ +-------------------------------+ |
| | Copy-Paste | | Cursor Navigation | |
| | Extract just content | | Jump between regions | |
| | or full markdown | | | |
| +-------------------------------+ +-------------------------------+ |
| |
+-----------------------------------------------------------------------+
Execution Trace: Parsing “bold”
Copy
+-----------------------------------------------------------------------+
| |
| STEP-BY-STEP: HOW "**bold**" IS PARSED |
| |
+-----------------------------------------------------------------------+
| |
| Character 1: '*' |
| +----------------------------------------------------------------+ |
| | NORMAL --> ONE_STAR | |
| | "First *, could be italic or part of bold" | |
| +----------------------------------------------------------------+ |
| |
| Character 2: '*' |
| +----------------------------------------------------------------+ |
| | ONE_STAR --> TWO_STAR | |
| | "Second *, could be bold or part of bolditalic" | |
| +----------------------------------------------------------------+ |
| |
| Character 3: 'b' |
| +----------------------------------------------------------------+ |
| | TWO_STAR --> IN_BOLD | |
| | "Non-whitespace after ** CONFIRMS bold construct" | |
| +----------------------------------------------------------------+ |
| |
| Characters 4-6: 'old' |
| +----------------------------------------------------------------+ |
| | IN_BOLD --> IN_BOLD | |
| | "Normal content, keep accumulating" | |
| +----------------------------------------------------------------+ |
| |
| Character 7: '*' (with lookahead: next is '*') |
| +----------------------------------------------------------------+ |
| | IN_BOLD --> NORMAL | |
| | "Found closing **, EMIT RANGE, reset" | |
| +----------------------------------------------------------------+ |
| |
| RESULT: Range { type: 'bold', positions: [0-8] } |
| |
+-----------------------------------------------------------------------+
Smart Backspace: Synchronized Marker Deletion
In a hybrid editor where markers are visible, users expect intuitive deletion behavior.Copy
+-----------------------------------------------------------------------+
| |
| THE SMART BACKSPACE PHILOSOPHY |
| |
+-----------------------------------------------------------------------+
| |
| Users think of "**bold**" as a SINGLE construct, |
| not six separate characters. |
| |
| When they delete one marker, they expect the symmetric |
| marker to also delete. This maintains the fundamental |
| property of markdown: PAIRED DELIMITERS. |
| |
+-----------------------------------------------------------------------+
Naive vs Smart Approach
Copy
+-----------------------------------------------------------------------+
| |
| NAIVE APPROACH |
| |
+-----------------------------------------------------------------------+
| |
| User wants to remove bold from "**hello**" |
| |
| Step 1: Position cursor, backspace twice to remove ** |
| Step 2: Position cursor, backspace twice to remove ** |
| |
| Total: 4 backspaces + 2 cursor movements |
| |
| Problem: Tedious, error-prone (easy to delete one side only) |
| |
+-----------------------------------------------------------------------+
+-----------------------------------------------------------------------+
| |
| SMART BACKSPACE |
| |
+-----------------------------------------------------------------------+
| |
| User wants to remove bold from "**hello**" |
| |
| Step 1: Position cursor at any marker |
| Step 2: Backspace once |
| |
| Result: "*hello*" (both markers reduced by 1) |
| Total: 1 backspace |
| |
| Why better: Intuitive, fast, maintains symmetry |
| |
+-----------------------------------------------------------------------+
Why Delete Closing First?
Copy
+-----------------------------------------------------------------------+
| |
| THE POSITION SHIFT PROBLEM |
| |
+-----------------------------------------------------------------------+
| |
| WRONG ORDER: Delete opening first |
| +----------------------------------------------------------------+ |
| | Original: "**bold**" | |
| | Positions: 0 1 2 3 4 5 6 7 8 | |
| | | |
| | Delete at position 1: | |
| | Result: "*bold**" | |
| | ALL POSITIONS SHIFTED! Old pos 7 is now 6 | |
| | | |
| | Try to delete at position 7 (saved earlier): | |
| | ERROR: Position 7 now points to wrong character! | |
| +----------------------------------------------------------------+ |
| |
| CORRECT ORDER: Delete closing first |
| +----------------------------------------------------------------+ |
| | Original: "**bold**" | |
| | Positions: 0 1 2 3 4 5 6 7 8 | |
| | | |
| | Delete at position 7 (last *): | |
| | Result: "**bold*" | |
| | Opening positions UNCHANGED (0-2) | |
| | | |
| | Delete at position 1: | |
| | Result: "*bold*" | |
| | CORRECT! | |
| +----------------------------------------------------------------+ |
| |
| GENERAL PRINCIPLE: |
| When deleting multiple positions, always delete from |
| HIGHEST position to LOWEST. |
| |
+-----------------------------------------------------------------------+
Gradual Degradation Pattern
Copy
+-----------------------------------------------------------------------+
| |
| GRADUAL MARKER DEGRADATION |
| |
+-----------------------------------------------------------------------+
| |
| Starting: "***bolditalic***" |
| Visual: (bold + italic styled) |
| |
| After 1st backspace: |
| +-------------------------------+ |
| | Content: "**bolditalic**" | |
| | Visual: (bold styled) | |
| | Why: Removed outer layer | |
| +-------------------------------+ |
| |
| After 2nd backspace: |
| +-------------------------------+ |
| | Content: "*bolditalic*" | |
| | Visual: (italic styled) | |
| | Why: Removed bold layer | |
| +-------------------------------+ |
| |
| After 3rd backspace: |
| +-------------------------------+ |
| | Content: "bolditalic" | |
| | Visual: (plain text) | |
| | Why: Removed final layer | |
| +-------------------------------+ |
| |
| NATURAL PROGRESSION: bolditalic -> bold -> italic -> plain |
| |
| Why natural? *** = * inside * inside * (nested layers) |
| Backspace peels one layer at a time, maintaining valid state. |
| |
+-----------------------------------------------------------------------+
Copy-Paste Fidelity: VmdBox
When users paste ASCII art or box-drawing content, several problems can occur:Copy
+-----------------------------------------------------------------------+
| |
| COPY-PASTE PROBLEMS IN TRADITIONAL EDITORS |
| |
+-----------------------------------------------------------------------+
| |
| PROBLEM 1: Line Wrapping |
| +----------------------------------------------------------------+ |
| | Original: After paste: | |
| | +---------+ +------- | |
| | | Box | --+ | |
| | +---------+ | Box | |
| | | | |
| | Why: Rich text editors wrap based on container width | |
| +----------------------------------------------------------------+ |
| |
| PROBLEM 2: Font Mismatch |
| +----------------------------------------------------------------+ |
| | Box drawing assumes monospace fonts | |
| | Proportional fonts break all alignment | |
| +----------------------------------------------------------------+ |
| |
| PROBLEM 3: Lost Whitespace |
| +----------------------------------------------------------------+ |
| | Multiple spaces collapse to single space | |
| | " indented" becomes " indented" | |
| +----------------------------------------------------------------+ |
| |
+-----------------------------------------------------------------------+
The VmdBox Solution
Copy
+-----------------------------------------------------------------------+
| |
| VMDBOX: PERFECT TERMINAL PRESERVATION |
| |
+-----------------------------------------------------------------------+
| |
| TERMINAL OUTPUT VMDBOX INPUT |
| |
| +---------------------------+ +---------------------------+ |
| | $ tree | | Paste terminal output | |
| | . | | into VmdBox | |
| | +-- src | | | |
| | | +-- main.js | | PERFECT PRESERVATION: | |
| | | +-- index.html | | - Box drawing chars | |
| | +-- package.json | | - Alignment | |
| +---------------------------+ | - Whitespace | |
| | +---------------------------+ |
| | | |
| +------------------------------------+ |
| | |
| +--------------------------------+ |
| | SAME VISUAL FIDELITY | |
| | Terminal -> Copy -> Paste | |
| | -> VmdBox -> Same appearance | |
| +--------------------------------+ |
| |
+-----------------------------------------------------------------------+
How VmdBox Works
Copy
+-----------------------------------------------------------------------+
| |
| VMDBOX DETECTION FLOW |
| |
+-----------------------------------------------------------------------+
| |
| User pastes content |
| | |
| v |
| +-------------------+ |
| | Pattern Detection | |
| +-------------------+ |
| | |
| | Box-drawing characters? |
| | Table patterns? |
| | Chart patterns? |
| | |
| +---> NO ---> Normal paste (rich text) |
| | |
| +---> YES (2+ box lines) |
| | |
| v |
| +-------------------+ |
| | Create VmdBox | |
| +-------------------+ |
| | |
| v |
| +-------------------+ |
| | Apply Properties | |
| | - Monospace font | |
| | - Preserve spaces | |
| | - No word wrap | |
| | - Auto-scale | |
| +-------------------+ |
| | |
| v |
| Perfect preservation! |
| |
+-----------------------------------------------------------------------+
Auto-Scaling Visualization
Copy
+-----------------------------------------------------------------------+
| |
| AUTO-SCALING: FIT WIDE CONTENT |
| |
+-----------------------------------------------------------------------+
| |
| Original content (1000px wide): |
| +------------------------------------------------------------------+ |
| | +--------------------------------------------------------------+ | |
| | | Very Wide ASCII Art | | |
| | +--------------------------------------------------------------+ | |
| +------------------------------------------------------------------+ |
| |
| Container width: 500px |
| Scale factor: 500/1000 = 0.5 (50%) |
| |
| After scaling: |
| +--------------------------------+ |
| | +----------------------------+ | |
| | | Very Wide ASCII Art | | <-- Scaled to 50% |
| | +----------------------------+ | All proportions preserved |
| +--------------------------------+ |
| |
| Key: Character spacing, line height, and alignment are preserved |
| because uniform scaling applies to both dimensions |
| |
+-----------------------------------------------------------------------+
Parallel State Machines: Terminal and Document
Copy
+-----------------------------------------------------------------------+
| |
| THE SHARED ARCHITECTURE |
| |
+-----------------------------------------------------------------------+
| |
| TERMINAL (VTE Parser) DOCUMENT (Markdown Parser) |
| |
| +---------------------------+ +---------------------------+ |
| | Input: Byte stream | | Input: Text string | |
| | State: VTE states | | State: NORMAL/IN_BOLD/etc | |
| | Output: Grid cells | | Output: Range objects | |
| +---------------------------+ +---------------------------+ |
| | | |
| +--------------------------------+ |
| | |
| +-------------------------------+ |
| | SHARED PRINCIPLES | |
| +-------------------------------+ |
| | | |
| | - Character-by-character | |
| | - Explicit state tracking | |
| | - No backtracking needed | |
| | - Linear time complexity | |
| | - Deterministic output | |
| | | |
| +-------------------------------+ |
| |
| The Brick Property: Both systems process input atomically, |
| building up structured output incrementally. |
| |
+-----------------------------------------------------------------------+
SMPC Discovery: Nested Italic = Bold
A key insight that simplifies the entire implementation:Copy
+-----------------------------------------------------------------------+
| |
| THE CONCEPTUAL SHIFT |
| |
+-----------------------------------------------------------------------+
| |
| TRADITIONAL VIEW (Separate Constructs): |
| +----------------------------------------------------------------+ |
| | * = italic | |
| | ** = bold (separate construct) | |
| | *** = bolditalic (yet another construct) | |
| | | |
| | Result: Three different handling paths needed | |
| +----------------------------------------------------------------+ |
| |
| SMPC VIEW (Nested Single Construct): |
| +----------------------------------------------------------------+ |
| | * = italic | |
| | ** = *italic inside italic* = bold appearance | |
| | *** = *italic inside bold* = bolditalic appearance | |
| | | |
| | Result: Single * handling with nesting! | |
| +----------------------------------------------------------------+ |
| |
| This makes gradual degradation NATURAL: |
| *** -> ** -> * -> (none) is just "remove one layer at a time" |
| |
| Why SMPC: Simplicity emerges from recognizing that |
| ** is just * nested inside *. |
| |
+-----------------------------------------------------------------------+
Summary
Copy
+-----------------------------------------------------------------------+
| |
| STATE MACHINES: THE FOUNDATION OF TRUST |
| |
+-----------------------------------------------------------------------+
| |
| +-------------------+ |
| | VTE Parser | Terminal escape sequence processing |
| +-------------------+ |
| | |
| | Same algorithm pattern |
| v |
| +-------------------+ |
| | Markdown Parser | Document formatting processing |
| +-------------------+ |
| | |
| | Enables |
| v |
| +-------------------+ |
| | Smart Backspace | Synchronized marker deletion |
| +-------------------+ |
| | |
| | Combines with |
| v |
| +-------------------+ |
| | VmdBox | Perfect terminal-to-document copy-paste |
| +-------------------+ |
| | |
| | Results in |
| v |
| +-------------------+ |
| | Human + AI | Unified visual experience |
| | Alignment | Predictable, trustable output |
| +-------------------+ |
| |
| "State machines bring the precision of terminal parsing to |
| document formatting - each character is processed atomically, |
| building up meaning incrementally." |
| |
+-----------------------------------------------------------------------+
Related:
variablemd/hybrid-editor/architecture.mdx, variablemd/core-concepts/codexmono.mdx