Rendered from notes/18-full-project-handoff.md. Open raw Markdown Open current Coco scheduler page Open CodeWiki scheduler page # Coco-Wiki / CodeWiki Tokio Handoff Date: 2026-06-22 This is the current handoff for the Tokio documentation comparison project. It records what was tried, what is currently hosted, where the relevant code lives, how CodeWiki generates its pages, how Coco-Wiki currently generates its pages, and how to rerun the important pieces. ## Hosted Result - Coco-Wiki Tokio: <https://tokio-wiki-compare.pages.dev/coco/> - CodeWiki Tokio snapshot: <https://tokio-wiki-compare.pages.dev/codewiki/> - Side-by-side compare page: <https://tokio-wiki-compare.pages.dev/compare/> - Handoff page: <https://tokio-wiki-compare.pages.dev/handoff/> - Current verified preview deployment: <https://5434405a.tokio-wiki-compare.pages.dev/> Current notable result: - The Coco multi-thread scheduler page has been replaced with a CodeWiki-learned full-source prompt prototype. - Page URL: <https://tokio-wiki-compare.pages.dev/coco/#tokio_runtime_scheduler_multi_thread> - Markdown URL: <https://tokio-wiki-compare.pages.dev/coco/pages/tokio_runtime_scheduler_multi_thread.md> - It is 21.6KB, contains 4 Mermaid diagrams, and was validated with Mermaid 11.9.0 after syntax fixes. Important scope note: - Only `tokio_runtime_scheduler_multi_thread` has the CodeWiki-learned full-source style applied. - The other 23 Coco pages are still the previous strong-model rewrite over frozen `ComponentDossier` / `ModuleNotebook` artifacts. ## Local Paths Main Coco-Wiki repo: ```text /Users/zhhanz/Documents/rust/src/coco-wiki ``` Cloned CodeWiki source: ```text /tmp/CodeWiki-src ``` Static comparison site root: ```text /tmp/tokio-wiki-compare-site ``` Clean deploy directory, recreated before deploy: ```text /tmp/tokio-wiki-compare-deploy ``` Coco generated deliverable: ```text deliverables/tokio-research-wiki ``` Current important deliverable subpaths: ```text deliverables/tokio-research-wiki/index.html deliverables/tokio-research-wiki/manifest.json deliverables/tokio-research-wiki/pages/ deliverables/tokio-research-wiki/pages-strong/ deliverables/tokio-research-wiki/pages-deterministic-v2/ deliverables/tokio-research-wiki/pages-codewiki-learned/ deliverables/tokio-research-wiki/artifacts/dossiers/ deliverables/tokio-research-wiki/artifacts/research/ deliverables/tokio-research-wiki/artifacts/notebooks/ deliverables/tokio-research-wiki/artifacts/sections/ deliverables/tokio-research-wiki/codewiki-learned-writer-prompt.md ``` Hosted site copy: ```text /tmp/tokio-wiki-compare-site/coco/index.html /tmp/tokio-wiki-compare-site/coco/pages/ /tmp/tokio-wiki-compare-site/coco/pages-codewiki-learned/ /tmp/tokio-wiki-compare-site/codewiki/index.html /tmp/tokio-wiki-compare-site/assets/codewiki-pages.json /tmp/tokio-wiki-compare-site/assets/codewiki-module-tree.json /tmp/tokio-wiki-compare-site/handoff/index.html ``` ## What Was Tried 1. Design review and v0.3 architecture direction The original `coco-wiki-DESIGN.md` was reviewed against RepoDoc, RepoSummary, Skyframe, Salsa, SCIP, RANGER, and Aider repo map ideas. The recommended design became: ```text Boundary-first Structural Tree + Capability Lenses + CodeIntel Graph + Capability Graph + Artifact Graph + Red-Green evaluation ``` 2. Implementation correctness rounds Multiple implementation review rounds fixed root correctness issues around artifact persistence, verification gating, exact dirty checking, candidate impact vs exact impact, revision digests, dependency equality, Ask routing, CAS concurrency, relationship indexing, incremental snapshot restamping, build gating, and structured claims/evidence/verification storage. The later reported state was: ```text workspace tests: 205 passing acceptance harness: 23/23 green, later 31/31 for added acceptance checks ``` 3. Source-rich GLM 5.2 run Coco-Wiki was wired to include source snippets as deterministic evidence and attempted a GLM 5.2 source-rich generation pass. It published 24/24 pages with verification passing and clean rerun avoided model calls. Result: better than symbol-only pages, but still too shallow compared with CodeWiki. The main problem was not only prompt wording or source snippets. The writer still lacked a CodeWiki-like module research context and people-oriented module guide structure. 4. ResearchQuestionArtifact / ModuleNotebook executable v1 `cw-research` was added as an explicit semantic research layer: ```text ModuleSpec -> ComponentDossier -> Completeness Gate -> ResearchQuestionArtifact -> ModuleNotebook -> PageSectionArtifact -> Markdown -> HTML viewer ``` Current generated Tokio artifact counts: ```text pages: 24 component dossiers: 24 research artifacts: 120 notebooks: 24 section artifact sets: 24 frozen citations: 390 recorded query footprints: 3165 known gaps: 0 ``` 5. Strong-model writer pass All 24 pages were rewritten with local `claude -p --model sonnet` from frozen deterministic evidence. This improved prose but still did not match CodeWiki's scheduler page quality. 6. CodeWiki prompt/source study CodeWiki was cloned to `/tmp/CodeWiki-src`, and the relevant files were read: ```text /tmp/CodeWiki-src/codewiki/src/be/prompt_template.py /tmp/CodeWiki-src/codewiki/src/be/documentation_generator.py /tmp/CodeWiki-src/codewiki/src/be/backend.py /tmp/CodeWiki-src/codewiki/src/be/caw_backend.py /tmp/CodeWiki-src/codewiki/src/be/agent_tools/read_code_components.py /tmp/CodeWiki-src/codewiki/src/be/agent_tools/generate_sub_module_documentations.py ``` The key finding: CodeWiki quality is not from a magic prompt. It comes from module tree + full source grouped by file + agentic submodule docs + parent overview synthesis. 7. CodeWiki-learned scheduler prototype A new prompt contract was written: ```text deliverables/tokio-research-wiki/codewiki-learned-writer-prompt.md ``` A reusable generator script was added: ```text scripts/generate_codewiki_learned_scheduler.py ``` The generated page was applied to: ```text deliverables/tokio-research-wiki/pages-codewiki-learned/tokio_runtime_scheduler_multi_thread.md deliverables/tokio-research-wiki/pages/tokio_runtime_scheduler_multi_thread.md /tmp/tokio-wiki-compare-site/coco/pages/tokio_runtime_scheduler_multi_thread.md ``` 8. Mermaid 11.9.0 fix The first CodeWiki-learned page had Mermaid syntax errors. The fixed problems were: ```text subgraph id with hyphen: Per-Worker subgraph label with brackets: Remote[i] state labels with :: such as Launch::launch() sequence message text with semicolons ``` Current 4 Mermaid blocks parse under local Mermaid 11.9.0. ## How CodeWiki Works CodeWiki's main pipeline is in: ```text /tmp/CodeWiki-src/codewiki/src/be/documentation_generator.py ``` The flow is: ```text DependencyGraphBuilder -> components + leaf nodes -> LLM clustering into module_tree -> leaf modules first -> parent modules after children -> repository overview after module docs -> metadata.json / module_tree.json / overview.md / module pages ``` Important CodeWiki prompt files: ```text /tmp/CodeWiki-src/codewiki/src/be/prompt_template.py ``` Key prompt pieces: - `SYSTEM_PROMPT`: comprehensive system documentation, architecture, component relationships, sub-module docs, Mermaid diagrams, and tool workflow. - `LEAF_SYSTEM_PROMPT`: similar but without recursive sub-module generation. - `USER_PROMPT`: gives `<MODULE_TREE>` and `<CORE_COMPONENT_CODES>`. - `CLUSTER_REPO_PROMPT` / `CLUSTER_MODULE_PROMPT`: ask the model to group component ids into semantic modules. - `format_user_prompt`: groups core components by file and injects full file content. The most important quality detail is this CodeWiki behavior: ```text for each core component id: group by component.relative_path for each file: write: # File: path ## Core Components in this file - component ids ## File Content full file text ``` This means CodeWiki writers are often reading the actual implementation, not only extracted symbols or snippets. Important CodeWiki tools: ```text /tmp/CodeWiki-src/codewiki/src/be/agent_tools/read_code_components.py /tmp/CodeWiki-src/codewiki/src/be/agent_tools/generate_sub_module_documentations.py /tmp/CodeWiki-src/codewiki/src/be/agent_tools/str_replace_editor.py ``` What they do: - `read_code_components`: lets the agent fetch source for additional component ids. - `generate_sub_module_documentation`: recursively runs sub-agents for complex modules. - `str_replace_editor`: writes docs through CodeWiki's controlled editor path. CodeWiki backend abstraction: ```text /tmp/CodeWiki-src/codewiki/src/be/backend.py ``` It supports: - API mode through pydantic-ai/litellm. - Subscription CLI mode through `caw`, using local `claude` or `codex`. CodeWiki installed CLI examples from its README: ```bash cd /tmp/CodeWiki-src python3.12 -m venv .venv source .venv/bin/activate pip install -e . # Claude subscription mode, after `claude login` codewiki config set \ --provider claude-code \ --main-model sonnet \ --cluster-model sonnet codewiki generate --output /tmp/codewiki-tokio-docs --verbose ``` The hosted CodeWiki Tokio snapshot is partial: ```text /tmp/tokio-wiki-compare-site/assets/codewiki-pages.json /tmp/tokio-wiki-compare-site/assets/codewiki-module-tree.json ``` Snapshot stats: ```text generated pages: 17 known complete: false stopped at: runtime_scheduler_multi_thread / worker_core ``` ## How Coco-Wiki Works In This Project Core design code: ```text crates/cw-research/src/lib.rs crates/cw-research/src/tokio_wiki.rs crates/cw-fastcontext/src/lib.rs crates/cw-cli/src/commands/research.rs ``` Deterministic Tokio wiki generator: ```text crates/cw-research/src/tokio_wiki.rs ``` It defines 24 `ModuleSpec`s for Tokio and materializes: ```text ComponentDossier ResearchQuestionArtifact ModuleNotebook PageSectionArtifact Markdown single-file HTML viewer manifest ``` This generator does not call a model. It is reproducible and uses write-if-changed semantics so clean reruns skip unchanged outputs. FastContext boundary: ```text crates/cw-fastcontext/src/lib.rs ``` It implements an OpenAI-compatible client for `microsoft/FastContext-1.0-4B-SFT` and read-only repository tools: ```text READ GLOB GREP ``` It records footprints: ```text Read(path, revision_fp, start_line, end_line) Glob(pattern, path_set_fp, path_count) Grep(pattern, scope, match_set_fp, match_count) SymbolLookup(query, result_set_fp, result_count) GraphLookup(seed, relation_filter, adjacency_fp, edge_count) SemanticFrontier(query, candidate_set_fp, selected_k, frontier_m, cutoff) ``` FastContext is deliberately a research scout, not a live writer tool. Its output should be frozen as `ResearchQuestionArtifact` before any page writer sees it. ## How To Generate The Deterministic Coco Tokio Wiki From the repo root: ```bash cd /Users/zhhanz/Documents/rust/src/coco-wiki cargo run -p cw-cli --quiet -- research tokio-wiki \ --repo ~/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.52.3 \ --output deliverables/tokio-research-wiki ``` Expected output is JSON like: ```text page_count: 24 component_dossiers: 24 research_artifacts: 120 notebooks: 24 section_artifacts: 24 html_index: deliverables/tokio-research-wiki/index.html manifest_path: deliverables/tokio-research-wiki/manifest.json ``` Clean rerun should report mostly unchanged output. If it rewrites everything, check for volatile fields being serialized into the manifest or generated pages. Open locally: ```bash open deliverables/tokio-research-wiki/index.html ``` ## How To Validate A Research Artifact Example: ```bash cargo run -p cw-cli --quiet -- research validate-footprints \ --repo ~/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.52.3 \ --json \ deliverables/tokio-research-wiki/artifacts/research/tokio_runtime_scheduler_multi_thread_control_flow.json ``` Green means the recorded read/glob/grep/index-style footprints still match the current source tree. ## How To Run FastContext From Rust Serve FastContext with SGLang or vLLM. The notes assume SGLang: ```bash python3 -m sglang.launch_server \ --model-path ~/.cache/huggingface/fastcontext/FastContext-1.0-4B-SFT \ --tool-call-parser qwen \ --context-length 262144 \ --trust-remote-code \ --dtype bfloat16 \ --host 127.0.0.1 \ --port 30000 \ --tp-size 1 \ --mem-fraction-static 0.8 ``` Run a live scout query: ```bash cargo run -p cw-cli -- fastcontext explore \ --repo ~/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.52.3 \ "Find where the multi-thread scheduler parks and unparks workers." ``` Freeze a research artifact: ```bash cargo run -p cw-cli -- research explore \ --repo ~/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.52.3 \ --module runtime::scheduler::multi_thread \ --question-id worker_lifecycle \ --output /tmp/worker_lifecycle.json \ "How does the multi-thread worker lifecycle run, park, unpark, and shut down?" ``` Aggregate artifacts into a notebook: ```bash cargo run -p cw-cli -- research notebook \ --module runtime::scheduler::multi_thread \ --output /tmp/runtime_scheduler_multi_thread_notebook.json \ /tmp/worker_lifecycle.json ``` ## How To Generate The CodeWiki-Learned Scheduler Page The reusable script is: ```text scripts/generate_codewiki_learned_scheduler.py ``` Prompt-only dry run: ```bash scripts/generate_codewiki_learned_scheduler.py \ --no-call \ --prompt-out /tmp/scheduler_prompt.md ``` Generate with local Claude CLI: ```bash scripts/generate_codewiki_learned_scheduler.py \ --model sonnet \ --max-budget-usd 2.50 ``` The script sends the prompt through stdin to avoid OS argument length limits. It writes: ```text deliverables/tokio-research-wiki/pages-codewiki-learned/tokio_runtime_scheduler_multi_thread.md ``` Apply that generated page to the local viewer and hosted-site copy: ```bash scripts/apply_tokio_page_to_viewer.py \ --markdown deliverables/tokio-research-wiki/pages-codewiki-learned/tokio_runtime_scheduler_multi_thread.md ``` This updates: ```text deliverables/tokio-research-wiki/pages/tokio_runtime_scheduler_multi_thread.md deliverables/tokio-research-wiki/index.html /tmp/tokio-wiki-compare-site/coco/pages/tokio_runtime_scheduler_multi_thread.md /tmp/tokio-wiki-compare-site/coco/pages-codewiki-learned/tokio_runtime_scheduler_multi_thread.md /tmp/tokio-wiki-compare-site/coco/index.html ``` ## How To Validate Mermaid 11.9.0 The current scheduler page had to be made Mermaid-safe. Local validation used Mermaid 11.9.0 in `/tmp/tokio-wiki-compare-site/node_modules`. Install dependencies if needed: ```bash cd /tmp/tokio-wiki-compare-site npm install -D mermaid@11.9.0 jsdom dompurify ``` The temporary validation script used during this run is: ```text /tmp/parse_mermaid_dom.mjs ``` Run: ```bash node /tmp/parse_mermaid_dom.mjs \ /Users/zhhanz/Documents/rust/src/coco-wiki/deliverables/tokio-research-wiki/pages/tokio_runtime_scheduler_multi_thread.md ``` Expected: ```text block 1: ok block 2: ok block 3: ok block 4: ok ``` Mermaid 11.9.0 guardrails for future generated pages: ```text avoid hyphens in subgraph ids avoid [brackets] inside subgraph labels avoid :: in state transition labels avoid semicolons in sequence message text avoid raw HTML-ish generic text such as <Core> in diagram labels ``` ## How To Deploy The Comparison Site Wrangler is installed locally in: ```text /tmp/tokio-wiki-compare-site/node_modules/.bin/wrangler ``` The project cache identifies: ```text project_name: tokio-wiki-compare ``` Recreate a clean deploy directory. This avoids uploading `node_modules`, `package.json`, `.wrangler`, or lockfiles: ```bash rm -rf /tmp/tokio-wiki-compare-deploy mkdir -p /tmp/tokio-wiki-compare-deploy rsync -a --delete \ --exclude node_modules \ --exclude package.json \ --exclude package-lock.json \ --exclude .wrangler \ /tmp/tokio-wiki-compare-site/ \ /tmp/tokio-wiki-compare-deploy/ ``` Deploy: ```bash cd /tmp/tokio-wiki-compare-site ./node_modules/.bin/wrangler pages deploy \ /tmp/tokio-wiki-compare-deploy \ --project-name tokio-wiki-compare ``` Last successful deploy preview: ```text https://5434405a.tokio-wiki-compare.pages.dev ``` Verify production content: ```bash python3 - <<'PY' import time, urllib.request urls = [ f"https://tokio-wiki-compare.pages.dev/coco/pages/tokio_runtime_scheduler_multi_thread.md?cb={int(time.time())}", f"https://tokio-wiki-compare.pages.dev/coco/?cb={int(time.time())}", f"https://tokio-wiki-compare.pages.dev/handoff/?cb={int(time.time())}", ] for url in urls: text = urllib.request.urlopen(urllib.request.Request(url, headers={"User-Agent": "Mozilla/5.0"}), timeout=30).read().decode() print(url, len(text)) for phrase in [ "Task Source Decision Table", "Worker Lifecycle and Core Hand-off", "CodeWiki-learned scheduler prototype", "Coco-Wiki / CodeWiki Tokio Handoff", ]: if phrase in text: print(" found", phrase) PY ``` ## How To Run The Normal Coco-Wiki CLI Flow For a real repo flow, not the special Tokio research generator: ```bash cargo run -p cw-cli -- init --force cargo run -p cw-cli -- index run cargo run -p cw-cli -- plan run cargo run -p cw-cli -- build run cargo run -p cw-cli -- ask "how does Scheduler work?" cargo run -p cw-cli -- verify run cargo run -p cw-cli -- explain-dirty run tokio_runtime_scheduler_multi_thread ``` Runtime options: ```bash cargo run -p cw-cli -- build run --runtime native cargo run -p cw-cli -- build run --runtime cocoindex ``` The CocoIndex runtime adapter examples live under: ```text examples/cocoindex_runtime_demo/ examples/cocoindex_cw_runtime_adapter/ ``` Example scripts: ```bash examples/cocoindex_runtime_demo/run.sh examples/cocoindex_cw_runtime_adapter/run.sh examples/cocoindex_cw_runtime_adapter/cocoindex_e2e_incremental.sh ``` ## Current Quality Assessment The latest scheduler page is materially closer to CodeWiki because it is now shaped as a human module guide: ```text introduction how it fits architecture overview key design principles sub-module documentation task scheduling flow worker lifecycle shutdown procedure state/concurrency invariants performance notes evidence map incrementality note ``` The prior Coco page was accurate but read like an artifact/evidence report. CodeWiki's better page quality came from giving the model full source context and a documentation task framed around maintainers. Remaining gap: - Only one Coco page has been regenerated with the CodeWiki-learned contract. - CodeWiki still has a stronger recursive module-doc workflow. - Coco still needs this contract generalized across all 24 Tokio pages and then moved into the production buildgraph, instead of remaining a special scheduler script. Recommended next step: ```text generalize scripts/generate_codewiki_learned_scheduler.py into a page-id driven generator: page_id -> manifest entry -> source-file pack from dossier/source spans -> CodeWiki-learned prompt -> writer -> Mermaid validation -> apply_tokio_page_to_viewer.py ``` After that, batch regenerate the 5 golden pages first: ```text tokio_runtime tokio_runtime_scheduler tokio_runtime_scheduler_multi_thread tokio_runtime_io_driver tokio_sync_mpsc ``` Then regenerate all 24 pages only if the golden-page quality holds. ## Files Added In This Handoff ```text scripts/generate_codewiki_learned_scheduler.py scripts/apply_tokio_page_to_viewer.py notes/18-full-project-handoff.md ``` ## Security / Secrets Note Do not put provider keys into this repo, the handoff page, command history snippets, or generated docs. Use local CLI auth (`claude login`, `codex login`) or environment variables outside committed files. The hosted handoff intentionall