{
  "base_url": "https://talk.nervos.org",
  "generated_at": "2026-04-28T18:16:18.734431+00:00",
  "since": "2026-04-27T18:16:15.083508+00:00",
  "until": "2026-04-28T18:16:15.083508+00:00",
  "window_hours": 24,
  "topics": [
    {
      "topic_id": 10130,
      "title": "Introducing CKB Kickstarter: Decentralized All-or-Nothing Crowdfunding on Nervos CKB (Testnet MVP Live)",
      "slug": "introducing-ckb-kickstarter-decentralized-all-or-nothing-crowdfunding-on-nervos-ckb-testnet-mvp-live",
      "url": "https://talk.nervos.org/t/introducing-ckb-kickstarter-decentralized-all-or-nothing-crowdfunding-on-nervos-ckb-testnet-mvp-live/10130",
      "created_at": "2026-03-25T20:44:37.875000+00:00",
      "last_posted_at": "2026-04-28T17:10:24.005000+00:00",
      "category_id": 32,
      "tags": [
        "CKB",
        "dapp"
      ],
      "posters": [
        "Original Poster, Most Recent Poster",
        "Frequent Poster",
        "Frequent Poster",
        "Frequent Poster",
        "Frequent Poster"
      ],
      "recent_posts": [
        {
          "post_id": 24060,
          "post_number": 9,
          "topic_id": 10130,
          "topic_title": "Introducing CKB Kickstarter: Decentralized All-or-Nothing Crowdfunding on Nervos CKB (Testnet MVP Live)",
          "topic_slug": "introducing-ckb-kickstarter-decentralized-all-or-nothing-crowdfunding-on-nervos-ckb-testnet-mvp-live",
          "author": "Ayoub_Lesfer",
          "created_at": "2026-04-28T17:10:24.005000+00:00",
          "updated_at": "2026-04-28T17:10:24.005000+00:00",
          "reply_to_post_number": null,
          "url": "https://talk.nervos.org/t/introducing-ckb-kickstarter-decentralized-all-or-nothing-crowdfunding-on-nervos-ckb-testnet-mvp-live/10130/9",
          "content_text": "Update: Automatic Finalization Bot live on testnet\nFollowing up on the v1.1 update above: the bot is deployed and end-to-end verified on testnet as of yesterday (2026-04-27). The platform is now fully trustless on testnet, campaigns flow create → pledge → deadline → distribution with zero manual intervention from anyone (creator, backer, or platform operator).\nWhat the bot does (each polling cycle, every 10s):\nDetects expired campaigns still in Active status → submits permissionless finalizeCampaign tx (Success if total pledged ≥ goal, Failed otherwise)\nFor finalized Success campaigns with remaining live pledge cells → submits permissionlessRelease tx (funds → creator)\nFor finalized Failed campaigns with remaining live pledge cells → submits permissionlessRefund tx (funds → backer)\nArchitecture:\nSingle FinalizationBot class integrated into the existing indexer process (no separate service)\nRuns on Render free tier inside the same container as the indexer\nBot wallet funded with 100k CKB testnet, fees are negligible (~0.001 CKB per finalize/distribute)\nBot is optional: if BOT_PRIVATE_KEY env var is unset, the indexer runs normally and users can still trigger finalize/release/refund manually from the UI\nBot needs no special permissions, every contract entry point it calls is permissionless on-chain. The bot is a convenience, not a trust dependency.\nE2E verification on testnet (2026-04-27):\nPath\nGoal\nPledged\nOutcome\nSuccess\n200 CKB\n250 CKB\nBot auto-finalized as Success → auto-released to creator (release tx 0x564c6d7a...)\nFailed\n10,000 CKB\n100 CKB\nBot auto-finalized as Failed → auto-refunded to backer (refund tx 0x54fd7e40...)\nTotal time from deadline to full distribution: ~30 seconds.\nTry it yourself: https://decentralized-kickstarter-kappa.vercel.app/ create a campaign with a short deadline, pledge from a second JoyID account, and watch the bot do its thing.\nWhat’s next:\nExternal code review of v1.1 contracts\nSustainable platform business model (fees + treasury), open to community input on what feels right for an ecosystem-funded project\nMainnet deployment",
          "content_html": "<p><strong>Update: Automatic Finalization Bot live on testnet</strong> <img src=\"https://talk.nervos.org/images/emoji/apple/white_check_mark.png?v=15\" title=\":white_check_mark:\" class=\"emoji\" alt=\":white_check_mark:\" loading=\"lazy\" width=\"20\" height=\"20\"></p>\n<p>Following up on the v1.1 update above: the bot is deployed and end-to-end verified on testnet as of yesterday (2026-04-27). The platform is now <strong>fully trustless</strong> on testnet, campaigns flow <code>create → pledge → deadline → distribution</code> with zero manual intervention from anyone (creator, backer, or platform operator).</p>\n<p><strong>What the bot does</strong> (each polling cycle, every 10s):</p>\n<ul>\n<li>Detects expired campaigns still in <code>Active</code> status → submits permissionless <code>finalizeCampaign</code> tx (Success if total pledged ≥ goal, Failed otherwise)</li>\n<li>For finalized Success campaigns with remaining live pledge cells → submits <code>permissionlessRelease</code> tx (funds → creator)</li>\n<li>For finalized Failed campaigns with remaining live pledge cells → submits <code>permissionlessRefund</code> tx (funds → backer)</li>\n</ul>\n<p><strong>Architecture:</strong></p>\n<ul>\n<li>Single <code>FinalizationBot</code> class integrated into the existing indexer process (no separate service)</li>\n<li>Runs on Render free tier inside the same container as the indexer</li>\n<li>Bot wallet funded with 100k CKB testnet, fees are negligible (~0.001 CKB per finalize/distribute)</li>\n<li>Bot is <strong>optional</strong>: if <code>BOT_PRIVATE_KEY</code> env var is unset, the indexer runs normally and users can still trigger finalize/release/refund manually from the UI</li>\n<li>Bot needs <strong>no special permissions</strong>, every contract entry point it calls is permissionless on-chain. The bot is a convenience, not a trust dependency.</li>\n</ul>\n<p><strong>E2E verification on testnet (2026-04-27):</strong></p>\n<div class=\"md-table\">\n<table>\n<thead>\n<tr>\n<th>Path</th>\n<th>Goal</th>\n<th>Pledged</th>\n<th>Outcome</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>Success</td>\n<td>200 CKB</td>\n<td>250 CKB</td>\n<td>Bot auto-finalized as Success → auto-released to creator (release tx <code>0x564c6d7a...</code>)</td>\n</tr>\n<tr>\n<td>Failed</td>\n<td>10,000 CKB</td>\n<td>100 CKB</td>\n<td>Bot auto-finalized as Failed → auto-refunded to backer (refund tx <code>0x54fd7e40...</code>)</td>\n</tr>\n</tbody>\n</table>\n</div><p>Total time from deadline to full distribution: <strong>~30 seconds.</strong></p>\n<p><strong>Try it yourself:</strong> <a href=\"https://decentralized-kickstarter-kappa.vercel.app/\" rel=\"noopener nofollow ugc\">https://decentralized-kickstarter-kappa.vercel.app/</a> create a campaign with a short deadline, pledge from a second JoyID account, and watch the bot do its thing.</p>\n<p><strong>What’s next:</strong></p>\n<ul>\n<li>External code review of v1.1 contracts</li>\n<li>Sustainable platform business model (fees + treasury), open to community input on what feels right for an ecosystem-funded project</li>\n<li>Mainnet deployment</li>\n</ul>",
          "like_count": 0,
          "quote_count": 0
        }
      ]
    },
    {
      "topic_id": 9995,
      "title": "Spark Program | Nervos Brain - A Global Developer Onboarding Engine and Cross-Language Hub Powered by Agentic RAG",
      "slug": "spark-program-nervos-brain-a-global-developer-onboarding-engine-and-cross-language-hub-powered-by-agentic-rag",
      "url": "https://talk.nervos.org/t/spark-program-nervos-brain-a-global-developer-onboarding-engine-and-cross-language-hub-powered-by-agentic-rag/9995",
      "created_at": "2026-02-25T09:58:43.726000+00:00",
      "last_posted_at": "2026-04-28T14:03:27.779000+00:00",
      "category_id": 49,
      "tags": [
        "In-Progress",
        "Spark-Program"
      ],
      "posters": [
        "Original Poster",
        "Frequent Poster",
        "Frequent Poster",
        "Most Recent Poster"
      ],
      "recent_posts": [
        {
          "post_id": 24056,
          "post_number": 31,
          "topic_id": 9995,
          "topic_title": "Spark Program | Nervos Brain - A Global Developer Onboarding Engine and Cross-Language Hub Powered by Agentic RAG",
          "topic_slug": "spark-program-nervos-brain-a-global-developer-onboarding-engine-and-cross-language-hub-powered-by-agentic-rag",
          "author": "IrisNeko",
          "created_at": "2026-04-28T11:54:09.506000+00:00",
          "updated_at": "2026-04-28T11:54:09.506000+00:00",
          "reply_to_post_number": null,
          "url": "https://talk.nervos.org/t/spark-program-nervos-brain-a-global-developer-onboarding-engine-and-cross-language-hub-powered-by-agentic-rag/9995/31",
          "content_text": "第七周周报\n一、本周目标（工具闭环与评测基线周）\n本周承接第六周“多轮可持续交互”阶段的工作，重点从“机制已经具备”推进到“关键路径真正闭环、且后续可以被稳定评测”。核心目标有四个：\n继续治理运行时日志噪音，补齐最小可观测性闭环。\n让 discourse_query / github_search 从协议层定义走到图执行主路径可调用。\n建立第一版多轮评测集，为后续 benchmark 和量化回归提供统一输入。\n补齐 Telegram / Discord 两端在长消息与异常路径下的稳定性回归。\n二、本周完成\n日志治理与诊断视图补齐\n已对常见第三方库日志进行了统一降噪处理，补充了 quiet_loggers 与 third_party_level 控制项，降低了测试和运行期的无关日志干扰。同时把工具执行过程的摘要信息接入到图状态与 trace_summary 中，使得一次回答至少可以追踪到“执行了哪些 tool、各自成功/为空/失败”的最小诊断视图，而不再只能看到最终回答文本。\n工具执行闭环推进到运行时主路径\n本周把 discourse_query 与 github_search 从“schema 已存在但主路径未接通”的状态推进到了可被 RetrieverPlanner -> RetrievalExecutor -> ToolRuntime 实际调用的状态。具体包括：\nRetrieverPlanner 归一化阶段已允许这两个 tool 保留，不再强制回退到 qdrant_search。\nToolRuntime 为二者补齐了 handler。\nhandler 采用“transport 优先、本地 archive fallback 兜底”的策略：若存在外部 transport，则优先通过 transport 执行；若 transport 不可用，则回退到本地 archive/BM25 查询，保证评测和离线回归场景下依旧能走通闭环。\ntimeout / idempotency / error normalization 已统一复用 execute_tool，不再为新增 tool 走特殊旁路逻辑。\n回答链路的可解释性继续增强\n在原有“回答生成稳态与兜底能力增强”的基础上，本周补充了 tool 级别的错误码与执行摘要，使回答不仅能在失败时兜底，也能在 trace 中解释失败原因。例如工具执行异常会落到统一错误结构中，而不是静默吞掉。这样后续排查时，可以区分“模型答错”“检索为空”“工具执行异常”“预算截断”等不同问题来源。\n多轮评测集第一版落地\n已新增 evaluation/week7_multiturn_eval.jsonl 作为第一版多轮 benchmark 输入集，并新增配套的 loader / validator，确保每条 case 至少满足：\n有稳定 case_id；\n属于明确任务类别；\n至少包含两轮以上对话；\n带有 success_criteria；\n带有 expected_signals 用于后续自动比对。\n平台稳定性回归增强\nTelegram / Discord 两端都补充了：\ngraph runner 异常时的 fallback 测试；\n超长回答的分段发送测试；\n与 full graph 一致的输出适配回归。\n这部分虽然还没有做到真正的在线压测，但已把“长文本切分”和“异常不崩溃”这两条高风险路径先用自动化测试钉住。\nWeek 7 相关回归验证通过\n本周相关测试，结果为：\n78 passed, 1 warning\n覆盖范围包括 tool runtime、graph executor、logging system、Discord/TG runtime 以及 evaluation dataset 验证。\n三、本周重点：评测流程与 Benchmark 构建思路\n本周最重要的新工作，不是单纯“加了几条样例”，而是把后续多轮评测的基本方法论先定下来。这里单独展开说明。\n3.1 为什么这一周先做评测集，而不是直接做分数面板\n当前项目虽然已经具备多轮补参、反思分流、回答兜底等机制，但如果没有稳定 benchmark，任何关于“效果更好了”的结论都只能靠主观感受。尤其是多轮系统，很容易出现下面几类错觉：\n看起来会追问了，但追问的问题不一定对。\n看起来会恢复上下文了，但恢复后不一定真的沿着原问题推进。\n看起来工具更多了，但新工具可能并没有真正参与主路径决策。\n看起来回答更长了，但未必更可引用、更稳定。\n因此本周没有直接上“评分 dashboard”，而是先做 benchmark 输入层。原因很简单：没有稳定输入，就没有稳定指标；没有稳定 case，任何分数都不可复现。\n3.2 Benchmark 的目标不是“考模型常识”，而是考系统闭环能力\n本周设计 benchmark 时，刻意没有把重点放在开放式知识问答上，而是围绕 Nervos Brain 当前最关键的系统能力来建样例。也就是说，这份 benchmark 的目标不是测试“模型懂不懂 CKB”，而是测试下面这些系统行为是否发生：\n是否会在缺少必要参数时进行合理追问。\n追问后是否能保留线程上下文并继续执行，而不是重新开始。\n是否会根据任务类型选择合适的 tool，而不是永远只走 qdrant_search。\n面对文档与代码冲突、日志不完整、用户目标不明确等真实场景时，是否会做出保守且可解释的决策。\n最终回答是否带引用、是否避免过度编造、是否体现任务导向。\n换句话说，这份 benchmark 更像是“系统工作流 benchmark”，而不是“百科知识 benchmark”。\n3.3 为什么选这三类任务\n本周评测集按三类任务拆分：\nsolution_recommendation（方案推荐）\ndevelopment_guidance（开发指导）\ntroubleshooting（排障定位）\n这样拆分的原因是，这三类任务对应了系统最典型、也最容易出错的三种工作模式。\nsolution_recommendation 关注的是：\n用户目标还比较模糊；\n系统要先帮助缩小问题空间；\n容易因为用户补充信息变化而改写推荐路径。\ndevelopment_guidance 关注的是：\n用户往往已经有明确目标，但缺技术细节；\n系统要在检索、补参、代码示例、分阶段步骤之间做平衡；\n很适合验证 AskUser → resume → answer 的闭环。\ntroubleshooting 关注的是：\n用户信息往往不完整；\n证据冲突与日志缺失更常见；\n系统必须优先保守，不应过早武断下结论。\n这三类覆盖面并不等于项目全部任务，但已经足够形成一个可复用的最小 benchmark 骨架。\n3.4 每条评测 case 的设计原则\n本周不是只记录“问题文本”，而是给每条 case 设计了完整结构。单条 case 至少包含以下字段：\ncase_id\n用于保证 case 可追踪、可比对、可回归。\ncategory\n明确任务类别，避免把推荐类、开发类、排障类混在一起统计，导致指标失真。\nconversation\n必须是多轮结构，而不是单句问答。因为我们本周的目标就是验证多轮补参与恢复，而不是单轮回答质量。\nexpected_signals\n这是本周 benchmark 设计里最关键的一层。它不直接判断最终答案“好不好”，而是先判断系统行为“有没有发生”。例如：\n是否应该 ask for sdk_language；\n是否应该使用 github_search；\n是否应该同时走 discourse_query 与 qdrant_search；\n是否不应该在缺日志时给出过度确定结论。\n这层信号对后续半自动评测非常重要，因为它让我们可以把“工作流行为正确”从“最终表述优雅”里拆出来单独评估。\nsuccess_criteria\n这里记录的是回答级别的成功标准，例如：\n至少包含两个来源；\n给出 JS/TS 优先的路径；\n不能把文档/代码冲突直接武断归因。\n它和 expected_signals 的区别在于：前者看流程动作，后者看回答结果。\n3.5 为什么 benchmark 里同时保留 expected_signals 和 success_criteria\n如果只有 success_criteria，我们只能看最后回答像不像“还行”，但无法知道它是通过正确流程得到的，还是偶然答对的。\n如果只有 expected_signals，我们又只能知道系统“做了动作”，但不能判断最终产出的答案是否真的可用。\n所以本周采用“双层判定”思路：\n第一层：流程层 benchmark\n检查有没有正确 ask_user、有没有正确选 tool、有没有沿线程继续执行。\n第二层：回答层 benchmark\n检查最终回答是否引用充分、是否尊重约束、是否满足任务目标。\n这两层拆开之后，后面出现 bad case 时就能更快定位：\n是 planner 错了；\n是 executor 没调到正确工具；\n还是 composer 在证据足够时仍然总结失真。\n3.6 当前 benchmark 的样例来源与构造策略\n本周这 6 条样例不是随机写出来的，而是围绕当前系统最值得验证的机制构造出来的：\n推荐类样例\n用来测试用户目标在第二轮收缩后，推荐路径是否跟着变化。\n例如从“我想做 demo”进一步细化到“前端集成、JS/TS”。\n开发指导类样例\n用来测试系统是否会在缺语言/版本时追问，并在得到补参后输出更贴近实际工程的步骤或示例来源。\n排障类样例\n用来测试系统在日志不充分、版本冲突、文档与代码不一致时，是否会优先保守地继续收集证据，而不是直接“拍脑袋诊断”。\n本质上，这些 case 优先覆盖的是“高频失败模式”，而不是“知识点覆盖率”。这也是第一版 benchmark 更适合工程迭代的原因。\n3.7 当前评测流程是怎么设计的\n虽然本周还没有把完整的自动评分 runner 做出来，但流程已经定型，后续可以直接接上：\n读取 jsonl benchmark。\n校验 case 结构是否合法。\n按 case 重放多轮 conversation。\n记录每轮 graph 输出，尤其是：\nask_user_question\ntrace_summary\n使用过的 tool\n最终 citations\n是否发生 fallback / error\n先对照 expected_signals 做流程级检查。\n再对照 success_criteria 做结果级检查。\n最后按类别输出通过率、失败样例和失败类型分布。\n后续如果继续扩展，这个流程可以自然演进成：\nrule-based first pass；\nLLM-as-a-judge second pass；\nbad case archive for manual review。\n3.8 为什么本周只做 dataset/validator，而没有直接做最终 benchmark runner\n这是一个取舍问题。当前最缺的是“统一输入格式”，不是“再写一个复杂脚本”。\n如果没有先把 case 结构稳定下来，直接写 runner 很容易导致：\ncase 字段每周都变；\n评测逻辑和数据格式强耦合；\n新增 case 时需要频繁改脚本。\n所以本周先完成的是：\ncase schema 的最小约束；\ncase 分类方式；\ncase 编写原则；\nbenchmark 目录约定；\n自动校验入口。\n这样下一周只需要在这个基础上补 runner 和聚合输出，而不需要推倒重来。\n3.9 当前 benchmark 的局限\n本周也明确看到了第一版评测集的边界：\n样例数仍然偏少，只有 6 条，更多是“骨架验证”而不是“统计充分”。\n目前只做了结构校验，还没有产出真实分数面板。\nexpected_signals 还需要进一步细化成更严格、可自动比对的字段。\n还没有接入真实线上对话日志中的 bad case，当前样例仍以人工构造为主。\n因此，本周的 benchmark 工作更准确地说是“评测基线搭建完成”，而不是“评测体系完成”。\n四、阶段性成果\n本周完成后，系统进入了一个新的阶段：\ndiscourse_query / github_search 不再停留在协议层，而是进入了主路径可执行状态。\n回答的诊断信息开始从“只看最终文本”转向“可看到中间工具行为”。\n多轮 benchmark 已经有了稳定输入格式，后续可以在同一套 case 上持续回归。\nTelegram / Discord 两端在长消息与异常路径上的基本稳定性已经有自动化保护。\n五、当前问题\nWeek 7 的稳定性验证还没有做到真正的线上端到端长对话压测，目前仍以 runtime 级自动化回归为主。\nbenchmark 目前完成的是 dataset + validator，尚未形成自动评分 runner 与分类统计面板。\n当前 benchmark 仍以人工设计 case 为主，尚未系统吸收真实线上 bad case 反哺样例池。\ntrace_summary 已能表达最小工具执行信息，但还没有形成更结构化的统一诊断报告。\n六、下周计划（Week 8）\n在现有 benchmark dataset 基础上补一个评测 runner，输出分类通过率与失败原因汇总。\n继续扩充多轮样例，优先补齐真实 bad case 映射和版本冲突类样本。\n为 Telegram / Discord 增加更接近真实流量的多轮长对话端到端测试。\n继续增强 trace 结构化程度，让 planner / executor / composer 的失败原因更容易定位。\n开始为社区内测做交付准备，包括测试环境梳理、种子用户招募话术、使用说明与问题提交流程整理，确保内测不是“把 Bot 放出去”，而是有明确反馈闭环的可控测试。\n补齐用户评估准备工作，优先推进 CSAT 评分入口、BadCase 自动收集结构、以及一份极简用户问卷草案，保证内测期间不仅能拿到即时星级反馈，也能拿到跨会话的主观体验评价。\n规划第一轮内测观测指标，明确至少要跟踪的问题解决率、平均满意度（CSAT）、响应延迟分布、以及 1​-3​ 对话的复盘优先级，为后续结项报告沉淀真实用户评估数据。\nWeek 7 Report\n1. Weekly Goal (Tool Closure and Evaluation Baseline)\nThis week focused on turning the newly-added multi-turn mechanisms into a more testable and operationally reliable system. The four core goals were:\nReduce runtime log noise and improve minimum observability.\nMove discourse_query / github_search from protocol-only definitions into the real runtime path.\nBuild the first multi-turn evaluation dataset as a benchmark baseline.\nStrengthen Telegram/Discord regression coverage for long-output and failure paths.\n2. Completed Work\nLogging and observability cleanup\nAdded logger quieting controls and surfaced per-tool execution summaries into graph state and final trace summaries.\nRuntime tool-loop coverage expanded\ndiscourse_query and github_search now survive planner normalization, have dedicated runtime handlers, and align with timeout / idempotency / normalized execution behavior.\nBetter answer-path explainability\nTool-level execution failures now map into clearer traceable error states instead of disappearing behind generic failures.\nFirst multi-turn benchmark dataset landed\nAdded a structured jsonl evaluation set plus loader/validator utilities, covering recommendation, development-guidance, and troubleshooting tasks.\nPlatform stability regression improved\nAdded Discord/Telegram fallback-path and long-message segmentation tests to protect key runtime edge cases.\nWeek 7 regression validation\nIn the nervous-brain mamba environment, the Week 7 related suite passed:\n78 passed, 1 warning.\n3. Evaluation Flow and Benchmark Design\nThe most important outcome this week was not just “adding several examples”, but defining the first benchmark methodology for multi-turn system evaluation.\nThis benchmark is designed to test system behavior rather than generic knowledge recall. Its purpose is to verify whether the system:\nasks for missing parameters when required;\nresumes correctly after clarification;\nselects tools according to task type;\nremains conservative under missing logs or conflicting evidence;\nproduces traceable, citation-backed answers.\nThe dataset is divided into three task types:\nsolution_recommendation\ndevelopment_guidance\ntroubleshooting\nEach case contains:\ncase_id for stable tracking;\ncategory for split-level reporting;\nconversation with at least two turns;\nexpected_signals for workflow-level expectations;\nsuccess_criteria for answer-level expectations.\nThis two-layer design is deliberate:\nWorkflow-layer evaluation checks whether the system asked the right follow-up, used the right tools, and continued along the correct thread context.\nAnswer-layer evaluation checks whether the final response is well-grounded, appropriately scoped, and useful for the task.\nThis separation matters because it lets us diagnose whether a failure came from planning, retrieval execution, or answer composition instead of treating every bad answer as the same class of issue.\nThe benchmark construction strategy this week prioritized failure modes over topic breadth. The six seed cases were manually designed around the most important multi-turn risks:\nrecommendation shifts after clarification;\nimplementation guidance after missing language/version follow-up;\ntroubleshooting under incomplete logs;\nversion conflict between docs and code examples.\nThe intended evaluation flow is now clear:\nload benchmark cases from jsonl;\nvalidate schema;\nreplay the conversation turn by turn;\ncollect graph outputs such as tool usage, trace summary, follow-up question, fallback path, and citations;\ncompare against expected_signals;\ncompare final outputs against success_criteria;\naggregate results by category.\nThis week intentionally stopped at dataset + validator rather than overbuilding a scoring runner too early. The reasoning was simple: without a stable input format, any automated benchmark script would be fragile and constantly changing. By fixing the dataset contract first, future runner and dashboard work can build on a stable base.\n4. Current Gaps\nStability verification is still regression-oriented, not full online end-to-end long-dialogue load testing.\nThe benchmark currently provides dataset + validation, but not a full scoring runner yet.\nThe sample pool is still small and mostly manually curated.\nTrace summaries are more useful now, but not yet a full structured diagnostic report.\n5. Plan for Week 8\nBuild a benchmark runner with category-level pass-rate outputs.\nExpand the dataset, especially with real bad cases and version-conflict samples.\nAdd more realistic end-to-end multi-turn runtime tests for Telegram/Discord.\nContinue improving structured diagnostics across planner / executor / composer stages.",
          "content_html": "<h1><a name=\"p-24056-h-1\" class=\"anchor\" href=\"#p-24056-h-1\" aria-label=\"Heading link\"></a>第七周周报</h1>\n<h2><a name=\"p-24056-h-2\" class=\"anchor\" href=\"#p-24056-h-2\" aria-label=\"Heading link\"></a>一、本周目标（工具闭环与评测基线周）</h2>\n<p>本周承接第六周“多轮可持续交互”阶段的工作，重点从“机制已经具备”推进到“关键路径真正闭环、且后续可以被稳定评测”。核心目标有四个：</p>\n<ol>\n<li>继续治理运行时日志噪音，补齐最小可观测性闭环。</li>\n<li>让 <code>discourse_query</code> / <code>github_search</code> 从协议层定义走到图执行主路径可调用。</li>\n<li>建立第一版多轮评测集，为后续 benchmark 和量化回归提供统一输入。</li>\n<li>补齐 Telegram / Discord 两端在长消息与异常路径下的稳定性回归。</li>\n</ol>\n<h2><a name=\"p-24056-h-3\" class=\"anchor\" href=\"#p-24056-h-3\" aria-label=\"Heading link\"></a>二、本周完成</h2>\n<ol>\n<li>\n<p>日志治理与诊断视图补齐<br>\n已对常见第三方库日志进行了统一降噪处理，补充了 <code>quiet_loggers</code> 与 <code>third_party_level</code> 控制项，降低了测试和运行期的无关日志干扰。同时把工具执行过程的摘要信息接入到图状态与 <code>trace_summary</code> 中，使得一次回答至少可以追踪到“执行了哪些 tool、各自成功/为空/失败”的最小诊断视图，而不再只能看到最终回答文本。</p>\n</li>\n<li>\n<p>工具执行闭环推进到运行时主路径<br>\n本周把 <code>discourse_query</code> 与 <code>github_search</code> 从“schema 已存在但主路径未接通”的状态推进到了可被 <code>RetrieverPlanner -&gt; RetrievalExecutor -&gt; ToolRuntime</code> 实际调用的状态。具体包括：</p>\n<ul>\n<li><code>RetrieverPlanner</code> 归一化阶段已允许这两个 tool 保留，不再强制回退到 <code>qdrant_search</code>。</li>\n<li><code>ToolRuntime</code> 为二者补齐了 handler。</li>\n<li>handler 采用“transport 优先、本地 archive fallback 兜底”的策略：若存在外部 transport，则优先通过 transport 执行；若 transport 不可用，则回退到本地 archive/BM25 查询，保证评测和离线回归场景下依旧能走通闭环。</li>\n<li>timeout / idempotency / error normalization 已统一复用 <code>execute_tool</code>，不再为新增 tool 走特殊旁路逻辑。</li>\n</ul>\n</li>\n<li>\n<p>回答链路的可解释性继续增强<br>\n在原有“回答生成稳态与兜底能力增强”的基础上，本周补充了 tool 级别的错误码与执行摘要，使回答不仅能在失败时兜底，也能在 trace 中解释失败原因。例如工具执行异常会落到统一错误结构中，而不是静默吞掉。这样后续排查时，可以区分“模型答错”“检索为空”“工具执行异常”“预算截断”等不同问题来源。</p>\n</li>\n<li>\n<p>多轮评测集第一版落地<br>\n已新增 <code>evaluation/week7_multiturn_eval.jsonl</code> 作为第一版多轮 benchmark 输入集，并新增配套的 loader / validator，确保每条 case 至少满足：</p>\n<ul>\n<li>有稳定 <code>case_id</code>；</li>\n<li>属于明确任务类别；</li>\n<li>至少包含两轮以上对话；</li>\n<li>带有 <code>success_criteria</code>；</li>\n<li>带有 <code>expected_signals</code> 用于后续自动比对。</li>\n</ul>\n</li>\n<li>\n<p>平台稳定性回归增强<br>\nTelegram / Discord 两端都补充了：</p>\n<ul>\n<li>graph runner 异常时的 fallback 测试；</li>\n<li>超长回答的分段发送测试；</li>\n<li>与 full graph 一致的输出适配回归。<br>\n这部分虽然还没有做到真正的在线压测，但已把“长文本切分”和“异常不崩溃”这两条高风险路径先用自动化测试钉住。</li>\n</ul>\n</li>\n<li>\n<p>Week 7 相关回归验证通过<br>\n本周相关测试，结果为：<br>\n<code>78 passed, 1 warning</code><br>\n覆盖范围包括 tool runtime、graph executor、logging system、Discord/TG runtime 以及 evaluation dataset 验证。</p>\n</li>\n</ol>\n<h2><a name=\"p-24056-benchmark-4\" class=\"anchor\" href=\"#p-24056-benchmark-4\" aria-label=\"Heading link\"></a>三、本周重点：评测流程与 Benchmark 构建思路</h2>\n<p>本周最重要的新工作，不是单纯“加了几条样例”，而是把后续多轮评测的基本方法论先定下来。这里单独展开说明。</p>\n<h3><a name=\"p-24056-h-31-5\" class=\"anchor\" href=\"#p-24056-h-31-5\" aria-label=\"Heading link\"></a>3.1 为什么这一周先做评测集，而不是直接做分数面板</h3>\n<p>当前项目虽然已经具备多轮补参、反思分流、回答兜底等机制，但如果没有稳定 benchmark，任何关于“效果更好了”的结论都只能靠主观感受。尤其是多轮系统，很容易出现下面几类错觉：</p>\n<ol>\n<li>看起来会追问了，但追问的问题不一定对。</li>\n<li>看起来会恢复上下文了，但恢复后不一定真的沿着原问题推进。</li>\n<li>看起来工具更多了，但新工具可能并没有真正参与主路径决策。</li>\n<li>看起来回答更长了，但未必更可引用、更稳定。</li>\n</ol>\n<p>因此本周没有直接上“评分 dashboard”，而是先做 benchmark 输入层。原因很简单：没有稳定输入，就没有稳定指标；没有稳定 case，任何分数都不可复现。</p>\n<h3><a name=\"p-24056-h-32-benchmark-6\" class=\"anchor\" href=\"#p-24056-h-32-benchmark-6\" aria-label=\"Heading link\"></a>3.2 Benchmark 的目标不是“考模型常识”，而是考系统闭环能力</h3>\n<p>本周设计 benchmark 时，刻意没有把重点放在开放式知识问答上，而是围绕 Nervos Brain 当前最关键的系统能力来建样例。也就是说，这份 benchmark 的目标不是测试“模型懂不懂 CKB”，而是测试下面这些系统行为是否发生：</p>\n<ol>\n<li>是否会在缺少必要参数时进行合理追问。</li>\n<li>追问后是否能保留线程上下文并继续执行，而不是重新开始。</li>\n<li>是否会根据任务类型选择合适的 tool，而不是永远只走 <code>qdrant_search</code>。</li>\n<li>面对文档与代码冲突、日志不完整、用户目标不明确等真实场景时，是否会做出保守且可解释的决策。</li>\n<li>最终回答是否带引用、是否避免过度编造、是否体现任务导向。</li>\n</ol>\n<p>换句话说，这份 benchmark 更像是“系统工作流 benchmark”，而不是“百科知识 benchmark”。</p>\n<h3><a name=\"p-24056-h-33-7\" class=\"anchor\" href=\"#p-24056-h-33-7\" aria-label=\"Heading link\"></a>3.3 为什么选这三类任务</h3>\n<p>本周评测集按三类任务拆分：</p>\n<ol>\n<li><code>solution_recommendation</code>（方案推荐）</li>\n<li><code>development_guidance</code>（开发指导）</li>\n<li><code>troubleshooting</code>（排障定位）</li>\n</ol>\n<p>这样拆分的原因是，这三类任务对应了系统最典型、也最容易出错的三种工作模式。</p>\n<p><code>solution_recommendation</code> 关注的是：</p>\n<ul>\n<li>用户目标还比较模糊；</li>\n<li>系统要先帮助缩小问题空间；</li>\n<li>容易因为用户补充信息变化而改写推荐路径。</li>\n</ul>\n<p><code>development_guidance</code> 关注的是：</p>\n<ul>\n<li>用户往往已经有明确目标，但缺技术细节；</li>\n<li>系统要在检索、补参、代码示例、分阶段步骤之间做平衡；</li>\n<li>很适合验证 AskUser → resume → answer 的闭环。</li>\n</ul>\n<p><code>troubleshooting</code> 关注的是：</p>\n<ul>\n<li>用户信息往往不完整；</li>\n<li>证据冲突与日志缺失更常见；</li>\n<li>系统必须优先保守，不应过早武断下结论。</li>\n</ul>\n<p>这三类覆盖面并不等于项目全部任务，但已经足够形成一个可复用的最小 benchmark 骨架。</p>\n<h3><a name=\"p-24056-h-34-case-8\" class=\"anchor\" href=\"#p-24056-h-34-case-8\" aria-label=\"Heading link\"></a>3.4 每条评测 case 的设计原则</h3>\n<p>本周不是只记录“问题文本”，而是给每条 case 设计了完整结构。单条 case 至少包含以下字段：</p>\n<ol>\n<li>\n<p><code>case_id</code><br>\n用于保证 case 可追踪、可比对、可回归。</p>\n</li>\n<li>\n<p><code>category</code><br>\n明确任务类别，避免把推荐类、开发类、排障类混在一起统计，导致指标失真。</p>\n</li>\n<li>\n<p><code>conversation</code><br>\n必须是多轮结构，而不是单句问答。因为我们本周的目标就是验证多轮补参与恢复，而不是单轮回答质量。</p>\n</li>\n<li>\n<p><code>expected_signals</code><br>\n这是本周 benchmark 设计里最关键的一层。它不直接判断最终答案“好不好”，而是先判断系统行为“有没有发生”。例如：</p>\n<ul>\n<li>是否应该 ask for <code>sdk_language</code>；</li>\n<li>是否应该使用 <code>github_search</code>；</li>\n<li>是否应该同时走 <code>discourse_query</code> 与 <code>qdrant_search</code>；</li>\n<li>是否不应该在缺日志时给出过度确定结论。<br>\n这层信号对后续半自动评测非常重要，因为它让我们可以把“工作流行为正确”从“最终表述优雅”里拆出来单独评估。</li>\n</ul>\n</li>\n<li>\n<p><code>success_criteria</code><br>\n这里记录的是回答级别的成功标准，例如：</p>\n<ul>\n<li>至少包含两个来源；</li>\n<li>给出 JS/TS 优先的路径；</li>\n<li>不能把文档/代码冲突直接武断归因。<br>\n它和 <code>expected_signals</code> 的区别在于：前者看流程动作，后者看回答结果。</li>\n</ul>\n</li>\n</ol>\n<h3><a name=\"p-24056-h-35-benchmark-expected_signals-success_criteria-9\" class=\"anchor\" href=\"#p-24056-h-35-benchmark-expected_signals-success_criteria-9\" aria-label=\"Heading link\"></a>3.5 为什么 benchmark 里同时保留 <code>expected_signals</code> 和 <code>success_criteria</code></h3>\n<p>如果只有 <code>success_criteria</code>，我们只能看最后回答像不像“还行”，但无法知道它是通过正确流程得到的，还是偶然答对的。<br>\n如果只有 <code>expected_signals</code>，我们又只能知道系统“做了动作”，但不能判断最终产出的答案是否真的可用。</p>\n<p>所以本周采用“双层判定”思路：</p>\n<ol>\n<li>\n<p>第一层：流程层 benchmark<br>\n检查有没有正确 ask_user、有没有正确选 tool、有没有沿线程继续执行。</p>\n</li>\n<li>\n<p>第二层：回答层 benchmark<br>\n检查最终回答是否引用充分、是否尊重约束、是否满足任务目标。</p>\n</li>\n</ol>\n<p>这两层拆开之后，后面出现 bad case 时就能更快定位：</p>\n<ul>\n<li>是 planner 错了；</li>\n<li>是 executor 没调到正确工具；</li>\n<li>还是 composer 在证据足够时仍然总结失真。</li>\n</ul>\n<h3><a name=\"p-24056-h-36-benchmark-10\" class=\"anchor\" href=\"#p-24056-h-36-benchmark-10\" aria-label=\"Heading link\"></a>3.6 当前 benchmark 的样例来源与构造策略</h3>\n<p>本周这 6 条样例不是随机写出来的，而是围绕当前系统最值得验证的机制构造出来的：</p>\n<ol>\n<li>\n<p>推荐类样例<br>\n用来测试用户目标在第二轮收缩后，推荐路径是否跟着变化。<br>\n例如从“我想做 demo”进一步细化到“前端集成、JS/TS”。</p>\n</li>\n<li>\n<p>开发指导类样例<br>\n用来测试系统是否会在缺语言/版本时追问，并在得到补参后输出更贴近实际工程的步骤或示例来源。</p>\n</li>\n<li>\n<p>排障类样例<br>\n用来测试系统在日志不充分、版本冲突、文档与代码不一致时，是否会优先保守地继续收集证据，而不是直接“拍脑袋诊断”。</p>\n</li>\n</ol>\n<p>本质上，这些 case 优先覆盖的是“高频失败模式”，而不是“知识点覆盖率”。这也是第一版 benchmark 更适合工程迭代的原因。</p>\n<h3><a name=\"p-24056-h-37-11\" class=\"anchor\" href=\"#p-24056-h-37-11\" aria-label=\"Heading link\"></a>3.7 当前评测流程是怎么设计的</h3>\n<p>虽然本周还没有把完整的自动评分 runner 做出来，但流程已经定型，后续可以直接接上：</p>\n<ol>\n<li>读取 <code>jsonl</code> benchmark。</li>\n<li>校验 case 结构是否合法。</li>\n<li>按 case 重放多轮 <code>conversation</code>。</li>\n<li>记录每轮 graph 输出，尤其是：\n<ul>\n<li><code>ask_user_question</code></li>\n<li><code>trace_summary</code></li>\n<li>使用过的 tool</li>\n<li>最终 citations</li>\n<li>是否发生 fallback / error</li>\n</ul>\n</li>\n<li>先对照 <code>expected_signals</code> 做流程级检查。</li>\n<li>再对照 <code>success_criteria</code> 做结果级检查。</li>\n<li>最后按类别输出通过率、失败样例和失败类型分布。</li>\n</ol>\n<p>后续如果继续扩展，这个流程可以自然演进成：</p>\n<ul>\n<li>rule-based first pass；</li>\n<li>LLM-as-a-judge second pass；</li>\n<li>bad case archive for manual review。</li>\n</ul>\n<h3><a name=\"p-24056-h-38-datasetvalidator-benchmark-runner-12\" class=\"anchor\" href=\"#p-24056-h-38-datasetvalidator-benchmark-runner-12\" aria-label=\"Heading link\"></a>3.8 为什么本周只做 dataset/validator，而没有直接做最终 benchmark runner</h3>\n<p>这是一个取舍问题。当前最缺的是“统一输入格式”，不是“再写一个复杂脚本”。</p>\n<p>如果没有先把 case 结构稳定下来，直接写 runner 很容易导致：</p>\n<ul>\n<li>case 字段每周都变；</li>\n<li>评测逻辑和数据格式强耦合；</li>\n<li>新增 case 时需要频繁改脚本。</li>\n</ul>\n<p>所以本周先完成的是：</p>\n<ul>\n<li>case schema 的最小约束；</li>\n<li>case 分类方式；</li>\n<li>case 编写原则；</li>\n<li>benchmark 目录约定；</li>\n<li>自动校验入口。</li>\n</ul>\n<p>这样下一周只需要在这个基础上补 runner 和聚合输出，而不需要推倒重来。</p>\n<h3><a name=\"p-24056-h-39-benchmark-13\" class=\"anchor\" href=\"#p-24056-h-39-benchmark-13\" aria-label=\"Heading link\"></a>3.9 当前 benchmark 的局限</h3>\n<p>本周也明确看到了第一版评测集的边界：</p>\n<ol>\n<li>样例数仍然偏少，只有 6 条，更多是“骨架验证”而不是“统计充分”。</li>\n<li>目前只做了结构校验，还没有产出真实分数面板。</li>\n<li><code>expected_signals</code> 还需要进一步细化成更严格、可自动比对的字段。</li>\n<li>还没有接入真实线上对话日志中的 bad case，当前样例仍以人工构造为主。</li>\n</ol>\n<p>因此，本周的 benchmark 工作更准确地说是“评测基线搭建完成”，而不是“评测体系完成”。</p>\n<h2><a name=\"p-24056-h-14\" class=\"anchor\" href=\"#p-24056-h-14\" aria-label=\"Heading link\"></a>四、阶段性成果</h2>\n<p>本周完成后，系统进入了一个新的阶段：</p>\n<ol>\n<li><code>discourse_query</code> / <code>github_search</code> 不再停留在协议层，而是进入了主路径可执行状态。</li>\n<li>回答的诊断信息开始从“只看最终文本”转向“可看到中间工具行为”。</li>\n<li>多轮 benchmark 已经有了稳定输入格式，后续可以在同一套 case 上持续回归。</li>\n<li>Telegram / Discord 两端在长消息与异常路径上的基本稳定性已经有自动化保护。</li>\n</ol>\n<h2><a name=\"p-24056-h-15\" class=\"anchor\" href=\"#p-24056-h-15\" aria-label=\"Heading link\"></a>五、当前问题</h2>\n<ol>\n<li>Week 7 的稳定性验证还没有做到真正的线上端到端长对话压测，目前仍以 runtime 级自动化回归为主。</li>\n<li>benchmark 目前完成的是 dataset + validator，尚未形成自动评分 runner 与分类统计面板。</li>\n<li>当前 benchmark 仍以人工设计 case 为主，尚未系统吸收真实线上 bad case 反哺样例池。</li>\n<li><code>trace_summary</code> 已能表达最小工具执行信息，但还没有形成更结构化的统一诊断报告。</li>\n</ol>\n<h2><a name=\"p-24056-week-8-16\" class=\"anchor\" href=\"#p-24056-week-8-16\" aria-label=\"Heading link\"></a>六、下周计划（Week 8）</h2>\n<ol>\n<li>在现有 benchmark dataset 基础上补一个评测 runner，输出分类通过率与失败原因汇总。</li>\n<li>继续扩充多轮样例，优先补齐真实 bad case 映射和版本冲突类样本。</li>\n<li>为 Telegram / Discord 增加更接近真实流量的多轮长对话端到端测试。</li>\n<li>继续增强 trace 结构化程度，让 planner / executor / composer 的失败原因更容易定位。</li>\n<li>开始为社区内测做交付准备，包括测试环境梳理、种子用户招募话术、使用说明与问题提交流程整理，确保内测不是“把 Bot 放出去”，而是有明确反馈闭环的可控测试。</li>\n<li>补齐用户评估准备工作，优先推进 CSAT 评分入口、BadCase 自动收集结构、以及一份极简用户问卷草案，保证内测期间不仅能拿到即时星级反馈，也能拿到跨会话的主观体验评价。</li>\n<li>规划第一轮内测观测指标，明确至少要跟踪的问题解决率、平均满意度（CSAT）、响应延迟分布、以及 1​<img src=\"https://talk.nervos.org/images/emoji/apple/star.png?v=15\" title=\":star:\" class=\"emoji\" alt=\":star:\" loading=\"lazy\" width=\"20\" height=\"20\">-3​<img src=\"https://talk.nervos.org/images/emoji/apple/star.png?v=15\" title=\":star:\" class=\"emoji\" alt=\":star:\" loading=\"lazy\" width=\"20\" height=\"20\"> 对话的复盘优先级，为后续结项报告沉淀真实用户评估数据。</li>\n</ol>\n<hr>\n<h1><a name=\"p-24056-week-7-report-17\" class=\"anchor\" href=\"#p-24056-week-7-report-17\" aria-label=\"Heading link\"></a>Week 7 Report</h1>\n<h2><a name=\"p-24056-h-1-weekly-goal-tool-closure-and-evaluation-baseline-18\" class=\"anchor\" href=\"#p-24056-h-1-weekly-goal-tool-closure-and-evaluation-baseline-18\" aria-label=\"Heading link\"></a>1. Weekly Goal (Tool Closure and Evaluation Baseline)</h2>\n<p>This week focused on turning the newly-added multi-turn mechanisms into a more testable and operationally reliable system. The four core goals were:</p>\n<ol>\n<li>Reduce runtime log noise and improve minimum observability.</li>\n<li>Move <code>discourse_query</code> / <code>github_search</code> from protocol-only definitions into the real runtime path.</li>\n<li>Build the first multi-turn evaluation dataset as a benchmark baseline.</li>\n<li>Strengthen Telegram/Discord regression coverage for long-output and failure paths.</li>\n</ol>\n<h2><a name=\"p-24056-h-2-completed-work-19\" class=\"anchor\" href=\"#p-24056-h-2-completed-work-19\" aria-label=\"Heading link\"></a>2. Completed Work</h2>\n<ol>\n<li>\n<p>Logging and observability cleanup<br>\nAdded logger quieting controls and surfaced per-tool execution summaries into graph state and final trace summaries.</p>\n</li>\n<li>\n<p>Runtime tool-loop coverage expanded<br>\n<code>discourse_query</code> and <code>github_search</code> now survive planner normalization, have dedicated runtime handlers, and align with timeout / idempotency / normalized execution behavior.</p>\n</li>\n<li>\n<p>Better answer-path explainability<br>\nTool-level execution failures now map into clearer traceable error states instead of disappearing behind generic failures.</p>\n</li>\n<li>\n<p>First multi-turn benchmark dataset landed<br>\nAdded a structured <code>jsonl</code> evaluation set plus loader/validator utilities, covering recommendation, development-guidance, and troubleshooting tasks.</p>\n</li>\n<li>\n<p>Platform stability regression improved<br>\nAdded Discord/Telegram fallback-path and long-message segmentation tests to protect key runtime edge cases.</p>\n</li>\n<li>\n<p>Week 7 regression validation<br>\nIn the <code>nervous-brain</code> mamba environment, the Week 7 related suite passed:<br>\n<code>78 passed, 1 warning</code>.</p>\n</li>\n</ol>\n<h2><a name=\"p-24056-h-3-evaluation-flow-and-benchmark-design-20\" class=\"anchor\" href=\"#p-24056-h-3-evaluation-flow-and-benchmark-design-20\" aria-label=\"Heading link\"></a>3. Evaluation Flow and Benchmark Design</h2>\n<p>The most important outcome this week was not just “adding several examples”, but defining the first benchmark methodology for multi-turn system evaluation.</p>\n<p>This benchmark is designed to test system behavior rather than generic knowledge recall. Its purpose is to verify whether the system:</p>\n<ol>\n<li>asks for missing parameters when required;</li>\n<li>resumes correctly after clarification;</li>\n<li>selects tools according to task type;</li>\n<li>remains conservative under missing logs or conflicting evidence;</li>\n<li>produces traceable, citation-backed answers.</li>\n</ol>\n<p>The dataset is divided into three task types:</p>\n<ol>\n<li><code>solution_recommendation</code></li>\n<li><code>development_guidance</code></li>\n<li><code>troubleshooting</code></li>\n</ol>\n<p>Each case contains:</p>\n<ol>\n<li><code>case_id</code> for stable tracking;</li>\n<li><code>category</code> for split-level reporting;</li>\n<li><code>conversation</code> with at least two turns;</li>\n<li><code>expected_signals</code> for workflow-level expectations;</li>\n<li><code>success_criteria</code> for answer-level expectations.</li>\n</ol>\n<p>This two-layer design is deliberate:</p>\n<ol>\n<li>Workflow-layer evaluation checks whether the system asked the right follow-up, used the right tools, and continued along the correct thread context.</li>\n<li>Answer-layer evaluation checks whether the final response is well-grounded, appropriately scoped, and useful for the task.</li>\n</ol>\n<p>This separation matters because it lets us diagnose whether a failure came from planning, retrieval execution, or answer composition instead of treating every bad answer as the same class of issue.</p>\n<p>The benchmark construction strategy this week prioritized failure modes over topic breadth. The six seed cases were manually designed around the most important multi-turn risks:</p>\n<ol>\n<li>recommendation shifts after clarification;</li>\n<li>implementation guidance after missing language/version follow-up;</li>\n<li>troubleshooting under incomplete logs;</li>\n<li>version conflict between docs and code examples.</li>\n</ol>\n<p>The intended evaluation flow is now clear:</p>\n<ol>\n<li>load benchmark cases from <code>jsonl</code>;</li>\n<li>validate schema;</li>\n<li>replay the conversation turn by turn;</li>\n<li>collect graph outputs such as tool usage, trace summary, follow-up question, fallback path, and citations;</li>\n<li>compare against <code>expected_signals</code>;</li>\n<li>compare final outputs against <code>success_criteria</code>;</li>\n<li>aggregate results by category.</li>\n</ol>\n<p>This week intentionally stopped at dataset + validator rather than overbuilding a scoring runner too early. The reasoning was simple: without a stable input format, any automated benchmark script would be fragile and constantly changing. By fixing the dataset contract first, future runner and dashboard work can build on a stable base.</p>\n<h2><a name=\"p-24056-h-4-current-gaps-21\" class=\"anchor\" href=\"#p-24056-h-4-current-gaps-21\" aria-label=\"Heading link\"></a>4. Current Gaps</h2>\n<ol>\n<li>Stability verification is still regression-oriented, not full online end-to-end long-dialogue load testing.</li>\n<li>The benchmark currently provides dataset + validation, but not a full scoring runner yet.</li>\n<li>The sample pool is still small and mostly manually curated.</li>\n<li>Trace summaries are more useful now, but not yet a full structured diagnostic report.</li>\n</ol>\n<h2><a name=\"p-24056-h-5-plan-for-week-8-22\" class=\"anchor\" href=\"#p-24056-h-5-plan-for-week-8-22\" aria-label=\"Heading link\"></a>5. Plan for Week 8</h2>\n<ol>\n<li>Build a benchmark runner with category-level pass-rate outputs.</li>\n<li>Expand the dataset, especially with real bad cases and version-conflict samples.</li>\n<li>Add more realistic end-to-end multi-turn runtime tests for Telegram/Discord.</li>\n<li>Continue improving structured diagnostics across planner / executor / composer stages.</li>\n</ol>",
          "like_count": 0,
          "quote_count": 0
        },
        {
          "post_id": 24058,
          "post_number": 32,
          "topic_id": 9995,
          "topic_title": "Spark Program | Nervos Brain - A Global Developer Onboarding Engine and Cross-Language Hub Powered by Agentic RAG",
          "topic_slug": "spark-program-nervos-brain-a-global-developer-onboarding-engine-and-cross-language-hub-powered-by-agentic-rag",
          "author": "IrisNeko",
          "created_at": "2026-04-28T12:00:25.555000+00:00",
          "updated_at": "2026-04-28T12:00:25.555000+00:00",
          "reply_to_post_number": 30,
          "url": "https://talk.nervos.org/t/spark-program-nervos-brain-a-global-developer-onboarding-engine-and-cross-language-hub-powered-by-agentic-rag/9995/32",
          "content_text": "感谢您的建议，\n我上周对系统做了基础的评测，并计划在这周拉一个Telegram试用群，邀请委员会的成员提前体验。同时欢迎在体验中提出建议，来帮助我改善系统。\nBest regards.",
          "content_html": "<p>感谢您的建议，</p>\n<p>我上周对系统做了基础的评测，并计划在这周拉一个Telegram试用群，邀请委员会的成员提前体验。同时欢迎在体验中提出建议，来帮助我改善系统。</p>\n<p>Best regards.</p>",
          "like_count": 0,
          "quote_count": 0
        },
        {
          "post_id": 24059,
          "post_number": 33,
          "topic_id": 9995,
          "topic_title": "Spark Program | Nervos Brain - A Global Developer Onboarding Engine and Cross-Language Hub Powered by Agentic RAG",
          "topic_slug": "spark-program-nervos-brain-a-global-developer-onboarding-engine-and-cross-language-hub-powered-by-agentic-rag",
          "author": "zz_tovarishch",
          "created_at": "2026-04-28T14:03:27.779000+00:00",
          "updated_at": "2026-04-28T14:03:27.779000+00:00",
          "reply_to_post_number": 32,
          "url": "https://talk.nervos.org/t/spark-program-nervos-brain-a-global-developer-onboarding-engine-and-cross-language-hub-powered-by-agentic-rag/9995/33",
          "content_text": "Hi IrisNeko, 目前论坛已经接入AI翻译工具，Spark不再强制要求项目在Talk上沉淀的内容需采用双语版本\n期待项目的持续发展！",
          "content_html": "<p>Hi IrisNeko, 目前论坛已经接入AI翻译工具，Spark不再强制要求项目在Talk上沉淀的内容需采用双语版本</p>\n<p>期待项目的持续发展！</p>",
          "like_count": 0,
          "quote_count": 0
        }
      ]
    },
    {
      "topic_id": 10204,
      "title": "Discontinuation of the DAO v1.1 project",
      "slug": "discontinuation-of-the-dao-v1-1-project",
      "url": "https://talk.nervos.org/t/discontinuation-of-the-dao-v1-1-project/10204",
      "created_at": "2026-04-23T05:07:43.422000+00:00",
      "last_posted_at": "2026-04-28T11:56:27.570000+00:00",
      "category_id": 40,
      "tags": [],
      "posters": [
        "Original Poster, Most Recent Poster",
        "Frequent Poster",
        "Frequent Poster",
        "Frequent Poster",
        "Frequent Poster"
      ],
      "recent_posts": [
        {
          "post_id": 24057,
          "post_number": 11,
          "topic_id": 10204,
          "topic_title": "Discontinuation of the DAO v1.1 project",
          "topic_slug": "discontinuation-of-the-dao-v1-1-project",
          "author": "_magicsheep",
          "created_at": "2026-04-28T11:56:27.570000+00:00",
          "updated_at": "2026-04-28T11:56:27.570000+00:00",
          "reply_to_post_number": 7,
          "url": "https://talk.nervos.org/t/discontinuation-of-the-dao-v1-1-project/10204/11",
          "content_text": "In consideration of Terry’s advice, the following updates are provided regarding the closure of DAO v1.1:\nPayment: The proposal team will retain the payment corresponding to the already‑delivered Milestone 1.\nCode: The code will remain open source and is accessible at this repository. A total of eight repositories encompass all code for the DAO v1.1 platform (excluding Web5 services, which are available here). Additionally, the community may access the open‑source vote auditor tool here.\nContract: The voting contract deployed on the mainnet has been terminated, whereas the testnet voting contract continues to operate. The did:ckb contracts remain active on both the mainnet and the testnet.\nServer: Servers supporting the current DAO v1.1 platform will be decommissioned shortly. Following this action, services for both the mainnet and the testnet will be terminated.\nDomain name: Domain resolution for ccfdao.dev and ccfdao.org will be terminated in the near future. Consequently, the documentation website at https://docs.ccfdao.org/ will also be shut down. However, relevant documentation can be found within the source code repository accessible here.\nThis marks the official conclusion of the DAO v1.1 project. The proposal team wishes to once again express its gratitude to all those who provided constructive feedback and support. May the community identify a suitable governance model at a future date. Thank you",
          "content_html": "<p>In consideration of Terry’s advice, the following updates are provided regarding the closure of DAO v1.1:</p>\n<ol>\n<li>\n<p><strong>Payment</strong>: The proposal team will retain the payment corresponding to the already‑delivered Milestone 1.</p>\n</li>\n<li>\n<p><strong>Code</strong>: The code will remain open source and is accessible at <a href=\"https://github.com/CCF-DAO1-1\" rel=\"noopener nofollow ugc\">this repository</a>. A total of eight repositories encompass all code for the DAO v1.1 platform (excluding Web5 services, which are available <a href=\"https://github.com/web5fans\" rel=\"noopener nofollow ugc\">here</a>). Additionally, the community may access the open‑source vote auditor tool <a href=\"https://github.com/CCF-DAO1-1/ccfdao-vote-auditor-rfc\" rel=\"noopener nofollow ugc\">here</a>.</p>\n</li>\n<li>\n<p><strong>Contract</strong>: The voting contract deployed on the mainnet has been terminated, whereas the testnet voting contract continues to operate. The <code>did:ckb</code> contracts remain active on both the mainnet and the testnet.</p>\n</li>\n<li>\n<p><strong>Server</strong>: Servers supporting the current DAO v1.1 platform will be decommissioned shortly. Following this action, services for both the mainnet and the testnet will be terminated.</p>\n</li>\n<li>\n<p><strong>Domain name</strong>: Domain resolution for <code>ccfdao.dev</code> and <code>ccfdao.org</code> will be terminated in the near future. Consequently, the documentation website at <a href=\"https://docs.ccfdao.org/\" rel=\"noopener nofollow ugc\">https://docs.ccfdao.org/</a> will also be shut down. However, relevant documentation can be found within the source code repository accessible <a href=\"https://github.com/CCF-DAO1-1/ccfdao-v1.1-docs\" rel=\"noopener nofollow ugc\">here</a>.</p>\n</li>\n</ol>\n<p>This marks the official conclusion of the DAO v1.1 project. The proposal team wishes to once again express its gratitude to all those who provided constructive feedback and support. May the community identify a suitable governance model at a future date. Thank you <img src=\"https://talk.nervos.org/images/emoji/apple/slight_smile.png?v=15\" title=\":slight_smile:\" class=\"emoji\" alt=\":slight_smile:\" loading=\"lazy\" width=\"20\" height=\"20\"></p>",
          "like_count": 0,
          "quote_count": 0
        }
      ]
    },
    {
      "topic_id": 10199,
      "title": "Cellora — designing a production indexing and query service for CKB (feedback welcome)",
      "slug": "cellora-designing-a-production-indexing-and-query-service-for-ckb-feedback-welcome",
      "url": "https://talk.nervos.org/t/cellora-designing-a-production-indexing-and-query-service-for-ckb-feedback-welcome/10199",
      "created_at": "2026-04-22T15:33:26.290000+00:00",
      "last_posted_at": "2026-04-28T08:40:25.015000+00:00",
      "category_id": 32,
      "tags": [
        "CKB",
        "Nervos-项目动态",
        "dapp",
        "testnet"
      ],
      "posters": [
        "Original Poster",
        "Frequent Poster",
        "Most Recent Poster"
      ],
      "recent_posts": [
        {
          "post_id": 24055,
          "post_number": 4,
          "topic_id": 10199,
          "topic_title": "Cellora — designing a production indexing and query service for CKB (feedback welcome)",
          "topic_slug": "cellora-designing-a-production-indexing-and-query-service-for-ckb-feedback-welcome",
          "author": "ArthurZhang",
          "created_at": "2026-04-28T08:40:25.015000+00:00",
          "updated_at": "2026-04-28T08:51:09.950000+00:00",
          "reply_to_post_number": 3,
          "url": "https://talk.nervos.org/t/cellora-designing-a-production-indexing-and-query-service-for-ckb-feedback-welcome/10199/4",
          "content_text": "Just came across this thread and found it interesting, so I’ll try to offer a few suggestions. I think the honest answer is:\nFor tx inclusion proofs, the practical first step is likely not Flyclient, but exposing CKB’s existing get_transaction_proof / verify_transaction_proof path through Cellora. That lets clients verify that a transaction is committed under a particular block header, rather than merely trusting Cellora’s indexed result. This easily moves Cellora from a purely trusted indexer toward an inclusion-verifiable indexer.\nFor full historical / chain-tip trust minimisation, my narrower point is that I’m not sure there is a canonical Rust/TS wallet-side verifier package that app developers can just plug into today. So for Cellora v1, I’d probably keep MMR/Flyclient-style support as a later integration layer, not a hard requirement.",
          "content_html": "<p>Just came across this thread and found it interesting, so I’ll try to offer a few suggestions. I think the honest answer is:</p>\n<p><strong>For tx inclusion proofs</strong>, the practical first step is likely <strong>not Flyclient</strong>, but exposing CKB’s existing <code>get_transaction_proof</code> / <code>verify_transaction_proof</code> path through Cellora. That lets clients verify that a transaction is committed under a particular block header, rather than merely trusting Cellora’s indexed result. This easily moves Cellora from a purely trusted indexer toward an inclusion-verifiable indexer.</p>\n<p><strong>For full historical / chain-tip trust minimisation,</strong> my narrower point is that I’m not sure there is a canonical Rust/TS wallet-side verifier package that app developers can just plug into today. So for Cellora v1, I’d probably keep MMR/Flyclient-style support as a later integration layer, not a hard requirement.</p>",
          "like_count": 0,
          "quote_count": 0
        }
      ]
    },
    {
      "topic_id": 10214,
      "title": "Spark Program | CKB-VM Sail Formal Verification — Proving CKB-VM RISC-V Instruction Equivalence via Sail Specification and Coq Theorem Prover / CKB-VM Sail 形式化验证 — 基于 Sail 规范与 Coq 定理证明器的 CKB-VM RISC-V 指令等价性证明",
      "slug": "spark-program-ckb-vm-sail-formal-verification-proving-ckb-vm-risc-v-instruction-equivalence-via-sail-specification-and-coq-theorem-prover-ckb-vm-sail-sail-coq-ckb-vm-risc-v",
      "url": "https://talk.nervos.org/t/spark-program-ckb-vm-sail-formal-verification-proving-ckb-vm-risc-v-instruction-equivalence-via-sail-specification-and-coq-theorem-prover-ckb-vm-sail-sail-coq-ckb-vm-risc-v/10214",
      "created_at": "2026-04-27T21:14:40.905000+00:00",
      "last_posted_at": "2026-04-28T03:13:16.520000+00:00",
      "category_id": 49,
      "tags": [
        "Spark-Program"
      ],
      "posters": [
        "Original Poster",
        "Frequent Poster",
        "Most Recent Poster"
      ],
      "recent_posts": [
        {
          "post_id": 24050,
          "post_number": 1,
          "topic_id": 10214,
          "topic_title": "Spark Program | CKB-VM Sail Formal Verification — Proving CKB-VM RISC-V Instruction Equivalence via Sail Specification and Coq Theorem Prover / CKB-VM Sail 形式化验证 — 基于 Sail 规范与 Coq 定理证明器的 CKB-VM RISC-V 指令等价性证明",
          "topic_slug": "spark-program-ckb-vm-sail-formal-verification-proving-ckb-vm-risc-v-instruction-equivalence-via-sail-specification-and-coq-theorem-prover-ckb-vm-sail-sail-coq-ckb-vm-risc-v",
          "author": "TinyuengKwan",
          "created_at": "2026-04-27T21:14:41.082000+00:00",
          "updated_at": "2026-04-27T21:14:41.082000+00:00",
          "reply_to_post_number": null,
          "url": "https://talk.nervos.org/t/spark-program-ckb-vm-sail-formal-verification-proving-ckb-vm-risc-v-instruction-equivalence-via-sail-specification-and-coq-theorem-prover-ckb-vm-sail-sail-coq-ckb-vm-risc-v/10214/1",
          "content_text": "# [Spark Program] CKB-VM Sail Formal Verification — Proving CKB-VM RISC-V Instruction Equivalence via Sail Specification and Coq Theorem Prover / CKB-VM Sail 形式化验证 — 基于 Sail 规范与 Coq 定理证明器的 CKB-VM RISC-V 指令等价性证明\n-–\n## English Version\n### 1. Project Name and Summary\n**Project Name:** ckb-vm-sail-verify\n**One-line Summary:** Formally verify that CKB-VM’s RISC-V instruction execution semantics are mathematically equivalent to the official Sail RISC-V specification, using Coq theorem proofs and differential testing as dual verification.\n### 2. Team / Individual Introduction\n**Applicant:** Tinyueng ([GitHub](TinyuengKwan (Tinyueng) · GitHub))\n**Core Competencies:** Currently interning at PLCT Lab (Institute of Software, Chinese Academy of Sciences) on the **Sail & ACT (RISC-V Architectural Certification Tests) team**, with direct, hands-on experience in the Sail architecture definition language and RISC-V conformance testing. Contributed to the [sail-riscv](GitHub - riscv/sail-riscv: Sail RISC-V model · GitHub) ecosystem and developed [sail-lsp](GitHub - TinyuengKwan/sail-lsp · GitHub), a Language Server Protocol implementation for Sail. Familiar with the Sail compiler toolchain (including the `–coq` backend for theorem prover integration), the sail-riscv formal model structure, and the RISC-V Architectural Certification Test framework. Possesses strong Rust systems programming capabilities (ownership, lifetimes, trait system). Systematic study of CS:APP with a complete knowledge framework spanning processor architecture, virtual memory, ELF linking, and system-level I/O. Additionally studied compiler theory and x86 assembly.\n**Relevant Domain Knowledge:** Deep familiarity with the Sail formal specification language — not just as a user, but as a tooling contributor (sail-lsp). Hands-on experience with RISC-V Architectural Certification Tests (ACT), understanding how conformance is validated against the Sail reference model. Familiar with RISC-V ISA architecture (RV64IMAC, privilege levels, extension mechanisms). Experience with the Coq interactive theorem prover and Sail’s Coq backend output. Understanding of CKB-VM internals through source code analysis of nervosnetwork/ckb-vm, including its versioned instruction semantics, MOP (Macro-Op Fusion) extensions, and flat memory model.\n### 3. Problem Description\nCKB-VM is the RISC-V virtual machine that powers all on-chain computation in the Nervos CKB network. Every CKB transaction — every lock script, type script, and smart contract — executes inside CKB-VM. Its correctness is the security foundation of the entire chain. Yet today, there is no formal proof that CKB-VM faithfully implements the RISC-V specification.\n**No formal verification of instruction semantics.** CKB-VM implements 158 opcodes across RV64I, M, C (Zca), B (Zba/Zbb/Zbc/Zbs), A, and custom MOP extensions. Each instruction is hand-written in Rust. While the implementation is high-quality and battle-tested, no mathematical proof exists that any single instruction behaves identically to the RISC-V specification. This is [nervosnetwork/ckb-vm#190](Formally Verify CKB-VM via Sail · Issue #190 · nervosnetwork/ckb-vm · GitHub), open since 2021.\n**Testing alone cannot guarantee correctness.** CKB-VM passes the standard riscv-tests suite, but test suites are finite — they cannot cover all possible input states, edge cases, or corner-case interactions. A subtle semantic divergence (e.g., incorrect sign extension on a rare operand combination) could go undetected for years until exploited.\n**No authoritative ground truth for comparison.** The RISC-V ISA specification is written in natural language (English prose), which is inherently ambiguous. Different implementors may interpret the same sentence differently. Without a machine-readable formal specification as the reference, “correctness” remains a matter of human judgment.\n**Sail RISC-V solves the ground truth problem.** RISC-V International has adopted Sail as the official formal specification language for RISC-V. The [sail-riscv](GitHub - riscv/sail-riscv: Sail RISC-V model · GitHub) model is the machine-readable, executable, mathematically precise definition of RISC-V instruction semantics. Crucially, Sail can compile to Coq — enabling theorem proving directly against the official specification.\n**Semantic gaps between CKB-VM and standard RISC-V.** CKB-VM is not a generic RISC-V implementation. It has a flat 4MB memory model (no MMU/virtual memory), custom ECALL handling (syscall 93 for exit, not M-mode trap), three VERSION modes with different behavioral semantics, cycle-counting per instruction, and MOP fusion instructions that have no counterpart in the standard specification. These gaps must be formally documented and their impact on equivalence precisely characterized.\n**Real-world impact:** A formally verified CKB-VM would provide the strongest possible correctness guarantee for CKB’s execution layer. It would eliminate an entire class of potential vulnerabilities — instruction-level semantic bugs — that no amount of testing can fully prevent. For a blockchain whose security model depends entirely on deterministic execution, this is a foundational investment.\n### 4. Solution\n#### 4.1 Core Approach\nckb-vm-sail-verify employs a dual verification strategy: **Coq formal proofs** for mathematical certainty and **differential testing** for practical runtime validation. The two approaches are complementary — formal proofs cover all possible inputs for proven instructions, while differential testing validates the full instruction set against concrete test vectors.\n#### 4.2 Why Sail + Coq\n**Sail is the official RISC-V specification.** Unlike informal prose specifications, Sail is executable, unambiguous, and maintained by RISC-V International. Using Sail as ground truth means we verify against the same specification that hardware vendors use.\n**Sail compiles to Coq.** The Sail compiler’s `–coq` backend generates Coq definitions from Sail source. This gives us machine-checkable RISC-V semantics inside Coq — the same theorem prover used by CompCert (verified C compiler) and seL4 (verified OS kernel).\n**Coq proofs are total.** Once a Coq theorem is proved, it holds for all possible inputs — not just test vectors. A proven `ADD` instruction is correct for every combination of source registers and machine states, unconditionally.\n**Reproducible and auditable.** Coq proofs are machine-checked. Anyone can re-run `make coq` and verify every theorem independently. No trust required in the prover — only in Coq’s kernel, which is one of the most scrutinized pieces of software in existence.\n#### 4.3 Four-Layer Verification Architecture\n**Layer 1: Sail RISC-V Formal Specification (Ground Truth)**\nThe official sail-riscv model, compiled to Coq using `sail --coq`. Generates ~30,000 lines of Coq defining the complete RV64 instruction semantics in a monadic state transformer style. This is the authoritative reference — what RISC-V *should* do.\n**Layer 2: CKB-VM Coq Model (Verification Target)**\nA hand-written Coq model of CKB-VM’s Rust interpreter logic, capturing register file operations (32 × 64-bit, x0 hardwired to zero), PC advancement, ALU computations, memory access (little-endian, flat 4MB), and branch/jump semantics. Based on `handle_*` functions in ckb-vm’s `execute.rs`. This is what CKB-VM *actually does*.\n**Layer 3: Equivalence Proofs (Mathematical Bridge)**\nFor each target instruction, prove: `ckb_vm_execute(I, state) = sail_execute(I, state)`. Each proof decomposes into three sub-theorems: (a) **Semantics** — destination register has the correct computed value; (b) **PC update** — next PC is PC+4 (or PC+offset for branches/jumps); (c) **Register isolation** — all other registers are unchanged. Proof technique: unfold definitions → rewrite with helper lemmas → solve with `lia`/`reflexivity`.\n**Layer 4: Differential Testing (Runtime Validation)**\nExecute identical RISC-V ELF binaries on both CKB-VM (Rust) and the Sail C++ emulator, then compare execution traces step-by-step — PC values, register states, and exit codes. Complements formal proofs by covering the full instruction set (including instructions not yet formally proven) and catching implementation bugs that might not be visible in the Coq model.\n#### 4.4 State Equivalence Definition\nThe core challenge is bridging Sail’s monadic Coq output (stateful monad transformer with effects) and CKB-VM’s pure functional Coq model. We define state equivalence as:\n```\nstate_equiv(ckb_state, sail_state) :=\n∀ i ∈ [0, 31], get_reg(ckb_state, i) = extract_reg(sail_state, i)\n∧ ckb_state.pc = extract_pc(sail_state)\n∧ ∀ addr ∈ [0, 4MB), load_byte(ckb_state, addr) = extract_mem(sail_state, addr)\n```\nThis strips away Sail’s monad and CKB-VM’s cycle accounting, comparing only the observable architectural state that must agree.\n#### 4.5 CKB-VM Specific Considerations\n| Aspect | CKB-VM Behavior | Sail RISC-V Behavior | Handling |\n|--------|-----------------|---------------------|----------|\n| x0 register | Write then clear to zero | Discard writes silently | Functionally equivalent — prove in Coq |\n| ECALL | Dispatch via A7 (syscall 93 = exit) | Trap to M-mode handler | Filter in diff-test; exclude from Coq scope |\n| Memory model | Flat 4MB, no MMU, W^X | Sv39/48/57 full MMU | CKB-VM simplified model; prove within 4MB subset |\n| FENCE | No-op (single-threaded) | Full fence semantics | No observable difference in single-hart model |\n| Atomics (A ext) | Single address reservation | Reservation set | Limited concurrency; prove single-hart subset |\n| VERSION modes | Three modes (0/1/2) | None | Target VERSION2 only |\n| Cycle counting | Per-instruction cost tracking | None | Filter out; not part of architectural state |\n| MOP fusion | WIDE_MUL, FAR_JUMP_REL, ADC | No counterpart | Verify as compositions of standard instructions |\n#### 4.6 Proof Example\n```coq\n(* For ADD instruction: rd = rs1 + rs2 (mod 2^64), PC += 4 *)\nTheorem ckb_add_semantics : forall rd rs1 rs2 st,\nrd <> 0 ->\nget_reg (ckb_add rd rs1 rs2 st) rd =\ntruncate_64 (get_reg st rs1 + get_reg st rs2).\nProof.\nintros. unfold ckb_add, set_reg, get_reg.\ndestruct (Nat.eq_dec rd rd); [| contradiction].\napply truncate_64_mod. Qed.\nTheorem ckb_add_pc : forall rd rs1 rs2 st,\n(ckb_add rd rs1 rs2 st).(pc) = st.(pc) + 4.\nProof. intros. unfold ckb_add. simpl. lia. Qed.\nTheorem ckb_add_isolation : forall rd rs1 rs2 st r,\nr <> rd ->\nget_reg (ckb_add rd rs1 rs2 st) r = get_reg st r.\nProof.\nintros. unfold ckb_add, set_reg, get_reg.\ndestruct (Nat.eq_dec r rd); [contradiction | reflexivity]. Qed.\n```\n### 5. Detailed Technical Implementation Plan\n#### 5.1 Phase 0: Environment Setup and Instruction Mapping (Week 1)\nInstall the complete toolchain: Sail (>= 0.20.x via OPAM), Coq (9.0.0 + coq-sail-stdpp), CMake, Rust (1.92.0+), GMP, and RISC-V cross-compiler (`gcc-riscv64-unknown-elf`). Clone sail-riscv as a git submodule. Build the Rust workspace (`lib/` + `crates/diff-test/`). Generate Coq from Sail via `sail --coq --dcoq-undef-axioms`. Build the Sail C++ emulator for differential testing.\nCreate the complete CKB-VM instruction mapping table: map all 158 CKB-VM opcodes to their Sail RISC-V counterparts, classify by extension (I/M/C/A/B/MOP), and identify instructions with no Sail equivalent (MOP fusion ops).\n#### 5.2 Phase 1: Differential Testing Framework (Week 2)\nImplement the diff-test CLI tool with step-by-step execution trace comparison:\n**CKB-VM side (`lib/src/runner.rs`):** Wrap CKB-VM in a step-by-step executor that captures `StepState` (PC + 32 registers) after each instruction. Handle exit syscall (syscall 93 via A7 register). Support `–max-steps` to prevent infinite loops.\n**Sail side (`crates/diff-test/src/sail_runner.rs`):** Parse the Sail C++ emulator’s trace output format. Extract PC and register values per step. Handle ECALL divergence (riscv-tests use `tohost` memory-mapped I/O; CKB-VM uses syscall).\n**Comparison engine (`lib/src/state.rs`):** `CompareResult` with `first_mismatch` detection — reports the exact step, PC, and register where traces diverge. Support `–verbose` mode for full trace dump and `–json` for machine-readable output.\n**Test corpus:** Auto-discover ELF test artifacts from riscv-tests (rv64ui, rv64um, rv64uc, rv64ua), riscv-arch-test, and CKB-VM’s own test suite.\n#### 5.3 Phase 2: Coq Infrastructure and Sail Interface (Week 3)\nThis is the most challenging phase — bridging Sail’s monadic Coq output with CKB-VM’s pure model.\n**Study generated Coq (~30,000 lines):** Understand Sail’s state monad encoding, register access patterns, bitvector arithmetic library, and instruction decode/execute structure. Identify the Coq functions corresponding to each RISC-V instruction (e.g., `execute_RISCV_ADD`).\n**Define state equivalence relation:** Formalize the mapping between CKB-VM’s `machine_state` record and Sail’s monadic state. Handle type mismatches (Sail uses bitvectors; CKB-VM model uses Z integers).\n**Build proof infrastructure:** Create reusable Coq tactics (`solve_truncate`, `simplify_regs`, `unfold_sail_step`) that automate common proof patterns. Test with a minimal import: `Check execute_RISCV_ADD`.\n**Document the bridging strategy:** Record Sail’s Coq structure, naming conventions, and the exact approach for extracting observable state from monadic computations.\n#### 5.4 Phase 3: Core ALU Instruction Proofs (Week 4)\nProve 7 core instructions with full three-theorem coverage (semantics + PC + register isolation):\n| Instruction | Type | Key Proof Challenge |\n|-------------|------|-------------------|\n| ADD | R-type | `truncate_64` idempotence, x0 hardwiring |\n| SUB | R-type | Unsigned subtraction mod 2^64 |\n| ADDI | I-type | Sign extension of 12-bit immediate |\n| SLLI | Shift | Shift amount masking (lower 6 bits for RV64) |\n| SRLI | Shift | Logical vs. arithmetic right shift distinction |\n| SRAI | Shift | Sign bit preservation across arithmetic shift |\n| MUL | M-ext | 64-bit multiply, lower 64 bits of 128-bit product |\nBuild reusable lemma library: `truncate_64_idempotent`, `x0_always_zero`, `get_set_reg_same`, `get_set_reg_diff`, `sign_extend_properties`.\n#### 5.5 Phase 4: Control Flow and Memory Proofs (Week 5)\n**Branch instructions:**\n- BEQ: Prove both taken path (PC += offset) and not-taken path (PC += 4). Handle sign-extended branch offset.\n- Additional branches (BNE, BLT, BGE, BLTU, BGEU) follow the same pattern.\n**Jump instructions:**\n- JAL: Prove link address (rd = PC + 4) and jump target (PC = PC + offset). Handle x0 case (no link).\n- JALR: Prove indirect jump with LSB cleared.\n**Memory instructions:**\n- Build memory lemmas: store/load roundtrip (`load(store(addr, val)) = val`), byte-ordering (little-endian), alignment.\n- LW: Prove sign-extended 32-bit load.\n- SW: Prove 32-bit store (may partially Admit if byte-level memory model complexity exceeds time budget).\n#### 5.6 Phase 5: Extended Proofs and MOP Verification (Week 6)\n**Additional ALU proofs:** AND, OR, XOR, SLTI, SLTIU, LUI, AUIPC.\n**MOP fusion verification:** CKB-VM’s custom MOP instructions are implemented as multi-instruction fusions. Verify that:\n- `WIDE_MUL(rd1, rd2, rs1, rs2)` = sequential `MULH(rd1, rs1, rs2); MUL(rd2, rs1, rs2)`\n- `FAR_JUMP_REL(rd, offset)` = `AUIPC(rd, upper); JALR(rd, rd, lower)`\n**Proof audit:** Review all `Admitted` lemmas. Attempt to close as many as possible. Document remaining admits with clear justification.\n**Target:** 10+ core instructions fully proved (zero Admitted), 5+ additional instructions with partial proofs.\n#### 5.7 Phase 6: Full Differential Test Coverage and Gap Analysis (Week 7)\n**Per-extension testing:** Run differential tests across rv64ui (integer), rv64um (multiply/divide), rv64uc (compressed), rv64ua (atomic) test suites. Add `–extension` CLI filter for targeted runs.\n**Edge case tests:** Division by zero behavior, maximum shift amounts, integer overflow/underflow, misaligned memory access, x0 write attempts.\n**Semantic gap documentation:** Produce a formal gap analysis document covering all 8 identified semantic gaps (x0 handling, ECALL, memory model, FENCE, atomics, VERSION modes, cycle counting, MOP), with precise characterization of each gap’s impact on equivalence claims.\n**Instruction coverage matrix:** Update the 158-opcode mapping table with proof status (Proved / Partially Proved / Diff-Test Only / Not Covered) and test status (Pass / Fail / Skipped / N/A).\n#### 5.8 Phase 7: Documentation, Deliverables, and Roadmap (Week 8)\n**Clean-room build test:** Clone from scratch, `make all` succeeds, all Coq proofs compile, differential tests pass.\n**Completion report:** Number of theorems proved, number of instructions with full coverage, number of differential tests passing, complete gap analysis, remaining `Admitted` proofs with justification.\n**Phase 2+ roadmap:** Full RV64I coverage (~50 instructions), M/C/B extension proofs, ASM-mode verification (requiring Islaris-level tooling, ~6+ months, $20k+ scope).\n**Code quality:** `cargo fmt`, `cargo clippy`, remove temporary files, ensure CI-ready build.\n### 6. Expected Deliverables\n#### 6.1 Core Deliverables\n1. **Coq proof library (coq/)** — Formally proved equivalence theorems for 10+ core RISC-V instructions, covering RV64I ALU operations, shifts, branches, jumps, and at least one M-extension instruction. Each instruction proved with three sub-theorems (semantics, PC update, register isolation). All proofs machine-checked by Coq — zero trust required.\n2. **Differential testing framework (crates/diff-test/)** — CLI tool that executes identical ELF binaries on CKB-VM and Sail C++ emulator, comparing execution traces step-by-step. Supports `–verbose`, `–json`, `–max-steps`, `–test-dir` flags. Reports first divergence point with full state dump.\n3. **CKB-VM instruction mapping document** — Complete mapping of all 158 CKB-VM opcodes to Sail RISC-V counterparts, classified by extension, with proof/test status annotations.\n4. **Formal semantic gap analysis** — Rigorous documentation of all 8 identified semantic gaps between CKB-VM and standard RISC-V, with precise characterization of impact on equivalence claims and mitigation strategies.\n5. **Methodology documentation (doc/)** — Architecture guide and proof methodology document enabling future contributors to extend proofs to additional instructions without re-learning the entire framework.\n6. **Build-from-source reproducibility** — Single `make all` command builds everything (Coq proofs + Sail emulator + Rust diff-test tool). Documented toolchain requirements and setup script.\n7. **Phase 2+ roadmap** — Detailed plan for extending to full RV64I, M/C/B extensions, and ASM-mode verification, suitable for Community Fund DAO-scale proposal.\n#### 6.2 Concrete Output Examples\n##### Example A — Coq Proof Compilation Output\n```\n$ make coq\ncoqc -R . CkbVmVerify -R generated Riscv MachineState.v\ncoqc -R . CkbVmVerify -R generated Riscv CkbVmModel.v\ncoqc -R . CkbVmVerify -R generated Riscv InstructionEquiv.v\nProved: ckb_add_semantics\nProved: ckb_add_pc\nProved: ckb_add_isolation\nProved: ckb_sub_semantics\nProved: ckb_sub_pc\nProved: ckb_sub_isolation\nProved: ckb_addi_semantics\n…\n────────────────────────────────────────\nTotal: 33 theorems proved, 0 admitted.\nAll proofs verified by Coq 9.0.0.\n```\nAll theorems are machine-checked. Any modification that breaks a proof will cause `make coq` to fail — continuous verification by construction.\n##### Example B — Differential Test Output (Pass)\n```\n$ cargo run --release -p ckb-vm-diff-test – \\\n--test-dir tests/rv64ui/ --sail-bin deps/sail-riscv/build/sail_riscv_sim\nRunning 55 differential tests (rv64ui)…\nrv64ui-p-add … CKB-VM: 127 steps, Sail: 127 steps → MATCH\nrv64ui-p-addi … CKB-VM: 98 steps, Sail: 98 steps → MATCH\nrv64ui-p-and … CKB-VM: 112 steps, Sail: 112 steps → MATCH\nrv64ui-p-andi … CKB-VM: 95 steps, Sail: 95 steps → MATCH\nrv64ui-p-auipc … CKB-VM: 43 steps, Sail: 43 steps → MATCH\nrv64ui-p-beq … CKB-VM: 156 steps, Sail: 156 steps → MATCH\n…\n────────────────────────────────────────\nResults: 55 passed, 0 failed, 0 skipped.\nAll execution traces match.\n```\n##### Example C — Differential Test Output (Divergence Detected)\n```\n$ cargo run --release -p ckb-vm-diff-test – \\\n--elf tests/edge/division_by_zero.elf --verbose\nStep 1: PC=0x80000000 CKB-VM Sail\nStep 2: PC=0x80000004 CKB-VM Sail\nStep 3: PC=0x80000008 CKB-VM Sail\nStep 4: PC=0x8000000c DIVERGENCE DETECTED\nCKB-VM state:\nPC = 0x80000010\nx10 = 0xffffffffffffffff ← DIV by zero → all-ones\nSail state:\nPC = 0x80000010\nx10 = 0xffffffffffffffff\nRegister diff: (none — values match at this step)\nFirst divergence: step 4, register x10\nNote: Both implementations return all-ones for DIV by zero\n(RISC-V spec mandated behavior). Trace matches.\n────────────────────────────────────────\nResult: MATCH (divergence was false alarm after full comparison)\n```\n##### Example D — JSON Machine-Readable Output\n```\n$ cargo run --release -p ckb-vm-diff-test – \\\n--elf tests/rv64ui/rv64ui-p-add --json\n{\n“elf”: “tests/rv64ui/rv64ui-p-add”,\n“ckb_vm_steps”: 127,\n“sail_steps”: 127,\n“ckb_vm_exit_code”: 0,\n“sail_exit_code”: 0,\n“match”: true,\n“first_mismatch”: null,\n“trace_comparison”: {\n\"total_steps_compared\": 127,\n\"registers_compared_per_step\": 32,\n\"all_match\": true\n}\n}\n```\n#### 6.3 Reproducible Build Environment\n```bash\ngit clone --recursive https://github.com/xxxx/ckb-vm-sail-verify.git\ncd ckb-vm-sail-verify\nmake all # Builds everything: Coq proofs + Sail emulator + diff-test tool\nmake test # Runs differential tests against riscv-tests suite\n```\n**Toolchain requirements documented in README:**\n- Rust >= 1.92.0\n- OPAM with Sail >= 0.20.x, Coq 9.0.0, coq-sail-stdpp\n- CMake >= 3.20\n- GMP development library\n- RISC-V cross-compiler (for custom test assembly)\n**`scripts/` directory** provides automated setup:\n- `generate_coq.sh` — Generate Coq from Sail source\n- `build_sail_emulator.sh` — Build Sail C++ emulator\n- `run_differential.sh` — Run full differential test suite\n#### 6.4 Acceptance Criteria\n##### Formal Verification Criteria\n- V-1: `make coq` compiles all Coq proofs without errors or warnings on a clean environment.\n- V-2: At least 10 RISC-V instructions have complete three-theorem proofs (semantics + PC + isolation) with zero `Admitted`.\n- V-3: At least one M-extension instruction (MUL) is formally proved.\n- V-4: All helper lemmas (`truncate_64_idempotent`, `x0_always_zero`, `get_set_reg_*`) are fully proved.\n- V-5: State equivalence relation is formally defined and used consistently across all proofs.\n##### Differential Testing Criteria\n- T-1: Differential tests pass on the complete rv64ui test suite (55 tests).\n- T-2: Differential tests pass on rv64um test suite (13 tests).\n- T-3: The diff-test tool correctly detects and reports injected semantic divergences (negative testing).\n- T-4: `–json` output is valid JSON parseable by `jq`.\n- T-5: Edge case tests (division by zero, max shift, overflow) produce matching traces.\n##### Documentation Criteria\n- D-1: Complete 158-opcode mapping table with proof/test status.\n- D-2: Semantic gap analysis covers all 8 identified gaps with impact assessment.\n- D-3: Clean-room build test: fresh clone → `make all` succeeds → `make test` passes.\n- D-4: Bilingual (English + Chinese) README with quick-start instructions.\n### 7. Funding Request and Usage\n**Total Amount Requested:** 1,000 USD\n**Payment Method:** 100% CKB\n| Category | Amount | Description |\n|----------|--------|-------------|\n| Cloud Server | $350 USD | 1 VPS (Linux, ≥ 4 cores 16GB RAM) for Coq compilation (memory-intensive), Sail emulator builds, and differential test execution. 8 weeks. |\n| Developer Compensation | $450 USD | Core development work. Estimated 20–30 hours per week, 8 weeks. Covers Coq proof development, Rust diff-test framework, and Sail integration. |\n| Documentation & Community | $200 USD | Bilingual documentation, architecture diagrams, 2 monthly sharing sessions, completion report, and Phase 2+ roadmap preparation. |\n### 8. Estimated Completion Timeline\n**Total Duration:** 8 weeks (~2 months)\n#### Phase 1: Infrastructure and Feasibility (Week 1–3)\n**Week 1:** Complete toolchain installation (Sail + Coq + CMake + Rust + RISC-V cross-compiler). Clone sail-riscv submodule. Generate Coq from Sail. Build Sail C++ emulator. Build Rust workspace. Create 158-opcode instruction mapping table.\n**Week 2:** Implement differential testing framework — CKB-VM step executor, Sail trace parser, comparison engine, CLI tool with `–verbose`/`–json`/`–max-steps`. Handle ECALL divergence between riscv-tests and CKB-VM. Run initial diff-tests on rv64ui suite.\n**Week 3:** Study Sail-generated Coq (~30,000 lines). Define state equivalence relation. Build proof infrastructure and reusable tactics. Create minimal test import (`Check execute_RISCV_ADD`). Document Sail Coq bridging strategy. **Most challenging week** — debugging imports, name conflicts, library dependencies.\n**Milestone 1 (End of Week 3):** Differential testing framework operational with rv64ui tests passing. Coq infrastructure compiles against Sail-generated definitions. Instruction mapping table complete.\n#### Phase 2: Formal Proofs (Week 4–6)\n**Week 4:** Prove 7 core ALU instructions (ADD, SUB, ADDI, SLLI, SRLI, SRAI, MUL) with full three-theorem coverage. Build reusable lemma library.\n**Week 5:** Prove control flow instructions (BEQ taken/not-taken, JAL link+jump). Build memory lemmas. Prove LW, SW. Submit mid-term progress report.\n**Week 6:** Prove additional ALU instructions (AND, OR, XOR, SLTI, JALR). Verify MOP fusion equivalence (WIDE_MUL = MULH + MUL). Audit all `Admitted` proofs.\n**Milestone 2 (End of Week 6):** 10+ instructions with complete proofs. MOP fusion verified. Mid-term report submitted.\n#### Phase 3: Integration and Delivery (Week 7–8)\n**Week 7:** Full differential test coverage across rv64ui/rv64um/rv64uc/rv64ua. Edge case testing. Produce formal semantic gap analysis document. Update instruction coverage matrix with proof/test status.\n**Week 8:** Clean-room build test. Bilingual documentation. Code cleanup (fmt, clippy). Completion report. Phase 2+ roadmap. Community sharing session. Final submission.\n**Milestone 3 (End of Week 8):** All deliverables submitted — Coq proof library, diff-test framework, instruction mapping, gap analysis, methodology docs, and roadmap.\n#### Timeline Overview\n| Phase | Weeks | Focus | Milestone |\n|-------|-------|-------|-----------|\n| Phase 1: Infrastructure | Week 1–3 | Toolchain, diff-test framework, Coq infrastructure, instruction mapping | Milestone 1 |\n| Phase 2: Formal Proofs | Week 4–6 | Core ALU proofs, control flow, memory, MOP, proof audit | Milestone 2 (Week 6) |\n| Phase 3: Integration & Delivery | Week 7–8 | Full test coverage, gap analysis, docs, clean-room build, submission | Milestone 3 |\n### 9. Relevance to CKB Ecosystem\n**Addresses a long-standing open issue.** [ckb-vm#190](Formally Verify CKB-VM via Sail · Issue #190 · nervosnetwork/ckb-vm · GitHub) has been open since 2021, requesting formal verification of CKB-VM’s RISC-V implementation. This project directly delivers a proof-of-concept solution.\n**Strengthens CKB’s security foundation.** CKB-VM executes every transaction on the Nervos network. A formally verified instruction set eliminates an entire class of potential vulnerabilities — instruction-level semantic bugs that testing cannot fully prevent. This is especially critical for a blockchain where deterministic execution is a security invariant.\n**Establishes a reusable verification framework.** The Coq proof infrastructure, differential testing tool, and methodology documentation are designed for extension. Future contributors can add proofs for additional instructions following the established patterns, without re-learning the framework.\n**Demonstrates CKB’s technical depth.** Formal verification of a blockchain VM is rare in the industry. CompCert verified a C compiler; seL4 verified an OS kernel; this project brings the same rigor to CKB’s execution layer. It positions CKB alongside the most technically ambitious blockchain projects.\n**Enables future full-ISA verification.** This PoC covers 10+ instructions and establishes the methodology. The Phase 2+ roadmap (full RV64I, extensions, ASM-mode) provides a clear path to comprehensive verification suitable for a Community Fund DAO proposal.\n**Pure Rust + Coq toolchain.** The differential testing framework is pure Rust, consistent with CKB’s technology stack. Rust developers in the CKB community can contribute without learning new languages (Coq proofs are the specialist component, but the testing framework is accessible to all).\n### 10. Technical Risks and Mitigations\n| Risk | Impact | Probability | Mitigation |\n|------|--------|-------------|------------|\n| Sail Coq output structure changes between versions | High | Low | Pin sail-riscv to a specific commit. Document exact Sail compiler version. Provide `generate_coq.sh` for reproducible regeneration. |\n| Sail monadic Coq too complex to bridge with CKB-VM pure model | High | Medium | Week 3 is dedicated to this. Fallback: extract observable state post-execution rather than proving step-internal equivalence. Worst case: prove against a simplified Sail extract rather than raw monadic output. |\n| Coq compilation time exceeds development iteration speed | Medium | Medium | Generated Coq is ~30K lines. Use `_CoqProject` with selective compilation. Develop proofs incrementally. Cloud server with 16GB RAM for parallel coqc. |\n| CKB-VM Rust semantics diverge from Coq model | Medium | Medium | Differential testing catches runtime divergences. Cross-reference Coq model against actual Rust source. Document any abstraction gaps. |\n| riscv-tests ECALL convention incompatible with CKB-VM | Medium | Low | Already identified and designed for: CKB-VM uses syscall 93, riscv-tests use `tohost`. Sail runner filters ECALL divergence. |\n| Insufficient time for 10+ instruction proofs | Medium | Medium | Prioritize core ALU instructions (ADD, SUB, ADDI) which have the simplest proof structure. Build reusable tactics early. Accept partial proofs (documented Admits) for complex instructions (SW, JALR). |\n| Sail C++ emulator build failures on target platform | Low | Low | CMake build is well-documented in sail-riscv. Provide `build_sail_emulator.sh` with error handling. GMP is the only non-trivial dependency. |\n| Generated Coq requires unavailable Coq libraries | Low | Medium | Use `coq-sail-stdpp` as documented by sail-riscv project. Pin exact Coq version (9.0.0) in build instructions. |\n### 11. Transparency Commitments\n**Fully open-source from Day 1.** All code on GitHub under MIT license, public repository from the start of development.\n**Weekly progress updates.** Posted on Nervos Talk forum every week, covering completed tasks, blockers, and next-week plan.\n**Monthly sharing sessions.** Two sessions total (Week 4 and Week 8), the final session includes a live demo of Coq proof compilation and differential test execution, plus Q&A.\n**Machine-verifiable results.** All Coq proofs are machine-checked — anyone can clone the repo, run `make coq`, and independently verify every theorem. No trust in the author required.\n**Honest reporting of limitations.** The completion report will explicitly document: number of `Admitted` proofs and why, semantic gaps that prevent full equivalence claims, and instructions not yet covered.\n**Reproducible from scratch.** Complete build instructions, pinned dependencies, and setup scripts ensure any reviewer can reproduce all results independently.\n-–\n-–\n## 中文版本\n### 一、项目名称与简介\n**项目名称：** ckb-vm-sail-verify\n**一句话简介：** 基于 Sail RISC-V 官方规范与 Coq 定理证明器，形式化验证 CKB-VM 的 RISC-V 指令执行语义与标准规范的数学等价性，辅以差分测试进行双重验证。\n### 二、团队/个人介绍\n**申请人：** Tinyueng（[GitHub](TinyuengKwan (Tinyueng) · GitHub)）\n**核心能力：** 目前在 PLCT Lab（中科院软件所）****Sail & ACT（RISC-V Architectural Certification Tests）小队****实习，对 Sail 体系结构定义语言和 RISC-V 一致性测试有直接、深入的实践经验。参与 [sail-riscv](GitHub - riscv/sail-riscv: Sail RISC-V model · GitHub) 生态贡献，并独立开发了 [sail-lsp](GitHub - TinyuengKwan/sail-lsp · GitHub)——Sail 语言的 Language Server Protocol 实现。熟悉 Sail 编译器工具链（包括用于定理证明器集成的 `–coq` 后端）、sail-riscv 形式模型结构以及 RISC-V 架构认证测试（ACT）框架。具备 Rust 系统编程能力（所有权、生命周期、trait 系统）。系统学习过 CS:APP，对处理器架构、虚拟内存、ELF 链接、系统级 I/O 有完整知识框架。此外还学习过编译原理、x86 汇编等内容。\n**相关领域知识：** 对 Sail 形式规范语言有深度了解——不仅是使用者，更是工具链贡献者（sail-lsp）。具有 RISC-V 架构认证测试（ACT）的实践经验，理解如何基于 Sail 参考模型验证一致性。熟悉 RISC-V ISA 体系结构（RV64IMAC、特权级、扩展机制）。具有 Coq 交互式定理证明器和 Sail Coq 后端输出的使用经验。通过源码分析深入理解 CKB-VM 内部机制（nervosnetwork/ckb-vm），涵盖其版本化指令语义、MOP（宏操作融合）扩展和平坦内存模型。\n### 三、问题描述\nCKB-VM 是驱动 Nervos CKB 网络所有链上计算的 RISC-V 虚拟机。每一笔 CKB 交易——每一个 lock script、type script 和智能合约——都在 CKB-VM 中执行。其正确性是整条链的安全基石。然而目前，没有任何形式化证明表明 CKB-VM 忠实地实现了 RISC-V 规范。\n**指令语义缺乏形式化验证。** CKB-VM 实现了横跨 RV64I、M、C（Zca）、B（Zba/Zbb/Zbc/Zbs）、A 和自定义 MOP 扩展的 158 条操作码。每条指令都是手工编写的 Rust 代码。尽管实现质量高且经过实战考验，但不存在任何数学证明表明其中任何一条指令的行为与 RISC-V 规范完全一致。这是 [nervosnetwork/ckb-vm#190](Formally Verify CKB-VM via Sail · Issue #190 · nervosnetwork/ckb-vm · GitHub)，自 2021 年开放至今。\n**测试无法保证正确性。** CKB-VM 通过了标准 riscv-tests 测试套件，但测试套件是有限的——无法覆盖所有可能的输入状态、边缘情况或极端条件下的交互。一个微妙的语义偏差（例如，在某个罕见操作数组合上的符号扩展错误）可能多年未被发现，直到被利用。\n**缺乏权威的比对基准。** RISC-V ISA 规范以自然语言（英文散文）编写，天然存在歧义。不同实现者可能对同一句话有不同理解。没有机器可读的形式规范作为参考，\"正确性\"仍然是人为判断。\n**Sail RISC-V 解决了基准问题。** RISC-V International 已采纳 Sail 作为 RISC-V 的官方形式规范语言。[sail-riscv](GitHub - riscv/sail-riscv: Sail RISC-V model · GitHub) 模型是 RISC-V 指令语义的机器可读、可执行、数学精确的定义。关键在于，Sail 可以编译为 Coq——使得直接针对官方规范进行定理证明成为可能。\n**CKB-VM 与标准 RISC-V 之间存在语义差距。** CKB-VM 不是通用的 RISC-V 实现。它拥有平坦的 4MB 内存模型（无 MMU/虚拟内存）、自定义 ECALL 处理（syscall 93 退出，非 M 模式陷阱）、三种 VERSION 模式（各有不同行为语义）、逐指令周期计数以及标准规范中没有对应物的 MOP 融合指令。这些差距必须被形式化地记录，并精确表征其对等价性的影响。\n**实际影响：** 经过形式化验证的 CKB-VM 将为 CKB 的执行层提供最强的正确性保证。它将消除一整类潜在漏洞——指令级语义缺陷——这是任何数量的测试都无法完全防止的。对于一条安全模型完全依赖确定性执行的区块链而言，这是一项基础性投资。\n### 四、解决方案\n#### 4.1 核心思路\nckb-vm-sail-verify 采用双重验证策略：****Coq 形式化证明**提供数学确定性，**差分测试****提供实际运行时验证。两种方法互补——形式化证明覆盖已证明指令的所有可能输入，差分测试则用具体测试向量验证完整指令集。\n#### 4.2 为什么选择 Sail + Coq\n**Sail 是官方 RISC-V 规范。** 与非正式的散文规范不同，Sail 是可执行的、无歧义的，由 RISC-V International 维护。以 Sail 作为基准意味着我们针对硬件厂商使用的同一规范进行验证。\n**Sail 可编译为 Coq。** Sail 编译器的 `–coq` 后端从 Sail 源码生成 Coq 定义。这使我们在 Coq——CompCert（经验证的 C 编译器）和 seL4（经验证的操作系统内核）使用的同一定理证明器——中获得了机器可检查的 RISC-V 语义。\n**Coq 证明是全称的。** 一旦 Coq 定理被证明，它对所有可能的输入成立——而不仅仅是测试向量。一条被证明的 `ADD` 指令对每一种源寄存器和机器状态的组合都是正确的，无条件地。\n**可复现且可审计。** Coq 证明由机器检查。任何人都可以重新运行 `make coq` 并独立验证每一个定理。不需要信任证明者——只需信任 Coq 的内核，这是现存最受审视的软件之一。\n#### 4.3 四层验证架构\n**Layer 1：Sail RISC-V 形式规范（基准真相）**\n官方 sail-riscv 模型，通过 `sail --coq` 编译为 Coq。生成约 30,000 行 Coq 代码，以单子状态转换器风格定义完整的 RV64 指令语义。这是权威参考——RISC-V **应该**做什么。\n**Layer 2：CKB-VM Coq 模型（验证目标）**\n手工编写的 CKB-VM Rust 解释器逻辑的 Coq 模型，捕获寄存器文件操作（32 × 64 位，x0 硬连线为零）、PC 推进、ALU 计算、内存访问（小端序、平坦 4MB）和分支/跳转语义。基于 ckb-vm `execute.rs` 中的 `handle_*` 函数。这是 CKB-VM **实际**做什么。\n**Layer 3：等价性证明（数学桥梁）**\n对每条目标指令，证明：`ckb_vm_execute(I, state) = sail_execute(I, state)`。每个证明分解为三个子定理：(a) **语义**——目标寄存器具有正确的计算值；(b) **PC 更新**——下一个 PC 是 PC+4（或分支/跳转的 PC+偏移量）；(c) **寄存器隔离**——所有其他寄存器不变。证明技术：展开定义 → 用辅助引理改写 → 用 `lia`/`reflexivity` 求解。\n**Layer 4：差分测试（运行时验证）**\n在 CKB-VM（Rust）和 Sail C++ 模拟器上执行相同的 RISC-V ELF 二进制文件，然后逐步比较执行轨迹——PC 值、寄存器状态和退出码。通过覆盖完整指令集（包括尚未被形式化证明的指令）和捕获 Coq 模型中可能不可见的实现缺陷来补充形式化证明。\n#### 4.4 CKB-VM 特定考量\n| 方面 | CKB-VM 行为 | Sail RISC-V 行为 | 处理方式 |\n|------|------------|-----------------|---------|\n| x0 寄存器 | 写入后清零 | 静默丢弃写入 | 功能等价——在 Coq 中证明 |\n| ECALL | 通过 A7 分发（syscall 93 = 退出） | 陷入 M 模式处理器 | 差分测试中过滤；排除在 Coq 范围外 |\n| 内存模型 | 平坦 4MB，无 MMU，W^X | Sv39/48/57 完整 MMU | CKB-VM 简化模型；在 4MB 子集内证明 |\n| FENCE | 空操作（单线程） | 完整栅栏语义 | 单核模型中无可观测差异 |\n| 原子操作（A 扩展） | 单地址保留 | 保留集 | 有限并发；证明单核子集 |\n| VERSION 模式 | 三种模式（0/1/2） | 无 | 仅针对 VERSION2 |\n| 周期计数 | 逐指令成本追踪 | 无 | 过滤；不属于体系结构状态 |\n| MOP 融合 | WIDE_MUL、FAR_JUMP_REL、ADC | 无对应物 | 验证为标准指令的组合 |\n### 五、详细技术实现计划\n#### 5.1 Phase 0：环境搭建与指令映射（Week 1）\n安装完整工具链：Sail（>= 0.20.x，通过 OPAM）、Coq（9.0.0 + coq-sail-stdpp）、CMake、Rust（1.92.0+）、GMP、RISC-V 交叉编译器（`gcc-riscv64-unknown-elf`）。克隆 sail-riscv 作为 git 子模块。构建 Rust 工作空间（`lib/` + `crates/diff-test/`）。通过 `sail --coq --dcoq-undef-axioms` 从 Sail 生成 Coq。构建 Sail C++ 模拟器用于差分测试。\n创建完整的 CKB-VM 指令映射表：将所有 158 条 CKB-VM 操作码映射到其 Sail RISC-V 对应项，按扩展分类（I/M/C/A/B/MOP），识别无 Sail 对应项的指令（MOP 融合操作）。\n#### 5.2 Phase 1：差分测试框架（Week 2）\n实现 diff-test CLI 工具，支持逐步执行轨迹比较：\n**CKB-VM 端（`lib/src/runner.rs`）：** 将 CKB-VM 包装为逐步执行器，在每条指令后捕获 `StepState`（PC + 32 个寄存器）。处理退出系统调用（通过 A7 寄存器的 syscall 93）。支持 `–max-steps` 防止无限循环。\n**Sail 端（`crates/diff-test/src/sail_runner.rs`）：** 解析 Sail C++ 模拟器的轨迹输出格式。逐步提取 PC 和寄存器值。处理 ECALL 差异（riscv-tests 使用 `tohost` 内存映射 I/O；CKB-VM 使用系统调用）。\n**比较引擎（`lib/src/state.rs`）：** 带 `first_mismatch` 检测的 `CompareResult`——报告轨迹分歧的精确步骤、PC 和寄存器。支持 `–verbose` 模式（完整轨迹转储）和 `–json`（机器可读输出）。\n**测试语料库：** 自动发现来自 riscv-tests（rv64ui、rv64um、rv64uc、rv64ua）、riscv-arch-test 和 CKB-VM 自有测试套件的 ELF 测试工件。\n#### 5.3 Phase 2：Coq 基础设施与 Sail 接口（Week 3）\n这是最具挑战性的阶段——桥接 Sail 的单子 Coq 输出与 CKB-VM 的纯模型。\n**研究生成的 Coq（约 30,000 行）：** 理解 Sail 的状态单子编码、寄存器访问模式、位向量算术库和指令解码/执行结构。识别每条 RISC-V 指令对应的 Coq 函数（如 `execute_RISCV_ADD`）。\n**定义状态等价关系：** 形式化 CKB-VM 的 `machine_state` 记录与 Sail 单子状态之间的映射。处理类型不匹配（Sail 使用位向量；CKB-VM 模型使用 Z 整数）。\n**构建证明基础设施：** 创建可重用的 Coq 策略（`solve_truncate`、`simplify_regs`、`unfold_sail_step`），自动化常见证明模式。用最小导入测试：`Check execute_RISCV_ADD`。\n#### 5.4 Phase 3：核心 ALU 指令证明（Week 4）\n证明 7 条核心指令，每条提供完整的三定理覆盖（语义 + PC + 寄存器隔离）：\n| 指令 | 类型 | 关键证明挑战 |\n|------|------|------------|\n| ADD | R 型 | `truncate_64` 幂等性，x0 硬连线 |\n| SUB | R 型 | 无符号减法 mod 2^64 |\n| ADDI | I 型 | 12 位立即数的符号扩展 |\n| SLLI | 移位 | 移位量掩码（RV64 取低 6 位） |\n| SRLI | 移位 | 逻辑右移与算术右移的区分 |\n| SRAI | 移位 | 算术移位中符号位的保持 |\n| MUL | M 扩展 | 64 位乘法，128 位乘积的低 64 位 |\n构建可重用引理库：`truncate_64_idempotent`、`x0_always_zero`、`get_set_reg_same`、`get_set_reg_diff`、`sign_extend_properties`。\n#### 5.5 Phase 4：控制流与内存证明（Week 5）\n**分支指令：** BEQ 的跳转路径（PC += offset）和不跳转路径（PC += 4）。处理符号扩展的分支偏移量。\n**跳转指令：** JAL 的链接地址（rd = PC + 4）和跳转目标（PC = PC + offset）。JALR 的间接跳转（LSB 清零）。\n**内存指令：** 构建内存引理——存取往返（`load(store(addr, val)) = val`）、字节序（小端）、对齐。证明 LW（符号扩展的 32 位加载）和 SW（32 位存储）。\n#### 5.6 Phase 5：扩展证明与 MOP 验证（Week 6）\n**额外 ALU 证明：** AND、OR、XOR、SLTI、SLTIU、LUI、AUIPC。\n**MOP 融合验证：** 验证 `WIDE_MUL(rd1, rd2, rs1, rs2)` = 顺序执行 `MULH(rd1, rs1, rs2); MUL(rd2, rs1, rs2)`。验证 `FAR_JUMP_REL(rd, offset)` = `AUIPC(rd, upper); JALR(rd, rd, lower)`。\n**证明审计：** 审查所有 `Admitted` 引理。尽可能关闭。记录剩余 admits 及明确理由。\n**目标：** 10+ 条核心指令完全证明（零 Admitted），5+ 条额外指令具有部分证明。\n#### 5.7 Phase 6：完整差分测试覆盖与语义差距分析（Week 7）\n**分扩展测试：** 在 rv64ui（整数）、rv64um（乘除法）、rv64uc（压缩指令）、rv64ua（原子操作）测试套件上运行差分测试。添加 `–extension` CLI 过滤器。\n**边缘情况测试：** 除以零行为、最大移位量、整数溢出/下溢、非对齐内存访问、x0 写入尝试。\n**语义差距文档化：** 生成正式的差距分析文档，覆盖所有 8 个已识别的语义差距，精确表征每个差距对等价性声明的影响。\n**指令覆盖矩阵：** 更新 158 条操作码映射表，标注证明状态（已证明 / 部分证明 / 仅差分测试 / 未覆盖）和测试状态（通过 / 失败 / 跳过 / 不适用）。\n#### 5.8 Phase 7：文档、交付物与路线图（Week 8）\n**洁净室构建测试：** 从头克隆，`make all` 成功，所有 Coq 证明编译通过，差分测试通过。\n**结项报告：** 已证明定理数量、完全覆盖的指令数量、通过的差分测试数量、完整差距分析、剩余 `Admitted` 证明及理由。\n**Phase 2+ 路线图：** 完整 RV64I 覆盖（约 50 条指令）、M/C/B 扩展证明、ASM 模式验证（需要 Islaris 级工具，约 6+ 个月，$20k+ 规模）。\n**代码质量：** `cargo fmt`、`cargo clippy`、移除临时文件、确保 CI 就绪。\n### 六、预期交付成果\n#### 6.1 核心交付物\n1. **Coq 证明库（coq/）** —— 10+ 条核心 RISC-V 指令的形式化等价性定理，覆盖 RV64I ALU 操作、移位、分支、跳转及至少一条 M 扩展指令。每条指令包含三个子定理（语义、PC 更新、寄存器隔离）。所有证明由 Coq 机器检查——零信任需求。\n2. **差分测试框架（crates/diff-test/）** —— 在 CKB-VM 和 Sail C++ 模拟器上执行相同 ELF 二进制文件并逐步比较执行轨迹的 CLI 工具。支持 `–verbose`、`–json`、`–max-steps`、`–test-dir` 参数。报告首个分歧点及完整状态转储。\n3. **CKB-VM 指令映射文档** —— 所有 158 条 CKB-VM 操作码到 Sail RISC-V 对应项的完整映射，按扩展分类，标注证明/测试状态。\n4. **形式化语义差距分析** —— 严格记录 CKB-VM 与标准 RISC-V 之间所有 8 个已识别语义差距，精确表征对等价性声明的影响及缓解策略。\n5. **方法论文档（doc/）** —— 架构指南和证明方法论文档，使未来贡献者能够按照既定模式为额外指令添加证明，无需重新学习整个框架。\n6. **源码可复现构建** —— 单条 `make all` 命令构建全部内容（Coq 证明 + Sail 模拟器 + Rust 差分测试工具）。文档化的工具链要求和安装脚本。\n7. **Phase 2+ 路线图** —— 扩展至完整 RV64I、M/C/B 扩展和 ASM 模式验证的详细计划，适用于 Community Fund DAO 规模的提案。\n#### 6.2 验收标准\n##### 形式化验证标准\n- V-1：`make coq` 在洁净环境中编译所有 Coq 证明，无错误或警告。\n- V-2：至少 10 条 RISC-V 指令具有完整的三定理证明（语义 + PC + 隔离），零 `Admitted`。\n- V-3：至少一条 M 扩展指令（MUL）被形式化证明。\n- V-4：所有辅助引理（`truncate_64_idempotent`、`x0_always_zero`、`get_set_reg_*`）完全证明。\n- V-5：状态等价关系被形式化定义并在所有证明中一致使用。\n##### 差分测试标准\n- T-1：差分测试在完整 rv64ui 测试套件（55 个测试）上通过。\n- T-2：差分测试在 rv64um 测试套件（13 个测试）上通过。\n- T-3：diff-test 工具正确检测并报告注入的语义偏差（负面测试）。\n- T-4：`–json` 输出为 `jq` 可解析的有效 JSON。\n- T-5：边缘情况测试（除以零、最大移位、溢出）产生匹配的轨迹。\n##### 文档标准\n- D-1：完整的 158 条操作码映射表，标注证明/测试状态。\n- D-2：语义差距分析覆盖所有 8 个已识别差距及影响评估。\n- D-3：洁净室构建测试：全新克隆 → `make all` 成功 → `make test` 通过。\n- D-4：中英双语 README，含快速入门说明。\n### 七、所需资金及用途说明\n**申请总额：** 1,000 USD\n**支付方式：** 100% CKB\n| 类别 | 金额 | 说明 |\n|------|------|------|\n| 云服务器 | $350 USD | 1 台 VPS（Linux，≥ 4 核 16GB 内存），用于 Coq 编译（内存密集型）、Sail 模拟器构建和差分测试执行。8 周使用。 |\n| 开发者补贴 | $450 USD | 核心开发工作。预计每周 20–30 小时，共 8 周。涵盖 Coq 证明开发、Rust 差分测试框架和 Sail 集成。 |\n| 文档与社区 | $200 USD | 中英双语文档编写、架构图制作、2 次月度分享会材料、结项报告和 Phase 2+ 路线图准备。 |\n### 八、预计完成时间\n**总周期：** 8 周（约 2 个月）\n#### 第一阶段：基础设施与可行性验证（Week 1–3）\n**Week 1：** 完成工具链安装（Sail + Coq + CMake + Rust + RISC-V 交叉编译器）。克隆 sail-riscv 子模块。从 Sail 生成 Coq。构建 Sail C++ 模拟器。构建 Rust 工作空间。创建 158 条操作码指令映射表。\n**Week 2：** 实现差分测试框架——CKB-VM 逐步执行器、Sail 轨迹解析器、比较引擎、支持 `–verbose`/`–json`/`–max-steps` 的 CLI 工具。处理 riscv-tests 与 CKB-VM 之间的 ECALL 差异。在 rv64ui 套件上运行初始差分测试。\n**Week 3：** 研究 Sail 生成的 Coq（约 30,000 行）。定义状态等价关系。构建证明基础设施和可重用策略。创建最小测试导入（`Check execute_RISCV_ADD`）。记录 Sail Coq 桥接策略。**最具挑战性的一周**——调试导入、名称冲突、库依赖。\n**里程碑 1（Week 3 末）：** 差分测试框架可运行，rv64ui 测试通过。Coq 基础设施可针对 Sail 生成的定义进行编译。指令映射表完成。\n#### 第二阶段：形式化证明（Week 4–6）\n**Week 4：** 证明 7 条核心 ALU 指令（ADD、SUB、ADDI、SLLI、SRLI、SRAI、MUL），每条提供完整的三定理覆盖。构建可重用引理库。\n**Week 5：** 证明控制流指令（BEQ 跳转/不跳转、JAL 链接+跳转）。构建内存引理。证明 LW、SW。提交中期进度报告。\n**Week 6：** 证明额外 ALU 指令（AND、OR、XOR、SLTI、JALR）。验证 MOP 融合等价性（WIDE_MUL = MULH + MUL）。审计所有 `Admitted` 证明。\n**里程碑 2（Week 6 末）：** 10+ 条指令具有完整证明。MOP 融合已验证。中期报告已提交。\n#### 第三阶段：集成与交付（Week 7–8）\n**Week 7：** 在 rv64ui/rv64um/rv64uc/rv64ua 上进行完整差分测试覆盖。边缘情况测试。生成正式语义差距分析文档。更新指令覆盖矩阵。\n**Week 8：** 洁净室构建测试。中英双语文档。代码清理（fmt、clippy）。结项报告。Phase 2+ 路线图。社区分享会。最终提交。\n**里程碑 3（Week 8 末）：** 所有交付物提交——Coq 证明库、差分测试框架、指令映射、差距分析、方法论文档和路线图。\n#### 时间线总览\n| 阶段 | 周次 | 重点 | 里程碑 |\n|------|------|------|--------|\n| 第一阶段：基础设施 | Week 1–3 | 工具链、差分测试框架、Coq 基础设施、指令映射 | 里程碑 1 |\n| 第二阶段：形式化证明 | Week 4–6 | 核心 ALU 证明、控制流、内存、MOP、证明审计 | 里程碑 2（Week 6） |\n| 第三阶段：集成与交付 | Week 7–8 | 完整测试覆盖、差距分析、文档、洁净室构建、提交 | 里程碑 3 |\n### 九、与 CKB 生态的关联性\n**回应长期开放的社区需求。** [ckb-vm#190](Formally Verify CKB-VM via Sail · Issue #190 · nervosnetwork/ckb-vm · GitHub) 自 2021 年开放至今，请求对 CKB-VM 的 RISC-V 实现进行形式化验证。本项目直接交付概念验证解决方案。\n**加固 CKB 的安全基石。** CKB-VM 执行 Nervos 网络上的每一笔交易。经过形式化验证的指令集消除了一整类潜在漏洞——测试无法完全防止的指令级语义缺陷。这对于安全模型依赖确定性执行的区块链而言至关重要。\n**建立可复用的验证框架。** Coq 证明基础设施、差分测试工具和方法论文档均为扩展而设计。未来贡献者可按照既定模式为额外指令添加证明，无需重新学习框架。\n**展示 CKB 的技术纵深。** 区块链虚拟机的形式化验证在行业中罕见。CompCert 验证了 C 编译器；seL4 验证了操作系统内核；本项目将同等严谨性带入 CKB 的执行层。这将 CKB 定位于技术最具雄心的区块链项目之列。\n**为未来全 ISA 验证铺路。** 本 PoC 覆盖 10+ 条指令并建立方法论。Phase 2+ 路线图（完整 RV64I、扩展、ASM 模式）为 Community Fund DAO 提案规模的全面验证提供了清晰路径。\n**纯 Rust + Coq 工具链。** 差分测试框架为纯 Rust，与 CKB 技术栈一致。CKB 社区的 Rust 开发者无需学习新语言即可贡献（Coq 证明是专业组件，但测试框架对所有人开放）。\n### 十、技术风险与应对\n| 风险 | 影响 | 概率 | 应对 |\n|------|------|------|------|\n| Sail Coq 输出结构在版本间变化 | 高 | 低 | 锁定 sail-riscv 到特定 commit。记录精确的 Sail 编译器版本。提供 `generate_coq.sh` 用于可复现的重新生成。 |\n| Sail 单子 Coq 过于复杂，难以与 CKB-VM 纯模型桥接 | 高 | 中 | Week 3 专门用于此任务。退路：提取执行后的可观测状态而非证明步骤内部等价性。最坏情况：针对简化的 Sail 提取物而非原始单子输出进行证明。 |\n| Coq 编译时间超出开发迭代速度 | 中 | 中 | 生成的 Coq 约 30K 行。使用 `_CoqProject` 进行选择性编译。增量开发证明。16GB 内存的云服务器支持并行 coqc。 |\n| CKB-VM Rust 语义与 Coq 模型偏差 | 中 | 中 | 差分测试捕获运行时偏差。将 Coq 模型与实际 Rust 源码交叉参照。记录所有抽象差距。 |\n| riscv-tests ECALL 约定与 CKB-VM 不兼容 | 中 | 低 | 已识别并已设计解决方案：CKB-VM 使用 syscall 93，riscv-tests 使用 `tohost`。Sail runner 过滤 ECALL 差异。 |\n| 时间不足以完成 10+ 条指令证明 | 中 | 中 | 优先处理证明结构最简单的核心 ALU 指令（ADD、SUB、ADDI）。尽早构建可重用策略。接受复杂指令（SW、JALR）的部分证明（记录 Admits）。 |\n| Sail C++ 模拟器在目标平台上构建失败 | 低 | 低 | CMake 构建在 sail-riscv 中有良好文档。提供带错误处理的 `build_sail_emulator.sh`。GMP 是唯一非平凡依赖。 |\n| 生成的 Coq 需要不可用的 Coq 库 | 低 | 中 | 按 sail-riscv 项目文档使用 `coq-sail-stdpp`。在构建说明中锁定精确的 Coq 版本（9.0.0）。 |\n### 十一、透明度承诺\n**Day 1 起完全开源。** 所有代码在 GitHub 上以 MIT 许可证公开，从开发伊始即为公开仓库。\n**每周进度更新。** 每周在 Nervos Talk 论坛发布，涵盖已完成任务、阻塞项和下周计划。\n**月度分享会。** 共两次（Week 4 和 Week 8），最后一次包括 Coq 证明编译和差分测试执行的实时演示及问答环节。\n**机器可验证的结果。** 所有 Coq 证明由机器检查——任何人都可以克隆仓库、运行 `make coq`，独立验证每一个定理。不需要信任作者。\n**如实报告局限性。** 结项报告将明确记录：`Admitted` 证明的数量及原因、阻碍完整等价性声明的语义差距、以及尚未覆盖的指令。\n**从头可复现。** 完整的构建说明、锁定的依赖版本和安装脚本，确保任何评审者都能独立复现所有结果。",
          "content_html": "<p><strong># [Spark Program] CKB-VM Sail Formal Verification — Proving CKB-VM RISC-V Instruction Equivalence via Sail Specification and Coq Theorem Prover / CKB-VM Sail 形式化验证 — 基于 Sail 规范与 Coq 定理证明器的 CKB-VM RISC-V 指令等价性证明</strong></p>\n<p>-–</p>\n<p><strong>## English Version</strong></p>\n<p><strong>### 1. Project Name and Summary</strong></p>\n<p><strong>**Project Name:**</strong> ckb-vm-sail-verify</p>\n<p><strong>**One-line Summary:**</strong> Formally verify that CKB-VM’s RISC-V instruction execution semantics are mathematically equivalent to the official Sail RISC-V specification, using Coq theorem proofs and differential testing as dual verification.</p>\n<p><strong>### 2. Team / Individual Introduction</strong></p>\n<p><strong>**Applicant:**</strong> Tinyueng ([GitHub](<a href=\"https://github.com/TinyuengKwan\" class=\"inline-onebox\" rel=\"noopener nofollow ugc\">TinyuengKwan (Tinyueng) · GitHub</a>))</p>\n<p><strong>**Core Competencies:**</strong> Currently interning at PLCT Lab (Institute of Software, Chinese Academy of Sciences) on the <strong>**Sail &amp; ACT (RISC-V Architectural Certification Tests) team**</strong>, with direct, hands-on experience in the Sail architecture definition language and RISC-V conformance testing. Contributed to the [sail-riscv](<a href=\"https://github.com/riscv/sail-riscv\" class=\"inline-onebox\" rel=\"noopener nofollow ugc\">GitHub - riscv/sail-riscv: Sail RISC-V model · GitHub</a>) ecosystem and developed [sail-lsp](<a href=\"https://github.com/TinyuengKwan/sail-lsp\" class=\"inline-onebox\" rel=\"noopener nofollow ugc\">GitHub - TinyuengKwan/sail-lsp · GitHub</a>), a Language Server Protocol implementation for Sail. Familiar with the Sail compiler toolchain (including the `–coq` backend for theorem prover integration), the sail-riscv formal model structure, and the RISC-V Architectural Certification Test framework. Possesses strong Rust systems programming capabilities (ownership, lifetimes, trait system). Systematic study of CS:APP with a complete knowledge framework spanning processor architecture, virtual memory, ELF linking, and system-level I/O. Additionally studied compiler theory and x86 assembly.</p>\n<p><strong>**Relevant Domain Knowledge:**</strong> Deep familiarity with the Sail formal specification language — not just as a user, but as a tooling contributor (sail-lsp). Hands-on experience with RISC-V Architectural Certification Tests (ACT), understanding how conformance is validated against the Sail reference model. Familiar with RISC-V ISA architecture (RV64IMAC, privilege levels, extension mechanisms). Experience with the Coq interactive theorem prover and Sail’s Coq backend output. Understanding of CKB-VM internals through source code analysis of nervosnetwork/ckb-vm, including its versioned instruction semantics, MOP (Macro-Op Fusion) extensions, and flat memory model.</p>\n<p><strong>### 3. Problem Description</strong></p>\n<p>CKB-VM is the RISC-V virtual machine that powers all on-chain computation in the Nervos CKB network. Every CKB transaction — every lock script, type script, and smart contract — executes inside CKB-VM. Its correctness is the security foundation of the entire chain. Yet today, there is no formal proof that CKB-VM faithfully implements the RISC-V specification.</p>\n<p><strong>**No formal verification of instruction semantics.**</strong> CKB-VM implements 158 opcodes across RV64I, M, C (Zca), B (Zba/Zbb/Zbc/Zbs), A, and custom MOP extensions. Each instruction is hand-written in Rust. While the implementation is high-quality and battle-tested, no mathematical proof exists that any single instruction behaves identically to the RISC-V specification. This is [nervosnetwork/ckb-vm#190](<a href=\"https://github.com/nervosnetwork/ckb-vm/issues/190\" class=\"inline-onebox\" rel=\"noopener nofollow ugc\">Formally Verify CKB-VM via Sail · Issue #190 · nervosnetwork/ckb-vm · GitHub</a>), open since 2021.</p>\n<p><strong>**Testing alone cannot guarantee correctness.**</strong> CKB-VM passes the standard riscv-tests suite, but test suites are finite — they cannot cover all possible input states, edge cases, or corner-case interactions. A subtle semantic divergence (e.g., incorrect sign extension on a rare operand combination) could go undetected for years until exploited.</p>\n<p><strong>**No authoritative ground truth for comparison.**</strong> The RISC-V ISA specification is written in natural language (English prose), which is inherently ambiguous. Different implementors may interpret the same sentence differently. Without a machine-readable formal specification as the reference, “correctness” remains a matter of human judgment.</p>\n<p><strong>**Sail RISC-V solves the ground truth problem.**</strong> RISC-V International has adopted Sail as the official formal specification language for RISC-V. The [sail-riscv](<a href=\"https://github.com/riscv/sail-riscv\" class=\"inline-onebox\" rel=\"noopener nofollow ugc\">GitHub - riscv/sail-riscv: Sail RISC-V model · GitHub</a>) model is the machine-readable, executable, mathematically precise definition of RISC-V instruction semantics. Crucially, Sail can compile to Coq — enabling theorem proving directly against the official specification.</p>\n<p><strong>**Semantic gaps between CKB-VM and standard RISC-V.**</strong> CKB-VM is not a generic RISC-V implementation. It has a flat 4MB memory model (no MMU/virtual memory), custom ECALL handling (syscall 93 for exit, not M-mode trap), three VERSION modes with different behavioral semantics, cycle-counting per instruction, and MOP fusion instructions that have no counterpart in the standard specification. These gaps must be formally documented and their impact on equivalence precisely characterized.</p>\n<p><strong>**Real-world impact:**</strong> A formally verified CKB-VM would provide the strongest possible correctness guarantee for CKB’s execution layer. It would eliminate an entire class of potential vulnerabilities — instruction-level semantic bugs — that no amount of testing can fully prevent. For a blockchain whose security model depends entirely on deterministic execution, this is a foundational investment.</p>\n<p><strong>### 4. Solution</strong></p>\n<p><strong>#### 4.1 Core Approach</strong></p>\n<p>ckb-vm-sail-verify employs a dual verification strategy: <strong>**Coq formal proofs**</strong> for mathematical certainty and <strong>**differential testing**</strong> for practical runtime validation. The two approaches are complementary — formal proofs cover all possible inputs for proven instructions, while differential testing validates the full instruction set against concrete test vectors.</p>\n<p><strong>#### 4.2 Why Sail + Coq</strong></p>\n<p><strong>**Sail is the official RISC-V specification.**</strong> Unlike informal prose specifications, Sail is executable, unambiguous, and maintained by RISC-V International. Using Sail as ground truth means we verify against the same specification that hardware vendors use.</p>\n<p><strong>**Sail compiles to Coq.**</strong> The Sail compiler’s `–coq` backend generates Coq definitions from Sail source. This gives us machine-checkable RISC-V semantics inside Coq — the same theorem prover used by CompCert (verified C compiler) and seL4 (verified OS kernel).</p>\n<p><strong>**Coq proofs are total.**</strong> Once a Coq theorem is proved, it holds for all possible inputs — not just test vectors. A proven `ADD` instruction is correct for every combination of source registers and machine states, unconditionally.</p>\n<p><strong>**Reproducible and auditable.**</strong> Coq proofs are machine-checked. Anyone can re-run `make coq` and verify every theorem independently. No trust required in the prover — only in Coq’s kernel, which is one of the most scrutinized pieces of software in existence.</p>\n<p><strong>#### 4.3 Four-Layer Verification Architecture</strong></p>\n<p><strong>**Layer 1: Sail RISC-V Formal Specification (Ground Truth)**</strong></p>\n<p>The official sail-riscv model, compiled to Coq using `sail --coq`. Generates ~30,000 lines of Coq defining the complete RV64 instruction semantics in a monadic state transformer style. This is the authoritative reference — what RISC-V <em>*should*</em> do.</p>\n<p><strong>**Layer 2: CKB-VM Coq Model (Verification Target)**</strong></p>\n<p>A hand-written Coq model of CKB-VM’s Rust interpreter logic, capturing register file operations (32 × 64-bit, x0 hardwired to zero), PC advancement, ALU computations, memory access (little-endian, flat 4MB), and branch/jump semantics. Based on `handle_*` functions in ckb-vm’s `execute.rs`. This is what CKB-VM <em>*actually does*</em>.</p>\n<p><strong>**Layer 3: Equivalence Proofs (Mathematical Bridge)**</strong></p>\n<p>For each target instruction, prove: `ckb_vm_execute(I, state) = sail_execute(I, state)`. Each proof decomposes into three sub-theorems: (a) <strong>**Semantics**</strong> — destination register has the correct computed value; (b) <strong>**PC update**</strong> — next PC is PC+4 (or PC+offset for branches/jumps); (c) <strong>**Register isolation**</strong> — all other registers are unchanged. Proof technique: unfold definitions → rewrite with helper lemmas → solve with `lia`/`reflexivity`.</p>\n<p><strong>**Layer 4: Differential Testing (Runtime Validation)**</strong></p>\n<p>Execute identical RISC-V ELF binaries on both CKB-VM (Rust) and the Sail C++ emulator, then compare execution traces step-by-step — PC values, register states, and exit codes. Complements formal proofs by covering the full instruction set (including instructions not yet formally proven) and catching implementation bugs that might not be visible in the Coq model.</p>\n<p><strong>#### 4.4 State Equivalence Definition</strong></p>\n<p>The core challenge is bridging Sail’s monadic Coq output (stateful monad transformer with effects) and CKB-VM’s pure functional Coq model. We define state equivalence as:</p>\n<p>```</p>\n<p>state_equiv(ckb_state, sail_state) :=</p>\n<p>∀ i ∈ [0, 31], get_reg(ckb_state, i) = extract_reg(sail_state, i)</p>\n<p>∧ ckb_state.pc = extract_pc(sail_state)</p>\n<p>∧ ∀ addr ∈ [0, 4MB), load_byte(ckb_state, addr) = extract_mem(sail_state, addr)</p>\n<p>```</p>\n<p>This strips away Sail’s monad and CKB-VM’s cycle accounting, comparing only the observable architectural state that must agree.</p>\n<p><strong>#### 4.5 CKB-VM Specific Considerations</strong></p>\n<p>| Aspect | CKB-VM Behavior | Sail RISC-V Behavior | Handling |</p>\n<p>|--------|-----------------|---------------------|----------|</p>\n<p>| x0 register | Write then clear to zero | Discard writes silently | Functionally equivalent — prove in Coq |</p>\n<p>| ECALL | Dispatch via A7 (syscall 93 = exit) | Trap to M-mode handler | Filter in diff-test; exclude from Coq scope |</p>\n<p>| Memory model | Flat 4MB, no MMU, W^X | Sv39/48/57 full MMU | CKB-VM simplified model; prove within 4MB subset |</p>\n<p>| FENCE | No-op (single-threaded) | Full fence semantics | No observable difference in single-hart model |</p>\n<p>| Atomics (A ext) | Single address reservation | Reservation set | Limited concurrency; prove single-hart subset |</p>\n<p>| VERSION modes | Three modes (0/1/2) | None | Target VERSION2 only |</p>\n<p>| Cycle counting | Per-instruction cost tracking | None | Filter out; not part of architectural state |</p>\n<p>| MOP fusion | WIDE_MUL, FAR_JUMP_REL, ADC | No counterpart | Verify as compositions of standard instructions |</p>\n<p><strong>#### 4.6 Proof Example</strong></p>\n<p>```coq</p>\n<p>(* For ADD instruction: rd = rs1 + rs2 (mod 2^64), PC += 4 *)</p>\n<p>Theorem ckb_add_semantics : forall rd rs1 rs2 st,</p>\n<pre><code>rd &lt;&gt; 0 -&gt;\n\nget_reg (ckb_add rd rs1 rs2 st) rd =\n\n  truncate_64 (get_reg st rs1 + get_reg st rs2).\n</code></pre>\n<p>Proof.</p>\n<p>intros. unfold ckb_add, set_reg, get_reg.</p>\n<p>destruct (Nat.eq_dec rd rd); [| contradiction].</p>\n<p>apply truncate_64_mod. Qed.</p>\n<p>Theorem ckb_add_pc : forall rd rs1 rs2 st,</p>\n<pre><code>(ckb_add rd rs1 rs2 st).(pc) = st.(pc) + 4.\n</code></pre>\n<p>Proof. intros. unfold ckb_add. simpl. lia. Qed.</p>\n<p>Theorem ckb_add_isolation : forall rd rs1 rs2 st r,</p>\n<pre><code>r &lt;&gt; rd -&gt;\n\nget_reg (ckb_add rd rs1 rs2 st) r = get_reg st r.\n</code></pre>\n<p>Proof.</p>\n<p>intros. unfold ckb_add, set_reg, get_reg.</p>\n<p>destruct (Nat.eq_dec r rd); [contradiction | reflexivity]. Qed.</p>\n<p>```</p>\n<p><strong>### 5. Detailed Technical Implementation Plan</strong></p>\n<p><strong>#### 5.1 Phase 0: Environment Setup and Instruction Mapping (Week 1)</strong></p>\n<p>Install the complete toolchain: Sail (&gt;= 0.20.x via OPAM), Coq (9.0.0 + coq-sail-stdpp), CMake, Rust (1.92.0+), GMP, and RISC-V cross-compiler (`gcc-riscv64-unknown-elf`). Clone sail-riscv as a git submodule. Build the Rust workspace (`lib/` + `crates/diff-test/`). Generate Coq from Sail via `sail --coq --dcoq-undef-axioms`. Build the Sail C++ emulator for differential testing.</p>\n<p>Create the complete CKB-VM instruction mapping table: map all 158 CKB-VM opcodes to their Sail RISC-V counterparts, classify by extension (I/M/C/A/B/MOP), and identify instructions with no Sail equivalent (MOP fusion ops).</p>\n<p><strong>#### 5.2 Phase 1: Differential Testing Framework (Week 2)</strong></p>\n<p>Implement the diff-test CLI tool with step-by-step execution trace comparison:</p>\n<p><strong>**CKB-VM side (`lib/src/runner.rs`):**</strong> Wrap CKB-VM in a step-by-step executor that captures `StepState` (PC + 32 registers) after each instruction. Handle exit syscall (syscall 93 via A7 register). Support `–max-steps` to prevent infinite loops.</p>\n<p><strong>**Sail side (`crates/diff-test/src/sail_runner.rs`):**</strong> Parse the Sail C++ emulator’s trace output format. Extract PC and register values per step. Handle ECALL divergence (riscv-tests use `tohost` memory-mapped I/O; CKB-VM uses syscall).</p>\n<p><strong>**Comparison engine (`lib/src/state.rs`):**</strong> `CompareResult` with `first_mismatch` detection — reports the exact step, PC, and register where traces diverge. Support `–verbose` mode for full trace dump and `–json` for machine-readable output.</p>\n<p><strong>**Test corpus:**</strong> Auto-discover ELF test artifacts from riscv-tests (rv64ui, rv64um, rv64uc, rv64ua), riscv-arch-test, and CKB-VM’s own test suite.</p>\n<p><strong>#### 5.3 Phase 2: Coq Infrastructure and Sail Interface (Week 3)</strong></p>\n<p>This is the most challenging phase — bridging Sail’s monadic Coq output with CKB-VM’s pure model.</p>\n<p><strong>**Study generated Coq (~30,000 lines):**</strong> Understand Sail’s state monad encoding, register access patterns, bitvector arithmetic library, and instruction decode/execute structure. Identify the Coq functions corresponding to each RISC-V instruction (e.g., `execute_RISCV_ADD`).</p>\n<p><strong>**Define state equivalence relation:**</strong> Formalize the mapping between CKB-VM’s `machine_state` record and Sail’s monadic state. Handle type mismatches (Sail uses bitvectors; CKB-VM model uses Z integers).</p>\n<p><strong>**Build proof infrastructure:**</strong> Create reusable Coq tactics (`solve_truncate`, `simplify_regs`, `unfold_sail_step`) that automate common proof patterns. Test with a minimal import: `Check execute_RISCV_ADD`.</p>\n<p><strong>**Document the bridging strategy:**</strong> Record Sail’s Coq structure, naming conventions, and the exact approach for extracting observable state from monadic computations.</p>\n<p><strong>#### 5.4 Phase 3: Core ALU Instruction Proofs (Week 4)</strong></p>\n<p>Prove 7 core instructions with full three-theorem coverage (semantics + PC + register isolation):</p>\n<p>| Instruction | Type | Key Proof Challenge |</p>\n<p>|-------------|------|-------------------|</p>\n<p>| ADD | R-type | `truncate_64` idempotence, x0 hardwiring |</p>\n<p>| SUB | R-type | Unsigned subtraction mod 2^64 |</p>\n<p>| ADDI | I-type | Sign extension of 12-bit immediate |</p>\n<p>| SLLI | Shift | Shift amount masking (lower 6 bits for RV64) |</p>\n<p>| SRLI | Shift | Logical vs. arithmetic right shift distinction |</p>\n<p>| SRAI | Shift | Sign bit preservation across arithmetic shift |</p>\n<p>| MUL | M-ext | 64-bit multiply, lower 64 bits of 128-bit product |</p>\n<p>Build reusable lemma library: `truncate_64_idempotent`, `x0_always_zero`, `get_set_reg_same`, `get_set_reg_diff`, `sign_extend_properties`.</p>\n<p><strong>#### 5.5 Phase 4: Control Flow and Memory Proofs (Week 5)</strong></p>\n<p><strong>**Branch instructions:**</strong></p>\n<p>- BEQ: Prove both taken path (PC += offset) and not-taken path (PC += 4). Handle sign-extended branch offset.</p>\n<p>- Additional branches (BNE, BLT, BGE, BLTU, BGEU) follow the same pattern.</p>\n<p><strong>**Jump instructions:**</strong></p>\n<p>- JAL: Prove link address (rd = PC + 4) and jump target (PC = PC + offset). Handle x0 case (no link).</p>\n<p>- JALR: Prove indirect jump with LSB cleared.</p>\n<p><strong>**Memory instructions:**</strong></p>\n<p>- Build memory lemmas: store/load roundtrip (`load(store(addr, val)) = val`), byte-ordering (little-endian), alignment.</p>\n<p>- LW: Prove sign-extended 32-bit load.</p>\n<p>- SW: Prove 32-bit store (may partially Admit if byte-level memory model complexity exceeds time budget).</p>\n<p><strong>#### 5.6 Phase 5: Extended Proofs and MOP Verification (Week 6)</strong></p>\n<p><strong>**Additional ALU proofs:**</strong> AND, OR, XOR, SLTI, SLTIU, LUI, AUIPC.</p>\n<p><strong>**MOP fusion verification:**</strong> CKB-VM’s custom MOP instructions are implemented as multi-instruction fusions. Verify that:</p>\n<p>- `WIDE_MUL(rd1, rd2, rs1, rs2)` = sequential `MULH(rd1, rs1, rs2); MUL(rd2, rs1, rs2)`</p>\n<p>- `FAR_JUMP_REL(rd, offset)` = `AUIPC(rd, upper); JALR(rd, rd, lower)`</p>\n<p><strong>**Proof audit:**</strong> Review all `Admitted` lemmas. Attempt to close as many as possible. Document remaining admits with clear justification.</p>\n<p><strong>**Target:**</strong> 10+ core instructions fully proved (zero Admitted), 5+ additional instructions with partial proofs.</p>\n<p><strong>#### 5.7 Phase 6: Full Differential Test Coverage and Gap Analysis (Week 7)</strong></p>\n<p><strong>**Per-extension testing:**</strong> Run differential tests across rv64ui (integer), rv64um (multiply/divide), rv64uc (compressed), rv64ua (atomic) test suites. Add `–extension` CLI filter for targeted runs.</p>\n<p><strong>**Edge case tests:**</strong> Division by zero behavior, maximum shift amounts, integer overflow/underflow, misaligned memory access, x0 write attempts.</p>\n<p><strong>**Semantic gap documentation:**</strong> Produce a formal gap analysis document covering all 8 identified semantic gaps (x0 handling, ECALL, memory model, FENCE, atomics, VERSION modes, cycle counting, MOP), with precise characterization of each gap’s impact on equivalence claims.</p>\n<p><strong>**Instruction coverage matrix:**</strong> Update the 158-opcode mapping table with proof status (Proved / Partially Proved / Diff-Test Only / Not Covered) and test status (Pass / Fail / Skipped / N/A).</p>\n<p><strong>#### 5.8 Phase 7: Documentation, Deliverables, and Roadmap (Week 8)</strong></p>\n<p><strong>**Clean-room build test:**</strong> Clone from scratch, `make all` succeeds, all Coq proofs compile, differential tests pass.</p>\n<p><strong>**Completion report:**</strong> Number of theorems proved, number of instructions with full coverage, number of differential tests passing, complete gap analysis, remaining `Admitted` proofs with justification.</p>\n<p><strong>**Phase 2+ roadmap:**</strong> Full RV64I coverage (~50 instructions), M/C/B extension proofs, ASM-mode verification (requiring Islaris-level tooling, ~6+ months, $20k+ scope).</p>\n<p><strong>**Code quality:**</strong> `cargo fmt`, `cargo clippy`, remove temporary files, ensure CI-ready build.</p>\n<p><strong>### 6. Expected Deliverables</strong></p>\n<p><strong>#### 6.1 Core Deliverables</strong></p>\n<p>1. <strong>**Coq proof library (coq/)**</strong> — Formally proved equivalence theorems for 10+ core RISC-V instructions, covering RV64I ALU operations, shifts, branches, jumps, and at least one M-extension instruction. Each instruction proved with three sub-theorems (semantics, PC update, register isolation). All proofs machine-checked by Coq — zero trust required.</p>\n<p>2. <strong>**Differential testing framework (crates/diff-test/)**</strong> — CLI tool that executes identical ELF binaries on CKB-VM and Sail C++ emulator, comparing execution traces step-by-step. Supports `–verbose`, `–json`, `–max-steps`, `–test-dir` flags. Reports first divergence point with full state dump.</p>\n<p>3. <strong>**CKB-VM instruction mapping document**</strong> — Complete mapping of all 158 CKB-VM opcodes to Sail RISC-V counterparts, classified by extension, with proof/test status annotations.</p>\n<p>4. <strong>**Formal semantic gap analysis**</strong> — Rigorous documentation of all 8 identified semantic gaps between CKB-VM and standard RISC-V, with precise characterization of impact on equivalence claims and mitigation strategies.</p>\n<p>5. <strong>**Methodology documentation (doc/)**</strong> — Architecture guide and proof methodology document enabling future contributors to extend proofs to additional instructions without re-learning the entire framework.</p>\n<p>6. <strong>**Build-from-source reproducibility**</strong> — Single `make all` command builds everything (Coq proofs + Sail emulator + Rust diff-test tool). Documented toolchain requirements and setup script.</p>\n<p>7. <strong>**Phase 2+ roadmap**</strong> — Detailed plan for extending to full RV64I, M/C/B extensions, and ASM-mode verification, suitable for Community Fund DAO-scale proposal.</p>\n<p><strong>#### 6.2 Concrete Output Examples</strong></p>\n<p><strong>##### Example A — Coq Proof Compilation Output</strong></p>\n<p>```</p>\n<p>$ make coq</p>\n<p>coqc -R . CkbVmVerify -R generated Riscv MachineState.v</p>\n<p>coqc -R . CkbVmVerify -R generated Riscv CkbVmModel.v</p>\n<p>coqc -R . CkbVmVerify -R generated Riscv InstructionEquiv.v</p>\n<p>Proved: ckb_add_semantics</p>\n<p>Proved: ckb_add_pc</p>\n<p>Proved: ckb_add_isolation</p>\n<p>Proved: ckb_sub_semantics</p>\n<p>Proved: ckb_sub_pc</p>\n<p>Proved: ckb_sub_isolation</p>\n<p>Proved: ckb_addi_semantics</p>\n<p>…</p>\n<p>────────────────────────────────────────</p>\n<p>Total: 33 theorems proved, 0 admitted.</p>\n<p>All proofs verified by Coq 9.0.0. <img src=\"https://talk.nervos.org/images/emoji/apple/white_check_mark.png?v=15\" title=\":white_check_mark:\" class=\"emoji\" alt=\":white_check_mark:\" loading=\"lazy\" width=\"20\" height=\"20\"></p>\n<p>```</p>\n<p>All theorems are machine-checked. Any modification that breaks a proof will cause `make coq` to fail — continuous verification by construction.</p>\n<p><strong>##### Example B — Differential Test Output (Pass)</strong></p>\n<p>```</p>\n<p>$ cargo run --release -p ckb-vm-diff-test – \\</p>\n<pre><code>--test-dir tests/rv64ui/ --sail-bin deps/sail-riscv/build/sail_riscv_sim\n</code></pre>\n<p>Running 55 differential tests (rv64ui)…</p>\n<p>rv64ui-p-add     … CKB-VM: 127 steps, Sail: 127 steps → <img src=\"https://talk.nervos.org/images/emoji/apple/white_check_mark.png?v=15\" title=\":white_check_mark:\" class=\"emoji\" alt=\":white_check_mark:\" loading=\"lazy\" width=\"20\" height=\"20\"> MATCH</p>\n<p>rv64ui-p-addi    … CKB-VM:  98 steps, Sail:  98 steps → <img src=\"https://talk.nervos.org/images/emoji/apple/white_check_mark.png?v=15\" title=\":white_check_mark:\" class=\"emoji\" alt=\":white_check_mark:\" loading=\"lazy\" width=\"20\" height=\"20\"> MATCH</p>\n<p>rv64ui-p-and     … CKB-VM: 112 steps, Sail: 112 steps → <img src=\"https://talk.nervos.org/images/emoji/apple/white_check_mark.png?v=15\" title=\":white_check_mark:\" class=\"emoji\" alt=\":white_check_mark:\" loading=\"lazy\" width=\"20\" height=\"20\"> MATCH</p>\n<p>rv64ui-p-andi    … CKB-VM:  95 steps, Sail:  95 steps → <img src=\"https://talk.nervos.org/images/emoji/apple/white_check_mark.png?v=15\" title=\":white_check_mark:\" class=\"emoji\" alt=\":white_check_mark:\" loading=\"lazy\" width=\"20\" height=\"20\"> MATCH</p>\n<p>rv64ui-p-auipc   … CKB-VM:  43 steps, Sail:  43 steps → <img src=\"https://talk.nervos.org/images/emoji/apple/white_check_mark.png?v=15\" title=\":white_check_mark:\" class=\"emoji\" alt=\":white_check_mark:\" loading=\"lazy\" width=\"20\" height=\"20\"> MATCH</p>\n<p>rv64ui-p-beq     … CKB-VM: 156 steps, Sail: 156 steps → <img src=\"https://talk.nervos.org/images/emoji/apple/white_check_mark.png?v=15\" title=\":white_check_mark:\" class=\"emoji\" alt=\":white_check_mark:\" loading=\"lazy\" width=\"20\" height=\"20\"> MATCH</p>\n<p>…</p>\n<p>────────────────────────────────────────</p>\n<p>Results: 55 passed, 0 failed, 0 skipped.</p>\n<p>All execution traces match. <img src=\"https://talk.nervos.org/images/emoji/apple/white_check_mark.png?v=15\" title=\":white_check_mark:\" class=\"emoji\" alt=\":white_check_mark:\" loading=\"lazy\" width=\"20\" height=\"20\"></p>\n<p>```</p>\n<p><strong>##### Example C — Differential Test Output (Divergence Detected)</strong></p>\n<p>```</p>\n<p>$ cargo run --release -p ckb-vm-diff-test – \\</p>\n<pre><code>--elf tests/edge/division_by_zero.elf --verbose\n</code></pre>\n<p>Step 1:  PC=0x80000000  CKB-VM <img src=\"https://talk.nervos.org/images/emoji/apple/white_check_mark.png?v=15\" title=\":white_check_mark:\" class=\"emoji\" alt=\":white_check_mark:\" loading=\"lazy\" width=\"20\" height=\"20\">  Sail <img src=\"https://talk.nervos.org/images/emoji/apple/white_check_mark.png?v=15\" title=\":white_check_mark:\" class=\"emoji\" alt=\":white_check_mark:\" loading=\"lazy\" width=\"20\" height=\"20\"></p>\n<p>Step 2:  PC=0x80000004  CKB-VM <img src=\"https://talk.nervos.org/images/emoji/apple/white_check_mark.png?v=15\" title=\":white_check_mark:\" class=\"emoji\" alt=\":white_check_mark:\" loading=\"lazy\" width=\"20\" height=\"20\">  Sail <img src=\"https://talk.nervos.org/images/emoji/apple/white_check_mark.png?v=15\" title=\":white_check_mark:\" class=\"emoji\" alt=\":white_check_mark:\" loading=\"lazy\" width=\"20\" height=\"20\"></p>\n<p>Step 3:  PC=0x80000008  CKB-VM <img src=\"https://talk.nervos.org/images/emoji/apple/white_check_mark.png?v=15\" title=\":white_check_mark:\" class=\"emoji\" alt=\":white_check_mark:\" loading=\"lazy\" width=\"20\" height=\"20\">  Sail <img src=\"https://talk.nervos.org/images/emoji/apple/white_check_mark.png?v=15\" title=\":white_check_mark:\" class=\"emoji\" alt=\":white_check_mark:\" loading=\"lazy\" width=\"20\" height=\"20\"></p>\n<p>Step 4:  PC=0x8000000c  <img src=\"https://talk.nervos.org/images/emoji/apple/cross_mark.png?v=15\" title=\":cross_mark:\" class=\"emoji\" alt=\":cross_mark:\" loading=\"lazy\" width=\"20\" height=\"20\"> DIVERGENCE DETECTED</p>\n<p>CKB-VM state:</p>\n<pre><code>PC  = 0x80000010\n\nx10 = 0xffffffffffffffff   ← DIV by zero → all-ones\n</code></pre>\n<p>Sail state:</p>\n<pre><code>PC  = 0x80000010\n\nx10 = 0xffffffffffffffff\n</code></pre>\n<p>Register diff: (none — values match at this step)</p>\n<p>First divergence: step 4, register x10</p>\n<p>Note: Both implementations return all-ones for DIV by zero</p>\n<pre><code>    (RISC-V spec mandated behavior). Trace matches.\n</code></pre>\n<p>────────────────────────────────────────</p>\n<p>Result: <img src=\"https://talk.nervos.org/images/emoji/apple/white_check_mark.png?v=15\" title=\":white_check_mark:\" class=\"emoji\" alt=\":white_check_mark:\" loading=\"lazy\" width=\"20\" height=\"20\"> MATCH (divergence was false alarm after full comparison)</p>\n<p>```</p>\n<p><strong>##### Example D — JSON Machine-Readable Output</strong></p>\n<p>```</p>\n<p>$ cargo run --release -p ckb-vm-diff-test – \\</p>\n<pre><code>--elf tests/rv64ui/rv64ui-p-add --json\n</code></pre>\n<p>{</p>\n<p>“elf”: “tests/rv64ui/rv64ui-p-add”,</p>\n<p>“ckb_vm_steps”: 127,</p>\n<p>“sail_steps”: 127,</p>\n<p>“ckb_vm_exit_code”: 0,</p>\n<p>“sail_exit_code”: 0,</p>\n<p>“match”: true,</p>\n<p>“first_mismatch”: null,</p>\n<p>“trace_comparison”: {</p>\n<pre><code>\"total_steps_compared\": 127,\n\n\"registers_compared_per_step\": 32,\n\n\"all_match\": true\n</code></pre>\n<p>}</p>\n<p>}</p>\n<p>```</p>\n<p><strong>#### 6.3 Reproducible Build Environment</strong></p>\n<p>```bash</p>\n<p>git clone --recursive <a href=\"https://github.com/xxxx/ckb-vm-sail-verify.git\" rel=\"noopener nofollow ugc\">https://github.com/xxxx/ckb-vm-sail-verify.git</a></p>\n<p>cd ckb-vm-sail-verify</p>\n<p>make all    <em># Builds everything: Coq proofs + Sail emulator + diff-test tool</em></p>\n<p>make test   <em># Runs differential tests against riscv-tests suite</em></p>\n<p>```</p>\n<p><strong>**Toolchain requirements documented in README:**</strong></p>\n<p>- Rust &gt;= 1.92.0</p>\n<p>- OPAM with Sail &gt;= 0.20.x, Coq 9.0.0, coq-sail-stdpp</p>\n<p>- CMake &gt;= 3.20</p>\n<p>- GMP development library</p>\n<p>- RISC-V cross-compiler (for custom test assembly)</p>\n<p><strong>**`scripts/` directory**</strong> provides automated setup:</p>\n<p>- `generate_coq.sh` — Generate Coq from Sail source</p>\n<p>- `build_sail_emulator.sh` — Build Sail C++ emulator</p>\n<p>- `run_differential.sh` — Run full differential test suite</p>\n<p><strong>#### 6.4 Acceptance Criteria</strong></p>\n<p><strong>##### Formal Verification Criteria</strong></p>\n<p>- V-1: `make coq` compiles all Coq proofs without errors or warnings on a clean environment.</p>\n<p>- V-2: At least 10 RISC-V instructions have complete three-theorem proofs (semantics + PC + isolation) with zero `Admitted`.</p>\n<p>- V-3: At least one M-extension instruction (MUL) is formally proved.</p>\n<p>- V-4: All helper lemmas (`truncate_64_idempotent`, `x0_always_zero`, `get_set_reg_*`) are fully proved.</p>\n<p>- V-5: State equivalence relation is formally defined and used consistently across all proofs.</p>\n<p><strong>##### Differential Testing Criteria</strong></p>\n<p>- T-1: Differential tests pass on the complete rv64ui test suite (55 tests).</p>\n<p>- T-2: Differential tests pass on rv64um test suite (13 tests).</p>\n<p>- T-3: The diff-test tool correctly detects and reports injected semantic divergences (negative testing).</p>\n<p>- T-4: `–json` output is valid JSON parseable by `jq`.</p>\n<p>- T-5: Edge case tests (division by zero, max shift, overflow) produce matching traces.</p>\n<p><strong>##### Documentation Criteria</strong></p>\n<p>- D-1: Complete 158-opcode mapping table with proof/test status.</p>\n<p>- D-2: Semantic gap analysis covers all 8 identified gaps with impact assessment.</p>\n<p>- D-3: Clean-room build test: fresh clone → `make all` succeeds → `make test` passes.</p>\n<p>- D-4: Bilingual (English + Chinese) README with quick-start instructions.</p>\n<p><strong>### 7. Funding Request and Usage</strong></p>\n<p><strong>**Total Amount Requested:**</strong> 1,000 USD</p>\n<p><strong>**Payment Method:**</strong> 100% CKB</p>\n<p>| Category | Amount | Description |</p>\n<p>|----------|--------|-------------|</p>\n<p>| Cloud Server | $350 USD | 1 VPS (Linux, ≥ 4 cores 16GB RAM) for Coq compilation (memory-intensive), Sail emulator builds, and differential test execution. 8 weeks. |</p>\n<p>| Developer Compensation | $450 USD | Core development work. Estimated 20–30 hours per week, 8 weeks. Covers Coq proof development, Rust diff-test framework, and Sail integration. |</p>\n<p>| Documentation &amp; Community | $200 USD | Bilingual documentation, architecture diagrams, 2 monthly sharing sessions, completion report, and Phase 2+ roadmap preparation. |</p>\n<p><strong>### 8. Estimated Completion Timeline</strong></p>\n<p><strong>**Total Duration:**</strong> 8 weeks (~2 months)</p>\n<p><strong>#### Phase 1: Infrastructure and Feasibility (Week 1–3)</strong></p>\n<p><strong>**Week 1:**</strong> Complete toolchain installation (Sail + Coq + CMake + Rust + RISC-V cross-compiler). Clone sail-riscv submodule. Generate Coq from Sail. Build Sail C++ emulator. Build Rust workspace. Create 158-opcode instruction mapping table.</p>\n<p><strong>**Week 2:**</strong> Implement differential testing framework — CKB-VM step executor, Sail trace parser, comparison engine, CLI tool with `–verbose`/`–json`/`–max-steps`. Handle ECALL divergence between riscv-tests and CKB-VM. Run initial diff-tests on rv64ui suite.</p>\n<p><strong>**Week 3:**</strong> Study Sail-generated Coq (~30,000 lines). Define state equivalence relation. Build proof infrastructure and reusable tactics. Create minimal test import (`Check execute_RISCV_ADD`). Document Sail Coq bridging strategy. <strong>**Most challenging week**</strong> — debugging imports, name conflicts, library dependencies.</p>\n<p><strong>**Milestone 1 (End of Week 3):**</strong> Differential testing framework operational with rv64ui tests passing. Coq infrastructure compiles against Sail-generated definitions. Instruction mapping table complete.</p>\n<p><strong>#### Phase 2: Formal Proofs (Week 4–6)</strong></p>\n<p><strong>**Week 4:**</strong> Prove 7 core ALU instructions (ADD, SUB, ADDI, SLLI, SRLI, SRAI, MUL) with full three-theorem coverage. Build reusable lemma library.</p>\n<p><strong>**Week 5:**</strong> Prove control flow instructions (BEQ taken/not-taken, JAL link+jump). Build memory lemmas. Prove LW, SW. Submit mid-term progress report.</p>\n<p><strong>**Week 6:**</strong> Prove additional ALU instructions (AND, OR, XOR, SLTI, JALR). Verify MOP fusion equivalence (WIDE_MUL = MULH + MUL). Audit all `Admitted` proofs.</p>\n<p><strong>**Milestone 2 (End of Week 6):**</strong> 10+ instructions with complete proofs. MOP fusion verified. Mid-term report submitted.</p>\n<p><strong>#### Phase 3: Integration and Delivery (Week 7–8)</strong></p>\n<p><strong>**Week 7:**</strong> Full differential test coverage across rv64ui/rv64um/rv64uc/rv64ua. Edge case testing. Produce formal semantic gap analysis document. Update instruction coverage matrix with proof/test status.</p>\n<p><strong>**Week 8:**</strong> Clean-room build test. Bilingual documentation. Code cleanup (fmt, clippy). Completion report. Phase 2+ roadmap. Community sharing session. Final submission.</p>\n<p><strong>**Milestone 3 (End of Week 8):**</strong> All deliverables submitted — Coq proof library, diff-test framework, instruction mapping, gap analysis, methodology docs, and roadmap.</p>\n<p><strong>#### Timeline Overview</strong></p>\n<p>| Phase | Weeks | Focus | Milestone |</p>\n<p>|-------|-------|-------|-----------|</p>\n<p>| Phase 1: Infrastructure | Week 1–3 | Toolchain, diff-test framework, Coq infrastructure, instruction mapping | Milestone 1 |</p>\n<p>| Phase 2: Formal Proofs | Week 4–6 | Core ALU proofs, control flow, memory, MOP, proof audit | Milestone 2 (Week 6) |</p>\n<p>| Phase 3: Integration &amp; Delivery | Week 7–8 | Full test coverage, gap analysis, docs, clean-room build, submission | Milestone 3 |</p>\n<p><strong>### 9. Relevance to CKB Ecosystem</strong></p>\n<p><strong>**Addresses a long-standing open issue.**</strong> [ckb-vm#190](<a href=\"https://github.com/nervosnetwork/ckb-vm/issues/190\" class=\"inline-onebox\" rel=\"noopener nofollow ugc\">Formally Verify CKB-VM via Sail · Issue #190 · nervosnetwork/ckb-vm · GitHub</a>) has been open since 2021, requesting formal verification of CKB-VM’s RISC-V implementation. This project directly delivers a proof-of-concept solution.</p>\n<p><strong>**Strengthens CKB’s security foundation.**</strong> CKB-VM executes every transaction on the Nervos network. A formally verified instruction set eliminates an entire class of potential vulnerabilities — instruction-level semantic bugs that testing cannot fully prevent. This is especially critical for a blockchain where deterministic execution is a security invariant.</p>\n<p><strong>**Establishes a reusable verification framework.**</strong> The Coq proof infrastructure, differential testing tool, and methodology documentation are designed for extension. Future contributors can add proofs for additional instructions following the established patterns, without re-learning the framework.</p>\n<p><strong>**Demonstrates CKB’s technical depth.**</strong> Formal verification of a blockchain VM is rare in the industry. CompCert verified a C compiler; seL4 verified an OS kernel; this project brings the same rigor to CKB’s execution layer. It positions CKB alongside the most technically ambitious blockchain projects.</p>\n<p><strong>**Enables future full-ISA verification.**</strong> This PoC covers 10+ instructions and establishes the methodology. The Phase 2+ roadmap (full RV64I, extensions, ASM-mode) provides a clear path to comprehensive verification suitable for a Community Fund DAO proposal.</p>\n<p><strong>**Pure Rust + Coq toolchain.**</strong> The differential testing framework is pure Rust, consistent with CKB’s technology stack. Rust developers in the CKB community can contribute without learning new languages (Coq proofs are the specialist component, but the testing framework is accessible to all).</p>\n<p><strong>### 10. Technical Risks and Mitigations</strong></p>\n<p>| Risk | Impact | Probability | Mitigation |</p>\n<p>|------|--------|-------------|------------|</p>\n<p>| Sail Coq output structure changes between versions | High | Low | Pin sail-riscv to a specific commit. Document exact Sail compiler version. Provide `generate_coq.sh` for reproducible regeneration. |</p>\n<p>| Sail monadic Coq too complex to bridge with CKB-VM pure model | High | Medium | Week 3 is dedicated to this. Fallback: extract observable state post-execution rather than proving step-internal equivalence. Worst case: prove against a simplified Sail extract rather than raw monadic output. |</p>\n<p>| Coq compilation time exceeds development iteration speed | Medium | Medium | Generated Coq is ~30K lines. Use `_CoqProject` with selective compilation. Develop proofs incrementally. Cloud server with 16GB RAM for parallel coqc. |</p>\n<p>| CKB-VM Rust semantics diverge from Coq model | Medium | Medium | Differential testing catches runtime divergences. Cross-reference Coq model against actual Rust source. Document any abstraction gaps. |</p>\n<p>| riscv-tests ECALL convention incompatible with CKB-VM | Medium | Low | Already identified and designed for: CKB-VM uses syscall 93, riscv-tests use `tohost`. Sail runner filters ECALL divergence. |</p>\n<p>| Insufficient time for 10+ instruction proofs | Medium | Medium | Prioritize core ALU instructions (ADD, SUB, ADDI) which have the simplest proof structure. Build reusable tactics early. Accept partial proofs (documented Admits) for complex instructions (SW, JALR). |</p>\n<p>| Sail C++ emulator build failures on target platform | Low | Low | CMake build is well-documented in sail-riscv. Provide `build_sail_emulator.sh` with error handling. GMP is the only non-trivial dependency. |</p>\n<p>| Generated Coq requires unavailable Coq libraries | Low | Medium | Use `coq-sail-stdpp` as documented by sail-riscv project. Pin exact Coq version (9.0.0) in build instructions. |</p>\n<p><strong>### 11. Transparency Commitments</strong></p>\n<p><strong>**Fully open-source from Day 1.**</strong> All code on GitHub under MIT license, public repository from the start of development.</p>\n<p><strong>**Weekly progress updates.**</strong> Posted on Nervos Talk forum every week, covering completed tasks, blockers, and next-week plan.</p>\n<p><strong>**Monthly sharing sessions.**</strong> Two sessions total (Week 4 and Week 8), the final session includes a live demo of Coq proof compilation and differential test execution, plus Q&amp;A.</p>\n<p><strong>**Machine-verifiable results.**</strong> All Coq proofs are machine-checked — anyone can clone the repo, run `make coq`, and independently verify every theorem. No trust in the author required.</p>\n<p><strong>**Honest reporting of limitations.**</strong> The completion report will explicitly document: number of `Admitted` proofs and why, semantic gaps that prevent full equivalence claims, and instructions not yet covered.</p>\n<p><strong>**Reproducible from scratch.**</strong> Complete build instructions, pinned dependencies, and setup scripts ensure any reviewer can reproduce all results independently.</p>\n<p>-–</p>\n<p>-–</p>\n<p><strong>## 中文版本</strong></p>\n<p><strong>### 一、项目名称与简介</strong></p>\n<p><strong>**项目名称：**</strong> ckb-vm-sail-verify</p>\n<p><strong>**一句话简介：**</strong> 基于 Sail RISC-V 官方规范与 Coq 定理证明器，形式化验证 CKB-VM 的 RISC-V 指令执行语义与标准规范的数学等价性，辅以差分测试进行双重验证。</p>\n<p><strong>### 二、团队/个人介绍</strong></p>\n<p><strong>**申请人：**</strong> Tinyueng（[GitHub](<a href=\"https://github.com/TinyuengKwan\" class=\"inline-onebox\" rel=\"noopener nofollow ugc\">TinyuengKwan (Tinyueng) · GitHub</a>)）</p>\n<p><strong>**核心能力：**</strong> 目前在 PLCT Lab（中科院软件所）****Sail &amp; ACT（RISC-V Architectural Certification Tests）小队****实习，对 Sail 体系结构定义语言和 RISC-V 一致性测试有直接、深入的实践经验。参与 [sail-riscv](<a href=\"https://github.com/riscv/sail-riscv\" class=\"inline-onebox\" rel=\"noopener nofollow ugc\">GitHub - riscv/sail-riscv: Sail RISC-V model · GitHub</a>) 生态贡献，并独立开发了 [sail-lsp](<a href=\"https://github.com/TinyuengKwan/sail-lsp\" class=\"inline-onebox\" rel=\"noopener nofollow ugc\">GitHub - TinyuengKwan/sail-lsp · GitHub</a>)——Sail 语言的 Language Server Protocol 实现。熟悉 Sail 编译器工具链（包括用于定理证明器集成的 `–coq` 后端）、sail-riscv 形式模型结构以及 RISC-V 架构认证测试（ACT）框架。具备 Rust 系统编程能力（所有权、生命周期、trait 系统）。系统学习过 CS:APP，对处理器架构、虚拟内存、ELF 链接、系统级 I/O 有完整知识框架。此外还学习过编译原理、x86 汇编等内容。</p>\n<p><strong>**相关领域知识：**</strong> 对 Sail 形式规范语言有深度了解——不仅是使用者，更是工具链贡献者（sail-lsp）。具有 RISC-V 架构认证测试（ACT）的实践经验，理解如何基于 Sail 参考模型验证一致性。熟悉 RISC-V ISA 体系结构（RV64IMAC、特权级、扩展机制）。具有 Coq 交互式定理证明器和 Sail Coq 后端输出的使用经验。通过源码分析深入理解 CKB-VM 内部机制（nervosnetwork/ckb-vm），涵盖其版本化指令语义、MOP（宏操作融合）扩展和平坦内存模型。</p>\n<p><strong>### 三、问题描述</strong></p>\n<p>CKB-VM 是驱动 Nervos CKB 网络所有链上计算的 RISC-V 虚拟机。每一笔 CKB 交易——每一个 lock script、type script 和智能合约——都在 CKB-VM 中执行。其正确性是整条链的安全基石。然而目前，没有任何形式化证明表明 CKB-VM 忠实地实现了 RISC-V 规范。</p>\n<p><strong>**指令语义缺乏形式化验证。**</strong> CKB-VM 实现了横跨 RV64I、M、C（Zca）、B（Zba/Zbb/Zbc/Zbs）、A 和自定义 MOP 扩展的 158 条操作码。每条指令都是手工编写的 Rust 代码。尽管实现质量高且经过实战考验，但不存在任何数学证明表明其中任何一条指令的行为与 RISC-V 规范完全一致。这是 [nervosnetwork/ckb-vm#190](<a href=\"https://github.com/nervosnetwork/ckb-vm/issues/190\" class=\"inline-onebox\" rel=\"noopener nofollow ugc\">Formally Verify CKB-VM via Sail · Issue #190 · nervosnetwork/ckb-vm · GitHub</a>)，自 2021 年开放至今。</p>\n<p><strong>**测试无法保证正确性。**</strong> CKB-VM 通过了标准 riscv-tests 测试套件，但测试套件是有限的——无法覆盖所有可能的输入状态、边缘情况或极端条件下的交互。一个微妙的语义偏差（例如，在某个罕见操作数组合上的符号扩展错误）可能多年未被发现，直到被利用。</p>\n<p><strong>**缺乏权威的比对基准。**</strong> RISC-V ISA 规范以自然语言（英文散文）编写，天然存在歧义。不同实现者可能对同一句话有不同理解。没有机器可读的形式规范作为参考，\"正确性\"仍然是人为判断。</p>\n<p><strong>**Sail RISC-V 解决了基准问题。**</strong> RISC-V International 已采纳 Sail 作为 RISC-V 的官方形式规范语言。[sail-riscv](<a href=\"https://github.com/riscv/sail-riscv\" class=\"inline-onebox\" rel=\"noopener nofollow ugc\">GitHub - riscv/sail-riscv: Sail RISC-V model · GitHub</a>) 模型是 RISC-V 指令语义的机器可读、可执行、数学精确的定义。关键在于，Sail 可以编译为 Coq——使得直接针对官方规范进行定理证明成为可能。</p>\n<p><strong>**CKB-VM 与标准 RISC-V 之间存在语义差距。**</strong> CKB-VM 不是通用的 RISC-V 实现。它拥有平坦的 4MB 内存模型（无 MMU/虚拟内存）、自定义 ECALL 处理（syscall 93 退出，非 M 模式陷阱）、三种 VERSION 模式（各有不同行为语义）、逐指令周期计数以及标准规范中没有对应物的 MOP 融合指令。这些差距必须被形式化地记录，并精确表征其对等价性的影响。</p>\n<p><strong>**实际影响：**</strong> 经过形式化验证的 CKB-VM 将为 CKB 的执行层提供最强的正确性保证。它将消除一整类潜在漏洞——指令级语义缺陷——这是任何数量的测试都无法完全防止的。对于一条安全模型完全依赖确定性执行的区块链而言，这是一项基础性投资。</p>\n<p><strong>### 四、解决方案</strong></p>\n<p><strong>#### 4.1 核心思路</strong></p>\n<p>ckb-vm-sail-verify 采用双重验证策略：****Coq 形式化证明**<strong>提供数学确定性，</strong>**差分测试****提供实际运行时验证。两种方法互补——形式化证明覆盖已证明指令的所有可能输入，差分测试则用具体测试向量验证完整指令集。</p>\n<p><strong>#### 4.2 为什么选择 Sail + Coq</strong></p>\n<p><strong>**Sail 是官方 RISC-V 规范。**</strong> 与非正式的散文规范不同，Sail 是可执行的、无歧义的，由 RISC-V International 维护。以 Sail 作为基准意味着我们针对硬件厂商使用的同一规范进行验证。</p>\n<p><strong>**Sail 可编译为 Coq。**</strong> Sail 编译器的 `–coq` 后端从 Sail 源码生成 Coq 定义。这使我们在 Coq——CompCert（经验证的 C 编译器）和 seL4（经验证的操作系统内核）使用的同一定理证明器——中获得了机器可检查的 RISC-V 语义。</p>\n<p><strong>**Coq 证明是全称的。**</strong> 一旦 Coq 定理被证明，它对所有可能的输入成立——而不仅仅是测试向量。一条被证明的 `ADD` 指令对每一种源寄存器和机器状态的组合都是正确的，无条件地。</p>\n<p><strong>**可复现且可审计。**</strong> Coq 证明由机器检查。任何人都可以重新运行 `make coq` 并独立验证每一个定理。不需要信任证明者——只需信任 Coq 的内核，这是现存最受审视的软件之一。</p>\n<p><strong>#### 4.3 四层验证架构</strong></p>\n<p><strong>**Layer 1：Sail RISC-V 形式规范（基准真相）**</strong></p>\n<p>官方 sail-riscv 模型，通过 `sail --coq` 编译为 Coq。生成约 30,000 行 Coq 代码，以单子状态转换器风格定义完整的 RV64 指令语义。这是权威参考——RISC-V **应该**做什么。</p>\n<p><strong>**Layer 2：CKB-VM Coq 模型（验证目标）**</strong></p>\n<p>手工编写的 CKB-VM Rust 解释器逻辑的 Coq 模型，捕获寄存器文件操作（32 × 64 位，x0 硬连线为零）、PC 推进、ALU 计算、内存访问（小端序、平坦 4MB）和分支/跳转语义。基于 ckb-vm `execute.rs` 中的 `handle_*` 函数。这是 CKB-VM **实际**做什么。</p>\n<p><strong>**Layer 3：等价性证明（数学桥梁）**</strong></p>\n<p>对每条目标指令，证明：`ckb_vm_execute(I, state) = sail_execute(I, state)`。每个证明分解为三个子定理：(a) <strong>**语义**</strong>——目标寄存器具有正确的计算值；(b) <strong>**PC 更新**</strong>——下一个 PC 是 PC+4（或分支/跳转的 PC+偏移量）；(c) <strong>**寄存器隔离**</strong>——所有其他寄存器不变。证明技术：展开定义 → 用辅助引理改写 → 用 `lia`/`reflexivity` 求解。</p>\n<p><strong>**Layer 4：差分测试（运行时验证）**</strong></p>\n<p>在 CKB-VM（Rust）和 Sail C++ 模拟器上执行相同的 RISC-V ELF 二进制文件，然后逐步比较执行轨迹——PC 值、寄存器状态和退出码。通过覆盖完整指令集（包括尚未被形式化证明的指令）和捕获 Coq 模型中可能不可见的实现缺陷来补充形式化证明。</p>\n<p><strong>#### 4.4 CKB-VM 特定考量</strong></p>\n<p>| 方面 | CKB-VM 行为 | Sail RISC-V 行为 | 处理方式 |</p>\n<p>|------|------------|-----------------|---------|</p>\n<p>| x0 寄存器 | 写入后清零 | 静默丢弃写入 | 功能等价——在 Coq 中证明 |</p>\n<p>| ECALL | 通过 A7 分发（syscall 93 = 退出） | 陷入 M 模式处理器 | 差分测试中过滤；排除在 Coq 范围外 |</p>\n<p>| 内存模型 | 平坦 4MB，无 MMU，W^X | Sv39/48/57 完整 MMU | CKB-VM 简化模型；在 4MB 子集内证明 |</p>\n<p>| FENCE | 空操作（单线程） | 完整栅栏语义 | 单核模型中无可观测差异 |</p>\n<p>| 原子操作（A 扩展） | 单地址保留 | 保留集 | 有限并发；证明单核子集 |</p>\n<p>| VERSION 模式 | 三种模式（0/1/2） | 无 | 仅针对 VERSION2 |</p>\n<p>| 周期计数 | 逐指令成本追踪 | 无 | 过滤；不属于体系结构状态 |</p>\n<p>| MOP 融合 | WIDE_MUL、FAR_JUMP_REL、ADC | 无对应物 | 验证为标准指令的组合 |</p>\n<p><strong>### 五、详细技术实现计划</strong></p>\n<p><strong>#### 5.1 Phase 0：环境搭建与指令映射（Week 1）</strong></p>\n<p>安装完整工具链：Sail（&gt;= 0.20.x，通过 OPAM）、Coq（9.0.0 + coq-sail-stdpp）、CMake、Rust（1.92.0+）、GMP、RISC-V 交叉编译器（`gcc-riscv64-unknown-elf`）。克隆 sail-riscv 作为 git 子模块。构建 Rust 工作空间（`lib/` + `crates/diff-test/`）。通过 `sail --coq --dcoq-undef-axioms` 从 Sail 生成 Coq。构建 Sail C++ 模拟器用于差分测试。</p>\n<p>创建完整的 CKB-VM 指令映射表：将所有 158 条 CKB-VM 操作码映射到其 Sail RISC-V 对应项，按扩展分类（I/M/C/A/B/MOP），识别无 Sail 对应项的指令（MOP 融合操作）。</p>\n<p><strong>#### 5.2 Phase 1：差分测试框架（Week 2）</strong></p>\n<p>实现 diff-test CLI 工具，支持逐步执行轨迹比较：</p>\n<p><strong>**CKB-VM 端（`lib/src/runner.rs`）：**</strong> 将 CKB-VM 包装为逐步执行器，在每条指令后捕获 `StepState`（PC + 32 个寄存器）。处理退出系统调用（通过 A7 寄存器的 syscall 93）。支持 `–max-steps` 防止无限循环。</p>\n<p><strong>**Sail 端（`crates/diff-test/src/sail_runner.rs`）：**</strong> 解析 Sail C++ 模拟器的轨迹输出格式。逐步提取 PC 和寄存器值。处理 ECALL 差异（riscv-tests 使用 `tohost` 内存映射 I/O；CKB-VM 使用系统调用）。</p>\n<p><strong>**比较引擎（`lib/src/state.rs`）：**</strong> 带 `first_mismatch` 检测的 `CompareResult`——报告轨迹分歧的精确步骤、PC 和寄存器。支持 `–verbose` 模式（完整轨迹转储）和 `–json`（机器可读输出）。</p>\n<p><strong>**测试语料库：**</strong> 自动发现来自 riscv-tests（rv64ui、rv64um、rv64uc、rv64ua）、riscv-arch-test 和 CKB-VM 自有测试套件的 ELF 测试工件。</p>\n<p><strong>#### 5.3 Phase 2：Coq 基础设施与 Sail 接口（Week 3）</strong></p>\n<p>这是最具挑战性的阶段——桥接 Sail 的单子 Coq 输出与 CKB-VM 的纯模型。</p>\n<p><strong>**研究生成的 Coq（约 30,000 行）：**</strong> 理解 Sail 的状态单子编码、寄存器访问模式、位向量算术库和指令解码/执行结构。识别每条 RISC-V 指令对应的 Coq 函数（如 `execute_RISCV_ADD`）。</p>\n<p><strong>**定义状态等价关系：**</strong> 形式化 CKB-VM 的 `machine_state` 记录与 Sail 单子状态之间的映射。处理类型不匹配（Sail 使用位向量；CKB-VM 模型使用 Z 整数）。</p>\n<p><strong>**构建证明基础设施：**</strong> 创建可重用的 Coq 策略（`solve_truncate`、`simplify_regs`、`unfold_sail_step`），自动化常见证明模式。用最小导入测试：`Check execute_RISCV_ADD`。</p>\n<p><strong>#### 5.4 Phase 3：核心 ALU 指令证明（Week 4）</strong></p>\n<p>证明 7 条核心指令，每条提供完整的三定理覆盖（语义 + PC + 寄存器隔离）：</p>\n<p>| 指令 | 类型 | 关键证明挑战 |</p>\n<p>|------|------|------------|</p>\n<p>| ADD | R 型 | `truncate_64` 幂等性，x0 硬连线 |</p>\n<p>| SUB | R 型 | 无符号减法 mod 2^64 |</p>\n<p>| ADDI | I 型 | 12 位立即数的符号扩展 |</p>\n<p>| SLLI | 移位 | 移位量掩码（RV64 取低 6 位） |</p>\n<p>| SRLI | 移位 | 逻辑右移与算术右移的区分 |</p>\n<p>| SRAI | 移位 | 算术移位中符号位的保持 |</p>\n<p>| MUL | M 扩展 | 64 位乘法，128 位乘积的低 64 位 |</p>\n<p>构建可重用引理库：`truncate_64_idempotent`、`x0_always_zero`、`get_set_reg_same`、`get_set_reg_diff`、`sign_extend_properties`。</p>\n<p><strong>#### 5.5 Phase 4：控制流与内存证明（Week 5）</strong></p>\n<p><strong>**分支指令：**</strong> BEQ 的跳转路径（PC += offset）和不跳转路径（PC += 4）。处理符号扩展的分支偏移量。</p>\n<p><strong>**跳转指令：**</strong> JAL 的链接地址（rd = PC + 4）和跳转目标（PC = PC + offset）。JALR 的间接跳转（LSB 清零）。</p>\n<p><strong>**内存指令：**</strong> 构建内存引理——存取往返（`load(store(addr, val)) = val`）、字节序（小端）、对齐。证明 LW（符号扩展的 32 位加载）和 SW（32 位存储）。</p>\n<p><strong>#### 5.6 Phase 5：扩展证明与 MOP 验证（Week 6）</strong></p>\n<p><strong>**额外 ALU 证明：**</strong> AND、OR、XOR、SLTI、SLTIU、LUI、AUIPC。</p>\n<p><strong>**MOP 融合验证：**</strong> 验证 `WIDE_MUL(rd1, rd2, rs1, rs2)` = 顺序执行 `MULH(rd1, rs1, rs2); MUL(rd2, rs1, rs2)`。验证 `FAR_JUMP_REL(rd, offset)` = `AUIPC(rd, upper); JALR(rd, rd, lower)`。</p>\n<p><strong>**证明审计：**</strong> 审查所有 `Admitted` 引理。尽可能关闭。记录剩余 admits 及明确理由。</p>\n<p><strong>**目标：**</strong> 10+ 条核心指令完全证明（零 Admitted），5+ 条额外指令具有部分证明。</p>\n<p><strong>#### 5.7 Phase 6：完整差分测试覆盖与语义差距分析（Week 7）</strong></p>\n<p><strong>**分扩展测试：**</strong> 在 rv64ui（整数）、rv64um（乘除法）、rv64uc（压缩指令）、rv64ua（原子操作）测试套件上运行差分测试。添加 `–extension` CLI 过滤器。</p>\n<p><strong>**边缘情况测试：**</strong> 除以零行为、最大移位量、整数溢出/下溢、非对齐内存访问、x0 写入尝试。</p>\n<p><strong>**语义差距文档化：**</strong> 生成正式的差距分析文档，覆盖所有 8 个已识别的语义差距，精确表征每个差距对等价性声明的影响。</p>\n<p><strong>**指令覆盖矩阵：**</strong> 更新 158 条操作码映射表，标注证明状态（已证明 / 部分证明 / 仅差分测试 / 未覆盖）和测试状态（通过 / 失败 / 跳过 / 不适用）。</p>\n<p><strong>#### 5.8 Phase 7：文档、交付物与路线图（Week 8）</strong></p>\n<p><strong>**洁净室构建测试：**</strong> 从头克隆，`make all` 成功，所有 Coq 证明编译通过，差分测试通过。</p>\n<p><strong>**结项报告：**</strong> 已证明定理数量、完全覆盖的指令数量、通过的差分测试数量、完整差距分析、剩余 `Admitted` 证明及理由。</p>\n<p><strong>**Phase 2+ 路线图：**</strong> 完整 RV64I 覆盖（约 50 条指令）、M/C/B 扩展证明、ASM 模式验证（需要 Islaris 级工具，约 6+ 个月，$20k+ 规模）。</p>\n<p><strong>**代码质量：**</strong> `cargo fmt`、`cargo clippy`、移除临时文件、确保 CI 就绪。</p>\n<p><strong>### 六、预期交付成果</strong></p>\n<p><strong>#### 6.1 核心交付物</strong></p>\n<p>1. <strong>**Coq 证明库（coq/）**</strong> —— 10+ 条核心 RISC-V 指令的形式化等价性定理，覆盖 RV64I ALU 操作、移位、分支、跳转及至少一条 M 扩展指令。每条指令包含三个子定理（语义、PC 更新、寄存器隔离）。所有证明由 Coq 机器检查——零信任需求。</p>\n<p>2. <strong>**差分测试框架（crates/diff-test/）**</strong> —— 在 CKB-VM 和 Sail C++ 模拟器上执行相同 ELF 二进制文件并逐步比较执行轨迹的 CLI 工具。支持 `–verbose`、`–json`、`–max-steps`、`–test-dir` 参数。报告首个分歧点及完整状态转储。</p>\n<p>3. <strong>**CKB-VM 指令映射文档**</strong> —— 所有 158 条 CKB-VM 操作码到 Sail RISC-V 对应项的完整映射，按扩展分类，标注证明/测试状态。</p>\n<p>4. <strong>**形式化语义差距分析**</strong> —— 严格记录 CKB-VM 与标准 RISC-V 之间所有 8 个已识别语义差距，精确表征对等价性声明的影响及缓解策略。</p>\n<p>5. <strong>**方法论文档（doc/）**</strong> —— 架构指南和证明方法论文档，使未来贡献者能够按照既定模式为额外指令添加证明，无需重新学习整个框架。</p>\n<p>6. <strong>**源码可复现构建**</strong> —— 单条 `make all` 命令构建全部内容（Coq 证明 + Sail 模拟器 + Rust 差分测试工具）。文档化的工具链要求和安装脚本。</p>\n<p>7. <strong>**Phase 2+ 路线图**</strong> —— 扩展至完整 RV64I、M/C/B 扩展和 ASM 模式验证的详细计划，适用于 Community Fund DAO 规模的提案。</p>\n<p><strong>#### 6.2 验收标准</strong></p>\n<p><strong>##### 形式化验证标准</strong></p>\n<p>- V-1：`make coq` 在洁净环境中编译所有 Coq 证明，无错误或警告。</p>\n<p>- V-2：至少 10 条 RISC-V 指令具有完整的三定理证明（语义 + PC + 隔离），零 `Admitted`。</p>\n<p>- V-3：至少一条 M 扩展指令（MUL）被形式化证明。</p>\n<p>- V-4：所有辅助引理（`truncate_64_idempotent`、`x0_always_zero`、`get_set_reg_*`）完全证明。</p>\n<p>- V-5：状态等价关系被形式化定义并在所有证明中一致使用。</p>\n<p><strong>##### 差分测试标准</strong></p>\n<p>- T-1：差分测试在完整 rv64ui 测试套件（55 个测试）上通过。</p>\n<p>- T-2：差分测试在 rv64um 测试套件（13 个测试）上通过。</p>\n<p>- T-3：diff-test 工具正确检测并报告注入的语义偏差（负面测试）。</p>\n<p>- T-4：`–json` 输出为 `jq` 可解析的有效 JSON。</p>\n<p>- T-5：边缘情况测试（除以零、最大移位、溢出）产生匹配的轨迹。</p>\n<p><strong>##### 文档标准</strong></p>\n<p>- D-1：完整的 158 条操作码映射表，标注证明/测试状态。</p>\n<p>- D-2：语义差距分析覆盖所有 8 个已识别差距及影响评估。</p>\n<p>- D-3：洁净室构建测试：全新克隆 → `make all` 成功 → `make test` 通过。</p>\n<p>- D-4：中英双语 README，含快速入门说明。</p>\n<p><strong>### 七、所需资金及用途说明</strong></p>\n<p><strong>**申请总额：**</strong> 1,000 USD</p>\n<p><strong>**支付方式：**</strong> 100% CKB</p>\n<p>| 类别 | 金额 | 说明 |</p>\n<p>|------|------|------|</p>\n<p>| 云服务器 | $350 USD | 1 台 VPS（Linux，≥ 4 核 16GB 内存），用于 Coq 编译（内存密集型）、Sail 模拟器构建和差分测试执行。8 周使用。 |</p>\n<p>| 开发者补贴 | $450 USD | 核心开发工作。预计每周 20–30 小时，共 8 周。涵盖 Coq 证明开发、Rust 差分测试框架和 Sail 集成。 |</p>\n<p>| 文档与社区 | $200 USD | 中英双语文档编写、架构图制作、2 次月度分享会材料、结项报告和 Phase 2+ 路线图准备。 |</p>\n<p><strong>### 八、预计完成时间</strong></p>\n<p><strong>**总周期：**</strong> 8 周（约 2 个月）</p>\n<p><strong>#### 第一阶段：基础设施与可行性验证（Week 1–3）</strong></p>\n<p><strong>**Week 1：**</strong> 完成工具链安装（Sail + Coq + CMake + Rust + RISC-V 交叉编译器）。克隆 sail-riscv 子模块。从 Sail 生成 Coq。构建 Sail C++ 模拟器。构建 Rust 工作空间。创建 158 条操作码指令映射表。</p>\n<p><strong>**Week 2：**</strong> 实现差分测试框架——CKB-VM 逐步执行器、Sail 轨迹解析器、比较引擎、支持 `–verbose`/`–json`/`–max-steps` 的 CLI 工具。处理 riscv-tests 与 CKB-VM 之间的 ECALL 差异。在 rv64ui 套件上运行初始差分测试。</p>\n<p><strong>**Week 3：**</strong> 研究 Sail 生成的 Coq（约 30,000 行）。定义状态等价关系。构建证明基础设施和可重用策略。创建最小测试导入（`Check execute_RISCV_ADD`）。记录 Sail Coq 桥接策略。<strong>**最具挑战性的一周**</strong>——调试导入、名称冲突、库依赖。</p>\n<p><strong>**里程碑 1（Week 3 末）：**</strong> 差分测试框架可运行，rv64ui 测试通过。Coq 基础设施可针对 Sail 生成的定义进行编译。指令映射表完成。</p>\n<p><strong>#### 第二阶段：形式化证明（Week 4–6）</strong></p>\n<p><strong>**Week 4：**</strong> 证明 7 条核心 ALU 指令（ADD、SUB、ADDI、SLLI、SRLI、SRAI、MUL），每条提供完整的三定理覆盖。构建可重用引理库。</p>\n<p><strong>**Week 5：**</strong> 证明控制流指令（BEQ 跳转/不跳转、JAL 链接+跳转）。构建内存引理。证明 LW、SW。提交中期进度报告。</p>\n<p><strong>**Week 6：**</strong> 证明额外 ALU 指令（AND、OR、XOR、SLTI、JALR）。验证 MOP 融合等价性（WIDE_MUL = MULH + MUL）。审计所有 `Admitted` 证明。</p>\n<p><strong>**里程碑 2（Week 6 末）：**</strong> 10+ 条指令具有完整证明。MOP 融合已验证。中期报告已提交。</p>\n<p><strong>#### 第三阶段：集成与交付（Week 7–8）</strong></p>\n<p><strong>**Week 7：**</strong> 在 rv64ui/rv64um/rv64uc/rv64ua 上进行完整差分测试覆盖。边缘情况测试。生成正式语义差距分析文档。更新指令覆盖矩阵。</p>\n<p><strong>**Week 8：**</strong> 洁净室构建测试。中英双语文档。代码清理（fmt、clippy）。结项报告。Phase 2+ 路线图。社区分享会。最终提交。</p>\n<p><strong>**里程碑 3（Week 8 末）：**</strong> 所有交付物提交——Coq 证明库、差分测试框架、指令映射、差距分析、方法论文档和路线图。</p>\n<p><strong>#### 时间线总览</strong></p>\n<p>| 阶段 | 周次 | 重点 | 里程碑 |</p>\n<p>|------|------|------|--------|</p>\n<p>| 第一阶段：基础设施 | Week 1–3 | 工具链、差分测试框架、Coq 基础设施、指令映射 | 里程碑 1 |</p>\n<p>| 第二阶段：形式化证明 | Week 4–6 | 核心 ALU 证明、控制流、内存、MOP、证明审计 | 里程碑 2（Week 6） |</p>\n<p>| 第三阶段：集成与交付 | Week 7–8 | 完整测试覆盖、差距分析、文档、洁净室构建、提交 | 里程碑 3 |</p>\n<p><strong>### 九、与 CKB 生态的关联性</strong></p>\n<p><strong>**回应长期开放的社区需求。**</strong> [ckb-vm#190](<a href=\"https://github.com/nervosnetwork/ckb-vm/issues/190\" class=\"inline-onebox\" rel=\"noopener nofollow ugc\">Formally Verify CKB-VM via Sail · Issue #190 · nervosnetwork/ckb-vm · GitHub</a>) 自 2021 年开放至今，请求对 CKB-VM 的 RISC-V 实现进行形式化验证。本项目直接交付概念验证解决方案。</p>\n<p><strong>**加固 CKB 的安全基石。**</strong> CKB-VM 执行 Nervos 网络上的每一笔交易。经过形式化验证的指令集消除了一整类潜在漏洞——测试无法完全防止的指令级语义缺陷。这对于安全模型依赖确定性执行的区块链而言至关重要。</p>\n<p><strong>**建立可复用的验证框架。**</strong> Coq 证明基础设施、差分测试工具和方法论文档均为扩展而设计。未来贡献者可按照既定模式为额外指令添加证明，无需重新学习框架。</p>\n<p><strong>**展示 CKB 的技术纵深。**</strong> 区块链虚拟机的形式化验证在行业中罕见。CompCert 验证了 C 编译器；seL4 验证了操作系统内核；本项目将同等严谨性带入 CKB 的执行层。这将 CKB 定位于技术最具雄心的区块链项目之列。</p>\n<p><strong>**为未来全 ISA 验证铺路。**</strong> 本 PoC 覆盖 10+ 条指令并建立方法论。Phase 2+ 路线图（完整 RV64I、扩展、ASM 模式）为 Community Fund DAO 提案规模的全面验证提供了清晰路径。</p>\n<p><strong>**纯 Rust + Coq 工具链。**</strong> 差分测试框架为纯 Rust，与 CKB 技术栈一致。CKB 社区的 Rust 开发者无需学习新语言即可贡献（Coq 证明是专业组件，但测试框架对所有人开放）。</p>\n<p><strong>### 十、技术风险与应对</strong></p>\n<p>| 风险 | 影响 | 概率 | 应对 |</p>\n<p>|------|------|------|------|</p>\n<p>| Sail Coq 输出结构在版本间变化 | 高 | 低 | 锁定 sail-riscv 到特定 commit。记录精确的 Sail 编译器版本。提供 `generate_coq.sh` 用于可复现的重新生成。 |</p>\n<p>| Sail 单子 Coq 过于复杂，难以与 CKB-VM 纯模型桥接 | 高 | 中 | Week 3 专门用于此任务。退路：提取执行后的可观测状态而非证明步骤内部等价性。最坏情况：针对简化的 Sail 提取物而非原始单子输出进行证明。 |</p>\n<p>| Coq 编译时间超出开发迭代速度 | 中 | 中 | 生成的 Coq 约 30K 行。使用 `_CoqProject` 进行选择性编译。增量开发证明。16GB 内存的云服务器支持并行 coqc。 |</p>\n<p>| CKB-VM Rust 语义与 Coq 模型偏差 | 中 | 中 | 差分测试捕获运行时偏差。将 Coq 模型与实际 Rust 源码交叉参照。记录所有抽象差距。 |</p>\n<p>| riscv-tests ECALL 约定与 CKB-VM 不兼容 | 中 | 低 | 已识别并已设计解决方案：CKB-VM 使用 syscall 93，riscv-tests 使用 `tohost`。Sail runner 过滤 ECALL 差异。 |</p>\n<p>| 时间不足以完成 10+ 条指令证明 | 中 | 中 | 优先处理证明结构最简单的核心 ALU 指令（ADD、SUB、ADDI）。尽早构建可重用策略。接受复杂指令（SW、JALR）的部分证明（记录 Admits）。 |</p>\n<p>| Sail C++ 模拟器在目标平台上构建失败 | 低 | 低 | CMake 构建在 sail-riscv 中有良好文档。提供带错误处理的 `build_sail_emulator.sh`。GMP 是唯一非平凡依赖。 |</p>\n<p>| 生成的 Coq 需要不可用的 Coq 库 | 低 | 中 | 按 sail-riscv 项目文档使用 `coq-sail-stdpp`。在构建说明中锁定精确的 Coq 版本（9.0.0）。 |</p>\n<p><strong>### 十一、透明度承诺</strong></p>\n<p><strong>**Day 1 起完全开源。**</strong> 所有代码在 GitHub 上以 MIT 许可证公开，从开发伊始即为公开仓库。</p>\n<p><strong>**每周进度更新。**</strong> 每周在 Nervos Talk 论坛发布，涵盖已完成任务、阻塞项和下周计划。</p>\n<p><strong>**月度分享会。**</strong> 共两次（Week 4 和 Week 8），最后一次包括 Coq 证明编译和差分测试执行的实时演示及问答环节。</p>\n<p><strong>**机器可验证的结果。**</strong> 所有 Coq 证明由机器检查——任何人都可以克隆仓库、运行 `make coq`，独立验证每一个定理。不需要信任作者。</p>\n<p><strong>**如实报告局限性。**</strong> 结项报告将明确记录：`Admitted` 证明的数量及原因、阻碍完整等价性声明的语义差距、以及尚未覆盖的指令。</p>\n<p><strong>**从头可复现。**</strong> 完整的构建说明、锁定的依赖版本和安装脚本，确保任何评审者都能独立复现所有结果。</p>",
          "like_count": 0,
          "quote_count": 0
        },
        {
          "post_id": 24052,
          "post_number": 2,
          "topic_id": 10214,
          "topic_title": "Spark Program | CKB-VM Sail Formal Verification — Proving CKB-VM RISC-V Instruction Equivalence via Sail Specification and Coq Theorem Prover / CKB-VM Sail 形式化验证 — 基于 Sail 规范与 Coq 定理证明器的 CKB-VM RISC-V 指令等价性证明",
          "topic_slug": "spark-program-ckb-vm-sail-formal-verification-proving-ckb-vm-risc-v-instruction-equivalence-via-sail-specification-and-coq-theorem-prover-ckb-vm-sail-sail-coq-ckb-vm-risc-v",
          "author": "zz_tovarishch",
          "created_at": "2026-04-27T21:16:49.736000+00:00",
          "updated_at": "2026-04-27T21:16:49.736000+00:00",
          "reply_to_post_number": null,
          "url": "https://talk.nervos.org/t/spark-program-ckb-vm-sail-formal-verification-proving-ckb-vm-risc-v-instruction-equivalence-via-sail-specification-and-coq-theorem-prover-ckb-vm-sail-sail-coq-ckb-vm-risc-v/10214/2",
          "content_text": "Hi 欢迎来到Nervos Talk以及申请Spark\n目前论坛已经有了AI翻译插件，Spark不再强制需要双语撰写proposal、回复了\ncc @xingtianchunyan",
          "content_html": "<p>Hi 欢迎来到Nervos Talk以及申请Spark</p>\n<p>目前论坛已经有了AI翻译插件，Spark不再强制需要双语撰写proposal、回复了</p>\n<p>cc <a class=\"mention\" href=\"/u/xingtianchunyan\">@xingtianchunyan</a></p>",
          "like_count": 0,
          "quote_count": 0
        },
        {
          "post_id": 24054,
          "post_number": 3,
          "topic_id": 10214,
          "topic_title": "Spark Program | CKB-VM Sail Formal Verification — Proving CKB-VM RISC-V Instruction Equivalence via Sail Specification and Coq Theorem Prover / CKB-VM Sail 形式化验证 — 基于 Sail 规范与 Coq 定理证明器的 CKB-VM RISC-V 指令等价性证明",
          "topic_slug": "spark-program-ckb-vm-sail-formal-verification-proving-ckb-vm-risc-v-instruction-equivalence-via-sail-specification-and-coq-theorem-prover-ckb-vm-sail-sail-coq-ckb-vm-risc-v",
          "author": "ArthurZhang",
          "created_at": "2026-04-28T03:13:16.520000+00:00",
          "updated_at": "2026-04-28T03:15:44.546000+00:00",
          "reply_to_post_number": null,
          "url": "https://talk.nervos.org/t/spark-program-ckb-vm-sail-formal-verification-proving-ckb-vm-risc-v-instruction-equivalence-via-sail-specification-and-coq-theorem-prover-ckb-vm-sail-sail-coq-ckb-vm-risc-v/10214/3",
          "content_text": "This looks like a valuable direction. A further verified CKB-VM foundation would strengthen the whole CKB scripting stack. Proving instruction-level equivalence against the Sail RISC-V specification feels like the kind of deep infrastructure work that may not be immediately visible to application developers, but i reckon it compounds over time.\nBest of luck.",
          "content_html": "<p>This looks like a valuable direction. A further verified CKB-VM foundation would strengthen the whole CKB scripting stack. Proving instruction-level equivalence against the Sail RISC-V specification feels like the kind of deep infrastructure work that may not be immediately visible to application developers, but i reckon it compounds over time.</p>\n<p>Best of luck.</p>",
          "like_count": 0,
          "quote_count": 0
        }
      ]
    },
    {
      "topic_id": 8752,
      "title": "Spark Program: Mini-Grant Initiative",
      "slug": "spark-program-mini-grant-initiative",
      "url": "https://talk.nervos.org/t/spark-program-mini-grant-initiative/8752",
      "created_at": "2025-04-28T10:30:41.206000+00:00",
      "last_posted_at": "2026-04-27T21:19:29.698000+00:00",
      "category_id": 49,
      "tags": [
        "Spark-Program"
      ],
      "posters": [
        "Original Poster, Most Recent Poster",
        "Frequent Poster"
      ],
      "recent_posts": [
        {
          "post_id": 24053,
          "post_number": 6,
          "topic_id": 8752,
          "topic_title": "Spark Program: Mini-Grant Initiative",
          "topic_slug": "spark-program-mini-grant-initiative",
          "author": "zz_tovarishch",
          "created_at": "2026-04-27T21:19:29.698000+00:00",
          "updated_at": "2026-04-27T21:19:41.823000+00:00",
          "reply_to_post_number": null,
          "url": "https://talk.nervos.org/t/spark-program-mini-grant-initiative/8752/6",
          "content_text": "由于论坛已经全面接入AI翻译工具\n后续Spark项目的提案、周更新、总结等在Nervos Talk沉淀的内容，不再强制要求双语版本\nZhouzhou\nOn Behalf of the Spark Committee",
          "content_html": "<p>由于论坛已经全面接入AI翻译工具</p>\n<p>后续Spark项目的提案、周更新、总结等在Nervos Talk沉淀的内容，不再强制要求双语版本</p>\n<p>Zhouzhou<br>\nOn Behalf of the Spark Committee</p>",
          "like_count": 0,
          "quote_count": 0
        }
      ]
    }
  ]
}