pccx-lab research lineage¶

Auto-generated from pccx_core::research::CITATIONS. Every entry grounds a specific analyzer or UVM strategy in a published paper — update core/src/research.rs when adding a new probe.

Analyzers¶

used by	title	year	arxiv
`kv_cache_pressure`	QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving	2024	2405.04532
`kv_cache_pressure`	Prefill vs Decode Bottlenecks: SRAM-Frequency Tradeoffs	2025	2512.22066
`phase_classifier`	Prefill vs Decode Bottlenecks: SRAM-Frequency Tradeoffs	2025	2512.22066
`ai_trend`	LLM Inference Unveiled: Survey and Roofline Model Insights	2024	2402.16363
`power_estimate`	Hybrid Systolic Array Accelerator with Optimized Dataflow for Edge LLM Inference	2025	2507.09010
`latency_distribution`	HERMES: Understanding and Optimizing Multi-Stage AI Inference Pipelines	2025	hermes-2025
`matryoshka_footprint`	Matryoshka Representation Learning	2022	2205.13147
`dma_burst_efficiency`	LLMCompass: Enabling Efficient Hardware Design for LLMs	2024	2410-llmcompass-isca-2024
`moe_sparsity`	Switch Transformer: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity	2021	2101.03961
`flash_attention_tile`	FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning	2023	2307.08691
`flash_attention_tile`	FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-Precision	2024	2407.08608

UVM Strategies¶

used by	title	year	arxiv
`speculative_draft_probe`	Accelerating OpenPangu Inference on NPU via Speculative Decoding	2026	2603.03383
`early_exit_decoder`	Fast and Cost-effective Speculative Edge-Cloud Decoding with Early Exits	2025	2505.21594
`sparsified_kv_eviction`	A Survey on LLM Acceleration based on KV Cache Management	2024	2412.19442
`sparsified_kv_eviction`	EVICPRESS: Joint KV-Cache Compression and Eviction for Efficient LLM Serving	2025	2512.14946
`qoq_kv4_quantize`	QServe: W4A8KV4 Quantization and System Co-design	2024	2405.04532
`qoq_kv4_quantize`	QQQ: Quality Quattuor-Bit Quantization for LLMs	2024	2406.09904
`l2_prefetch`	Architecting Long-Context LLM Acceleration with Packing-Prefetch Scheduler	2025	2508.08457
`matryoshka_subnet_switch`	Matryoshka Representation Learning	2022	2205.13147
`flash_attention_tile_probe`	FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-Precision	2024	2407.08608
`wavelet_attention_probe`	Wavelet-Enhanced Linear Attention	2023	2312.07590