pccx-lab research lineage

Auto-generated from pccx_core::research::CITATIONS. Every entry grounds a specific analyzer or UVM strategy in a published paper — update core/src/research.rs when adding a new probe.

Analyzers

used by

title

year

arxiv

kv_cache_pressure

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

2024

2405.04532

kv_cache_pressure

Prefill vs Decode Bottlenecks: SRAM-Frequency Tradeoffs

2025

2512.22066

phase_classifier

Prefill vs Decode Bottlenecks: SRAM-Frequency Tradeoffs

2025

2512.22066

ai_trend

LLM Inference Unveiled: Survey and Roofline Model Insights

2024

2402.16363

power_estimate

Hybrid Systolic Array Accelerator with Optimized Dataflow for Edge LLM Inference

2025

2507.09010

latency_distribution

HERMES: Understanding and Optimizing Multi-Stage AI Inference Pipelines

2025

hermes-2025

matryoshka_footprint

Matryoshka Representation Learning

2022

2205.13147

dma_burst_efficiency

LLMCompass: Enabling Efficient Hardware Design for LLMs

2024

2410-llmcompass-isca-2024

moe_sparsity

Switch Transformer: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

2021

2101.03961

flash_attention_tile

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

2023

2307.08691

flash_attention_tile

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-Precision

2024

2407.08608

UVM Strategies

used by

title

year

arxiv

speculative_draft_probe

Accelerating OpenPangu Inference on NPU via Speculative Decoding

2026

2603.03383

early_exit_decoder

Fast and Cost-effective Speculative Edge-Cloud Decoding with Early Exits

2025

2505.21594

sparsified_kv_eviction

A Survey on LLM Acceleration based on KV Cache Management

2024

2412.19442

sparsified_kv_eviction

EVICPRESS: Joint KV-Cache Compression and Eviction for Efficient LLM Serving

2025

2512.14946

qoq_kv4_quantize

QServe: W4A8KV4 Quantization and System Co-design

2024

2405.04532

qoq_kv4_quantize

QQQ: Quality Quattuor-Bit Quantization for LLMs

2024

2406.09904

l2_prefetch

Architecting Long-Context LLM Acceleration with Packing-Prefetch Scheduler

2025

2508.08457

matryoshka_subnet_switch

Matryoshka Representation Learning

2022

2205.13147

flash_attention_tile_probe

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-Precision

2024

2407.08608

wavelet_attention_probe

Wavelet-Enhanced Linear Attention

2023

2312.07590