pccx-lab research lineage¶
Auto-generated from pccx_core::research::CITATIONS. Every
entry grounds a specific analyzer or UVM strategy in a published
paper — update core/src/research.rs when adding a new probe.
Analyzers¶
used by |
title |
year |
arxiv |
|---|---|---|---|
|
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving |
2024 |
|
|
Prefill vs Decode Bottlenecks: SRAM-Frequency Tradeoffs |
2025 |
|
|
Prefill vs Decode Bottlenecks: SRAM-Frequency Tradeoffs |
2025 |
|
|
LLM Inference Unveiled: Survey and Roofline Model Insights |
2024 |
|
|
Hybrid Systolic Array Accelerator with Optimized Dataflow for Edge LLM Inference |
2025 |
|
|
HERMES: Understanding and Optimizing Multi-Stage AI Inference Pipelines |
2025 |
|
|
Matryoshka Representation Learning |
2022 |
|
|
LLMCompass: Enabling Efficient Hardware Design for LLMs |
2024 |
|
|
Switch Transformer: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity |
2021 |
|
|
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning |
2023 |
|
|
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-Precision |
2024 |
UVM Strategies¶
used by |
title |
year |
arxiv |
|---|---|---|---|
|
Accelerating OpenPangu Inference on NPU via Speculative Decoding |
2026 |
|
|
Fast and Cost-effective Speculative Edge-Cloud Decoding with Early Exits |
2025 |
|
|
A Survey on LLM Acceleration based on KV Cache Management |
2024 |
|
|
EVICPRESS: Joint KV-Cache Compression and Eviction for Efficient LLM Serving |
2025 |
|
|
QServe: W4A8KV4 Quantization and System Co-design |
2024 |
|
|
QQQ: Quality Quattuor-Bit Quantization for LLMs |
2024 |
|
|
Architecting Long-Context LLM Acceleration with Packing-Prefetch Scheduler |
2025 |
|
|
Matryoshka Representation Learning |
2022 |
|
|
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-Precision |
2024 |
|
|
Wavelet-Enhanced Linear Attention |
2023 |