Blog | hwkim-dev

[프로젝트] llm-lite — Gemma 3N E4B 경량 추론 엔진

April 19, 2026 · 2 min read

Developer

llm-lite 는 저사양 로컬 환경에서 Gemma 3N E4B 를 클라우드 없이 돌리는 걸 목표로 만든 멀티 백엔드 추론 엔진이다. 모델 구조는 그대로 두되 공격적인 양자화(INT4 weights + MMAP)와 저수준 하드웨어 가속으로 성능을 끌어내는 방향을 택했다.

[Paper] Attention Is All You Need

April 17, 2026 · 14 min read

hwkim-dev

Developer

This text contains the core concepts and mathematical principles of the Transformer model architecture.

[Paper] GPT-1 — Improving Language Understanding by Generative Pre-Training

April 17, 2026 · 13 min read

hwkim-dev

Developer

This document is a note organizing the architecture and training process of the GPT-1 paper by combining mathematical definitions with intuitive interpretations.

[Daily] Spring, and a Fresh Start

April 14, 2026 · One min read

hwkim-dev

Developer

Cherry blossoms are starting to bloom — and so is this new site.

[Misc] Rebuilt My Personal Homepage with Docusaurus

April 14, 2026 · One min read

hwkim-dev

Developer

I finally set up a proper personal homepage. What used to be just a GitHub Profile README is now a full static site powered by Docusaurus.

[News] AI / HPC Weekly Clips — 2026.04.14

April 14, 2026 · One min read

hwkim-dev

Developer

A quick summary of notable updates in deep learning inference, GPU architecture, and HPC this week.

[Review] CUDA by Example — Best Book for GPU Beginners

April 14, 2026 · One min read

hwkim-dev

Developer

The book that helped me most when I first started learning CUDA programming.

[Study] CUDA Kernel Optimization — Memory Access Patterns

April 14, 2026 · One min read

hwkim-dev

Developer

While studying deep learning inference optimization, I explored how memory access patterns in CUDA kernels dramatically affect performance.