<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>hwkim-dev Blog</title>
        <link>https://hwkim-dev.github.io/hwkim-dev/ko/blog</link>
        <description>hwkim-dev Blog</description>
        <lastBuildDate>Sun, 19 Apr 2026 00:00:00 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>ko</language>
        <item>
            <title><![CDATA[[프로젝트] llm-lite — Gemma 3N E4B 경량 추론 엔진]]></title>
            <link>https://hwkim-dev.github.io/hwkim-dev/ko/blog/llm-lite-intro</link>
            <guid>https://hwkim-dev.github.io/hwkim-dev/ko/blog/llm-lite-intro</guid>
            <pubDate>Sun, 19 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[llm-lite 는 저사양 로컬 환경에서 Gemma 3N E4B 를 클라우드 없이 돌리는 걸 목표로 만든]]></description>
            <content:encoded><![CDATA[<p><strong>llm-lite</strong> 는 저사양 로컬 환경에서 Gemma 3N E4B 를 <strong>클라우드 없이</strong> 돌리는 걸 목표로 만든
멀티 백엔드 추론 엔진이다. 모델 구조는 그대로 두되 공격적인 양자화(INT4 weights + MMAP)와
저수준 하드웨어 가속으로 성능을 끌어내는 방향을 택했다.</p>
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="타겟-하드웨어">타겟 하드웨어<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/llm-lite-intro#%ED%83%80%EA%B2%9F-%ED%95%98%EB%93%9C%EC%9B%A8%EC%96%B4" class="hash-link" aria-label="타겟 하드웨어에 대한 직접 링크" title="타겟 하드웨어에 대한 직접 링크" translate="no">​</a></h2>
<p>1차 타겟은 <strong>AMD Ryzen 5 4500U APU</strong> (Renoir, 6C/6T, Radeon RX Vega 6 iGPU) 를 장착한
리눅스 머신이다. 2차 타겟으로 macOS (Apple Silicon / Intel, MoltenVK 경유), Raspberry Pi 4/5
(aarch64), 그리고 <strong>Xilinx KV260 FPGA</strong> 를 지원한다. KV260 에서는 별도 NPU 백엔드
(uCA — micro Compute Architecture) 를 사용한다.</p>
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="아키텍처-요약">아키텍처 요약<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/llm-lite-intro#%EC%95%84%ED%82%A4%ED%85%8D%EC%B2%98-%EC%9A%94%EC%95%BD" class="hash-link" aria-label="아키텍처 요약에 대한 직접 링크" title="아키텍처 요약에 대한 직접 링크" translate="no">​</a></h2>
<table><thead><tr><th>레이어</th><th>기술</th></tr></thead><tbody><tr><td>추론 엔진</td><td>Python 3.12 + NumPy</td></tr><tr><td>CPU 커널</td><td>C++17 + SIMD / OpenMP</td></tr><tr><td>GPU 커널</td><td>Vulkan 1.2 Compute + GLSL</td></tr><tr><td>웹 GUI</td><td>Flask 3 + SSE 스트리밍</td></tr><tr><td>네이티브 GUI</td><td>Dear ImGui 1.91 + Vulkan</td></tr><tr><td>양자화</td><td>W4A32 기본 — INT4 weights, FP32 activations</td></tr><tr><td>Weight 로딩</td><td>safetensors + MMAP (zero-copy)</td></tr></tbody></table>
<p>프리필 ~35 tokens/sec, 디코드 ~8-12 tokens/sec 수준으로, Ryzen 4500U 에서도 일상적
대화가 가능한 속도다. 모델 RAM 은 INT4 MMAP 기준 약 2.8 GB.</p>
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="최근-업데이트">최근 업데이트<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/llm-lite-intro#%EC%B5%9C%EA%B7%BC-%EC%97%85%EB%8D%B0%EC%9D%B4%ED%8A%B8" class="hash-link" aria-label="최근 업데이트에 대한 직접 링크" title="최근 업데이트에 대한 직접 링크" translate="no">​</a></h2>
<ul>
<li class=""><strong>양자화 모드 확장</strong>: 기존 INT4 에 더해 INT8 / FP16 / FP32 weight 모드 추가.
특히 구형 iGPU (Vega 6 등) 는 정수 연산보다 부동소수점이 빠를 수 있어 모드 선택이 의미가 있다.</li>
<li class=""><strong>모델 매니저</strong>: GUI 에서 HuggingFace 모델 다운로드 → 양자화 → 기존 variant 삭제까지
한 흐름으로 가능.</li>
<li class=""><strong>Speculative Decoding 준비</strong>: Gemma 3N 의 MatFormer 구조를 이용해 E4B 에서 E2B 를
슬라이스하는 방향으로 draft model 을 만드는 연구를 시작했다. 현재는 scaffold 상태이고
실제 구현은 별도 이슈로 추적 중.</li>
</ul>
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="관련-링크">관련 링크<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/llm-lite-intro#%EA%B4%80%EB%A0%A8-%EB%A7%81%ED%81%AC" class="hash-link" aria-label="관련 링크에 대한 직접 링크" title="관련 링크에 대한 직접 링크" translate="no">​</a></h2>
<ul>
<li class="">GitHub: <a href="https://github.com/hwkim-dev/llm-lite" target="_blank" rel="noopener noreferrer" class="">hwkim-dev/llm-lite</a></li>
<li class="">Reference manual: <a href="https://github.com/hwkim-dev/llm-lite/blob/main/docs/Gemma3N_Reference_Manual.md" target="_blank" rel="noopener noreferrer" class="">Gemma3N_Reference_Manual.md</a></li>
<li class="">관련 논문 글: <a class="" href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/gemma-3n-e4b">Gemma 3 4B 내부 처리 과정</a></li>
</ul>]]></content:encoded>
            <category>llm-lite</category>
            <category>gemma</category>
            <category>llm</category>
            <category>inference</category>
            <category>vulkan</category>
            <category>cpp</category>
        </item>
        <item>
            <title><![CDATA[[논문] Attention Is All You Need]]></title>
            <link>https://hwkim-dev.github.io/hwkim-dev/ko/blog/attention-is-all-you-need</link>
            <guid>https://hwkim-dev.github.io/hwkim-dev/ko/blog/attention-is-all-you-need</guid>
            <pubDate>Fri, 17 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Transformer 모델 구조의 핵심 개념과 수학적 원리를 담은 글이다.]]></description>
            <content:encoded><![CDATA[<p>Transformer 모델 구조의 핵심 개념과 수학적 원리를 담은 글이다.</p>
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="1-transformer의-등장-배경">1. Transformer의 등장 배경<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/attention-is-all-you-need#1-transformer%EC%9D%98-%EB%93%B1%EC%9E%A5-%EB%B0%B0%EA%B2%BD" class="hash-link" aria-label="1. Transformer의 등장 배경에 대한 직접 링크" title="1. Transformer의 등장 배경에 대한 직접 링크" translate="no">​</a></h2>
<p>기존 NLP 처리 분야에서 주류를 이루던 모델은 RNN(Recurrent Neural Network)과 LSTM(Long Short-Term Memory)이었다. 이 모델들은 데이터를 순차적(Sequential)으로 처리한다. 예를 들어 "나는 학교에 간다"라는 문장이 있을 때, '나는'을 처리한 결과를 바탕으로 '학교에'를 처리하고, 그 결과를 다시 바탕으로 '간다'를 처리하는 방식이다.</p>
<p>이러한 순차적 처리 방식에는 두 가지 치명적인 한계가 있다.</p>
<ol>
<li class="">
<p><strong>parallel하게 처리 불가:</strong> 이전 단어의 연산이 끝나야만 다음 단어의 연산을 수행할 수 있으므로, 컴퓨터의 연산 자원을 동시에 활용하는 parallel 처리가 불가능하다.</p>
</li>
<li class="">
<p><strong>장기 의존성(Long-term Dependency) 문제:</strong> 문장이 길어질수록 초반에 입력된 단어의 정보가 뒤로 갈수록 희미해지는 현상이 발생한다.</p>
</li>
</ol>
<p>Transformer는 "<strong>단어들을 순차적으로 넣지 말고, 문장 전체를 한꺼번에 입력한 뒤 단어들 간의 관계를 동시에 계산하자</strong>"는 아이디어에서 출발했다. 이를 가능하게 한 핵심 기술이 바로 <strong>Attention</strong> 메커니즘이다.</p>
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="2-model-architecture">2. Model Architecture<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/attention-is-all-you-need#2-model-architecture" class="hash-link" aria-label="2. Model Architecture에 대한 직접 링크" title="2. Model Architecture에 대한 직접 링크" translate="no">​</a></h2>
<p>Transformer는 기계 번역과 같은 Sequence Transduction 작업에 최적화된 <strong>Encoder-Decoder</strong> 구조를 채택하고 있다.</p>
<!-- -->
<ul>
<li class=""><strong>Auto-regressive 특성:</strong> 모델은 출력을 생성할 때 이전에 자신이 생성한 출력 기호들을 다음 단계의 추가 입력으로 사용한다. 즉, 1번째 단어를 예측하고, 그 단어를 포함하여 2번째 단어를 예측하는 방식이다.</li>
</ul>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="21-encoder">2.1 Encoder<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/attention-is-all-you-need#21-encoder" class="hash-link" aria-label="2.1 Encoder에 대한 직접 링크" title="2.1 Encoder에 대한 직접 링크" translate="no">​</a></h3>
<p>Encoder는 입력된 원본 문장(예: 한국어 문장)을 읽고, 그 문장 내 단어들의 의미와 문맥을 파악하여 압축된 정보(Representation)로 변환하는 역할을 한다.</p>
<ul>
<li class="">
<p><strong>계층 구조:</strong> 총 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>N</mi><mo>=</mo><mn>6</mn></mrow><annotation encoding="application/x-tex">N = 6</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.109em">N</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.6444em"></span><span class="mord">6</span></span></span></span>개의 Identical layers 를 쌓아 올린 형태이다.</p>
</li>
<li class="">
<p><strong>Sub-layer:</strong> 각 레이어는 내부적으로 2개의 Sub-layer를 가진다.</p>
<ol>
<li class="">
<p><strong>Multi-Head Self-Attention:</strong> 문장 내부의 단어들이 서로 어떤 연관성을 가지는지 파악한다.</p>
</li>
<li class="">
<p><strong>Position-wise Feed-Forward Network (FFN):</strong> 파악된 연관성 정보를 바탕으로 각 단어의 특징을 더욱 깊게 학습하는 Neural Network이다.</p>
</li>
</ol>
</li>
<li class="">
<p><strong>Residual Connection 및 Layer Normalization:</strong>
각 Sub-layer의 출력은 다음과 같은 수식으로 처리된다.</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>O</mi><mi>u</mi><mi>t</mi><mi>p</mi><mi>u</mi><mi>t</mi><mo>=</mo><mi>L</mi><mi>a</mi><mi>y</mi><mi>e</mi><mi>r</mi><mi>N</mi><mi>o</mi><mi>r</mi><mi>m</mi><mo stretchy="false">(</mo><mi>x</mi><mo>+</mo><mi>S</mi><mi>u</mi><mi>b</mi><mi>l</mi><mi>a</mi><mi>y</mi><mi>e</mi><mi>r</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">Output = LayerNorm(x + Sublayer(x))</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal" style="margin-right:0.0278em">O</span><span class="mord mathnormal">u</span><span class="mord mathnormal">tp</span><span class="mord mathnormal">u</span><span class="mord mathnormal">t</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal">L</span><span class="mord mathnormal">a</span><span class="mord mathnormal" style="margin-right:0.0359em">y</span><span class="mord mathnormal" style="margin-right:0.0278em">er</span><span class="mord mathnormal" style="margin-right:0.109em">N</span><span class="mord mathnormal" style="margin-right:0.0278em">or</span><span class="mord mathnormal">m</span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.0576em">S</span><span class="mord mathnormal">u</span><span class="mord mathnormal">b</span><span class="mord mathnormal" style="margin-right:0.0197em">l</span><span class="mord mathnormal">a</span><span class="mord mathnormal" style="margin-right:0.0359em">y</span><span class="mord mathnormal" style="margin-right:0.0278em">er</span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">))</span></span></span></span></span>
<ul>
<li class="">
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>x</mi></mrow><annotation encoding="application/x-tex">x</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em"></span><span class="mord mathnormal">x</span></span></span></span><strong>:</strong> Sub-layer로 들어가는 원본 입력값이다.</p>
</li>
<li class="">
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>S</mi><mi>u</mi><mi>b</mi><mi>l</mi><mi>a</mi><mi>y</mi><mi>e</mi><mi>r</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">Sublayer(x)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.0576em">S</span><span class="mord mathnormal">u</span><span class="mord mathnormal">b</span><span class="mord mathnormal" style="margin-right:0.0197em">l</span><span class="mord mathnormal">a</span><span class="mord mathnormal" style="margin-right:0.0359em">y</span><span class="mord mathnormal" style="margin-right:0.0278em">er</span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span></span></span></span><strong>:</strong> Attention이나 FFN 연산을 거친 결과값이다.</p>
</li>
<li class="">
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>x</mi><mo>+</mo><mi>S</mi><mi>u</mi><mi>b</mi><mi>l</mi><mi>a</mi><mi>y</mi><mi>e</mi><mi>r</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">x + Sublayer(x)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6667em;vertical-align:-0.0833em"></span><span class="mord mathnormal">x</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.0576em">S</span><span class="mord mathnormal">u</span><span class="mord mathnormal">b</span><span class="mord mathnormal" style="margin-right:0.0197em">l</span><span class="mord mathnormal">a</span><span class="mord mathnormal" style="margin-right:0.0359em">y</span><span class="mord mathnormal" style="margin-right:0.0278em">er</span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span></span></span></span> <strong>(Residual Connection):</strong> 연산 결과에 원본 입력값을 더해준다. 층이 깊어지더라도 초기 정보가 소실되는 것을 방지하여 학습을 안정적으로 만든다.</p>
</li>
<li class="">
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>L</mi><mi>a</mi><mi>y</mi><mi>e</mi><mi>r</mi><mi>N</mi><mi>o</mi><mi>r</mi><mi>m</mi><mo stretchy="false">(</mo><mi mathvariant="normal">.</mi><mi mathvariant="normal">.</mi><mi mathvariant="normal">.</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">LayerNorm(...)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal">L</span><span class="mord mathnormal">a</span><span class="mord mathnormal" style="margin-right:0.0359em">y</span><span class="mord mathnormal" style="margin-right:0.0278em">er</span><span class="mord mathnormal" style="margin-right:0.109em">N</span><span class="mord mathnormal" style="margin-right:0.0278em">or</span><span class="mord mathnormal">m</span><span class="mopen">(</span><span class="mord">...</span><span class="mclose">)</span></span></span></span><strong>:</strong> 더해진 결과값의 평균과 분산을 구하여 데이터를 일정한 범위로 정규화한다.</p>
</li>
</ul>
</li>
<li class="">
<p><strong>차원 통일:</strong> Residual Connection을 원활하게 수행하기 위해, 모델 내부의 모든 Sub-layer와 Embedding 층의 출력 차원은 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>d</mi><mrow><mi>m</mi><mi>o</mi><mi>d</mi><mi>e</mi><mi>l</mi></mrow></msub><mo>=</mo><mn>512</mn></mrow><annotation encoding="application/x-tex">d_{model} = 512</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8444em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">m</span><span class="mord mathnormal mtight">o</span><span class="mord mathnormal mtight">d</span><span class="mord mathnormal mtight">e</span><span class="mord mathnormal mtight" style="margin-right:0.0197em">l</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.6444em"></span><span class="mord">512</span></span></span></span>로 고정된다.</p>
</li>
</ul>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="22-decoder">2.2 Decoder<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/attention-is-all-you-need#22-decoder" class="hash-link" aria-label="2.2 Decoder에 대한 직접 링크" title="2.2 Decoder에 대한 직접 링크" translate="no">​</a></h3>
<p>Decoder는 Encoder가 압축해 놓은 문맥 정보를 바탕으로 타겟 문장(예: 번역된 영어 문장)을 하나씩 생성하는 역할을 한다. Encoder와 마찬가지로 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>N</mi><mo>=</mo><mn>6</mn></mrow><annotation encoding="application/x-tex">N = 6</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.109em">N</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.6444em"></span><span class="mord">6</span></span></span></span>개의 동일한 레이어로 구성되지만, Sub-layer가 3개로 늘어난다.</p>
<ol>
<li class="">
<p><strong>Masked Multi-Head Self-Attention:</strong></p>
<ul>
<li class="">
<p>Decoder가 출력 단어를 생성할 때, 현재 위치보다 뒤에 있는(미래의) 단어들을 미리 보지 못하게 가리는(Masking) 역할을 한다.</p>
</li>
<li class="">
<p>예를 들어 3번째 단어를 예측할 때는 1, 2번째 단어만 참조할 수 있도록, 미래 단어들의 유사도 점수(Score)를 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo>−</mo><mi mathvariant="normal">∞</mi></mrow><annotation encoding="application/x-tex">-\infty</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6667em;vertical-align:-0.0833em"></span><span class="mord">−</span><span class="mord">∞</span></span></span></span>로 마스킹하여, Softmax 함수를 거친 후의 Attention 가중치(Weight)가 0이 되도록 만든다.</p>
</li>
</ul>
</li>
<li class="">
<p><strong>Multi-Head Attention (Encoder-Decoder Attention):</strong></p>
<ul>
<li class="">
<p>Decoder가 단어를 생성하기 위해 "원본 문장의 어떤 부분을 집중해서 봐야 할지"를 결정하는 곳이다.</p>
</li>
<li class="">
<p>여기서 Decoder는 자신의 정보를 기준(Query)으로 삼고, Encoder가 최종적으로 출력한 정보(Key, Value)를 참조한다.</p>
</li>
</ul>
</li>
<li class="">
<p><strong>Position-wise Feed-Forward Network:</strong> Encoder의 구조와 동일하다.</p>
</li>
</ol>
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="3-attention-메커니즘">3. Attention 메커니즘<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/attention-is-all-you-need#3-attention-%EB%A9%94%EC%BB%A4%EB%8B%88%EC%A6%98" class="hash-link" aria-label="3. Attention 메커니즘에 대한 직접 링크" title="3. Attention 메커니즘에 대한 직접 링크" translate="no">​</a></h2>
<p>Attention 메커니즘은 Transformer의 핵심이다. Attention 함수는 하나의 Query와 Key-Value 쌍들의 집합을 출력에 매핑하는 작업으로 설명할 수 있다.</p>
<!-- -->
<p>비유하자면 도서관에서 정보를 찾는 과정과 같다.</p>
<ul>
<li class="">
<p><strong>Query (Q):</strong> 사용자가 검색창에 입력한 '검색어' (현재 파악하고자 하는 대상 단어)</p>
</li>
<li class="">
<p><strong>Key (K):</strong> 도서관 책들에 붙어있는 '색인' 또는 '라벨' (다른 단어들이 가진 특징)</p>
</li>
<li class="">
<p><strong>Value (V):</strong> 그 책의 실제 '내용' (다른 단어들이 가진 실제 정보)</p>
</li>
</ul>
<p>(* Self-Attention의 경우 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi><mo separator="true">,</mo><mi>K</mi><mo separator="true">,</mo><mi>V</mi></mrow><annotation encoding="application/x-tex">Q, K, V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span>는 모두 같은 입력 문장으로부터 생성되며, 각각 서로 다른 가중치 행렬을 곱해 목적에 맞게 변환된 값이다)</p>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="31-scaled-dot-product-attention">3.1 Scaled Dot-Product Attention<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/attention-is-all-you-need#31-scaled-dot-product-attention" class="hash-link" aria-label="3.1 Scaled Dot-Product Attention에 대한 직접 링크" title="3.1 Scaled Dot-Product Attention에 대한 직접 링크" translate="no">​</a></h3>
<p>논문에서는 Attention을 계산하기 위해 'Scaled Dot-Product Attention'이라는 방식을 제안한다. 연산 수식은 다음과 같다.</p>
<!-- -->
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>A</mi><mi>t</mi><mi>t</mi><mi>e</mi><mi>n</mi><mi>t</mi><mi>i</mi><mi>o</mi><mi>n</mi><mo stretchy="false">(</mo><mi>Q</mi><mo separator="true">,</mo><mi>K</mi><mo separator="true">,</mo><mi>V</mi><mo stretchy="false">)</mo><mo>=</mo><mi>s</mi><mi>o</mi><mi>f</mi><mi>t</mi><mi>m</mi><mi>a</mi><mi>x</mi><mo stretchy="false">(</mo><mfrac><mrow><mi>Q</mi><msup><mi>K</mi><mi>T</mi></msup></mrow><msqrt><msub><mi>d</mi><mi>k</mi></msub></msqrt></mfrac><mo stretchy="false">)</mo><mi>V</mi></mrow><annotation encoding="application/x-tex">Attention(Q, K, V) = softmax(\frac{QK^T}{\sqrt{d_k}})V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal">A</span><span class="mord mathnormal">tt</span><span class="mord mathnormal">e</span><span class="mord mathnormal">n</span><span class="mord mathnormal">t</span><span class="mord mathnormal">i</span><span class="mord mathnormal">o</span><span class="mord mathnormal">n</span><span class="mopen">(</span><span class="mord mathnormal">Q</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:2.4483em;vertical-align:-0.93em"></span><span class="mord mathnormal">so</span><span class="mord mathnormal" style="margin-right:0.1076em">f</span><span class="mord mathnormal">t</span><span class="mord mathnormal">ma</span><span class="mord mathnormal">x</span><span class="mopen">(</span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.5183em"><span style="top:-2.2528em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord sqrt"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8572em"><span class="svg-align" style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord" style="padding-left:0.833em"><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0315em">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span><span style="top:-2.8172em"><span class="pstrut" style="height:3em"></span><span class="hide-tail" style="min-width:0.853em;height:1.08em"><svg xmlns="http://www.w3.org/2000/svg" width="400em" height="1.08em" viewBox="0 0 400000 1080" preserveAspectRatio="xMinYMin slice"><path d="M95,702
c-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14
c0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54
c44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10
s173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429
c69,-144,104.5,-217.7,106.5,-221
l0 -0
c5.3,-9.3,12,-14,20,-14
H400000v40H845.2724
s-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7
c-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z
M834 80h400000v40h-400000z"></path></svg></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.1828em"><span></span></span></span></span></span></span></span><span style="top:-3.23em"><span class="pstrut" style="height:3em"></span><span class="frac-line" style="border-bottom-width:0.04em"></span></span><span style="top:-3.677em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal">Q</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.0715em">K</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413em"><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.1389em">T</span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.93em"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mclose">)</span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span></span>
<ul>
<li class="">
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi></mrow><annotation encoding="application/x-tex">Q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span></span></span></span> <strong>(Query Matrix):</strong> | [질문] | 타겟 단어들의 벡터가 모인 Matrix이다.</p>
</li>
<li class="">
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span></span></span></span> <strong>(Key Matrix):</strong> | [위치] | 참조할 단어들의 벡터가 모인 Matrix이다.</p>
</li>
<li class="">
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi></mrow><annotation encoding="application/x-tex">V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span> <strong>(Value Matrix):</strong> | [내용] | 참조할 단어들의 실제 정보 벡터가 모인 Matrix이다.</p>
</li>
<li class="">
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mi>K</mi><mi>T</mi></msup></mrow><annotation encoding="application/x-tex">K^T</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8413em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.0715em">K</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413em"><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.1389em">T</span></span></span></span></span></span></span></span></span></span></span><strong>:</strong> Key Matrix의 전치 Matrix(Transposed Matrix)이다. Matrix 곱을 위해 행과 열을 바꾼 형태이다.</p>
</li>
<li class="">
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>d</mi><mi>k</mi></msub></mrow><annotation encoding="application/x-tex">d_k</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8444em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0315em">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span><strong>:</strong> Query와 Key 벡터의 차원 수이다. (논문에서는 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>d</mi><mi>k</mi></msub><mo>=</mo><mn>64</mn></mrow><annotation encoding="application/x-tex">d_k = 64</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8444em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0315em">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.6444em"></span><span class="mord">64</span></span></span></span>를 사용한다.)</p>
</li>
<li class="">
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msqrt><msub><mi>d</mi><mi>k</mi></msub></msqrt></mrow><annotation encoding="application/x-tex">\sqrt{d_k}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.04em;vertical-align:-0.1828em"></span><span class="mord sqrt"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8572em"><span class="svg-align" style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord" style="padding-left:0.833em"><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0315em">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span><span style="top:-2.8172em"><span class="pstrut" style="height:3em"></span><span class="hide-tail" style="min-width:0.853em;height:1.08em"><svg xmlns="http://www.w3.org/2000/svg" width="400em" height="1.08em" viewBox="0 0 400000 1080" preserveAspectRatio="xMinYMin slice"><path d="M95,702
c-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14
c0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54
c44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10
s173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429
c69,-144,104.5,-217.7,106.5,-221
l0 -0
c5.3,-9.3,12,-14,20,-14
H400000v40H845.2724
s-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7
c-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z
M834 80h400000v40h-400000z"></path></svg></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.1828em"><span></span></span></span></span></span></span></span></span><strong>:</strong> <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>d</mi><mi>k</mi></msub></mrow><annotation encoding="application/x-tex">d_k</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8444em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0315em">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span>의 제곱근이다. (논문에서는 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msqrt><mn>64</mn></msqrt><mo>=</mo><mn>8</mn></mrow><annotation encoding="application/x-tex">\sqrt{64} = 8</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.04em;vertical-align:-0.1328em"></span><span class="mord sqrt"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9072em"><span class="svg-align" style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord" style="padding-left:0.833em"><span class="mord">64</span></span></span><span style="top:-2.8672em"><span class="pstrut" style="height:3em"></span><span class="hide-tail" style="min-width:0.853em;height:1.08em"><svg xmlns="http://www.w3.org/2000/svg" width="400em" height="1.08em" viewBox="0 0 400000 1080" preserveAspectRatio="xMinYMin slice"><path d="M95,702
c-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14
c0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54
c44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10
s173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429
c69,-144,104.5,-217.7,106.5,-221
l0 -0
c5.3,-9.3,12,-14,20,-14
H400000v40H845.2724
s-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7
c-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z
M834 80h400000v40h-400000z"></path></svg></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.1328em"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.6444em"></span><span class="mord">8</span></span></span></span>이 된다.)</p>
</li>
<li class="">
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>s</mi><mi>o</mi><mi>f</mi><mi>t</mi><mi>m</mi><mi>a</mi><mi>x</mi></mrow><annotation encoding="application/x-tex">softmax</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em"></span><span class="mord mathnormal">so</span><span class="mord mathnormal" style="margin-right:0.1076em">f</span><span class="mord mathnormal">t</span><span class="mord mathnormal">ma</span><span class="mord mathnormal">x</span></span></span></span><strong>:</strong> 입력된 값들을 0과 1 사이의 확률값으로 변환하고, 그 총합이 1이 되도록 만드는 함수이다. (공식: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mfrac><msup><mi>e</mi><msub><mi>x</mi><mi>i</mi></msub></msup><mrow><mo>∑</mo><msup><mi>e</mi><msub><mi>x</mi><mi>j</mi></msub></msup></mrow></mfrac></mrow><annotation encoding="application/x-tex">\frac{e^{x_i}}{\sum e^{x_j}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.4413em;vertical-align:-0.5303em"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.911em"><span style="top:-2.6447em"><span class="pstrut" style="height:3em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mop op-symbol small-op mtight" style="position:relative;top:0em">∑</span><span class="mspace mtight" style="margin-right:0.1952em"></span><span class="mord mtight"><span class="mord mathnormal mtight">e</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.779em"><span style="top:-2.9714em;margin-right:0.0714em"><span class="pstrut" style="height:2.5em"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mtight"><span class="mord mathnormal mtight">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3448em"><span style="top:-2.3448em;margin-left:0em;margin-right:0.1em"><span class="pstrut" style="height:2.6595em"></span><span class="mord mathnormal mtight" style="margin-right:0.0572em">j</span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.5092em"><span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span><span style="top:-3.23em"><span class="pstrut" style="height:3em"></span><span class="frac-line" style="border-bottom-width:0.04em"></span></span><span style="top:-3.394em"><span class="pstrut" style="height:3em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"><span class="mord mathnormal mtight">e</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7385em"><span style="top:-2.931em;margin-right:0.0714em"><span class="pstrut" style="height:2.5em"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mtight"><span class="mord mathnormal mtight">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3448em"><span style="top:-2.3448em;margin-left:0em;margin-right:0.1em"><span class="pstrut" style="height:2.6595em"></span><span class="mord mathnormal mtight">i</span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.3147em"><span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.5303em"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span>)</p>
</li>
</ul>
<hr>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>A</mi><mi>t</mi><mi>t</mi><mi>e</mi><mi>n</mi><mi>t</mi><mi>i</mi><mi>o</mi><mi>n</mi><mo stretchy="false">(</mo><mi>Q</mi><mo separator="true">,</mo><mi>K</mi><mo separator="true">,</mo><mi>V</mi><mo stretchy="false">)</mo><mo>=</mo><mi>s</mi><mi>o</mi><mi>f</mi><mi>t</mi><mi>m</mi><mi>a</mi><mi>x</mi><mo stretchy="false">(</mo><mfrac><mrow><mi>Q</mi><msup><mi>K</mi><mi>T</mi></msup></mrow><msqrt><msub><mi>d</mi><mi>k</mi></msub></msqrt></mfrac><mo stretchy="false">)</mo><mi>V</mi></mrow><annotation encoding="application/x-tex">Attention(Q, K, V) = softmax(\frac{QK^T}{\sqrt{d_k}})V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal">A</span><span class="mord mathnormal">tt</span><span class="mord mathnormal">e</span><span class="mord mathnormal">n</span><span class="mord mathnormal">t</span><span class="mord mathnormal">i</span><span class="mord mathnormal">o</span><span class="mord mathnormal">n</span><span class="mopen">(</span><span class="mord mathnormal">Q</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:2.4483em;vertical-align:-0.93em"></span><span class="mord mathnormal">so</span><span class="mord mathnormal" style="margin-right:0.1076em">f</span><span class="mord mathnormal">t</span><span class="mord mathnormal">ma</span><span class="mord mathnormal">x</span><span class="mopen">(</span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.5183em"><span style="top:-2.2528em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord sqrt"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8572em"><span class="svg-align" style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord" style="padding-left:0.833em"><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0315em">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span><span style="top:-2.8172em"><span class="pstrut" style="height:3em"></span><span class="hide-tail" style="min-width:0.853em;height:1.08em"><svg xmlns="http://www.w3.org/2000/svg" width="400em" height="1.08em" viewBox="0 0 400000 1080" preserveAspectRatio="xMinYMin slice"><path d="M95,702
c-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14
c0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54
c44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10
s173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429
c69,-144,104.5,-217.7,106.5,-221
l0 -0
c5.3,-9.3,12,-14,20,-14
H400000v40H845.2724
s-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7
c-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z
M834 80h400000v40h-400000z"></path></svg></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.1828em"><span></span></span></span></span></span></span></span><span style="top:-3.23em"><span class="pstrut" style="height:3em"></span><span class="frac-line" style="border-bottom-width:0.04em"></span></span><span style="top:-3.677em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal">Q</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.0715em">K</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413em"><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.1389em">T</span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.93em"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mclose">)</span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span></span>
<ol>
<li class="">
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi><msup><mi>K</mi><mi>T</mi></msup></mrow><annotation encoding="application/x-tex">QK^T</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.0358em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.0715em">K</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413em"><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.1389em">T</span></span></span></span></span></span></span></span></span></span></span> <strong>(유사도 계산):</strong> Query 행렬과 Key 전치 행렬을 행렬 곱(Matrix Multiplication)한다. 이는 Query 단어 벡터와 각 Key 단어 벡터 간의 내적(Dot Product)을 한 번에 계산하는 과정으로, Query 단어와 각 key 단어가 얼마나 연관성이 높은지(유사한지)를 수학적인 점수로 산출하는 과정이다. 값이 클수록 두 단어의 연관성이 높다는 뜻이다.</p>
</li>
<li class="">
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mfrac><mrow><mi>Q</mi><msup><mi>K</mi><mi>T</mi></msup></mrow><msqrt><msub><mi>d</mi><mi>k</mi></msub></msqrt></mfrac></mrow><annotation encoding="application/x-tex">\frac{QK^T}{\sqrt{d_k}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.6275em;vertical-align:-0.538em"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.0895em"><span style="top:-2.5864em"><span class="pstrut" style="height:3em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord sqrt mtight"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8622em"><span class="svg-align" style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord mtight" style="padding-left:0.833em"><span class="mord mtight"><span class="mord mathnormal mtight">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3448em"><span style="top:-2.3488em;margin-left:0em;margin-right:0.0714em"><span class="pstrut" style="height:2.5em"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0315em">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.1512em"><span></span></span></span></span></span></span></span></span><span style="top:-2.8222em"><span class="pstrut" style="height:3em"></span><span class="hide-tail mtight" style="min-width:0.853em;height:1.08em"><svg xmlns="http://www.w3.org/2000/svg" width="400em" height="1.08em" viewBox="0 0 400000 1080" preserveAspectRatio="xMinYMin slice"><path d="M95,702
c-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14
c0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54
c44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10
s173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429
c69,-144,104.5,-217.7,106.5,-221
l0 -0
c5.3,-9.3,12,-14,20,-14
H400000v40H845.2724
s-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7
c-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z
M834 80h400000v40h-400000z"></path></svg></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.1778em"><span></span></span></span></span></span></span></span></span><span style="top:-3.23em"><span class="pstrut" style="height:3em"></span><span class="frac-line" style="border-bottom-width:0.04em"></span></span><span style="top:-3.4461em"><span class="pstrut" style="height:3em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">Q</span><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.0715em">K</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.9191em"><span style="top:-2.931em;margin-right:0.0714em"><span class="pstrut" style="height:2.5em"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mathnormal mtight" style="margin-right:0.1389em">T</span></span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.538em"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span> <strong>(Scaling):</strong> Dot product을 수행하면 차원 수(<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>d</mi><mi>k</mi></msub></mrow><annotation encoding="application/x-tex">d_k</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8444em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0315em">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span>)가 클수록 결과값이 매우 커지는 경향이 있다. 값이 너무 커지면 다음 단계인 Softmax 함수에서 기울기(Gradient)가 0에 수렴하여 학습이 진행되지 않는 문제가 발생한다. 이를 방지하기 위해 점수를 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msqrt><msub><mi>d</mi><mi>k</mi></msub></msqrt></mrow><annotation encoding="application/x-tex">\sqrt{d_k}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.04em;vertical-align:-0.1828em"></span><span class="mord sqrt"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8572em"><span class="svg-align" style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord" style="padding-left:0.833em"><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0315em">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span><span style="top:-2.8172em"><span class="pstrut" style="height:3em"></span><span class="hide-tail" style="min-width:0.853em;height:1.08em"><svg xmlns="http://www.w3.org/2000/svg" width="400em" height="1.08em" viewBox="0 0 400000 1080" preserveAspectRatio="xMinYMin slice"><path d="M95,702
c-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14
c0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54
c44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10
s173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429
c69,-144,104.5,-217.7,106.5,-221
l0 -0
c5.3,-9.3,12,-14,20,-14
H400000v40H845.2724
s-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7
c-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z
M834 80h400000v40h-400000z"></path></svg></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.1828em"><span></span></span></span></span></span></span></span></span>로 나누어 값의 크기를 적절하게 조절(Scaling)한다.</p>
</li>
<li class="">
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>s</mi><mi>o</mi><mi>f</mi><mi>t</mi><mi>m</mi><mi>a</mi><mi>x</mi><mo stretchy="false">(</mo><mi mathvariant="normal">.</mi><mi mathvariant="normal">.</mi><mi mathvariant="normal">.</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">softmax(...)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal">so</span><span class="mord mathnormal" style="margin-right:0.1076em">f</span><span class="mord mathnormal">t</span><span class="mord mathnormal">ma</span><span class="mord mathnormal">x</span><span class="mopen">(</span><span class="mord">...</span><span class="mclose">)</span></span></span></span> <strong>(weight 확률화):</strong> Scaling 된 점수들을 Softmax 함수에 통과시킨다. 이 과정을 거치면 각 단어에 대한 점수가 0~1 사이의 확률값(weight)으로 변환된다. 예를 들어 "0.9"가 나오면 이 단어와 매우 강하게 연관되어 있다는 뜻이고, "0.01"이 나오면 거의 무시해도 좋다는 뜻이다.</p>
</li>
<li class="">
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo>×</mo><mi>V</mi></mrow><annotation encoding="application/x-tex">\times V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7667em;vertical-align:-0.0833em"></span><span class="mord">×</span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span> <strong>(정보의 결합):</strong> 계산된 Softmax weight를 실제 정보인 Value Matrix에 곱한다. 결과적으로 연관성이 높은 단어의 정보(Value)는 많이 가져오고, 연관성이 낮은 단어의 정보는 적게 가져와서 하나로 합치게 된다. 이 결과가 바로 Attention의 최종 출력값이 된다.</p>
</li>
</ol>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="32-multi-head-attention">3.2 Multi-Head Attention<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/attention-is-all-you-need#32-multi-head-attention" class="hash-link" aria-label="3.2 Multi-Head Attention에 대한 직접 링크" title="3.2 Multi-Head Attention에 대한 직접 링크" translate="no">​</a></h3>
<p>Transformer는 위의 단일 Attention을 한 번만 수행하지 않고, 차원을 여러 개로 쪼개어 여러 번의 Attention을 parallel하게 수행한다. 이를 Multi-Head Attention이라고 부른다.</p>
<!-- -->
<p>논문에서는 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>d</mi><mrow><mi>m</mi><mi>o</mi><mi>d</mi><mi>e</mi><mi>l</mi></mrow></msub><mo>=</mo><mn>512</mn></mrow><annotation encoding="application/x-tex">d_{model} = 512</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8444em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">m</span><span class="mord mathnormal mtight">o</span><span class="mord mathnormal mtight">d</span><span class="mord mathnormal mtight">e</span><span class="mord mathnormal mtight" style="margin-right:0.0197em">l</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.6444em"></span><span class="mord">512</span></span></span></span>차원을 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>h</mi><mo>=</mo><mn>8</mn></mrow><annotation encoding="application/x-tex">h = 8</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em"></span><span class="mord mathnormal">h</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.6444em"></span><span class="mord">8</span></span></span></span>개의 Head로 쪼갠다. 따라서 각 Head는 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>d</mi><mi>k</mi></msub><mo>=</mo><msub><mi>d</mi><mi>v</mi></msub><mo>=</mo><mn>512</mn><mi mathvariant="normal">/</mi><mn>8</mn><mo>=</mo><mn>64</mn></mrow><annotation encoding="application/x-tex">d_k = d_v = 512 / 8 = 64</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8444em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0315em">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.8444em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0359em">v</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord">512/8</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.6444em"></span><span class="mord">64</span></span></span></span> 차원의 벡터를 다루게 된다.</p>
<p><strong>왜 Multi Head(여러개)를 사용하는가?</strong></p>
<p>문장 내에서 단어들의 관계는 다각도로 해석될 수 있다.
예를 들어 "그가 강하게 공을 찼다"라는 문장에서 '찼다'라는 단어는 '그가'(주어, 누가 했는가?)와 연결될 수도 있고, '공을'(목적어, 무엇을 했는가?)과 연결될 수도 있다.
단일 Attention만 사용하면 여러 관계 중 평균적인 한 가지 관점만 보게 되지만, Head를 8개로 나누면 각각의 Head가 주어와의 관계, 목적어와의 관계, 시제와의 관계 등 서로 다른 다양한 문맥적 특징(Representation subspace)을 동시에 포착할 수 있다.</p>
<p>각각의 Head에서 계산된 8개의 결과 Matrix은 마지막에 하나로 이어 붙여진(Concatenated) 후, 선형 변환(Linear Projection) Matrix을 곱하여 최종 출력 Matrix이 된다.</p>
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="4-position-wise-feed-forward-network">4. Position-wise Feed-Forward Network<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/attention-is-all-you-need#4-position-wise-feed-forward-network" class="hash-link" aria-label="4. Position-wise Feed-Forward Network에 대한 직접 링크" title="4. Position-wise Feed-Forward Network에 대한 직접 링크" translate="no">​</a></h2>
<p>Attention Sub-layer를 통과한 데이터는 각 레이어마다 포함된 완전 연결 전방향 신경망(Fully Connected Feed-Forward Network, FFN)을 거치게 된다.</p>
<!-- -->
<p>"Position-wise"라는 의미는 문장을 구성하는 개별 단어 위치(Position)마다 동일한 Neural Network가 각각 독립적으로 적용된다는 뜻이다.</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>F</mi><mi>F</mi><mi>N</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><mi>max</mi><mo>⁡</mo><mo stretchy="false">(</mo><mn>0</mn><mo separator="true">,</mo><mi>x</mi><msub><mi>W</mi><mn>1</mn></msub><mo>+</mo><msub><mi>b</mi><mn>1</mn></msub><mo stretchy="false">)</mo><msub><mi>W</mi><mn>2</mn></msub><mo>+</mo><msub><mi>b</mi><mn>2</mn></msub></mrow><annotation encoding="application/x-tex">FFN(x) = \max(0, xW_1 + b_1)W_2 + b_2</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.1389em">F</span><span class="mord mathnormal" style="margin-right:0.1389em">F</span><span class="mord mathnormal" style="margin-right:0.109em">N</span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mop">max</span><span class="mopen">(</span><span class="mord">0</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal">x</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord"><span class="mord mathnormal">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mclose">)</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.8444em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span></span>
<ul>
<li class="">
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>x</mi></mrow><annotation encoding="application/x-tex">x</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em"></span><span class="mord mathnormal">x</span></span></span></span><strong>:</strong> Attention 층을 통과하여 들어온 입력 벡터이다. 차원은 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>d</mi><mrow><mi>m</mi><mi>o</mi><mi>d</mi><mi>e</mi><mi>l</mi></mrow></msub><mo>=</mo><mn>512</mn></mrow><annotation encoding="application/x-tex">d_{model} = 512</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8444em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">m</span><span class="mord mathnormal mtight">o</span><span class="mord mathnormal mtight">d</span><span class="mord mathnormal mtight">e</span><span class="mord mathnormal mtight" style="margin-right:0.0197em">l</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.6444em"></span><span class="mord">512</span></span></span></span>이다.</p>
</li>
<li class="">
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>W</mi><mn>1</mn></msub><mo separator="true">,</mo><msub><mi>b</mi><mn>1</mn></msub></mrow><annotation encoding="application/x-tex">W_1, b_1</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span><strong>:</strong> 첫 번째 선형 변환을 위한 weight(Weight) Matrix과 편향(Bias) 벡터이다.</p>
</li>
<li class="">
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>max</mi><mo>⁡</mo><mo stretchy="false">(</mo><mn>0</mn><mo separator="true">,</mo><mi mathvariant="normal">.</mi><mi mathvariant="normal">.</mi><mi mathvariant="normal">.</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">\max(0, ...)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mop">max</span><span class="mopen">(</span><span class="mord">0</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord">...</span><span class="mclose">)</span></span></span></span><strong>:</strong> ReLU(Rectified Linear Unit) 활성화 함수이다. 괄호 안의 계산 결과가 0보다 작으면 0으로 만들고, 0보다 크면 그 값을 그대로 유지한다. 비선형성을 부여하는 핵심 요소이다.</p>
</li>
<li class="">
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>W</mi><mn>2</mn></msub><mo separator="true">,</mo><msub><mi>b</mi><mn>2</mn></msub></mrow><annotation encoding="application/x-tex">W_2, b_2</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span><strong>:</strong> 두 번째 선형 변환을 위한 weight Matrix과 편향 벡터이다.</p>
</li>
</ul>
<p>이 신경망은 샌드위치 구조를 가진다.</p>
<ol>
<li class="">
<p><strong>차원 확장:</strong> 입력 벡터 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>x</mi></mrow><annotation encoding="application/x-tex">x</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em"></span><span class="mord mathnormal">x</span></span></span></span> (512차원)에 weight <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>W</mi><mn>1</mn></msub></mrow><annotation encoding="application/x-tex">W_1</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span>을 곱하여 차원을 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>d</mi><mrow><mi>f</mi><mi>f</mi></mrow></msub><mo>=</mo><mn>2048</mn></mrow><annotation encoding="application/x-tex">d_{ff} = 2048</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9805em;vertical-align:-0.2861em"></span><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.1076em">f</span><span class="mord mathnormal mtight" style="margin-right:0.1076em">f</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.6444em"></span><span class="mord">2048</span></span></span></span> 차원으로 크게 확장시킨다.</p>
</li>
<li class="">
<p><strong>활성화:</strong> 확장된 공간에서 ReLU 함수를 거치며 데이터의 비선형적 특징을 추출한다. 이 과정에서 불필요한 정보(음수 값)는 0으로 소거된다.</p>
</li>
<li class="">
<p><strong>차원 압축:</strong> 다시 weight <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>W</mi><mn>2</mn></msub></mrow><annotation encoding="application/x-tex">W_2</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span>를 곱하여 원래의 차원인 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>d</mi><mrow><mi>m</mi><mi>o</mi><mi>d</mi><mi>e</mi><mi>l</mi></mrow></msub><mo>=</mo><mn>512</mn></mrow><annotation encoding="application/x-tex">d_{model} = 512</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8444em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">m</span><span class="mord mathnormal mtight">o</span><span class="mord mathnormal mtight">d</span><span class="mord mathnormal mtight">e</span><span class="mord mathnormal mtight" style="margin-right:0.0197em">l</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.6444em"></span><span class="mord">512</span></span></span></span> 차원으로 압축하여 출력한다.</p>
</li>
</ol>
<p>Attention 이 단어들 사이의 '관계'를 수집하는 과정이라면, FFN 층은 수집된 정보를 바탕으로 각 단어 자체가 가진 '의미'를 더욱 복잡하고 풍부하게 가공하여 기억하는 역할을 담당한다. 모델 전체의 학습 파라미터(weight) 대부분이 바로 이 FFN의 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>W</mi><mn>1</mn></msub><mo separator="true">,</mo><msub><mi>W</mi><mn>2</mn></msub></mrow><annotation encoding="application/x-tex">W_1, W_2</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span> Matrix에 집중되어 있다.</p>
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="5-positional-encoding">5. Positional Encoding<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/attention-is-all-you-need#5-positional-encoding" class="hash-link" aria-label="5. Positional Encoding에 대한 직접 링크" title="5. Positional Encoding에 대한 직접 링크" translate="no">​</a></h2>
<p>Transformer는 RNN 구조를 버리고 Matrix 곱셈을 통한 parallel 처리를 택했다. 그러나 이로 인해 치명적인 단점이 생긴다. Attention 연산은 단어 집합을 마치 순서가 없는 '가방(Bag of words)'처럼 취급하기 때문에, "나는 밥을 먹는다"와 "밥을 나는 먹는다"를 수학적으로 동일하게 인식할 수 있다.</p>
<!-- -->
<p>이를 해결하기 위해 모델이 Sequence 내 단어의 상대적 또는 절대적 '위치(순서)' 정보를 알 수 있도록, 입력 단어의 Embedding 벡터에 위치 정보를 담은 벡터를 더해주는 과정을 <strong>Positional Encoding</strong>이라고 한다.</p>
<p>논문에서는 위치 정보를 생성하기 위해 다양한 주파수를 가진 사인(Sine) 및 코사인(Cosine) 함수를 사용한다.</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>P</mi><msub><mi>E</mi><mrow><mo stretchy="false">(</mo><mi>p</mi><mi>o</mi><mi>s</mi><mo separator="true">,</mo><mn>2</mn><mi>i</mi><mo stretchy="false">)</mo></mrow></msub><mo>=</mo><mi>sin</mi><mo>⁡</mo><mo stretchy="false">(</mo><mi>p</mi><mi>o</mi><mi>s</mi><mi mathvariant="normal">/</mi><msup><mn>10000</mn><mrow><mn>2</mn><mi>i</mi><mi mathvariant="normal">/</mi><msub><mi>d</mi><mrow><mi>m</mi><mi>o</mi><mi>d</mi><mi>e</mi><mi>l</mi></mrow></msub></mrow></msup><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">PE_{(pos, 2i)} = \sin(pos / 10000^{2i/d_{model}})</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.0385em;vertical-align:-0.3552em"></span><span class="mord mathnormal" style="margin-right:0.1389em">P</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.0576em">E</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3448em"><span style="top:-2.5198em;margin-left:-0.0576em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mopen mtight">(</span><span class="mord mathnormal mtight">p</span><span class="mord mathnormal mtight">os</span><span class="mpunct mtight">,</span><span class="mord mtight">2</span><span class="mord mathnormal mtight">i</span><span class="mclose mtight">)</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.3552em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1.188em;vertical-align:-0.25em"></span><span class="mop">sin</span><span class="mopen">(</span><span class="mord mathnormal">p</span><span class="mord mathnormal">os</span><span class="mord">/1000</span><span class="mord"><span class="mord">0</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.938em"><span style="top:-3.113em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">2</span><span class="mord mathnormal mtight">i</span><span class="mord mtight">/</span><span class="mord mtight"><span class="mord mathnormal mtight">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3448em"><span style="top:-2.3488em;margin-left:0em;margin-right:0.0714em"><span class="pstrut" style="height:2.5em"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">m</span><span class="mord mathnormal mtight">o</span><span class="mord mathnormal mtight">d</span><span class="mord mathnormal mtight">e</span><span class="mord mathnormal mtight" style="margin-right:0.0197em">l</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.1512em"><span></span></span></span></span></span></span></span></span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span></span>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>P</mi><msub><mi>E</mi><mrow><mo stretchy="false">(</mo><mi>p</mi><mi>o</mi><mi>s</mi><mo separator="true">,</mo><mn>2</mn><mi>i</mi><mo>+</mo><mn>1</mn><mo stretchy="false">)</mo></mrow></msub><mo>=</mo><mi>cos</mi><mo>⁡</mo><mo stretchy="false">(</mo><mi>p</mi><mi>o</mi><mi>s</mi><mi mathvariant="normal">/</mi><msup><mn>10000</mn><mrow><mn>2</mn><mi>i</mi><mi mathvariant="normal">/</mi><msub><mi>d</mi><mrow><mi>m</mi><mi>o</mi><mi>d</mi><mi>e</mi><mi>l</mi></mrow></msub></mrow></msup><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">PE_{(pos, 2i+1)} = \cos(pos / 10000^{2i/d_{model}})</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.0385em;vertical-align:-0.3552em"></span><span class="mord mathnormal" style="margin-right:0.1389em">P</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.0576em">E</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3448em"><span style="top:-2.5198em;margin-left:-0.0576em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mopen mtight">(</span><span class="mord mathnormal mtight">p</span><span class="mord mathnormal mtight">os</span><span class="mpunct mtight">,</span><span class="mord mtight">2</span><span class="mord mathnormal mtight">i</span><span class="mbin mtight">+</span><span class="mord mtight">1</span><span class="mclose mtight">)</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.3552em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1.188em;vertical-align:-0.25em"></span><span class="mop">cos</span><span class="mopen">(</span><span class="mord mathnormal">p</span><span class="mord mathnormal">os</span><span class="mord">/1000</span><span class="mord"><span class="mord">0</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.938em"><span style="top:-3.113em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">2</span><span class="mord mathnormal mtight">i</span><span class="mord mtight">/</span><span class="mord mtight"><span class="mord mathnormal mtight">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3448em"><span style="top:-2.3488em;margin-left:0em;margin-right:0.0714em"><span class="pstrut" style="height:2.5em"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">m</span><span class="mord mathnormal mtight">o</span><span class="mord mathnormal mtight">d</span><span class="mord mathnormal mtight">e</span><span class="mord mathnormal mtight" style="margin-right:0.0197em">l</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.1512em"><span></span></span></span></span></span></span></span></span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span></span>
<ul>
<li class="">
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>p</mi><mi>o</mi><mi>s</mi></mrow><annotation encoding="application/x-tex">pos</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em"></span><span class="mord mathnormal">p</span><span class="mord mathnormal">os</span></span></span></span><strong>:</strong> 문장 내에서 해당 단어의 위치(Position) 인덱스이다. (예: 첫 번째 단어는 0, 두 번째 단어는 1)</p>
</li>
<li class="">
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>i</mi></mrow><annotation encoding="application/x-tex">i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6595em"></span><span class="mord mathnormal">i</span></span></span></span><strong>:</strong> 차원(Dimension)의 인덱스이다. Embedding 벡터 내의 몇 번째 값인지를 나타낸다.<br>
<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>i</mi></mrow><annotation encoding="application/x-tex">i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6595em"></span><span class="mord mathnormal">i</span></span></span></span>의 범위는 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mn>0</mn></mrow><annotation encoding="application/x-tex">0</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6444em"></span><span class="mord">0</span></span></span></span>부터 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>d</mi><mrow><mi>m</mi><mi>o</mi><mi>d</mi><mi>e</mi><mi>l</mi></mrow></msub><mi mathvariant="normal">/</mi><mn>2</mn><mo>−</mo><mn>1</mn></mrow><annotation encoding="application/x-tex">d_{model}/2 - 1</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">m</span><span class="mord mathnormal mtight">o</span><span class="mord mathnormal mtight">d</span><span class="mord mathnormal mtight">e</span><span class="mord mathnormal mtight" style="margin-right:0.0197em">l</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mord">/2</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.6444em"></span><span class="mord">1</span></span></span></span>까지이며, 이를 통해 벡터의 짝수 인덱스(<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mn>2</mn><mi>i</mi></mrow><annotation encoding="application/x-tex">2i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6595em"></span><span class="mord">2</span><span class="mord mathnormal">i</span></span></span></span>)와 홀수 인덱스(<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mn>2</mn><mi>i</mi><mo>+</mo><mn>1</mn></mrow><annotation encoding="application/x-tex">2i+1</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7429em;vertical-align:-0.0833em"></span><span class="mord">2</span><span class="mord mathnormal">i</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.6444em"></span><span class="mord">1</span></span></span></span>)에 각각 다른 삼각함수를 짝지어 적용한다</p>
</li>
<li class="">
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mn>2</mn><mi>i</mi></msub><mo separator="true">,</mo><msub><mn>2</mn><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub></mrow><annotation encoding="application/x-tex">2_{i}, 2_{i+1}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8528em;vertical-align:-0.2083em"></span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord">2</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span><span class="mbin mtight">+</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2083em"><span></span></span></span></span></span></span></span></span></span><strong>:</strong> 벡터의 인덱스가 짝수(2i)일 때는 사인(sin) 함수를, 홀수(2i+1)일 때는 코사인(cos) 함수를 사용한다는 의미이다.</p>
</li>
<li class="">
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>d</mi><mrow><mi>m</mi><mi>o</mi><mi>d</mi><mi>e</mi><mi>l</mi></mrow></msub></mrow><annotation encoding="application/x-tex">d_{model}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8444em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">m</span><span class="mord mathnormal mtight">o</span><span class="mord mathnormal mtight">d</span><span class="mord mathnormal mtight">e</span><span class="mord mathnormal mtight" style="margin-right:0.0197em">l</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span><strong>:</strong> Embedding 벡터의 총 차원 수 (512)이다.</p>
</li>
<li class="">
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mn>10000</mn><mrow><mn>2</mn><mi>i</mi><mi mathvariant="normal">/</mi><msub><mi>d</mi><mrow><mi>m</mi><mi>o</mi><mi>d</mi><mi>e</mi><mi>l</mi></mrow></msub></mrow></msup></mrow><annotation encoding="application/x-tex">10000^{2i/d_{model}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.888em"></span><span class="mord">1000</span><span class="mord"><span class="mord">0</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.888em"><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">2</span><span class="mord mathnormal mtight">i</span><span class="mord mtight">/</span><span class="mord mtight"><span class="mord mathnormal mtight">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3448em"><span style="top:-2.3488em;margin-left:0em;margin-right:0.0714em"><span class="pstrut" style="height:2.5em"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">m</span><span class="mord mathnormal mtight">o</span><span class="mord mathnormal mtight">d</span><span class="mord mathnormal mtight">e</span><span class="mord mathnormal mtight" style="margin-right:0.0197em">l</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.1512em"><span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span><strong>:</strong> 주파수를 결정하는 분모 항목이다. 인덱스 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>i</mi></mrow><annotation encoding="application/x-tex">i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6595em"></span><span class="mord mathnormal">i</span></span></span></span>가 커질수록 분모가 커져 주파수가 매우 느리게 변하게 된다.</p>
</li>
</ul>
<p>이 공식을 사용하면 문장 내의 각 위치(pos)마다, 그리고 벡터의 각 차원(i)마다 고유한 패턴을 가지는 연속적인 실수 값이 생성된다. 삼각함수를 사용했기 때문에 위치 Vector의 값들은 -1에서 1 사이의 값으로 일정하게 파동을 그린다.</p>
<p>이렇게 수학적 규칙으로 생성된 512 dimension의 '위치 벡터'를, 데이터가 Encoder나 Decoder의 첫 번째 레이어에 들어가기 직전에 원래 단어의 'Embedding 벡터'에 단순 덧셈(+)해 준다. 결과적으로 모델은 학습을 진행하면서 단어의 고유한 의미뿐만 아니라, 이 삼각함수 파동 패턴을 역추적해서 "아, 이 단어는 문장의 앞부분에 있구나" 혹은 "저 단어는 바로 다음 위치에 있구나"라는 상대적인 순서(relative position)를 파악할 수 있게 된다.</p>]]></content:encoded>
            <category>논문</category>
            <category>transformer</category>
            <category>nlp</category>
            <category>딥러닝</category>
        </item>
        <item>
            <title><![CDATA[[논문] Gemma 3 4B 내부 처리 과정]]></title>
            <link>https://hwkim-dev.github.io/hwkim-dev/ko/blog/gemma-3n-e4b</link>
            <guid>https://hwkim-dev.github.io/hwkim-dev/ko/blog/gemma-3n-e4b</guid>
            <pubDate>Fri, 17 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Phase 1: 모델이 알아들을 수 있게 준비하기]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="phase-1-모델이-알아들을-수-있게-준비하기">Phase 1: 모델이 알아들을 수 있게 준비하기<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/gemma-3n-e4b#phase-1-%EB%AA%A8%EB%8D%B8%EC%9D%B4-%EC%95%8C%EC%95%84%EB%93%A4%EC%9D%84-%EC%88%98-%EC%9E%88%EA%B2%8C-%EC%A4%80%EB%B9%84%ED%95%98%EA%B8%B0" class="hash-link" aria-label="Phase 1: 모델이 알아들을 수 있게 준비하기에 대한 직접 링크" title="Phase 1: 모델이 알아들을 수 있게 준비하기에 대한 직접 링크" translate="no">​</a></h2>
<!-- -->
<p><strong>1단계: 토큰화 (Tokenization) - "단어 쪼개기"</strong>
우리가 "안녕하세요"라고 치면, AI는 한글을 못 읽어. 그래서 자기가 아는 숫자(ID) 번호표로 바꿔야 해.
Gemma의 단어 사전(Vocabulary, 약 25만 개)을 뒤져서 쪼개는 거지.</p>
<ul>
<li class="">"안녕" -&gt; 4512번</li>
<li class="">"하세요" -&gt; 8931번</li>
</ul>
<p>이런 식으로 숫자로 쪼개. 이제 입력은 <code>[4512, 8931]</code> 이라는 두 개의 숫자가 돼.</p>
<p>수학적 표현:
문자열 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>S</mi></mrow><annotation encoding="application/x-tex">S</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0576em">S</span></span></span></span> 를 토큰 시퀀스 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>X</mi><mo>=</mo><mo stretchy="false">{</mo><msub><mi>x</mi><mn>1</mn></msub><mo separator="true">,</mo><msub><mi>x</mi><mn>2</mn></msub><mo separator="true">,</mo><mo>…</mo><mo separator="true">,</mo><msub><mi>x</mi><mi>N</mi></msub><mo stretchy="false">}</mo></mrow><annotation encoding="application/x-tex">X = \{x_1, x_2, \dots, x_N\}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0785em">X</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mopen">{</span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="minner">…</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3283em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.109em">N</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mclose">}</span></span></span></span> 으로 매핑하는 함수 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>f</mi><mrow><mi>t</mi><mi>o</mi><mi>k</mi><mi>e</mi><mi>n</mi><mi>i</mi><mi>z</mi><mi>e</mi></mrow></msub></mrow><annotation encoding="application/x-tex">f_{tokenize}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1076em">f</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:-0.1076em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">t</span><span class="mord mathnormal mtight">o</span><span class="mord mathnormal mtight" style="margin-right:0.0315em">k</span><span class="mord mathnormal mtight">e</span><span class="mord mathnormal mtight">ni</span><span class="mord mathnormal mtight" style="margin-right:0.044em">z</span><span class="mord mathnormal mtight">e</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span> . 여기서 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>x</mi><mi>i</mi></msub><mo>∈</mo><mo stretchy="false">{</mo><mn>1</mn><mo separator="true">,</mo><mn>2</mn><mo separator="true">,</mo><mo>…</mo><mo separator="true">,</mo><mi>V</mi><mo stretchy="false">}</mo></mrow><annotation encoding="application/x-tex">x_i \in \{1, 2, \dots, V\}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6891em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">∈</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mopen">{</span><span class="mord">1</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord">2</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="minner">…</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span><span class="mclose">}</span></span></span></span> (단, <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi></mrow><annotation encoding="application/x-tex">V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span> 는 단어 사전의 크기, 약 256,000).</p>
<p><strong>2단계: 임베딩 (Embedding) - "숫자를 캐릭터 스탯창으로 만들기"</strong>
숫자만 있으면 의미를 모르잖아? 4512번이라는 숫자를 엄청나게 긴 <strong>숫자 배열(벡터)</strong> 로 바꿔줘.
마치 게임 캐릭터 스탯창(힘, 민첩, 지능...)을 만드는 거랑 같아. Gemma 3 4B 기준으로는 이 스탯창이 약 3072칸(차원) 정도 될 거야.</p>
<ul>
<li class="">"안녕" -&gt; <code>[0.1, -0.4, 0.8, ... (3072개)]</code></li>
<li class="">"하세요" -&gt; <code>[-0.2, 0.5, 0.1, ... (3072개)]</code></li>
</ul>
<p>이제 단어가 수학적인 공간에 자리를 잡은 거야.</p>
<p>수학적 표현:
임베딩 행렬 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>E</mi><mo>∈</mo><msup><mi mathvariant="double-struck">R</mi><mrow><mi>V</mi><mo>×</mo><msub><mi>d</mi><mrow><mi>m</mi><mi>o</mi><mi>d</mi><mi>e</mi><mi>l</mi></mrow></msub></mrow></msup></mrow><annotation encoding="application/x-tex">E \in \mathbb{R}^{V \times d_{model}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7224em;vertical-align:-0.0391em"></span><span class="mord mathnormal" style="margin-right:0.0576em">E</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">∈</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.8491em"></span><span class="mord"><span class="mord mathbb">R</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8491em"><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.2222em">V</span><span class="mbin mtight">×</span><span class="mord mtight"><span class="mord mathnormal mtight">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3448em"><span style="top:-2.3488em;margin-left:0em;margin-right:0.0714em"><span class="pstrut" style="height:2.5em"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">m</span><span class="mord mathnormal mtight">o</span><span class="mord mathnormal mtight">d</span><span class="mord mathnormal mtight">e</span><span class="mord mathnormal mtight" style="margin-right:0.0197em">l</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.1512em"><span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span> (단, <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>d</mi><mrow><mi>m</mi><mi>o</mi><mi>d</mi><mi>e</mi><mi>l</mi></mrow></msub><mo>=</mo><mn>3072</mn></mrow><annotation encoding="application/x-tex">d_{model} = 3072</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8444em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">m</span><span class="mord mathnormal mtight">o</span><span class="mord mathnormal mtight">d</span><span class="mord mathnormal mtight">e</span><span class="mord mathnormal mtight" style="margin-right:0.0197em">l</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.6444em"></span><span class="mord">3072</span></span></span></span> ).
토큰 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>x</mi><mi>i</mi></msub></mrow><annotation encoding="application/x-tex">x_i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5806em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span> 에 대한 임베딩 벡터 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi mathvariant="bold">e</mi><mi>i</mi></msub></mrow><annotation encoding="application/x-tex">\mathbf{e}_i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5944em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathbf">e</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span> 는 행렬 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>E</mi></mrow><annotation encoding="application/x-tex">E</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0576em">E</span></span></span></span> 에서 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>x</mi><mi>i</mi></msub></mrow><annotation encoding="application/x-tex">x_i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5806em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span> 번째 행을 가져오는 것과 같음. (또는 원-핫 벡터 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi mathvariant="bold">v</mi><msub><mi>x</mi><mi>i</mi></msub></msub></mrow><annotation encoding="application/x-tex">\mathbf{v}_{x_i}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6945em;vertical-align:-0.2501em"></span><span class="mord"><span class="mord mathbf" style="margin-right:0.016em">v</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em"><span style="top:-2.55em;margin-left:-0.016em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"><span class="mord mathnormal mtight">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3281em"><span style="top:-2.357em;margin-left:0em;margin-right:0.0714em"><span class="pstrut" style="height:2.5em"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.143em"><span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2501em"><span></span></span></span></span></span></span></span></span></span> 와의 행렬 곱: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi mathvariant="bold">e</mi><mi>i</mi></msub><mo>=</mo><msub><mi mathvariant="bold">v</mi><msub><mi>x</mi><mi>i</mi></msub></msub><mi>E</mi></mrow><annotation encoding="application/x-tex">\mathbf{e}_i = \mathbf{v}_{x_i} E</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5944em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathbf">e</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.9334em;vertical-align:-0.2501em"></span><span class="mord"><span class="mord mathbf" style="margin-right:0.016em">v</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em"><span style="top:-2.55em;margin-left:-0.016em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"><span class="mord mathnormal mtight">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3281em"><span style="top:-2.357em;margin-left:0em;margin-right:0.0714em"><span class="pstrut" style="height:2.5em"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.143em"><span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2501em"><span></span></span></span></span></span></span><span class="mord mathnormal" style="margin-right:0.0576em">E</span></span></span></span> ).
결과적으로 입력 시퀀스는 행렬 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="bold">X</mi><mo>∈</mo><msup><mi mathvariant="double-struck">R</mi><mrow><mi>N</mi><mo>×</mo><msub><mi>d</mi><mrow><mi>m</mi><mi>o</mi><mi>d</mi><mi>e</mi><mi>l</mi></mrow></msub></mrow></msup></mrow><annotation encoding="application/x-tex">\mathbf{X} \in \mathbb{R}^{N \times d_{model}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7252em;vertical-align:-0.0391em"></span><span class="mord mathbf">X</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">∈</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.8491em"></span><span class="mord"><span class="mord mathbb">R</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8491em"><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.109em">N</span><span class="mbin mtight">×</span><span class="mord mtight"><span class="mord mathnormal mtight">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3448em"><span style="top:-2.3488em;margin-left:0em;margin-right:0.0714em"><span class="pstrut" style="height:2.5em"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">m</span><span class="mord mathnormal mtight">o</span><span class="mord mathnormal mtight">d</span><span class="mord mathnormal mtight">e</span><span class="mord mathnormal mtight" style="margin-right:0.0197em">l</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.1512em"><span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span> 이 됨.</p>
<p><strong>3단계: 위치 정보 추가 (RoPE) - "순서표 달아주기"</strong>
단어 두 개가 들어왔는데, AI는 이게 "안녕 하세요"인지 "하세요 안녕"인지 순서를 몰라. 한꺼번에 처리하거든.
그래서 각 단어의 스탯창(벡터)을 수학적으로 살짝 회전(Rotation) 시켜줘.</p>
<ul>
<li class="">1번 자리 "안녕"은 10도 회전</li>
<li class="">2번 자리 "하세요"는 20도 회전</li>
</ul>
<p>이걸 <strong>RoPE(Rotary Position Embedding)</strong> 라고 해. 이제 AI는 단어의 순서를 알게 됐어.
1번 단어는 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mn>1</mn><mo>×</mo><mi>θ</mi></mrow><annotation encoding="application/x-tex">1 \times \theta</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7278em;vertical-align:-0.0833em"></span><span class="mord">1</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.6944em"></span><span class="mord mathnormal" style="margin-right:0.0278em">θ</span></span></span></span> 만큼 회전, 2번 단어는 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mn>2</mn><mo>×</mo><mi>θ</mi></mrow><annotation encoding="application/x-tex">2 \times \theta</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7278em;vertical-align:-0.0833em"></span><span class="mord">2</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.6944em"></span><span class="mord mathnormal" style="margin-right:0.0278em">θ</span></span></span></span> 만큼 회전... <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>n</mi></mrow><annotation encoding="application/x-tex">n</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em"></span><span class="mord mathnormal">n</span></span></span></span> 번 단어는 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>n</mi><mo>×</mo><mi>θ</mi></mrow><annotation encoding="application/x-tex">n \times \theta</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6667em;vertical-align:-0.0833em"></span><span class="mord mathnormal">n</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.6944em"></span><span class="mord mathnormal" style="margin-right:0.0278em">θ</span></span></span></span> 만큼 회전.</p>
<p>이때 사용하는 것이 그 유명한 회전 행렬(Rotation Matrix)이야.</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><msub><mi mathvariant="bold">R</mi><mrow><mi>n</mi><mi>θ</mi></mrow></msub><mo>=</mo><mrow><mo fence="true">[</mo><mtable rowspacing="0.16em" columnalign="center center" columnspacing="1em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow><mi>cos</mi><mo>⁡</mo><mo stretchy="false">(</mo><mi>n</mi><mi>θ</mi><mo stretchy="false">)</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow><mo>−</mo><mi>sin</mi><mo>⁡</mo><mo stretchy="false">(</mo><mi>n</mi><mi>θ</mi><mo stretchy="false">)</mo></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow><mi>sin</mi><mo>⁡</mo><mo stretchy="false">(</mo><mi>n</mi><mi>θ</mi><mo stretchy="false">)</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow><mi>cos</mi><mo>⁡</mo><mo stretchy="false">(</mo><mi>n</mi><mi>θ</mi><mo stretchy="false">)</mo></mrow></mstyle></mtd></mtr></mtable><mo fence="true">]</mo></mrow></mrow><annotation encoding="application/x-tex">\mathbf{R}_{n\theta} = \begin{bmatrix} \cos(n\theta) &amp; -\sin(n\theta) \\ \sin(n\theta) &amp; \cos(n\theta) \end{bmatrix}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8361em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathbf">R</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">n</span><span class="mord mathnormal mtight" style="margin-right:0.0278em">θ</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:2.4em;vertical-align:-0.95em"></span><span class="minner"><span class="mopen delimcenter" style="top:0em"><span class="delimsizing size3">[</span></span><span class="mord"><span class="mtable"><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.45em"><span style="top:-3.61em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mop">cos</span><span class="mopen">(</span><span class="mord mathnormal">n</span><span class="mord mathnormal" style="margin-right:0.0278em">θ</span><span class="mclose">)</span></span></span><span style="top:-2.41em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mop">sin</span><span class="mopen">(</span><span class="mord mathnormal">n</span><span class="mord mathnormal" style="margin-right:0.0278em">θ</span><span class="mclose">)</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.95em"><span></span></span></span></span></span><span class="arraycolsep" style="width:0.5em"></span><span class="arraycolsep" style="width:0.5em"></span><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.45em"><span style="top:-3.61em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord">−</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mop">sin</span><span class="mopen">(</span><span class="mord mathnormal">n</span><span class="mord mathnormal" style="margin-right:0.0278em">θ</span><span class="mclose">)</span></span></span><span style="top:-2.41em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mop">cos</span><span class="mopen">(</span><span class="mord mathnormal">n</span><span class="mord mathnormal" style="margin-right:0.0278em">θ</span><span class="mclose">)</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.95em"><span></span></span></span></span></span></span></span><span class="mclose delimcenter" style="top:0em"><span class="delimsizing size3">]</span></span></span></span></span></span></span>
<p>여기에 좌표 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false">(</mo><mi>x</mi><mo separator="true">,</mo><mi>y</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">(x,y)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.0359em">y</span><span class="mclose">)</span></span></span></span> 를 곱하면 새로운 위치로 이동하게 돼.
두 단어 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>m</mi></mrow><annotation encoding="application/x-tex">m</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em"></span><span class="mord mathnormal">m</span></span></span></span> 번과 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>n</mi></mrow><annotation encoding="application/x-tex">n</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em"></span><span class="mord mathnormal">n</span></span></span></span> 번의 벡터를 내적(곱하기)하면, 신기하게도 절대적인 위치값은 사라지고 두 단어 사이의 거리 차이인 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false">(</mo><mi>m</mi><mo>−</mo><mi>n</mi><mo stretchy="false">)</mo><mi>θ</mi></mrow><annotation encoding="application/x-tex">(m-n)\theta</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mopen">(</span><span class="mord mathnormal">m</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal">n</span><span class="mclose">)</span><span class="mord mathnormal" style="margin-right:0.0278em">θ</span></span></span></span> 에 대한 정보만 남아.</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mo stretchy="false">(</mo><msub><mi mathvariant="bold">R</mi><mrow><mi>m</mi><mi>θ</mi></mrow></msub><mi mathvariant="bold">q</mi><msup><mo stretchy="false">)</mo><mi>T</mi></msup><mo stretchy="false">(</mo><msub><mi mathvariant="bold">R</mi><mrow><mi>n</mi><mi>θ</mi></mrow></msub><mi mathvariant="bold">k</mi><mo stretchy="false">)</mo><mo>=</mo><msup><mi mathvariant="bold">q</mi><mi>T</mi></msup><msubsup><mi mathvariant="bold">R</mi><mrow><mi>m</mi><mi>θ</mi></mrow><mi>T</mi></msubsup><msub><mi mathvariant="bold">R</mi><mrow><mi>n</mi><mi>θ</mi></mrow></msub><mi mathvariant="bold">k</mi><mo>=</mo><msup><mi mathvariant="bold">q</mi><mi>T</mi></msup><msub><mi mathvariant="bold">R</mi><mrow><mo stretchy="false">(</mo><mi>m</mi><mo>−</mo><mi>n</mi><mo stretchy="false">)</mo><mi>θ</mi></mrow></msub><mi mathvariant="bold">k</mi></mrow><annotation encoding="application/x-tex">(\mathbf{R}_{m\theta} \mathbf{q})^T (\mathbf{R}_{n\theta} \mathbf{k}) = \mathbf{q}^T \mathbf{R}_{m\theta}^T \mathbf{R}_{n\theta} \mathbf{k} = \mathbf{q}^T \mathbf{R}_{(m-n)\theta} \mathbf{k}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.1413em;vertical-align:-0.25em"></span><span class="mopen">(</span><span class="mord"><span class="mord mathbf">R</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">m</span><span class="mord mathnormal mtight" style="margin-right:0.0278em">θ</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mord mathbf">q</span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8913em"><span style="top:-3.113em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.1389em">T</span></span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord"><span class="mord mathbf">R</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">n</span><span class="mord mathnormal mtight" style="margin-right:0.0278em">θ</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mord mathbf">k</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1.1383em;vertical-align:-0.247em"></span><span class="mord"><span class="mord mathbf">q</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8913em"><span style="top:-3.113em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.1389em">T</span></span></span></span></span></span></span></span><span class="mord"><span class="mord mathbf">R</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8913em"><span style="top:-2.453em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">m</span><span class="mord mathnormal mtight" style="margin-right:0.0278em">θ</span></span></span></span><span style="top:-3.113em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.1389em">T</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.247em"><span></span></span></span></span></span></span><span class="mord"><span class="mord mathbf">R</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">n</span><span class="mord mathnormal mtight" style="margin-right:0.0278em">θ</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mord mathbf">k</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1.2465em;vertical-align:-0.3552em"></span><span class="mord"><span class="mord mathbf">q</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8913em"><span style="top:-3.113em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.1389em">T</span></span></span></span></span></span></span></span><span class="mord"><span class="mord mathbf">R</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3448em"><span style="top:-2.5198em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mopen mtight">(</span><span class="mord mathnormal mtight">m</span><span class="mbin mtight">−</span><span class="mord mathnormal mtight">n</span><span class="mclose mtight">)</span><span class="mord mathnormal mtight" style="margin-right:0.0278em">θ</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.3552em"><span></span></span></span></span></span></span><span class="mord mathbf">k</span></span></span></span></span>
<ul>
<li class="">가까운 단어: 각도 차이가 작음 -&gt; 연관성 높게 측정됨</li>
<li class="">먼 단어: 각도 차이가 큼 -&gt; 연관성 낮게 측정됨</li>
</ul>
<p><strong>무한한 확장성</strong>: 번호표(Absolute) 방식은 학습 때 본 길이보다 길어지면 당황하지만, RoPE는 각도만 더 돌리면 되니 더 긴 문장(Context Window)을 읽는 데 유리해.
<strong>복소수(Complex Number) 활용</strong>: 실제 구현할 때는 Euler's formula ( <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mi>e</mi><mrow><mi>i</mi><mi>θ</mi></mrow></msup></mrow><annotation encoding="application/x-tex">e^{i\theta}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8491em"></span><span class="mord"><span class="mord mathnormal">e</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8491em"><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span><span class="mord mathnormal mtight" style="margin-right:0.0278em">θ</span></span></span></span></span></span></span></span></span></span></span></span> )를 이용해 복소수 평면에서 곱셈 한 번으로 회전을 끝내버려. 아주 빠르지.</p>
<hr>
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="phase-2-진짜-생각하기-transformer-block-40번-반복">Phase 2: 진짜 생각하기 (Transformer Block 40번 반복)<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/gemma-3n-e4b#phase-2-%EC%A7%84%EC%A7%9C-%EC%83%9D%EA%B0%81%ED%95%98%EA%B8%B0-transformer-block-40%EB%B2%88-%EB%B0%98%EB%B3%B5" class="hash-link" aria-label="Phase 2: 진짜 생각하기 (Transformer Block 40번 반복)에 대한 직접 링크" title="Phase 2: 진짜 생각하기 (Transformer Block 40번 반복)에 대한 직접 링크" translate="no">​</a></h2>
<!-- -->
<p>자, 이제 이 스탯창들이 Gemma의 '뇌'에 해당하는 Transformer Layer를 통과해. 이 층이 보통 40개 정도 겹쳐 있어. 한 층을 지날 때마다 아래 과정이 똑같이 반복돼.</p>
<p><strong>4단계: RMSNorm - "데이터 크기 진정시키기"</strong>
연산을 막 하다 보면 숫자가 너무 커지거나 작아져서 에러가 날 수 있어. 그래서 데이터를 깔끔하게 평균 근처로 꾹꾹 눌러 담아주는 정규화(Normalization) 과정을 거쳐.</p>
<blockquote>
<p><strong>AI 심사위원이 가수 오디션을 심사할 때</strong>
A가수는 성량이 커서 매우 크게 들림(값: 100)
B가수는 성량이 너무 작아서 모기 소리 수준임(값: 1)
이때 RMS Norm 투입 -&gt; 가수들이 내는 평균적인 에너지를 측정(제곱해서 루트를 씌운 ‘실효값’)
계산된 평균 에너지로 각 가수들의 성량(값)을 나눠버림
A가수는 값이(볼륨이) 줄고, B가수는 값이(볼륨이) 상대적으로 커짐
= 평준화 됨 = 가수들의 목소리가 비슷한 크기(표준적인 범위)로 들리게 됨.</p>
</blockquote>
<p>따라서 AI는 목소리 크기 상관없이 가창력에만 집중 가능해. (Layer Norm은 연산이 복잡하지만, RMS Norm은 분산 대신 제곱평균만 활용하여 빠르고 가벼운 연산이야).</p>
<p>수학적 표현:
입력 벡터 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="bold">x</mi><mo>∈</mo><msup><mi mathvariant="double-struck">R</mi><mi>d</mi></msup></mrow><annotation encoding="application/x-tex">\mathbf{x} \in \mathbb{R}^d</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5782em;vertical-align:-0.0391em"></span><span class="mord mathbf">x</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">∈</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.8491em"></span><span class="mord"><span class="mord mathbb">R</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8491em"><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">d</span></span></span></span></span></span></span></span></span></span></span> 에 대해,</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mtext>RMS</mtext><mo stretchy="false">(</mo><mi mathvariant="bold">x</mi><mo stretchy="false">)</mo><mo>=</mo><msqrt><mrow><mfrac><mn>1</mn><mi>d</mi></mfrac><munderover><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>d</mi></munderover><msubsup><mi>x</mi><mi>i</mi><mn>2</mn></msubsup><mo>+</mo><mi>ϵ</mi></mrow></msqrt></mrow><annotation encoding="application/x-tex">\text{RMS}(\mathbf{x}) = \sqrt{\frac{1}{d} \sum_{i=1}^{d} x_i^2 + \epsilon}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord text"><span class="mord">RMS</span></span><span class="mopen">(</span><span class="mord mathbf">x</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:3.3415em;vertical-align:-1.2777em"></span><span class="mord sqrt"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:2.0639em"><span class="svg-align" style="top:-5.3015em"><span class="pstrut" style="height:5.3015em"></span><span class="mord" style="padding-left:1.056em"><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.3214em"><span style="top:-2.314em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal">d</span></span></span><span style="top:-3.23em"><span class="pstrut" style="height:3em"></span><span class="frac-line" style="border-bottom-width:0.04em"></span></span><span style="top:-3.677em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.686em"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.1667em"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.8361em"><span style="top:-1.8723em;margin-left:0em"><span class="pstrut" style="height:3.05em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span><span class="mrel mtight">=</span><span class="mord mtight">1</span></span></span></span><span style="top:-3.05em"><span class="pstrut" style="height:3.05em"></span><span><span class="mop op-symbol large-op">∑</span></span></span><span style="top:-4.3em;margin-left:0em"><span class="pstrut" style="height:3.05em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">d</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.2777em"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.7959em"><span style="top:-2.4231em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span><span style="top:-3.0448em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2769em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mord mathnormal">ϵ</span></span></span><span style="top:-4.0239em"><span class="pstrut" style="height:5.3015em"></span><span class="hide-tail" style="min-width:0.742em;height:3.3815em"><svg xmlns="http://www.w3.org/2000/svg" width="400em" height="3.3815em" viewBox="0 0 400000 3381" preserveAspectRatio="xMinYMin slice"><path d="M702 80H40000040
H742v3247l-4 4-4 4c-.667.7 -2 1.5-4 2.5s-4.167 1.833-6.5 2.5-5.5 1-9.5 1
h-12l-28-84c-16.667-52-96.667 -294.333-240-727l-212 -643 -85 170
c-4-3.333-8.333-7.667-13 -13l-13-13l77-155 77-156c66 199.333 139 419.667
219 661 l218 661zM702 80H400000v40H742z"></path></svg></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.2777em"><span></span></span></span></span></span></span></span></span></span>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mover accent="true"><mi mathvariant="bold">x</mi><mo>ˉ</mo></mover><mo>=</mo><mfrac><mi mathvariant="bold">x</mi><mrow><mtext>RMS</mtext><mo stretchy="false">(</mo><mi mathvariant="bold">x</mi><mo stretchy="false">)</mo></mrow></mfrac><mo>⊙</mo><mi mathvariant="bold">γ</mi></mrow><annotation encoding="application/x-tex">\bar{\mathbf{x}} = \frac{\mathbf{x}}{\text{RMS}(\mathbf{x})} \odot \mathbf{\gamma}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5812em"></span><span class="mord accent"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.5812em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord mathbf">x</span></span><span style="top:-3.0134em"><span class="pstrut" style="height:3em"></span><span class="accent-body" style="left:-0.25em"><span class="mord">ˉ</span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:2.0574em;vertical-align:-0.936em"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.1214em"><span style="top:-2.314em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord text"><span class="mord">RMS</span></span><span class="mopen">(</span><span class="mord mathbf">x</span><span class="mclose">)</span></span></span><span style="top:-3.23em"><span class="pstrut" style="height:3em"></span><span class="frac-line" style="border-bottom-width:0.04em"></span></span><span style="top:-3.677em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathbf">x</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.936em"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">⊙</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em"></span><span class="mord mathnormal" style="margin-right:0.0556em">γ</span></span></span></span></span>
<p>(여기서 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>ϵ</mi></mrow><annotation encoding="application/x-tex">\epsilon</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em"></span><span class="mord mathnormal">ϵ</span></span></span></span> 은 0으로 나누는 것을 방지하는 아주 작은 수, <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>γ</mi></mrow><annotation encoding="application/x-tex">\gamma</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em"></span><span class="mord mathnormal" style="margin-right:0.0556em">γ</span></span></span></span> 는 학습 가능한 스케일링 파라미터, <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo>⊙</mo></mrow><annotation encoding="application/x-tex">\odot</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6667em;vertical-align:-0.0833em"></span><span class="mord">⊙</span></span></span></span> 은 요소별 곱셈(Element-wise multiplication)을 의미함.)</p>
<p><strong>5단계: Q, K, V 만들기 - "질문, 힌트, 정답지"</strong>
이제 각 단어("안녕", "하세요")가 3개의 분신을 만들어.</p>
<ul>
<li class=""><strong>Q (Query, 질문)</strong>: "내가 지금 누굴 찾아야 문맥이 맞지?"</li>
<li class=""><strong>K (Key, 힌트)</strong>: "나는 이런 특징을 가진 단어야!"</li>
<li class=""><strong>V (Value, 내용)</strong>: "나랑 연결되면 이 정보를 가져가!"</li>
</ul>
<p>수학적 표현:
정규화된 입력 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mover accent="true"><mi mathvariant="bold">X</mi><mo>ˉ</mo></mover></mrow><annotation encoding="application/x-tex">\bar{\mathbf{X}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8229em"></span><span class="mord accent"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8229em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord mathbf">X</span></span><span style="top:-3.2551em"><span class="pstrut" style="height:3em"></span><span class="accent-body" style="left:-0.25em"><span class="mord">ˉ</span></span></span></span></span></span></span></span></span></span> 에 가중치 행렬을 곱함.</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi mathvariant="bold">Q</mi><mo>=</mo><mover accent="true"><mi mathvariant="bold">X</mi><mo>ˉ</mo></mover><msub><mi mathvariant="bold">W</mi><mi>Q</mi></msub><mo separator="true">,</mo><mspace width="1em"></mspace><mi mathvariant="bold">K</mi><mo>=</mo><mover accent="true"><mi mathvariant="bold">X</mi><mo>ˉ</mo></mover><msub><mi mathvariant="bold">W</mi><mi>K</mi></msub><mo separator="true">,</mo><mspace width="1em"></mspace><mi mathvariant="bold">V</mi><mo>=</mo><mover accent="true"><mi mathvariant="bold">X</mi><mo>ˉ</mo></mover><msub><mi mathvariant="bold">W</mi><mi>V</mi></msub></mrow><annotation encoding="application/x-tex">\mathbf{Q} = \bar{\mathbf{X}} \mathbf{W}_Q, \quad \mathbf{K} = \bar{\mathbf{X}} \mathbf{W}_K, \quad \mathbf{V} = \bar{\mathbf{X}} \mathbf{W}_V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8805em;vertical-align:-0.1944em"></span><span class="mord mathbf">Q</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1.109em;vertical-align:-0.2861em"></span><span class="mord accent"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8229em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord mathbf">X</span></span><span style="top:-3.2551em"><span class="pstrut" style="height:3em"></span><span class="accent-body" style="left:-0.25em"><span class="mord">ˉ</span></span></span></span></span></span></span><span class="mord"><span class="mord mathbf" style="margin-right:0.016em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3283em"><span style="top:-2.55em;margin-left:-0.016em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">Q</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:1em"></span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathbf">K</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1.0173em;vertical-align:-0.1944em"></span><span class="mord accent"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8229em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord mathbf">X</span></span><span style="top:-3.2551em"><span class="pstrut" style="height:3em"></span><span class="accent-body" style="left:-0.25em"><span class="mord">ˉ</span></span></span></span></span></span></span><span class="mord"><span class="mord mathbf" style="margin-right:0.016em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3283em"><span style="top:-2.55em;margin-left:-0.016em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0715em">K</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:1em"></span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathbf" style="margin-right:0.016em">V</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.9729em;vertical-align:-0.15em"></span><span class="mord accent"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8229em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord mathbf">X</span></span><span style="top:-3.2551em"><span class="pstrut" style="height:3em"></span><span class="accent-body" style="left:-0.25em"><span class="mord">ˉ</span></span></span></span></span></span></span><span class="mord"><span class="mord mathbf" style="margin-right:0.016em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3283em"><span style="top:-2.55em;margin-left:-0.016em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.2222em">V</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span></span>
<p>(하드웨어에서는 이 부분이 거대한 Matrix Multiplication(GEMM) 엔진에서 처리됨.)</p>
<p><strong>6단계: GQA와 KV Cache 연산 (여기서 NPU가 피똥 쌈)</strong>
이 부분이 하드웨어 가속기(NPU) 설계할 때 가장 핵심인 부분이야.</p>
<p><strong>6-1) KV Cache (기억하기)</strong>: AI가 문장을 한 글자씩 생성할 때, 처음부터 다시 다 계산하면 비효율적이야.</p>
<ul>
<li class=""><strong>문제 상황</strong>: "안녕", "하", "세"까지 만들고 "요"를 만들 차례라고 해보자. 원래대로라면 앞의 단어들을 처음부터 다시 다 계산해서 Q, K, V를 만들어야 해.</li>
<li class=""><strong>해결책 (KV Cache)</strong>: "어차피 앞에 단어들은 안 변하잖아?" 이미 계산한 <strong>K(힌트)</strong> 와 <strong>V(내용)</strong> 를 <strong>메모리(VRAM)</strong> 에 딱 저장해두는 거지.</li>
</ul>
<p><strong>NPU가 피똥 싸는 이유</strong>:</p>
<ul>
<li class=""><strong>메모리 점유</strong>: 문장이 길어질수록 저장해야 할 K, V 값이 기하급수적으로 늘어나.</li>
<li class=""><strong>데이터 이동</strong>: 대용량의 캐시 데이터를 외부 메모리에서 NPU 코어로 계속 왔다 갔다 옮기는 과정에서 <strong>Memory Bound(병목 현상)</strong> 가 발생해.</li>
</ul>
<p>수학적 표현:
시점 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>t</mi></mrow><annotation encoding="application/x-tex">t</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6151em"></span><span class="mord mathnormal">t</span></span></span></span> 에서 새로 들어온 토큰의 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi mathvariant="bold">k</mi><mi>t</mi></msub><mo separator="true">,</mo><msub><mi mathvariant="bold">v</mi><mi>t</mi></msub></mrow><annotation encoding="application/x-tex">\mathbf{k}_t, \mathbf{v}_t</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em"></span><span class="mord"><span class="mord mathbf">k</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathbf" style="margin-right:0.016em">v</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:-0.016em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span> 를 기존 캐시에 결합(Concatenate).</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><msup><mi mathvariant="bold">K</mi><mrow><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo></mrow></msup><mo>=</mo><mo stretchy="false">[</mo><msup><mi mathvariant="bold">K</mi><mrow><mo stretchy="false">(</mo><mi>t</mi><mo>−</mo><mn>1</mn><mo stretchy="false">)</mo></mrow></msup><mo separator="true">,</mo><msub><mi mathvariant="bold">k</mi><mi>t</mi></msub><mo stretchy="false">]</mo><mo separator="true">,</mo><mspace width="1em"></mspace><msup><mi mathvariant="bold">V</mi><mrow><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo></mrow></msup><mo>=</mo><mo stretchy="false">[</mo><msup><mi mathvariant="bold">V</mi><mrow><mo stretchy="false">(</mo><mi>t</mi><mo>−</mo><mn>1</mn><mo stretchy="false">)</mo></mrow></msup><mo separator="true">,</mo><msub><mi mathvariant="bold">v</mi><mi>t</mi></msub><mo stretchy="false">]</mo></mrow><annotation encoding="application/x-tex">\mathbf{K}^{(t)} = [\mathbf{K}^{(t-1)}, \mathbf{k}_t], \quad \mathbf{V}^{(t)} = [\mathbf{V}^{(t-1)}, \mathbf{v}_t]</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.938em"></span><span class="mord"><span class="mord mathbf">K</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.938em"><span style="top:-3.113em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mopen mtight">(</span><span class="mord mathnormal mtight">t</span><span class="mclose mtight">)</span></span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1.188em;vertical-align:-0.25em"></span><span class="mopen">[</span><span class="mord"><span class="mord mathbf">K</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.938em"><span style="top:-3.113em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mopen mtight">(</span><span class="mord mathnormal mtight">t</span><span class="mbin mtight">−</span><span class="mord mtight">1</span><span class="mclose mtight">)</span></span></span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathbf">k</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mclose">]</span><span class="mpunct">,</span><span class="mspace" style="margin-right:1em"></span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathbf" style="margin-right:0.016em">V</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.938em"><span style="top:-3.113em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mopen mtight">(</span><span class="mord mathnormal mtight">t</span><span class="mclose mtight">)</span></span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1.188em;vertical-align:-0.25em"></span><span class="mopen">[</span><span class="mord"><span class="mord mathbf" style="margin-right:0.016em">V</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.938em"><span style="top:-3.113em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mopen mtight">(</span><span class="mord mathnormal mtight">t</span><span class="mbin mtight">−</span><span class="mord mtight">1</span><span class="mclose mtight">)</span></span></span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathbf" style="margin-right:0.016em">v</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:-0.016em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mclose">]</span></span></span></span></span>
<p><strong>6-2) GQA (그룹 지어서 찾기)</strong>:
KV Cache가 메모리를 너무 많이 잡아먹다 보니, 이를 해결하기 위해 등장한 천재적인 설계가 <strong>GQA(Grouped-Query Attention)</strong> 야.</p>
<ul>
<li class=""><strong>MHA (과거)</strong>: 질문자(Q), 힌트(K), 내용(V)을 1:1:1로 가짐. 메모리가 터져나감.</li>
<li class=""><strong>GQA (현재 - Gemma 3 등)</strong>: 질문자(Q)는 많지만, 힌트(K)와 내용(V)은 그룹을 지어 적게 만듦. (예: 4:1:1 대응)</li>
<li class=""><strong>효과</strong>: "너네 질문자 4명은 이 힌트(K)랑 내용(V) 하나를 같이 써!"라고 지정해서 메모리에 저장해야 할 양을 확 줄여줘. 데이터 이동량이 줄어드니 추론 속도가 비약적으로 빨라지지.</li>
</ul>
<p>"하세요"의 Q가 방금 저장된 "안녕"의 K를 훑어보고 연관성(Attention Score)을 계산해. 그리고 이 점수에 맞춰서 V를 섞어주면, "하세요"라는 벡터 안에 "안녕"이라는 문맥이 스며들게 돼.</p>
<p>수학적 표현 (Scaled Dot-Product Attention):
그룹 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>g</mi></mrow><annotation encoding="application/x-tex">g</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em"></span><span class="mord mathnormal" style="margin-right:0.0359em">g</span></span></span></span> 에 속한 쿼리 헤드 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi mathvariant="bold">Q</mi><mi>i</mi></msub></mrow><annotation encoding="application/x-tex">\mathbf{Q}_{i}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8805em;vertical-align:-0.1944em"></span><span class="mord"><span class="mord mathbf">Q</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span> 에 대해,</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><msub><mtext>Score</mtext><mi>i</mi></msub><mo>=</mo><mfrac><mrow><msub><mi mathvariant="bold">Q</mi><mi>i</mi></msub><mo stretchy="false">(</mo><msubsup><mi mathvariant="bold">K</mi><mi>g</mi><mrow><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo></mrow></msubsup><msup><mo stretchy="false">)</mo><mi>T</mi></msup></mrow><msqrt><msub><mi>d</mi><mi>k</mi></msub></msqrt></mfrac></mrow><annotation encoding="application/x-tex">\text{Score}_i = \frac{\mathbf{Q}_i (\mathbf{K}_g^{(t)})^T}{\sqrt{d_k}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord text"><span class="mord">Score</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:2.6518em;vertical-align:-0.93em"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.7218em"><span style="top:-2.2976em"><span class="pstrut" style="height:3.0448em"></span><span class="mord"><span class="mord sqrt"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8572em"><span class="svg-align" style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord" style="padding-left:0.833em"><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0315em">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span><span style="top:-2.8172em"><span class="pstrut" style="height:3em"></span><span class="hide-tail" style="min-width:0.853em;height:1.08em"><svg xmlns="http://www.w3.org/2000/svg" width="400em" height="1.08em" viewBox="0 0 400000 1080" preserveAspectRatio="xMinYMin slice"><path d="M95,702
c-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14
c0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54
c44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10
s173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429
c69,-144,104.5,-217.7,106.5,-221
l0 -0
c5.3,-9.3,12,-14,20,-14
H400000v40H845.2724
s-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7
c-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z
M834 80h400000v40h-400000z"></path></svg></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.1828em"><span></span></span></span></span></span></span></span><span style="top:-3.2748em"><span class="pstrut" style="height:3.0448em"></span><span class="frac-line" style="border-bottom-width:0.04em"></span></span><span style="top:-3.7218em"><span class="pstrut" style="height:3.0448em"></span><span class="mord"><span class="mord"><span class="mord mathbf">Q</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord"><span class="mord mathbf">K</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.0448em"><span style="top:-2.5834em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0359em">g</span></span></span><span style="top:-3.2198em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mopen mtight">(</span><span class="mord mathnormal mtight">t</span><span class="mclose mtight">)</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2527em"><span></span></span></span></span></span></span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413em"><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.1389em">T</span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.93em"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></span>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><msub><mtext>Attention</mtext><mi>i</mi></msub><mo>=</mo><mtext>Softmax</mtext><mo stretchy="false">(</mo><msub><mtext>Score</mtext><mi>i</mi></msub><mo>+</mo><mtext>Mask</mtext><mo stretchy="false">)</mo><msubsup><mi mathvariant="bold">V</mi><mi>g</mi><mrow><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo></mrow></msubsup></mrow><annotation encoding="application/x-tex">\text{Attention}_i = \text{Softmax}(\text{Score}_i + \text{Mask}) \mathbf{V}_g^{(t)}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord text"><span class="mord">Attention</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord text"><span class="mord">Softmax</span></span><span class="mopen">(</span><span class="mord"><span class="mord text"><span class="mord">Score</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:1.3211em;vertical-align:-0.3831em"></span><span class="mord text"><span class="mord">Mask</span></span><span class="mclose">)</span><span class="mord"><span class="mord mathbf" style="margin-right:0.016em">V</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.938em"><span style="top:-2.453em;margin-left:-0.016em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0359em">g</span></span></span><span style="top:-3.113em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mopen mtight">(</span><span class="mord mathnormal mtight">t</span><span class="mclose mtight">)</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.3831em"><span></span></span></span></span></span></span></span></span></span></span>
<p>(하드웨어 관점: 여기서 Softmax 연산이 지수 함수( <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mi>e</mi><mi>x</mi></msup></mrow><annotation encoding="application/x-tex">e^x</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6644em"></span><span class="mord"><span class="mord mathnormal">e</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.6644em"><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">x</span></span></span></span></span></span></span></span></span></span></span> )와 나눗셈을 포함하므로, NPU에서 LUT(Look-Up Table)나 Taylor 전개 같은 근사(Approximation) 하드웨어 로직이 필수적으로 들어감.)</p>
<p><strong>7단계: Residual Connection (Add) - "원본 까먹지 않기"</strong>
6단계에서 머리를 너무 굴리면 원래 단어의 본질을 잃어버릴 수 있어. 그래서 6단계의 결과물에 처음 들어왔던 3단계의 원본 데이터를 그대로 더해줘.</p>
<p>수학적 표현:</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><msub><mi mathvariant="bold">X</mi><mrow><mi>o</mi><mi>u</mi><mi>t</mi><mn>1</mn></mrow></msub><mo>=</mo><msub><mi mathvariant="bold">X</mi><mrow><mi>i</mi><mi>n</mi></mrow></msub><mo>+</mo><mtext>Attention</mtext><mo stretchy="false">(</mo><mtext>RMSNorm</mtext><mo stretchy="false">(</mo><msub><mi mathvariant="bold">X</mi><mrow><mi>i</mi><mi>n</mi></mrow></msub><mo stretchy="false">)</mo><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">\mathbf{X}_{out1} = \mathbf{X}_{in} + \text{Attention}(\text{RMSNorm}(\mathbf{X}_{in}))</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8361em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathbf">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">o</span><span class="mord mathnormal mtight">u</span><span class="mord mathnormal mtight">t</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.8361em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathbf">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">in</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord text"><span class="mord">Attention</span></span><span class="mopen">(</span><span class="mord text"><span class="mord">RMSNorm</span></span><span class="mopen">(</span><span class="mord"><span class="mord mathbf">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">in</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mclose">))</span></span></span></span></span>
<p>(하드웨어 관점: 행렬 덧셈. Element-wise 연산이므로 연산량 자체는 적지만 메모리에서 원본 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi mathvariant="bold">X</mi><mrow><mi>i</mi><mi>n</mi></mrow></msub></mrow><annotation encoding="application/x-tex">\mathbf{X}_{in}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8361em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathbf">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">in</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span> 을 유지해야 함.)</p>
<p><strong>8단계: MLP (다층 퍼셉트론) GeLU-Gate MLP - "의미 뻥튀기"</strong>
이제 문맥을 파악했으니, 이 정보를 바탕으로 더 깊은 의미를 추론해.
"아, '안녕하세요'는 사람이 만났을 때 하는 인사말이네! 그럼 다음엔 호응하는 말이 나와야겠다!"</p>
<p>데이터 차원을 엄청나게 크게 늘렸다가 다시 원래 크기로 쪼그라뜨려. 이 과정에서 정보 필터링을 위해 GeGLU (Gated Linear Unit) 연산을 사용하지.</p>
<p>수학적 표현 (GeGLU):
먼저 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi mathvariant="bold">X</mi><mrow><mi>o</mi><mi>u</mi><mi>t</mi><mn>1</mn></mrow></msub></mrow><annotation encoding="application/x-tex">\mathbf{X}_{out1}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8361em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathbf">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">o</span><span class="mord mathnormal mtight">u</span><span class="mord mathnormal mtight">t</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span> 을 다시 RMSNorm 처리한 후, 두 개의 선형 변환을 거침.</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><msub><mi mathvariant="bold">H</mi><mrow><mi>g</mi><mi>a</mi><mi>t</mi><mi>e</mi></mrow></msub><mo>=</mo><mtext>GELU</mtext><mo stretchy="false">(</mo><msub><mover accent="true"><mi mathvariant="bold">X</mi><mo>ˉ</mo></mover><mrow><mi>o</mi><mi>u</mi><mi>t</mi><mn>1</mn></mrow></msub><msub><mi mathvariant="bold">W</mi><mrow><mi>g</mi><mi>a</mi><mi>t</mi><mi>e</mi></mrow></msub><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">\mathbf{H}_{gate} = \text{GELU}(\bar{\mathbf{X}}_{out1} \mathbf{W}_{gate})</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9722em;vertical-align:-0.2861em"></span><span class="mord"><span class="mord mathbf">H</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.0359em">g</span><span class="mord mathnormal mtight">a</span><span class="mord mathnormal mtight">t</span><span class="mord mathnormal mtight">e</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1.109em;vertical-align:-0.2861em"></span><span class="mord text"><span class="mord">GELU</span></span><span class="mopen">(</span><span class="mord"><span class="mord accent"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8229em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord mathbf">X</span></span><span style="top:-3.2551em"><span class="pstrut" style="height:3em"></span><span class="accent-body" style="left:-0.25em"><span class="mord">ˉ</span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">o</span><span class="mord mathnormal mtight">u</span><span class="mord mathnormal mtight">t</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mord"><span class="mord mathbf" style="margin-right:0.016em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:-0.016em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.0359em">g</span><span class="mord mathnormal mtight">a</span><span class="mord mathnormal mtight">t</span><span class="mord mathnormal mtight">e</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em"><span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span></span>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><msub><mi mathvariant="bold">H</mi><mrow><mi>u</mi><mi>p</mi></mrow></msub><mo>=</mo><msub><mover accent="true"><mi mathvariant="bold">X</mi><mo>ˉ</mo></mover><mrow><mi>o</mi><mi>u</mi><mi>t</mi><mn>1</mn></mrow></msub><msub><mi mathvariant="bold">W</mi><mrow><mi>u</mi><mi>p</mi></mrow></msub></mrow><annotation encoding="application/x-tex">\mathbf{H}_{up} = \bar{\mathbf{X}}_{out1} \mathbf{W}_{up}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9722em;vertical-align:-0.2861em"></span><span class="mord"><span class="mord mathbf">H</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">u</span><span class="mord mathnormal mtight">p</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1.109em;vertical-align:-0.2861em"></span><span class="mord"><span class="mord accent"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8229em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord mathbf">X</span></span><span style="top:-3.2551em"><span class="pstrut" style="height:3em"></span><span class="accent-body" style="left:-0.25em"><span class="mord">ˉ</span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">o</span><span class="mord mathnormal mtight">u</span><span class="mord mathnormal mtight">t</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mord"><span class="mord mathbf" style="margin-right:0.016em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em"><span style="top:-2.55em;margin-left:-0.016em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">u</span><span class="mord mathnormal mtight">p</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em"><span></span></span></span></span></span></span></span></span></span></span>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><msub><mi mathvariant="bold">H</mi><mrow><mi>h</mi><mi>i</mi><mi>d</mi><mi>d</mi><mi>e</mi><mi>n</mi></mrow></msub><mo>=</mo><msub><mi mathvariant="bold">H</mi><mrow><mi>g</mi><mi>a</mi><mi>t</mi><mi>e</mi></mrow></msub><mo>⊙</mo><msub><mi mathvariant="bold">H</mi><mrow><mi>u</mi><mi>p</mi></mrow></msub></mrow><annotation encoding="application/x-tex">\mathbf{H}_{hidden} = \mathbf{H}_{gate} \odot \mathbf{H}_{up}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8361em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathbf">H</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">hi</span><span class="mord mathnormal mtight">dd</span><span class="mord mathnormal mtight">e</span><span class="mord mathnormal mtight">n</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.9722em;vertical-align:-0.2861em"></span><span class="mord"><span class="mord mathbf">H</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.0359em">g</span><span class="mord mathnormal mtight">a</span><span class="mord mathnormal mtight">t</span><span class="mord mathnormal mtight">e</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">⊙</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.9722em;vertical-align:-0.2861em"></span><span class="mord"><span class="mord mathbf">H</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">u</span><span class="mord mathnormal mtight">p</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em"><span></span></span></span></span></span></span></span></span></span></span>
<p>최종적으로 원래 차원으로 복구:</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><msub><mtext>MLP</mtext><mtext>out</mtext></msub><mo>=</mo><msub><mi mathvariant="bold">H</mi><mtext>hidden</mtext></msub><msub><mi mathvariant="bold">W</mi><mtext>down</mtext></msub></mrow><annotation encoding="application/x-tex">\text{MLP}_{\text{out}} = \mathbf{H}_{\text{hidden}} \mathbf{W}_{\text{down}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord text"><span class="mord">MLP</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord text mtight"><span class="mord mtight">out</span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.8361em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathbf">H</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord text mtight"><span class="mord mtight">hidden</span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mord"><span class="mord mathbf" style="margin-right:0.016em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:-0.016em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord text mtight"><span class="mord mtight">down</span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span></span>
<p>(하드웨어 관점: 가중치 행렬이 제일 큰 구간. Compute Bound가 심하게 발생하는 구간이므로, Systolic Array의 활용도를 극대화해야 하는 지점임.)</p>
<p><strong>9단계: Residual Connection (Add)</strong>
마찬가지로 8단계 결과물에 7단계까지의 원본을 한 번 더 더해줘.</p>
<p>수학적 표현:</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><msub><mi mathvariant="bold">X</mi><mrow><mi>o</mi><mi>u</mi><mi>t</mi><mn>2</mn></mrow></msub><mo>=</mo><msub><mi mathvariant="bold">X</mi><mrow><mi>o</mi><mi>u</mi><mi>t</mi><mn>1</mn></mrow></msub><mo>+</mo><msub><mtext>MLP</mtext><mtext>out</mtext></msub></mrow><annotation encoding="application/x-tex">\mathbf{X}_{out2} = \mathbf{X}_{out1} + \text{MLP}_{\text{out}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8361em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathbf">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">o</span><span class="mord mathnormal mtight">u</span><span class="mord mathnormal mtight">t</span><span class="mord mtight">2</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.8361em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathbf">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">o</span><span class="mord mathnormal mtight">u</span><span class="mord mathnormal mtight">t</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord text"><span class="mord">MLP</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord text mtight"><span class="mord mtight">out</span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span></span>
<p>(여기까지가 1개의 Layer야. 이 4~9단계를 약 40번 반복하면서 데이터가 점점 고도화돼.)</p>
<hr>
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="phase-3-대답-내놓기">Phase 3: 대답 내놓기<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/gemma-3n-e4b#phase-3-%EB%8C%80%EB%8B%B5-%EB%82%B4%EB%86%93%EA%B8%B0" class="hash-link" aria-label="Phase 3: 대답 내놓기에 대한 직접 링크" title="Phase 3: 대답 내놓기에 대한 직접 링크" translate="no">​</a></h2>
<!-- -->
<p><strong>10단계: 최종 RMSNorm</strong>
40번의 레이어를 뚫고 나온 최종 벡터를 마지막으로 깔끔하게 정돈해.</p>
<p>수학적 표현:</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><msub><mi mathvariant="bold">X</mi><mrow><mi>f</mi><mi>i</mi><mi>n</mi><mi>a</mi><mi>l</mi></mrow></msub><mo>=</mo><mtext>RMSNorm</mtext><mo stretchy="false">(</mo><msub><mi mathvariant="bold">X</mi><mrow><mi>l</mi><mi>a</mi><mi>y</mi><mi>e</mi><mi>r</mi><mn>40</mn></mrow></msub><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">\mathbf{X}_{final} = \text{RMSNorm}(\mathbf{X}_{layer40})</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9722em;vertical-align:-0.2861em"></span><span class="mord"><span class="mord mathbf">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.1076em">f</span><span class="mord mathnormal mtight">ina</span><span class="mord mathnormal mtight" style="margin-right:0.0197em">l</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1.0361em;vertical-align:-0.2861em"></span><span class="mord text"><span class="mord">RMSNorm</span></span><span class="mopen">(</span><span class="mord"><span class="mord mathbf">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.0197em">l</span><span class="mord mathnormal mtight">a</span><span class="mord mathnormal mtight" style="margin-right:0.0359em">y</span><span class="mord mathnormal mtight" style="margin-right:0.0278em">er</span><span class="mord mtight">40</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em"><span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span></span>
<p><strong>11단계: LM Head - "사전이랑 비교하기" (Output Projection)</strong>
이 압축된 최종 벡터를 25만 개의 단어 사전이랑 쫙 비교(행렬 곱셈)해. 다음에 올 단어로 뭐가 제일 어울릴지 점수(Logits)를 매기는 거지.</p>
<p>수학적 표현:</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mrow><mi mathvariant="bold">L</mi><mi mathvariant="bold">o</mi><mi mathvariant="bold">g</mi><mi mathvariant="bold">i</mi><mi mathvariant="bold">t</mi><mi mathvariant="bold">s</mi></mrow><mo>=</mo><msub><mi mathvariant="bold">X</mi><mrow><mi>f</mi><mi>i</mi><mi>n</mi><mi>a</mi><mi>l</mi></mrow></msub><msubsup><mi mathvariant="bold">W</mi><mrow><mi>v</mi><mi>o</mi><mi>c</mi><mi>a</mi><mi>b</mi></mrow><mi>T</mi></msubsup><mo>∈</mo><msup><mi mathvariant="double-struck">R</mi><mi>V</mi></msup></mrow><annotation encoding="application/x-tex">\mathbf{Logits} = \mathbf{X}_{final} \mathbf{W}_{vocab}^T \in \mathbb{R}^{V}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em"></span><span class="mord"><span class="mord mathbf">Logits</span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1.1774em;vertical-align:-0.2861em"></span><span class="mord"><span class="mord mathbf">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.1076em">f</span><span class="mord mathnormal mtight">ina</span><span class="mord mathnormal mtight" style="margin-right:0.0197em">l</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em"><span></span></span></span></span></span></span><span class="mord"><span class="mord mathbf" style="margin-right:0.016em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8913em"><span style="top:-2.453em;margin-left:-0.016em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.0359em">v</span><span class="mord mathnormal mtight">oc</span><span class="mord mathnormal mtight">ab</span></span></span></span><span style="top:-3.113em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.1389em">T</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.247em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">∈</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.8913em"></span><span class="mord"><span class="mord mathbb">R</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8913em"><span style="top:-3.113em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.2222em">V</span></span></span></span></span></span></span></span></span></span></span></span></span>
<p><strong>12단계: Softmax와 Sampling - "주사위 굴려서 단어 뽑기"</strong>
점수를 확률(0~100%)로 바꿔.</p>
<ul>
<li class="">"반갑습니다" -&gt; 85%</li>
<li class="">"네" -&gt; 10%</li>
<li class="">"누구세요" -&gt; 4%</li>
</ul>
<p>여기서 확률에 따라 "반" 이라는 글자(토큰)가 딱! 뽑히는 거야.</p>
<p>수학적 표현:
온도(Temperature) <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>T</mi></mrow><annotation encoding="application/x-tex">T</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.1389em">T</span></span></span></span> 를 적용한 Softmax 연산:</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>P</mi><mo stretchy="false">(</mo><msub><mi>x</mi><mi>i</mi></msub><mo stretchy="false">)</mo><mo>=</mo><mfrac><mrow><mi>exp</mi><mo>⁡</mo><mo stretchy="false">(</mo><msub><mtext>logit</mtext><mi>i</mi></msub><mi mathvariant="normal">/</mi><mi>T</mi><mo stretchy="false">)</mo></mrow><mrow><munderover><mo>∑</mo><mrow><mi>j</mi><mo>=</mo><mn>1</mn></mrow><mi>V</mi></munderover><mi>exp</mi><mo>⁡</mo><mo stretchy="false">(</mo><msub><mtext>logit</mtext><mi>j</mi></msub><mi mathvariant="normal">/</mi><mi>T</mi><mo stretchy="false">)</mo></mrow></mfrac></mrow><annotation encoding="application/x-tex">P(x_i) = \frac{\exp(\text{logit}_i / T)}{\sum_{j=1}^{V} \exp(\text{logit}_j / T)}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.1389em">P</span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:2.734em;vertical-align:-1.307em"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.427em"><span style="top:-2.1288em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:0em">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9812em"><span style="top:-2.4003em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.0572em">j</span><span class="mrel mtight">=</span><span class="mord mtight">1</span></span></span></span><span style="top:-3.2029em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.2222em">V</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.4358em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em"></span><span class="mop">exp</span><span class="mopen">(</span><span class="mord"><span class="mord text"><span class="mord">logit</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2175em"><span style="top:-2.4559em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0572em">j</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.3802em"><span></span></span></span></span></span></span><span class="mord">/</span><span class="mord mathnormal" style="margin-right:0.1389em">T</span><span class="mclose">)</span></span></span><span style="top:-3.23em"><span class="pstrut" style="height:3em"></span><span class="frac-line" style="border-bottom-width:0.04em"></span></span><span style="top:-3.677em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mop">exp</span><span class="mopen">(</span><span class="mord"><span class="mord text"><span class="mord">logit</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2175em"><span style="top:-2.4559em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2441em"><span></span></span></span></span></span></span><span class="mord">/</span><span class="mord mathnormal" style="margin-right:0.1389em">T</span><span class="mclose">)</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.307em"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></span>
<hr>
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="phase-4-무한-반복-autoregressive">Phase 4: 무한 반복 (Autoregressive)<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/gemma-3n-e4b#phase-4-%EB%AC%B4%ED%95%9C-%EB%B0%98%EB%B3%B5-autoregressive" class="hash-link" aria-label="Phase 4: 무한 반복 (Autoregressive)에 대한 직접 링크" title="Phase 4: 무한 반복 (Autoregressive)에 대한 직접 링크" translate="no">​</a></h2>
<p><strong>13단계: 꼬리 물기 (KV Cache의 진가)</strong>
대답이 끝난 게 아니야. 모델은 방금 자기가 뱉은 "반"을 다시 입력으로 집어넣어. (입력: "안녕하세요" + "반")</p>
<p>이때! "안녕하세요"는 아까 6단계에서 KV Cache에 저장해뒀지? 그래서 새로 들어온 "반"에 대한 Q, K, V만 계산해서 기존 캐시랑 비교하면 엄청 빠르게 다음 글자인 <strong>"갑"</strong> 을 뽑아낼 수 있어.</p>
<p>수학적 표현:</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>P</mi><mo stretchy="false">(</mo><mtext>"갑"</mtext><mi mathvariant="normal">∣</mi><mtext>"안녕",&nbsp;"하세요",&nbsp;"반"</mtext><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">P(\text{"갑"} | \text{"안녕", "하세요", "반"})</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.1389em">P</span><span class="mopen">(</span><span class="mord text"><span class="mord">"</span><span class="mord hangul_fallback">갑</span><span class="mord">"</span></span><span class="mord">∣</span><span class="mord text"><span class="mord">"</span><span class="mord hangul_fallback">안녕</span><span class="mord">",&nbsp;"</span><span class="mord hangul_fallback">하세요</span><span class="mord">",&nbsp;"</span><span class="mord hangul_fallback">반</span><span class="mord">"</span></span><span class="mclose">)</span></span></span></span></span>
<p>이때 연산 복잡도는 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>O</mi><mo stretchy="false">(</mo><mi>N</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">O(N)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.0278em">O</span><span class="mopen">(</span><span class="mord mathnormal" style="margin-right:0.109em">N</span><span class="mclose">)</span></span></span></span> 에서 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>O</mi><mo stretchy="false">(</mo><mn>1</mn><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">O(1)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.0278em">O</span><span class="mopen">(</span><span class="mord">1</span><span class="mclose">)</span></span></span></span> 수준으로 떨어져.</p>
<p><strong>14단계: 끝날 때까지 반복</strong>
이 과정을 계속 반복해.</p>
<ul>
<li class="">"안녕하세요 반 갑" -&gt; "습"</li>
<li class="">"안녕하세요 반갑 습" -&gt; "니다"</li>
<li class="">"안녕하세요 반갑습니다" -&gt; "."</li>
<li class="">"안녕하세요 반갑습니다." -&gt; <code>&lt;eos&gt;</code> (End of Sequence, 대화 끝 토큰)</li>
</ul>
<p><code>&lt;eos&gt;</code> 토큰이 뽑히는 순간, 모델은 출력을 딱 멈춰. 이게 챗봇의 전체 생성 과정이야.</p>]]></content:encoded>
            <category>논문</category>
            <category>gemma</category>
            <category>llm</category>
            <category>딥러닝</category>
        </item>
        <item>
            <title><![CDATA[[논문] GPT-1 핵심 정리]]></title>
            <link>https://hwkim-dev.github.io/hwkim-dev/ko/blog/gpt-1</link>
            <guid>https://hwkim-dev.github.io/hwkim-dev/ko/blog/gpt-1</guid>
            <pubDate>Fri, 17 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[이 문서는 GPT-1 논문의 architecture와 학습 과정을 수학적/정의와 직관적인 해설을 결합하여 정리한 노트이다.]]></description>
            <content:encoded><![CDATA[<p>이 문서는 GPT-1 논문의 architecture와 학습 과정을 수학적/정의와 직관적인 해설을 결합하여 정리한 노트이다.</p>
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="1-언어-모델의-핵심-기초-개념">1. 언어 모델의 핵심 기초 개념<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/gpt-1#1-%EC%96%B8%EC%96%B4-%EB%AA%A8%EB%8D%B8%EC%9D%98-%ED%95%B5%EC%8B%AC-%EA%B8%B0%EC%B4%88-%EA%B0%9C%EB%85%90" class="hash-link" aria-label="1. 언어 모델의 핵심 기초 개념에 대한 직접 링크" title="1. 언어 모델의 핵심 기초 개념에 대한 직접 링크" translate="no">​</a></h2>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="1-context-window">1) Context Window<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/gpt-1#1-context-window" class="hash-link" aria-label="1) Context Window에 대한 직접 링크" title="1) Context Window에 대한 직접 링크" translate="no">​</a></h3>
<ul>
<li class="">
<p><strong>정의</strong>: 모델이 한 번에 처리할 수 있는 <strong>단어(token)의 최대 개수</strong>, 즉 sequence의 길이 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>k</mi></mrow><annotation encoding="application/x-tex">k</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em"></span><span class="mord mathnormal" style="margin-right:0.0315em">k</span></span></span></span>를 의미한다. 트랜스포머의 Self-Attention 연산 복잡도는 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>O</mi><mo stretchy="false">(</mo><msup><mi>k</mi><mn>2</mn></msup><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">O(k^2)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.0641em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.0278em">O</span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.0315em">k</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141em"><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span>이다.</p>
</li>
<li class="">
<p><strong>직관적 해설</strong>:</p>
<ul>
<li class="">
<p><strong>장점</strong>: Context Window(<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>k</mi></mrow><annotation encoding="application/x-tex">k</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em"></span><span class="mord mathnormal" style="margin-right:0.0315em">k</span></span></span></span>값)가 커질수록 모델은 더 먼 과거의 단어들까지 기억할 수 있다. 힌트가 많아지니 문맥을 정교하게 파악하고 다음 단어를 예측하는 정확도가 상승한다.</p>
</li>
<li class="">
<p><strong>단점</strong>: 트랜스포머는 단어들끼리의 관계(Attention)를 모두 짝지어 계산해야 한다. 따라서 문맥 창이 10배 길어지면 연산량은 제곱인 100배로 폭증한다. 즉, <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>k</mi></mrow><annotation encoding="application/x-tex">k</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em"></span><span class="mord mathnormal" style="margin-right:0.0315em">k</span></span></span></span>의 증가는 하드웨어 메모리와 학습 비용의 한계와 직결되는 현실적인 장벽이다.</p>
</li>
</ul>
</li>
</ul>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="2-maximize-likelihood-최대-우도-추정">2) Maximize Likelihood (최대 우도 추정)<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/gpt-1#2-maximize-likelihood-%EC%B5%9C%EB%8C%80-%EC%9A%B0%EB%8F%84-%EC%B6%94%EC%A0%95" class="hash-link" aria-label="2) Maximize Likelihood (최대 우도 추정)에 대한 직접 링크" title="2) Maximize Likelihood (최대 우도 추정)에 대한 직접 링크" translate="no">​</a></h3>
<ul>
<li class="">
<p><strong>정의</strong>: 주어진 문맥 뒤에 등장할 실제 정답 단어가 나올 조건부 확률(Likelihood)을 극대화(Maximize)하도록 모델의 내부 parameter <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">Θ</mi></mrow><annotation encoding="application/x-tex">\Theta</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord">Θ</span></span></span></span>를 최적화하는 수학적 목적 함수다.</p>
</li>
<li class="">
<p><strong>직관적 해설</strong>: 쉽게 말해 언어 모델이 학습하는 가장 근본적인 '목표'다. 모델이 수많은 텍스트 데이터를 읽으면서 자기가 예측한 단어가 실제 텍스트에 적힌 단어와 일치하도록 끊임없이 내부 회로(parameter)를 조절하는 과정이다.</p>
</li>
</ul>
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="2-gpt의-뼈대-트랜스포머-디코더-transformer-decoder">2. GPT의 뼈대: 트랜스포머 디코더 (Transformer Decoder)<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/gpt-1#2-gpt%EC%9D%98-%EB%BC%88%EB%8C%80-%ED%8A%B8%EB%9E%9C%EC%8A%A4%ED%8F%AC%EB%A8%B8-%EB%94%94%EC%BD%94%EB%8D%94-transformer-decoder" class="hash-link" aria-label="2. GPT의 뼈대: 트랜스포머 디코더 (Transformer Decoder)에 대한 직접 링크" title="2. GPT의 뼈대: 트랜스포머 디코더 (Transformer Decoder)에 대한 직접 링크" translate="no">​</a></h2>
<p>원래 구글이 발표한 트랜스포머는 기계 번역을 위해 인코더(입력 파악)와 디코더(출력 생성)로 구성되었다. 하지만 GPT는 여기서 인코더를 과감히 버리고 <strong>디코더만을 12층으로 쌓아 올린 구조</strong>를 채택했다.</p>
<ul>
<li class=""><strong>왜 디코더만 썼을까?</strong>
GPT의 본질은 <strong>다음 단어 예측(Auto-regressive)</strong> 이기 때문이다. 디코더 내부에는 <strong>Masked Self-Attention</strong> 이라는 핵심 기능이 있다. 이는 모델이 현재 단어를 처리할 때 미래에 나올 단어들을 보지 못하게 Masking(가림 처리)하여 '커닝'을 막는다. 오직 과거부터 현재까지의 문맥만 보고 다음을 유추해야 하는 GPT의 철학과 완벽하게 맞아떨어지는 구조다.</li>
</ul>
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="3-gpt-1의-2단계-학습-파이프라인">3. GPT-1의 2단계 학습 파이프라인<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/gpt-1#3-gpt-1%EC%9D%98-2%EB%8B%A8%EA%B3%84-%ED%95%99%EC%8A%B5-%ED%8C%8C%EC%9D%B4%ED%94%84%EB%9D%BC%EC%9D%B8" class="hash-link" aria-label="3. GPT-1의 2단계 학습 파이프라인에 대한 직접 링크" title="3. GPT-1의 2단계 학습 파이프라인에 대한 직접 링크" translate="no">​</a></h2>
<!-- -->
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="1단계-unsupervised-pre-training-비지도-사전-학습">1단계: Unsupervised Pre-training (비지도 사전 학습)<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/gpt-1#1%EB%8B%A8%EA%B3%84-unsupervised-pre-training-%EB%B9%84%EC%A7%80%EB%8F%84-%EC%82%AC%EC%A0%84-%ED%95%99%EC%8A%B5" class="hash-link" aria-label="1단계: Unsupervised Pre-training (비지도 사전 학습)에 대한 직접 링크" title="1단계: Unsupervised Pre-training (비지도 사전 학습)에 대한 직접 링크" translate="no">​</a></h3>
<p>labeling되지 않은 대규모 텍스트 데이터를 통해 언어의 전반적인 패턴을 스스로 깨우치는 단계다.</p>
<ul>
<li class=""><strong>정의 (Objective Function)</strong>:
labeling되지 않은 대규모 Corpus(말뭉치) <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="script">U</mi><mo>=</mo><mo stretchy="false">{</mo><msub><mi>u</mi><mn>1</mn></msub><mo separator="true">,</mo><mo>…</mo><mo separator="true">,</mo><msub><mi>u</mi><mi>n</mi></msub><mo stretchy="false">}</mo></mrow><annotation encoding="application/x-tex">\mathcal{U} = \{u_1, \dots, u_n\}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathcal" style="margin-right:0.0993em">U</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mopen">{</span><span class="mord"><span class="mord mathnormal">u</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="minner">…</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal">u</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">n</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mclose">}</span></span></span></span>가 주어졌을 때, 다음의 Log-Likelihood를 최대화하도록 학습된다.</li>
</ul>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><msub><mi>L</mi><mn>1</mn></msub><mo stretchy="false">(</mo><mi mathvariant="script">U</mi><mo stretchy="false">)</mo><mo>=</mo><munder><mo>∑</mo><mi>i</mi></munder><mi>log</mi><mo>⁡</mo><mi>P</mi><mo stretchy="false">(</mo><msub><mi>u</mi><mi>i</mi></msub><mi mathvariant="normal">∣</mi><msub><mi>u</mi><mrow><mi>i</mi><mo>−</mo><mi>k</mi></mrow></msub><mo separator="true">,</mo><mo>…</mo><mo separator="true">,</mo><msub><mi>u</mi><mrow><mi>i</mi><mo>−</mo><mn>1</mn></mrow></msub><mo separator="true">;</mo><mi mathvariant="normal">Θ</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">L_1(\mathcal{U}) = \sum_i \log P(u_i | u_{i-k}, \dots, u_{i-1}; \Theta)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord"><span class="mord mathnormal">L</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathcal" style="margin-right:0.0993em">U</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:2.3277em;vertical-align:-1.2777em"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.05em"><span style="top:-1.8723em;margin-left:0em"><span class="pstrut" style="height:3.05em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span><span style="top:-3.05em"><span class="pstrut" style="height:3.05em"></span><span><span class="mop op-symbol large-op">∑</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.2777em"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em"></span><span class="mop">lo<span style="margin-right:0.0139em">g</span></span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.1389em">P</span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">u</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mord">∣</span><span class="mord"><span class="mord mathnormal">u</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span><span class="mbin mtight">−</span><span class="mord mathnormal mtight" style="margin-right:0.0315em">k</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2083em"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="minner">…</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal">u</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span><span class="mbin mtight">−</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2083em"><span></span></span></span></span></span></span><span class="mpunct">;</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord">Θ</span><span class="mclose">)</span></span></span></span></span>
<blockquote>
<p>모델(<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">Θ</mi></mrow><annotation encoding="application/x-tex">Θ</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord">Θ</span></span></span></span>)에게 이전 단어들(<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>u</mi><mrow><mi>i</mi><mo>−</mo><mi>k</mi></mrow></msub><mo separator="true">,</mo><mo>…</mo><mo separator="true">,</mo><msub><mi>u</mi><mrow><mi>i</mi><mo>−</mo><mn>1</mn></mrow></msub></mrow><annotation encoding="application/x-tex">u_{i-k} ,…,u_{i−1}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6389em;vertical-align:-0.2083em"></span><span class="mord"><span class="mord mathnormal">u</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span><span class="mbin mtight">−</span><span class="mord mathnormal mtight" style="margin-right:0.0315em">k</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2083em"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="minner">…</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal">u</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span><span class="mbin mtight">−</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2083em"><span></span></span></span></span></span></span></span></span></span>)을 보여주었을 때, 그 다음에 올 진짜 정답 단어(<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>u</mi><mi>i</mi></msub></mrow><annotation encoding="application/x-tex">u_i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5806em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">u</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span>)를 맞출 확률 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>P</mi><mo stretchy="false">(</mo><mo>⋯</mo><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">P(⋯)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.1389em">P</span><span class="mopen">(</span><span class="minner">⋯</span><span class="mclose">)</span></span></span></span> 을 계산하고, 이를 모든 텍스트 데이터에 대해 다 더한 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mo>∑</mo><mi>i</mi></msub></mrow><annotation encoding="application/x-tex">∑_i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.0497em;vertical-align:-0.2997em"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:0em">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.162em"><span style="top:-2.4003em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2997em"><span></span></span></span></span></span></span></span></span></span>​ 값 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>L</mi><mn>1</mn></msub><mo stretchy="false">(</mo><mi mathvariant="script">U</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">L_1(\mathcal{U})</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord"><span class="mord mathnormal">L</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathcal" style="margin-right:0.0993em">U</span><span class="mclose">)</span></span></span></span></p>
</blockquote>
<ul>
<li class="">
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>L</mi><mn>1</mn></msub><mo stretchy="false">(</mo><mi mathvariant="script">U</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">L_1(\mathcal{U})</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord"><span class="mord mathnormal">L</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathcal" style="margin-right:0.0993em">U</span><span class="mclose">)</span></span></span></span>  :</p>
<ul>
<li class="">목적 함수(Objective Function)를 의미합니다.<br>
<!-- -->여기서 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="script">U</mi></mrow><annotation encoding="application/x-tex">\mathcal{U}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathcal" style="margin-right:0.0993em">U</span></span></span></span>는 학습 데이터로 사용되는 라벨링되지 않은 거대한 텍스트 Corpus(말뭉치)입니다.<br>
<!-- -->즉, "데이터 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="script">U</mi></mrow><annotation encoding="application/x-tex">\mathcal{U}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathcal" style="margin-right:0.0993em">U</span></span></span></span>를 모델이 얼마나 잘 이해(예측)하고 있는가"를 점수로 나타낸 것입니다.</li>
</ul>
</li>
<li class="">
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mo>∑</mo><mi>i</mi></msub></mrow><annotation encoding="application/x-tex">∑_i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.0497em;vertical-align:-0.2997em"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:0em">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.162em"><span style="top:-2.4003em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2997em"><span></span></span></span></span></span></span></span></span></span>​ :</p>
<ul>
<li class="">문장(데이터) 속에 있는 모든 단어(토큰)들의 순서 ii에 대해 아래의 확률 값을 전부 더하라는 뜻입니다.</li>
</ul>
</li>
<li class="">
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>l</mi><mi>o</mi><mi>g</mi></mrow><annotation encoding="application/x-tex">log</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em"></span><span class="mord mathnormal" style="margin-right:0.0197em">l</span><span class="mord mathnormal">o</span><span class="mord mathnormal" style="margin-right:0.0359em">g</span></span></span></span>:</p>
<ul>
<li class="">로그 함수입니다. 확률값은 0과 1 사이의 소수인데, 여러 단어의 확률을 계속 곱하면 숫자가 0에 수렴해버리는 문제(언더플로우)가 생깁니다. 로그를 씌우면 곱셈이 덧셈(∑∑)으로 바뀌어 컴퓨터가 계산하기 매우 좋아집니다.</li>
</ul>
</li>
<li class="">
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>P</mi><mo stretchy="false">(</mo><mo>⋯</mo><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">P(⋯)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.1389em">P</span><span class="mopen">(</span><span class="minner">⋯</span><span class="mclose">)</span></span></span></span>:</p>
<ul>
<li class="">확률(Probability)입니다.(<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>P</mi></mrow><annotation encoding="application/x-tex">P</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.1389em">P</span></span></span></span>=parameter <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">Θ</mi></mrow><annotation encoding="application/x-tex">\Theta</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord">Θ</span></span></span></span>를 가진 Transformer Decoder에 의해 계산된 조건부 확률)</li>
</ul>
</li>
<li class="">
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>u</mi><mi>i</mi></msub></mrow><annotation encoding="application/x-tex">u_i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5806em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">u</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span>​ :</p>
<ul>
<li class="">모델이 맞춰야 할 <strong>'현재(다음) 단어'</strong></li>
</ul>
</li>
<li class="">
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>u</mi><mrow><mi>i</mi><mo>−</mo><mi>k</mi></mrow></msub><mo separator="true">,</mo><mo>…</mo><mo separator="true">,</mo><msub><mi>u</mi><mrow><mi>i</mi><mo>−</mo><mn>1</mn></mrow></msub></mrow><annotation encoding="application/x-tex">u_{i-k} ,…,u_{i−1}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6389em;vertical-align:-0.2083em"></span><span class="mord"><span class="mord mathnormal">u</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span><span class="mbin mtight">−</span><span class="mord mathnormal mtight" style="margin-right:0.0315em">k</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2083em"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="minner">…</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal">u</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span><span class="mbin mtight">−</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2083em"><span></span></span></span></span></span></span></span></span></span>:</p>
<ul>
<li class=""><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>u</mi><mi>i</mi></msub></mrow><annotation encoding="application/x-tex">u_i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5806em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">u</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span> 이전에 등장한 단어들입니다. <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>k</mi></mrow><annotation encoding="application/x-tex">k</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em"></span><span class="mord mathnormal" style="margin-right:0.0315em">k</span></span></span></span>는 모델이 한 번에 볼 수 있는 문맥의 길이(Context Window Size)를 뜻합니다. 즉, **'이전까지의 문맥'**입니다.</li>
</ul>
</li>
<li class="">
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">Θ</mi></mrow><annotation encoding="application/x-tex">Θ</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord">Θ</span></span></span></span> (세타):</p>
<ul>
<li class="">우리가 학습시키고자 하는 **인공지능 모델의 파라미터(가중치)**입니다.</li>
</ul>
</li>
</ul>
<hr>
<ul>
<li class="">
<p><strong>직관적 해설</strong>:</p>
<ul>
<li class="">
<p><strong>방식</strong>: 인터넷에 널려 있는 거대한 텍스트(뉴스, 책, 위키 등)를 순서대로 읽으며 빈칸(다음 단어)을 맞추게 한다.<br>
<!-- -->( * <em>실제로 GPT-1 모델이 학습한 메인 말뭉치는 7,000여 권의 미출판 도서 데이터인 'BooksCorpus' 입니다. 책 데이터 특성상 긴 문맥(Long-range dependency)을 학습하는 데 큰 도움이 되었다함</em>)</p>
</li>
<li class="">
<p><strong>비지도 학습인 이유</strong>: 사람이 일일이 정답표(labeling)를 달아줄 필요가 없다. "대한민국의 수도는 [서울]이다"라는 문장 자체가 문제이자 정답이기 때문이다.</p>
</li>
<li class="">
<p><strong>결과</strong>: 이 거대하고 단순한 '다음 단어 맞추기 게임'을 통해, 모델은 스스로 문법, 세상의 상식, 문맥의 논리를 통째로 학습하게 된다.</p>
</li>
</ul>
</li>
</ul>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="2단계-supervised-fine-tuning-지도-미세-조정">2단계: Supervised Fine-tuning (지도 미세 조정)<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/gpt-1#2%EB%8B%A8%EA%B3%84-supervised-fine-tuning-%EC%A7%80%EB%8F%84-%EB%AF%B8%EC%84%B8-%EC%A1%B0%EC%A0%95" class="hash-link" aria-label="2단계: Supervised Fine-tuning (지도 미세 조정)에 대한 직접 링크" title="2단계: Supervised Fine-tuning (지도 미세 조정)에 대한 직접 링크" translate="no">​</a></h3>
<p>사전 학습이 완료된 후, 우리가 진짜 풀고 싶은 특정 문제(감정 분석, 객관식 등)에 맞춰 모델을 튜닝하는 단계다. 정답이 있는 데이터를 사용하므로 지도 학습이 된다.</p>
<ul>
<li class=""><strong>정의 (Objective Function)</strong>:
labeling된 dataset <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="script">C</mi></mrow><annotation encoding="application/x-tex">\mathcal{C}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathcal" style="margin-right:0.0583em">C</span></span></span></span>의 입력 sequence <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mi>x</mi><mn>1</mn></msup><mo separator="true">,</mo><mo>…</mo><mo separator="true">,</mo><msup><mi>x</mi><mi>m</mi></msup></mrow><annotation encoding="application/x-tex">x^1, \dots, x^m</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.0085em;vertical-align:-0.1944em"></span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141em"><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="minner">…</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.6644em"><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">m</span></span></span></span></span></span></span></span></span></span></span>과 라벨 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>y</mi></mrow><annotation encoding="application/x-tex">y</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em"></span><span class="mord mathnormal" style="margin-right:0.0359em">y</span></span></span></span>가 주어질 때의 예측 확률과 목적 함수는 다음과 같다.</li>
</ul>
<hr>
<h3>label(정답) 예측 확률</h3>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>P</mi><mo stretchy="false">(</mo><mi>y</mi><mi mathvariant="normal">∣</mi><msup><mi>x</mi><mn>1</mn></msup><mo separator="true">,</mo><mo>…</mo><mo separator="true">,</mo><msup><mi>x</mi><mi>m</mi></msup><mo stretchy="false">)</mo><mo>=</mo><mtext>softmax</mtext><mo stretchy="false">(</mo><msubsup><mi>h</mi><mi>l</mi><mi>m</mi></msubsup><msub><mi>W</mi><mi>y</mi></msub><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">P(y | x^1, \dots, x^m) = \text{softmax}(h_l^m W_y)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.1141em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.1389em">P</span><span class="mopen">(</span><span class="mord mathnormal" style="margin-right:0.0359em">y</span><span class="mord">∣</span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8641em"><span style="top:-3.113em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="minner">…</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7144em"><span style="top:-3.113em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">m</span></span></span></span></span></span></span></span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1.0361em;vertical-align:-0.2861em"></span><span class="mord text"><span class="mord">softmax</span></span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">h</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.7144em"><span style="top:-2.453em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0197em">l</span></span></span><span style="top:-3.113em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">m</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.247em"><span></span></span></span></span></span></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0359em">y</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em"><span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span></span>
<ul>
<li class="">
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mi>x</mi><mn>1</mn></msup><mo separator="true">,</mo><mo>…</mo><mo separator="true">,</mo><msup><mi>x</mi><mi>m</mi></msup></mrow><annotation encoding="application/x-tex">x^1, \dots, x^m</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.0085em;vertical-align:-0.1944em"></span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141em"><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="minner">…</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.6644em"><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">m</span></span></span></span></span></span></span></span></span></span></span> :</p>
<ul>
<li class="">입력된 문장(데이터)입니다. <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>m</mi></mrow><annotation encoding="application/x-tex">m</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em"></span><span class="mord mathnormal">m</span></span></span></span>개의 단어(토큰)로 이루어져 있습니다. (예: "이 영화 너무 재밌다")</li>
</ul>
</li>
<li class="">
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>y</mi></mrow><annotation encoding="application/x-tex">y</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em"></span><span class="mord mathnormal" style="margin-right:0.0359em">y</span></span></span></span>:</p>
<ul>
<li class="">우리가 예측해야 할 정답 라벨입니다. (예: 긍정(Positive) 또는 부정(Negative))</li>
</ul>
</li>
<li class="">
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mi>h</mi><mi>m</mi></msup></mrow><annotation encoding="application/x-tex">h^m</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em"></span><span class="mord"><span class="mord mathnormal">h</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.6644em"><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">m</span></span></span></span></span></span></span></span></span></span></span>​ :</p>
<ul>
<li class="">사전 학습된 트랜스포머(Transformer) 모델의 제일 마지막 레이어(<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>l</mi></mrow><annotation encoding="application/x-tex">l</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em"></span><span class="mord mathnormal" style="margin-right:0.0197em">l</span></span></span></span>)에서, 맨 마지막 단어(<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>m</mi></mrow><annotation encoding="application/x-tex">m</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em"></span><span class="mord mathnormal">m</span></span></span></span>)를 처리하고 나온 **최종 출력값(Hidden state)**입니다. 모델이 문장 전체를 처음부터 끝까지 읽고 요약해 낸 **'문장의 핵심 의미'**라고 보시면 됩니다.</li>
</ul>
</li>
<li class="">
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>W</mi><mi>y</mi></msub></mrow><annotation encoding="application/x-tex">W_y</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9694em;vertical-align:-0.2861em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0359em">y</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em"><span></span></span></span></span></span></span></span></span></span>​ :</p>
<ul>
<li class="">특정 임무(분류)를 수행하기 위해 새로 추가한 선형 계층(Linear Layer)의 가중치입니다. 모델의 요약본 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false">(</mo><msubsup><mi>h</mi><mi>l</mi><mi>m</mi></msubsup><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">(h_{l}^m)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.0331em;vertical-align:-0.2831em"></span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">h</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.6644em"><span style="top:-2.4169em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.0197em">l</span></span></span></span><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">m</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2831em"><span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span> 을 받아서 정답 라벨의 개수만큼 점수를 변환해 줍니다.</li>
</ul>
</li>
<li class="">
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>s</mi><mi>o</mi><mi>f</mi><mi>t</mi><mi>m</mi><mi>a</mi><mi>x</mi></mrow><annotation encoding="application/x-tex">softmax</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em"></span><span class="mord mathnormal">so</span><span class="mord mathnormal" style="margin-right:0.1076em">f</span><span class="mord mathnormal">t</span><span class="mord mathnormal">ma</span><span class="mord mathnormal">x</span></span></span></span>:</p>
<ul>
<li class="">소프트맥스 함수입니다. <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>W</mi><mi>y</mi></mrow><annotation encoding="application/x-tex">Wy</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="mord mathnormal" style="margin-right:0.0359em">y</span></span></span></span>​ 를 통해 나온 단순한 점수들을 총합이 1(100%)이 되는 확률값으로 예쁘게 바꿔줍니다. (예: 긍정일 확률 0.9, 부정일 확률 0.1)</li>
</ul>
</li>
</ul>
<hr>
<h3>미세 조정(Fine-Tuning) 목적 함수</h3>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><msub><mi>L</mi><mn>2</mn></msub><mo stretchy="false">(</mo><mi mathvariant="script">C</mi><mo stretchy="false">)</mo><mo>=</mo><munder><mo>∑</mo><mrow><mo stretchy="false">(</mo><mi>x</mi><mo separator="true">,</mo><mi>y</mi><mo stretchy="false">)</mo></mrow></munder><mi>log</mi><mo>⁡</mo><mi>P</mi><mo stretchy="false">(</mo><mi>y</mi><mi mathvariant="normal">∣</mi><msup><mi>x</mi><mn>1</mn></msup><mo separator="true">,</mo><mo>…</mo><mo separator="true">,</mo><msup><mi>x</mi><mi>m</mi></msup><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">L_2(\mathcal{C}) = \sum_{(x,y)} \log P(y | x^1, \dots, x^m)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord"><span class="mord mathnormal">L</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathcal" style="margin-right:0.0583em">C</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:2.566em;vertical-align:-1.516em"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.05em"><span style="top:-1.809em;margin-left:0em"><span class="pstrut" style="height:3.05em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mopen mtight">(</span><span class="mord mathnormal mtight">x</span><span class="mpunct mtight">,</span><span class="mord mathnormal mtight" style="margin-right:0.0359em">y</span><span class="mclose mtight">)</span></span></span></span><span style="top:-3.05em"><span class="pstrut" style="height:3.05em"></span><span><span class="mop op-symbol large-op">∑</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.516em"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em"></span><span class="mop">lo<span style="margin-right:0.0139em">g</span></span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.1389em">P</span><span class="mopen">(</span><span class="mord mathnormal" style="margin-right:0.0359em">y</span><span class="mord">∣</span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8641em"><span style="top:-3.113em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="minner">…</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7144em"><span style="top:-3.113em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">m</span></span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span></span>
<ul>
<li class=""><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>L</mi><mn>2</mn></msub><mo stretchy="false">(</mo><mi mathvariant="script">C</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">L_2(\mathcal{C})</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord"><span class="mord mathnormal">L</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathcal" style="margin-right:0.0583em">C</span><span class="mclose">)</span></span></span></span>
<ul>
<li class="">두 번째 학습 단계(미세 조정)의 목적 함수입니다. <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="script">C</mi></mrow><annotation encoding="application/x-tex">\mathcal{C}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathcal" style="margin-right:0.0583em">C</span></span></span></span>는 사람이 직접 정답(<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>y</mi></mrow><annotation encoding="application/x-tex">y</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em"></span><span class="mord mathnormal" style="margin-right:0.0359em">y</span></span></span></span>)을 달아놓은 라벨링 데이터셋(예: 리뷰-별점 데이터)을 의미합니다.</li>
</ul>
</li>
<li class=""><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mo>∑</mo><mo stretchy="false">(</mo></msub><mi>x</mi><mo separator="true">,</mo><mi>y</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">∑_(x,y)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.2247em;vertical-align:-0.4747em"></span><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:0em">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2253em"><span style="top:-2.4003em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mopen mtight">(</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.4747em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal">x</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.0359em">y</span><span class="mclose">)</span></span></span></span>:<!-- -->
<ul>
<li class="">데이터셋 CC에 있는 모든 (입력 문장 xx, 정답 yy) 쌍에 대해서 아래의 확률을 전부 더하라는 뜻입니다.</li>
</ul>
</li>
<li class=""><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>l</mi><mi>o</mi><mi>g</mi><mi>P</mi><mo stretchy="false">(</mo><mo>…</mo><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">logP(…)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.0197em">l</span><span class="mord mathnormal">o</span><span class="mord mathnormal" style="margin-right:0.0359em">g</span><span class="mord mathnormal" style="margin-right:0.1389em">P</span><span class="mopen">(</span><span class="minner">…</span><span class="mclose">)</span></span></span></span>:<!-- -->
<ul>
<li class="">모델이 진짜 정답 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>y</mi></mrow><annotation encoding="application/x-tex">y</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em"></span><span class="mord mathnormal" style="margin-right:0.0359em">y</span></span></span></span>를 맞출 확률에 로그를 씌운 값입니다.</li>
</ul>
</li>
</ul>
<p>(<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msubsup><mi>h</mi><mi>l</mi><mi>m</mi></msubsup></mrow><annotation encoding="application/x-tex">h_l^m</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9775em;vertical-align:-0.2831em"></span><span class="mord"><span class="mord mathnormal">h</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.6644em"><span style="top:-2.4169em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0197em">l</span></span></span><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">m</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2831em"><span></span></span></span></span></span></span></span></span></span>은 Transformer 마지막 블록의 최종 활성화 벡터, <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>W</mi><mi>y</mi></msub></mrow><annotation encoding="application/x-tex">W_y</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9694em;vertical-align:-0.2861em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0359em">y</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em"><span></span></span></span></span></span></span></span></span></span>는 출력층의 가중치 행렬이다.)</p>
<hr>
<ul>
<li class=""><strong>Auxiliary Objective (보조 목적 함수)의 활용</strong>:
GPT-1은 지도 학습 단계에서도 학습의 안정성과 수렴 속도를 높이기 위해, 1단계의 언어 모델링(다음 단어 예측) 목적 함수를 보조적으로 함께 사용한다.</li>
</ul>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><msub><mi>L</mi><mn>3</mn></msub><mo stretchy="false">(</mo><mi mathvariant="script">C</mi><mo stretchy="false">)</mo><mo>=</mo><msub><mi>L</mi><mn>2</mn></msub><mo stretchy="false">(</mo><mi mathvariant="script">C</mi><mo stretchy="false">)</mo><mo>+</mo><mi>λ</mi><mo>⋅</mo><msub><mi>L</mi><mn>1</mn></msub><mo stretchy="false">(</mo><mi mathvariant="script">C</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">L_3(\mathcal{C}) = L_2(\mathcal{C}) + \lambda \cdot L_1(\mathcal{C})</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord"><span class="mord mathnormal">L</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">3</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathcal" style="margin-right:0.0583em">C</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord"><span class="mord mathnormal">L</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathcal" style="margin-right:0.0583em">C</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.6944em"></span><span class="mord mathnormal">λ</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">⋅</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord"><span class="mord mathnormal">L</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathcal" style="margin-right:0.0583em">C</span><span class="mclose">)</span></span></span></span></span>
<ul>
<li class=""><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>L</mi><mn>3</mn></msub><mo stretchy="false">(</mo><mi mathvariant="script">C</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">L_3(\mathcal{C})</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord"><span class="mord mathnormal">L</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">3</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathcal" style="margin-right:0.0583em">C</span><span class="mclose">)</span></span></span></span>:<!-- -->
<ul>
<li class="">미세 조정(Fine-Tuning) 단계에서 모델이 최종적으로 최대화해야 하는 종합 목표 점수입니다.</li>
</ul>
</li>
<li class=""><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>L</mi><mn>2</mn></msub><mo stretchy="false">(</mo><mi mathvariant="script">C</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">L_2(\mathcal{C})</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord"><span class="mord mathnormal">L</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord mathcal" style="margin-right:0.0583em">C</span><span class="mclose">)</span></span></span></span>:<!-- -->
<ul>
<li class="">이전에 설명해 드린 '정답(라벨) 맞추기' 점수입니다. (지도 학습)</li>
</ul>
</li>
<li class=""><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>L</mi><mn>1</mn></msub><mi mathvariant="script">C</mi></mrow><annotation encoding="application/x-tex">L_1{\mathcal{C}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">L</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mord"><span class="mord mathcal" style="margin-right:0.0583em">C</span></span></span></span></span>:<!-- -->
<ul>
<li class="">맨 처음에 설명해 드린 '다음 단어 맞추기' 점수입니다. (사전 학습 때 썼던 방식) 단, 여기서는 거대한 인터넷 데이터(<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="script">U</mi></mrow><annotation encoding="application/x-tex">{\mathcal{U}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord"><span class="mord mathcal" style="margin-right:0.0993em">U</span></span></span></span></span>)가 아니라, 현재 훈련 중인 라벨링 데이터셋(<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="script">C</mi></mrow><annotation encoding="application/x-tex">{\mathcal{C}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord"><span class="mord mathcal" style="margin-right:0.0583em">C</span></span></span></span></span>)의 텍스트를 가지고 다음 단어를 맞춥니다.</li>
</ul>
</li>
<li class=""><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>λ</mi></mrow><annotation encoding="application/x-tex">\lambda</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em"></span><span class="mord mathnormal">λ</span></span></span></span> (lamda):<!-- -->
<ul>
<li class="">가중치(Weight)를 조절하는 숫자입니다. "정답 맞추기(<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>L</mi><mn>2</mn></msub></mrow><annotation encoding="application/x-tex">L_2</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">L</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span>)가 메인 임무이긴 한데, 다음 단어 맞추기(<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>L</mi><mn>1</mn></msub></mrow><annotation encoding="application/x-tex">L_1</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">L</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span>)를 얼만큼의 비율로 섞어서 학습시킬까?"를 결정하는 조절 다이얼입니다. (보통 0.5 같은 값을 줍니다.)</li>
</ul>
</li>
</ul>
<p><strong></strong></p><h3><strong>왜 굳이 끝난 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>L</mi><mn>1</mn></msub></mrow><annotation encoding="application/x-tex">L_1</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">L</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span> 을 다시 가져와서 더했을까?</strong></h3><p></p>
<blockquote>
<p><strong>일반화 성능 향상 (과적합 방지):</strong><br>
<!-- -->정답(라벨) 맞추기에만 몰두하면, 모델이 텍스트의 진짜 의미는 잊어버리고 얄팍한 꼼수(특정 단어가 나오면 무조건 '긍정'으로 찍기 등)만 배울 수 있습니다(과적합). 다음 단어를 계속 예측하게 만들면, 문맥을 깊이 이해하는 능력을 유지하게 됩니다.</p>
</blockquote>
<blockquote>
<p><strong>학습 속도 상승 (빠른 수렴):</strong><br>
<!-- -->언어의 구조를 계속 인지하면서 학습하기 때문에, 모델이 정답을 찾아가는 속도가 훨씬 빨라집니다.</p>
</blockquote>
<blockquote>
<p><strong>사전 학습된 지식 유지:</strong><br>
<!-- -->인터넷 전체를 읽으며 고생해서 쌓아놓은 똑똑한 뇌(가중치)가, 특정 임무 하나만 배우다가 망가지는 현상(Catastrophic Forgetting)을 막아줍니다.</p>
</blockquote>
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="4-task-aware-input-transformations-작업-인식-입력-변환">4. Task-aware input transformations (작업 인식 입력 변환)<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/gpt-1#4-task-aware-input-transformations-%EC%9E%91%EC%97%85-%EC%9D%B8%EC%8B%9D-%EC%9E%85%EB%A0%A5-%EB%B3%80%ED%99%98" class="hash-link" aria-label="4. Task-aware input transformations (작업 인식 입력 변환)에 대한 직접 링크" title="4. Task-aware input transformations (작업 인식 입력 변환)에 대한 직접 링크" translate="no">​</a></h2>
<p>이 기법의 핵심은 <strong>잘 만들어진 12층짜리 디코더 구조를 뜯어고치지 않는다는 것</strong>이다. architecture 변경 없이, 텍스트 입력의 형태만 특수 token을 활용해 조작함으로써 다양한 태스크를 수행한다.</p>
<!-- -->
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="1-특수-token의-역할">1) 특수 token의 역할<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/gpt-1#1-%ED%8A%B9%EC%88%98-token%EC%9D%98-%EC%97%AD%ED%95%A0" class="hash-link" aria-label="1) 특수 token의 역할에 대한 직접 링크" title="1) 특수 token의 역할에 대한 직접 링크" translate="no">​</a></h3>
<ul>
<li class="">
<p><strong><code>&lt;S&gt; (Start)</code> token</strong>: sequence 맨 앞에 붙어 새로운 작업의 시작을 알리는 <strong>닻(Anchor)</strong> 역할.</p>
<ul>
<li class=""><em>Positional Encoding과의 차이</em>: 포지셔널 인코딩이 단어의 '물리적 위치'를 알려준다면, <code>&lt;S&gt; (Start)</code> token은 이전 문맥과 단절된 새로운 독립적 문제임을 알리는 '구조적 초기화 신호'다. 이 token이 없다면 첫 단어가 의미적 역할과 구조적 역할을 동시에 수행해야 해 어텐션 연산에 과부하가 온다.</li>
</ul>
</li>
<li class="">
<p><strong><code>$ (Delim)</code> token</strong>: 제시문과 보기 등 서로 다른 성격의 글을 분리해주는 <strong>구분자</strong> 역할.</p>
</li>
<li class="">
<p><strong><code>&lt;E&gt; (Extract)</code> token</strong>: sequence 맨 마지막에 붙는 token. 디코더가 이 token에 도달했을 때는 앞선 모든 문맥 정보가 계산된 상태다. 즉, 문장 전체의 의미를 꾹꾹 눌러 담은 <strong>하나의 요약 벡터(Vector)를 뽑아내는 방아쇠</strong> 역할을 한다.</p>
</li>
</ul>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="2-객관식-문제-multiple-choice-처리-메커니즘">2) 객관식 문제 (Multiple Choice) 처리 메커니즘<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/gpt-1#2-%EA%B0%9D%EA%B4%80%EC%8B%9D-%EB%AC%B8%EC%A0%9C-multiple-choice-%EC%B2%98%EB%A6%AC-%EB%A9%94%EC%BB%A4%EB%8B%88%EC%A6%98" class="hash-link" aria-label="2) 객관식 문제 (Multiple Choice) 처리 메커니즘에 대한 직접 링크" title="2) 객관식 문제 (Multiple Choice) 처리 메커니즘에 대한 직접 링크" translate="no">​</a></h3>
<p>수능 국어 객관식(제시문 1개, 보기 4개)을 푼다고 가정할 때의 처리 과정이다.</p>
<ol>
<li class="">
<p><strong>배치(Batch) 구성</strong>: 보기 4개를 하나의 긴 글로 묶지 않는다. 보기 개수만큼 다음과 같이 독립된 sequence로 구성한다.</p>
<ul>
<li class="">
<p><code>&lt;S&gt; (Start)</code> + 제시문 + <code>$ (Delim)</code> + 보기1 + <code>&lt;E&gt; (Extract)</code></p>
</li>
<li class="">
<p><code>&lt;S&gt; (Start)</code> + 제시문 + <code>$ (Delim)</code> + 보기2 + <code>&lt;E&gt; (Extract)</code> (이하 동일)</p>
</li>
</ul>
</li>
<li class="">
<p><strong>병렬 연산</strong>: 위 4개의 독립된 sequence를 배치로 묶어 모델에 한 번에 통과시킨다.</p>
</li>
<li class="">
<p><strong>점수 도출</strong>: 각각의 끝에 있는 <code>&lt;E&gt; (Extract)</code> token이 출력한 4개의 벡터를 동일한 선형 분류기(Linear Classifier)에 통과시켜 각 보기당 1개씩, 총 4개의 임의의 점수(Logit)를 얻어낸 뒤, 이 점수들을 모아 Softmax 함수를 통과시켜 정답 확률을 도출한다.</p>
</li>
</ol>
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="5-수학적-처리와-오차-계산-학습의-완성">5. 수학적 처리와 오차 계산 (학습의 완성)<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/gpt-1#5-%EC%88%98%ED%95%99%EC%A0%81-%EC%B2%98%EB%A6%AC%EC%99%80-%EC%98%A4%EC%B0%A8-%EA%B3%84%EC%82%B0-%ED%95%99%EC%8A%B5%EC%9D%98-%EC%99%84%EC%84%B1" class="hash-link" aria-label="5. 수학적 처리와 오차 계산 (학습의 완성)에 대한 직접 링크" title="5. 수학적 처리와 오차 계산 (학습의 완성)에 대한 직접 링크" translate="no">​</a></h2>
<p>모델이 뱉어낸 임의의 점수를 실제 정답과 비교하여 parameter를 업데이트(학습)하기 위한 필수 수학적 과정이다.</p>
<!-- -->
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="1-softmax-소프트맥스-함수">1) Softmax (소프트맥스 함수)<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/gpt-1#1-softmax-%EC%86%8C%ED%94%84%ED%8A%B8%EB%A7%A5%EC%8A%A4-%ED%95%A8%EC%88%98" class="hash-link" aria-label="1) Softmax (소프트맥스 함수)에 대한 직접 링크" title="1) Softmax (소프트맥스 함수)에 대한 직접 링크" translate="no">​</a></h3>
<ul>
<li class=""><strong>정의</strong>: 선형 분류기를 거쳐 나온 각 클래스의 임의의 점수 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>z</mi><mi>i</mi></msub></mrow><annotation encoding="application/x-tex">z_i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5806em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.044em">z</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-left:-0.044em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span>를 확률 값으로 변환한다.</li>
</ul>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>σ</mi><mo stretchy="false">(</mo><mi mathvariant="bold">z</mi><msub><mo stretchy="false">)</mo><mi>i</mi></msub><mo>=</mo><mfrac><msup><mi>e</mi><msub><mi>z</mi><mi>i</mi></msub></msup><mrow><munderover><mo>∑</mo><mrow><mi>j</mi><mo>=</mo><mn>1</mn></mrow><mi>K</mi></munderover><msup><mi>e</mi><msub><mi>z</mi><mi>j</mi></msub></msup></mrow></mfrac></mrow><annotation encoding="application/x-tex">\sigma(\mathbf{z})_i = \frac{e^{z_i}}{\sum_{j=1}^{K} e^{z_j}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.0359em">σ</span><span class="mopen">(</span><span class="mord mathbf">z</span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:2.6484em;vertical-align:-1.307em"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.3414em"><span style="top:-2.1288em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mop"><span class="mop op-symbol small-op" style="position:relative;top:0em">∑</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9812em"><span style="top:-2.4003em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.0572em">j</span><span class="mrel mtight">=</span><span class="mord mtight">1</span></span></span></span><span style="top:-3.2029em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.0715em">K</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.4358em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal">e</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.6065em"><span style="top:-3.0051em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.044em">z</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3281em"><span style="top:-2.357em;margin-left:-0.044em;margin-right:0.0714em"><span class="pstrut" style="height:2.5em"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0572em">j</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2819em"><span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span><span style="top:-3.23em"><span class="pstrut" style="height:3em"></span><span class="frac-line" style="border-bottom-width:0.04em"></span></span><span style="top:-3.677em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord"><span class="mord mathnormal">e</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.6644em"><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.044em">z</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3281em"><span style="top:-2.357em;margin-left:-0.044em;margin-right:0.0714em"><span class="pstrut" style="height:2.5em"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.143em"><span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.307em"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></span>
<ul>
<li class="">
<p><strong>직관적 해설</strong>:
선형 분류기에서 나온 4개의 점수(예: 10, 5, 1, -2)는 크기가 제각각이다. 이를 단순 비교하지 않고 Softmax를 쓰는 이유는 두 가지다.</p>
<ol>
<li class="">
<p><strong>확률 분포 변환</strong>: 점수들을 다 합쳐서 정확히 1(100%)이 되도록(각 값은 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mn>0</mn><mo>&lt;</mo><mi>σ</mi><mo>&lt;</mo><mn>1</mn></mrow><annotation encoding="application/x-tex">0 &lt; \sigma &lt; 1</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6835em;vertical-align:-0.0391em"></span><span class="mord">0</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">&lt;</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.5782em;vertical-align:-0.0391em"></span><span class="mord mathnormal" style="margin-right:0.0359em">σ</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">&lt;</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.6444em"></span><span class="mord">1</span></span></span></span>)록 비율을 맞춘다 (예: 70%, 20%, 8%, 2%). 지수 함수(<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>e</mi></mrow><annotation encoding="application/x-tex">e</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em"></span><span class="mord mathnormal">e</span></span></span></span>)를 쓰기 때문에 큰 값은 더 확실하게, 작은 값은 더 작게 만들어 모델이 확신을 갖도록 유도한다.</p>
</li>
<li class="">
<p><strong>미분 가능성</strong>: 딥러닝 역전파 학습을 위해선 그래프가 미분 가능해야 하는데, Softmax는 이 수학적 조건을 완벽하게 충족한다.</p>
</li>
</ol>
</li>
</ul>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="2-one-hot-encoding-원-핫-인코딩">2) One-hot Encoding (원-핫 인코딩)<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/gpt-1#2-one-hot-encoding-%EC%9B%90-%ED%95%AB-%EC%9D%B8%EC%BD%94%EB%94%A9" class="hash-link" aria-label="2) One-hot Encoding (원-핫 인코딩)에 대한 직접 링크" title="2) One-hot Encoding (원-핫 인코딩)에 대한 직접 링크" translate="no">​</a></h3>
<ul>
<li class=""><strong>정의</strong>: 정답이 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>c</mi></mrow><annotation encoding="application/x-tex">c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em"></span><span class="mord mathnormal">c</span></span></span></span>번 클래스일 때의 목표 확률 분포 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>p</mi></mrow><annotation encoding="application/x-tex">p</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em"></span><span class="mord mathnormal">p</span></span></span></span>는 다음과 같다.</li>
</ul>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>p</mi><mo stretchy="false">(</mo><mi>i</mi><mo stretchy="false">)</mo><mo>=</mo><mrow><mo fence="true">{</mo><mtable rowspacing="0.36em" columnalign="left left" columnspacing="1em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>1</mn></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow><mtext>if&nbsp;</mtext><mi>i</mi><mo>=</mo><mi>c</mi></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="false"><mn>0</mn></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="false"><mrow><mtext>if&nbsp;</mtext><mi>i</mi><mo mathvariant="normal">≠</mo><mi>c</mi></mrow></mstyle></mtd></mtr></mtable></mrow></mrow><annotation encoding="application/x-tex">p(i) = \begin{cases} 1 &amp; \text{if } i = c \\ 0 &amp; \text{if } i \neq c \end{cases}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal">p</span><span class="mopen">(</span><span class="mord mathnormal">i</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:3em;vertical-align:-1.25em"></span><span class="minner"><span class="mopen delimcenter" style="top:0em"><span class="delimsizing size4">{</span></span><span class="mord"><span class="mtable"><span class="col-align-l"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.69em"><span style="top:-3.69em"><span class="pstrut" style="height:3.008em"></span><span class="mord"><span class="mord">1</span></span></span><span style="top:-2.25em"><span class="pstrut" style="height:3.008em"></span><span class="mord"><span class="mord">0</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.19em"><span></span></span></span></span></span><span class="arraycolsep" style="width:1em"></span><span class="col-align-l"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.69em"><span style="top:-3.69em"><span class="pstrut" style="height:3.008em"></span><span class="mord"><span class="mord text"><span class="mord">if&nbsp;</span></span><span class="mord mathnormal">i</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mord mathnormal">c</span></span></span><span style="top:-2.25em"><span class="pstrut" style="height:3.008em"></span><span class="mord"><span class="mord text"><span class="mord">if&nbsp;</span></span><span class="mord mathnormal">i</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel"><span class="mrel"><span class="mord vbox"><span class="thinbox"><span class="rlap"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em"></span><span class="inner"><span class="mord"><span class="mrel"></span></span></span><span class="fix"></span></span></span></span></span><span class="mspace nobreak"></span><span class="mrel">=</span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mord mathnormal">c</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.19em"><span></span></span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></span>
<ul>
<li class=""><strong>직관적 해설</strong>: 컴퓨터가 자기가 예측한 확률(70%, 20%, 8%, 2%)과 진짜 정답을 비교하려면, 정답도 '확률 모양'이어야 한다. 정답이 2번이라면, 2번 자리에만 100%(1.0)를 주고 나머지는 0%(0.0)를 주어 <code>[0.0, 1.0, 0.0, 0.0]</code> 형태로 만들어주는 작업이다.</li>
</ul>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="3-cross-entropy-loss-크로스-엔트로피-오차">3) Cross-Entropy Loss (크로스 엔트로피 오차)<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/gpt-1#3-cross-entropy-loss-%ED%81%AC%EB%A1%9C%EC%8A%A4-%EC%97%94%ED%8A%B8%EB%A1%9C%ED%94%BC-%EC%98%A4%EC%B0%A8" class="hash-link" aria-label="3) Cross-Entropy Loss (크로스 엔트로피 오차)에 대한 직접 링크" title="3) Cross-Entropy Loss (크로스 엔트로피 오차)에 대한 직접 링크" translate="no">​</a></h3>
<ul>
<li class=""><strong>정의</strong>: 모델의 예측 확률 분포 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>q</mi></mrow><annotation encoding="application/x-tex">q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em"></span><span class="mord mathnormal" style="margin-right:0.0359em">q</span></span></span></span>와 실제 정답 분포 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>p</mi></mrow><annotation encoding="application/x-tex">p</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em"></span><span class="mord mathnormal">p</span></span></span></span> 사이의 차이(Loss)를 측정한다.</li>
</ul>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>H</mi><mo stretchy="false">(</mo><mi>p</mi><mo separator="true">,</mo><mi>q</mi><mo stretchy="false">)</mo><mo>=</mo><mo>−</mo><munder><mo>∑</mo><mi>x</mi></munder><mi>p</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mi>log</mi><mo>⁡</mo><mi>q</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">H(p, q) = -\sum_{x} p(x) \log q(x)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.0813em">H</span><span class="mopen">(</span><span class="mord mathnormal">p</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.0359em">q</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:2.3em;vertical-align:-1.25em"></span><span class="mord">−</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.05em"><span style="top:-1.9em;margin-left:0em"><span class="pstrut" style="height:3.05em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">x</span></span></span></span><span style="top:-3.05em"><span class="pstrut" style="height:3.05em"></span><span><span class="mop op-symbol large-op">∑</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.25em"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal">p</span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mop">lo<span style="margin-right:0.0139em">g</span></span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.0359em">q</span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mclose">)</span></span></span></span></span>
<p>정답이 One-hot Encoding된 경우, 실제 정답 클래스 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>c</mi></mrow><annotation encoding="application/x-tex">c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em"></span><span class="mord mathnormal">c</span></span></span></span>에 대해서만 확률을 계산하게 된다.
모델이 정답 클래스에 할당한 확률 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>q</mi><mo stretchy="false">(</mo><mi>c</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">q(c)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.0359em">q</span><span class="mopen">(</span><span class="mord mathnormal">c</span><span class="mclose">)</span></span></span></span>가 1에 가까울수록 오차(Loss)는 0에 수렴하고, 확률이 낮을수록 오차는 무한대로 발산한다.</p>
<ul>
<li class=""><strong>직관적 해설</strong>:
MSE(평균 제곱 오차)는 집값 예측 같은 연속된 숫자(회귀)에 쓴다. 반면, 객관식이나 분류 문제에서는 <strong>두 확률 분포(예측값 vs 정답) 간의 거리를 재는 Cross-Entropy</strong>가 훨씬 적합하다.
모델은 예측값(예: <code>[0.1, 0.7, 0.05, 0.15]</code>)과 정답(<code>[0, 1, 0, 0]</code>) 사이의 오차값을 계산한 뒤, 이 오차를 줄이는 방향으로 내부 parameter를 수정하며 점차 정답률을 높인다.</li>
</ul>]]></content:encoded>
            <category>논문</category>
            <category>gpt</category>
            <category>nlp</category>
            <category>llm</category>
            <category>딥러닝</category>
        </item>
        <item>
            <title><![CDATA[[논문] Space-Time Approach to Non-Relativistic Quantum Mechanics]]></title>
            <link>https://hwkim-dev.github.io/hwkim-dev/ko/blog/path-integral</link>
            <guid>https://hwkim-dev.github.io/hwkim-dev/ko/blog/path-integral</guid>
            <pubDate>Fri, 17 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[R.P. Feynman의 'Space-Time Approach to Non-Relativistic Quantum Mechanics' 논문 초록(Abstract) 정리 노트]]></description>
            <content:encoded><![CDATA[<p>R.P. Feynman의 'Space-Time Approach to Non-Relativistic Quantum Mechanics' 논문 초록(Abstract) 정리 노트</p>
<br>
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="1-abstract-양자역학의-새로운-공식화-path-integral-formulation">1. Abstract: 양자역학의 새로운 공식화 (Path Integral Formulation)<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/path-integral#1-abstract-%EC%96%91%EC%9E%90%EC%97%AD%ED%95%99%EC%9D%98-%EC%83%88%EB%A1%9C%EC%9A%B4-%EA%B3%B5%EC%8B%9D%ED%99%94-path-integral-formulation" class="hash-link" aria-label="1. Abstract: 양자역학의 새로운 공식화 (Path Integral Formulation)에 대한 직접 링크" title="1. Abstract: 양자역학의 새로운 공식화 (Path Integral Formulation)에 대한 직접 링크" translate="no">​</a></h2>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="1-확률probability">1) 확률(Probability)<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/path-integral#1-%ED%99%95%EB%A5%A0probability" class="hash-link" aria-label="1) 확률(Probability)에 대한 직접 링크" title="1) 확률(Probability)에 대한 직접 링크" translate="no">​</a></h3>
<ul>
<li class="">양자역학에서 특정 사건이 일어날 확률은 고전역학처럼 각 경로의 확률 자체를 단순히 더하는 것이 아니다.</li>
<li class=""><strong>사건이 일어날 수 있는 모든 대안적인 방법(Alternative ways/paths)</strong> 에서 발생하는 <strong>복소수 기여도(Complex contributions, 확률 진폭)</strong> 들을 모두 더한 후, 그 '합의 절댓값을 제곱'하여 구한다.</li>
</ul>
<br>
<table width="100%"><tbody><tr><th colspan="2">사건이 일어날 수 있는 모든 대안적인 방법(Alternative ways/paths)</th></tr><tr><td width="320" align="center"><img src="https://github.com/user-attachments/assets/29a70da5-d0fa-43e0-bafa-53bb2330fba5" width="300" height="300"></td><td>입자가 출발점 A에서 도착점 B로 가는 경로는 무한(Infinite)하기에, <br> 각 경로마다 방향이 다른 화살표 형태의 기여도(<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mi>e</mi><mrow><mi>i</mi><mi>S</mi><mi mathvariant="normal">/</mi><mi mathvariant="normal">ℏ</mi></mrow></msup></mrow><annotation encoding="application/x-tex">e^{iS/\hbar}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.888em"></span><span class="mord"><span class="mord mathnormal">e</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.888em"><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span><span class="mord mathnormal mtight" style="margin-right:0.0576em">S</span><span class="mord mtight">/ℏ</span></span></span></span></span></span></span></span></span></span></span></span>)가 무한히 도출된다.</td></tr></tbody></table>
<br>
<table width="100%"><tbody><tr><th colspan="2"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mi>e</mi><mrow><mi>i</mi><mi>S</mi><mi mathvariant="normal">/</mi><mi mathvariant="normal">ℏ</mi></mrow></msup></mrow><annotation encoding="application/x-tex">e^{iS/\hbar}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.888em"></span><span class="mord"><span class="mord mathnormal">e</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.888em"><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span><span class="mord mathnormal mtight" style="margin-right:0.0576em">S</span><span class="mord mtight">/ℏ</span></span></span></span></span></span></span></span></span></span></span></span> 함수의 그래프</th></tr><tr><td width="320" align="center"><img src="https://github.com/user-attachments/assets/584465da-f3fa-4bca-9083-e631c69e1a6b" width="300" height="300"></td><td>가로축은 실수부(<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>R</mi><mi>e</mi><mi>a</mi><mi>l</mi></mrow><annotation encoding="application/x-tex">Real</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em"></span><span class="mord mathnormal" style="margin-right:0.0077em">R</span><span class="mord mathnormal">e</span><span class="mord mathnormal">a</span><span class="mord mathnormal" style="margin-right:0.0197em">l</span></span></span></span>), 세로축은 허수부(<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>I</mi><mi>m</mi><mi>a</mi><mi>g</mi><mi>i</mi><mi>n</mi><mi>a</mi><mi>r</mi><mi>y</mi></mrow><annotation encoding="application/x-tex">Imaginary</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal" style="margin-right:0.0785em">I</span><span class="mord mathnormal">ma</span><span class="mord mathnormal" style="margin-right:0.0359em">g</span><span class="mord mathnormal">ina</span><span class="mord mathnormal" style="margin-right:0.0278em">r</span><span class="mord mathnormal" style="margin-right:0.0359em">y</span></span></span></span>)인 '복소평면'이다.<br><br> 원의 반지름(화살표의 길이)은 1로 고정되어 있다. 여기서 화살표가 가리키는 **각도(위상, Phase)**가 바로 앞서 배운 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>S</mi><mi mathvariant="normal">/</mi><mi mathvariant="normal">ℏ</mi></mrow><annotation encoding="application/x-tex">S/\hbar</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.0576em">S</span><span class="mord">/ℏ</span></span></span></span> 이고,<br><br> 경로의 물리적 수치(<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>S</mi></mrow><annotation encoding="application/x-tex">S</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0576em">S</span></span></span></span>)에 따라 이 화살표가 빙글빙글 돌아가며 각기 다른 방향을 가리키게 된다.</td></tr></tbody></table>
<hr>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>P</mi><mo stretchy="false">(</mo><mi>E</mi><mi>v</mi><mi>e</mi><mi>n</mi><mi>t</mi><mo stretchy="false">)</mo><mo>=</mo><mi mathvariant="normal">∣</mi><msub><mtext>Contribution</mtext><mn>1</mn></msub><mo>+</mo><msub><mtext>Contribution</mtext><mn>2</mn></msub><mo>+</mo><mo>⋯</mo><msup><mi mathvariant="normal">∣</mi><mn>2</mn></msup></mrow><annotation encoding="application/x-tex">P(Event) = |\text{Contribution}_1 + \text{Contribution}_2 + \dotsb|^2</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.1389em">P</span><span class="mopen">(</span><span class="mord mathnormal" style="margin-right:0.0576em">E</span><span class="mord mathnormal" style="margin-right:0.0359em">v</span><span class="mord mathnormal">e</span><span class="mord mathnormal">n</span><span class="mord mathnormal">t</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord">∣</span><span class="mord"><span class="mord text"><span class="mord">Contribution</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.8444em;vertical-align:-0.15em"></span><span class="mord"><span class="mord text"><span class="mord">Contribution</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:1.1141em;vertical-align:-0.25em"></span><span class="minner">⋯</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord">∣</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8641em"><span style="top:-3.113em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span></span></span>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="2-시공간에서의-경로-xt-와-sum-over-paths">2) 시공간에서의 경로: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>x</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">x(t)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal">x</span><span class="mopen">(</span><span class="mord mathnormal">t</span><span class="mclose">)</span></span></span></span> 와 Sum over Paths<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/path-integral#2-%EC%8B%9C%EA%B3%B5%EA%B0%84%EC%97%90%EC%84%9C%EC%9D%98-%EA%B2%BD%EB%A1%9C-xt-%EC%99%80-sum-over-paths" class="hash-link" aria-label="2-시공간에서의-경로-xt-와-sum-over-paths에 대한 직접 링크" title="2-시공간에서의-경로-xt-와-sum-over-paths에 대한 직접 링크" translate="no">​</a></h3>
<ul>
<li class=""><strong>Path <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>x</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">x(t)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal">x</span><span class="mopen">(</span><span class="mord mathnormal">t</span><span class="mclose">)</span></span></span></span>:</strong> * <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>x</mi></mrow><annotation encoding="application/x-tex">x</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em"></span><span class="mord mathnormal">x</span></span></span></span>는 위치(Space), <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>t</mi></mrow><annotation encoding="application/x-tex">t</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6151em"></span><span class="mord mathnormal">t</span></span></span></span>는 시간(Time)을 의미하며, <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>x</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">x(t)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal">x</span><span class="mopen">(</span><span class="mord mathnormal">t</span><span class="mclose">)</span></span></span></span>는 시간에 따라 입자가 시공간(Space-time)을 이동하는 궤적(Trajectory)을 나타낸다.</li>
<li class=""><strong>Sum over Paths (경로의 합):</strong> * 입자가 특정 시공간 영역 내에서 어떠한 경로 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>x</mi><mo stretchy="false">(</mo><mi>t</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">x(t)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal">x</span><span class="mopen">(</span><span class="mord mathnormal">t</span><span class="mclose">)</span></span></span></span>를 가질 확률은, 그 영역 안에 존재하는 <strong>모든 가능한 경로(All possible paths)</strong> 에서 나오는 기여(Contributions)들을 전부 합친 것의 절댓값 제곱이 된다.</li>
</ul>
<hr>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="핵심-심화-기여도contribution의-정체와-복소수complex의-마법">[핵심 심화] 기여도(Contribution)의 정체와 복소수(Complex)의 마법<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/path-integral#%ED%95%B5%EC%8B%AC-%EC%8B%AC%ED%99%94-%EA%B8%B0%EC%97%AC%EB%8F%84contribution%EC%9D%98-%EC%A0%95%EC%B2%B4%EC%99%80-%EB%B3%B5%EC%86%8C%EC%88%98complex%EC%9D%98-%EB%A7%88%EB%B2%95" class="hash-link" aria-label="[핵심 심화] 기여도(Contribution)의 정체와 복소수(Complex)의 마법에 대한 직접 링크" title="[핵심 심화] 기여도(Contribution)의 정체와 복소수(Complex)의 마법에 대한 직접 링크" translate="no">​</a></h3>
<p>이 논문의 가장 핵심적인 질문인 <strong>"기여도란 무엇이며, 왜 복소수 기여(Complex Contribution)인가?"</strong> 그리고 <strong>"왜 화살표(벡터)라고 부르는가?"</strong> 에 대한 해답이다.</p>
<p><strong>① 기여도(Contribution)의 진짜 이름: 확률 진폭 (Probability Amplitude)</strong></p>
<ul>
<li class="">여기서 말하는 '기여도'는 특정 경로가 최종 결과에 미치는 영향을 나타내는 수학적 값으로, 정식 물리학 용어로는 <strong>확률 진폭(Probability Amplitude)</strong> 이라고 부른다.</li>
<li class="">양자역학에서는 이 기여도를 단순히 0.2, 0.5 같은 '실수(Real Number)' 형태의 확률로 주지 않고, <strong>크기와 방향을 가진 수학적 벡터</strong> 형태로 부여한다.<!-- -->
<blockquote>
<p>고전역학에선 A길로 갈 확률 30% B길로 갈 확률 40% 이면 70%인 두개를 더하기만 하면 된다.</p>
</blockquote>
</li>
</ul>
<p><strong>② 물리적 화살표가 아닌 '복소평면 위의 벡터 (Complex Vector / Phasor)'</strong></p>
<ul>
<li class="">일상에서 말하는 '벡터'는 3차원 공간에서 날아가는 공의 방향 같은 물리적 궤적을 떠올리기 쉽다.</li>
<li class="">하지만 여기서의 기여도는 실제 공간에서 휘어지는 곡선이 <strong>절대 아니다.</strong> 실수부와 허수부로 이루어진 추상적인 수학 공간인 <strong>Complex Plane 위에 존재하는 Vector</strong>이다.</li>
<li class="">수학적으로 자연상수 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>e</mi></mrow><annotation encoding="application/x-tex">e</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em"></span><span class="mord mathnormal">e</span></span></span></span>의 지수에 허수 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>i</mi></mrow><annotation encoding="application/x-tex">i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6595em"></span><span class="mord mathnormal">i</span></span></span></span>가 붙은 형태(<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mi>e</mi><mrow><mi>i</mi><mi>θ</mi></mrow></msup></mrow><annotation encoding="application/x-tex">e^{i\theta}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8491em"></span><span class="mord"><span class="mord mathnormal">e</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8491em"><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span><span class="mord mathnormal mtight" style="margin-right:0.0278em">θ</span></span></span></span></span></span></span></span></span></span></span></span>)는 복소평면 상에서 <strong>길이가 항상 1인 회전하는 화살표(Phasor)</strong> 를 뜻한다.</li>
<li class="">즉, 모든 경로의 기여도 크기(Magnitude)는 무조건 1로 동일하지만, <strong>가리키는 방향(Phase, 위상)</strong> 이 경로의 특성에 따라 제각각 다르다.</li>
</ul>
<p><strong>③ 왜 실수(Real)가 아니라 복소수(Complex) 기여도인가? <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo>→</mo></mrow><annotation encoding="application/x-tex">\rightarrow</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.3669em"></span><span class="mrel">→</span></span></span></span> 간섭(Interference) 현상</strong></p>
<ul>
<li class=""><strong>실수 기여도(고전역학):</strong> 단순히 숫자가 커진다. 입자가 갈 수 있는 길이 많아질수록 확률은 무조건 올라간다.</li>
<li class=""><strong>복소수 기여도(양자역학):</strong> 복소수 벡터(화살표)이기 때문에 방향에 따라 서로 더해질 때 마법 같은 일이 일어난다.</li>
</ul>
<table width="100%"><tbody><tr><th colspan="2">보강 간섭 (Constructive Interference)</th></tr><tr><td width="320" align="center"><img src="https://github.com/user-attachments/assets/dd776ee2-35a7-419a-9ff4-e01e869f34dd" width="300" height="300"></td><td>경로들의 화살표 방향이 같다면, <br> 길이가 길어져 그곳에 입자가 도달할 확률이 폭발적으로 증가한다.</td></tr></tbody></table>
<br>
<table width="100%"><tbody><tr><th colspan="2">상쇄 간섭 (Destructive Interference)</th></tr><tr><td width="320" align="center"><img src="https://github.com/user-attachments/assets/84136b9c-6ad5-4d1a-8fa7-cc5ee1dfe7d7" width="300" height="300"></td><td>경로들의 화살표 방향이 정반대라면, 두 화살표를 더했을 때 0이 되어버린다. <br> 즉, 입자가 갈 수 있는 길이 분명히 열려있는데도 불구하고, 복소수 기여도들이 서로를 갉아먹어 <br> 입자가 발견될 확률이 0%가 되는 <br> 양자역학 특유의 현상을 수학적으로 완벽하게 설명할 수 있게 된다.</td></tr></tbody></table>
<hr>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="3-화살표의-방향phase을-결정하는-요소-s-와-hbar">3) 화살표의 방향(Phase)을 결정하는 요소: <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>S</mi></mrow><annotation encoding="application/x-tex">S</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0576em">S</span></span></span></span> 와 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">ℏ</mi></mrow><annotation encoding="application/x-tex">\hbar</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6889em"></span><span class="mord">ℏ</span></span></span></span><a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/path-integral#3-%ED%99%94%EC%82%B4%ED%91%9C%EC%9D%98-%EB%B0%A9%ED%96%A5phase%EC%9D%84-%EA%B2%B0%EC%A0%95%ED%95%98%EB%8A%94-%EC%9A%94%EC%86%8C-s-%EC%99%80-hbar" class="hash-link" aria-label="3-화살표의-방향phase을-결정하는-요소-s-와-hbar에 대한 직접 링크" title="3-화살표의-방향phase을-결정하는-요소-s-와-hbar에 대한 직접 링크" translate="no">​</a></h3>
<p>각각의 경로가 만들어내는 화살표(<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mi>e</mi><mrow><mi>i</mi><mi>S</mi><mi mathvariant="normal">/</mi><mi mathvariant="normal">ℏ</mi></mrow></msup></mrow><annotation encoding="application/x-tex">e^{iS/\hbar}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.888em"></span><span class="mord"><span class="mord mathnormal">e</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.888em"><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span><span class="mord mathnormal mtight" style="margin-right:0.0576em">S</span><span class="mord mtight">/ℏ</span></span></span></span></span></span></span></span></span></span></span></span>)의 각도(Phase)는 다음과 같은 물리량에 의해 결정된다.</p>
<ul>
<li class=""><strong><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>S</mi></mrow><annotation encoding="application/x-tex">S</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0576em">S</span></span></span></span> (Classical Action, 고전적 작용):</strong> * 입자가 특정 경로를 통과하면서 얻는 물리적 수치(적분값). 경로의 모양(속도, 궤적, 시간 등)에 따라 이 Action 값이 달라진다.</li>
<li class=""><strong><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">ℏ</mi></mrow><annotation encoding="application/x-tex">\hbar</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6889em"></span><span class="mord">ℏ</span></span></span></span> (Reduced Planck Constant, 환산 플랑크 상수):</strong> * 양자 세계의 아주 작은 기본 단위 상수.</li>
<li class=""><strong><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>S</mi><mi mathvariant="normal">/</mi><mi mathvariant="normal">ℏ</mi></mrow><annotation encoding="application/x-tex">S/\hbar</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.0576em">S</span><span class="mord">/ℏ</span></span></span></span> (Phase, 위상):</strong> * 경로의 Action(<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>S</mi></mrow><annotation encoding="application/x-tex">S</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0576em">S</span></span></span></span>)을 기준값(<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">ℏ</mi></mrow><annotation encoding="application/x-tex">\hbar</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6889em"></span><span class="mord">ℏ</span></span></span></span>)으로 나눈 값으로, 이것이 바로 <strong>화살표가 돌아간 각도(Phase angle)</strong> 가 된다. <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">ℏ</mi></mrow><annotation encoding="application/x-tex">\hbar</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6889em"></span><span class="mord">ℏ</span></span></span></span>가 워낙 미세한 상수이므로, 경로(Action)가 털끝만큼만 달라져도 화살표의 방향은 매우 격렬하게 돌아간다.</li>
</ul>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="4-파동함수wavefunction-psix-t-와-중첩superposition">4) 파동함수(Wavefunction) <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>ψ</mi><mo stretchy="false">(</mo><mi>x</mi><mo separator="true">,</mo><mi>t</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">\psi(x, t)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.0359em">ψ</span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal">t</span><span class="mclose">)</span></span></span></span> 와 중첩(Superposition)<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/path-integral#4-%ED%8C%8C%EB%8F%99%ED%95%A8%EC%88%98wavefunction-psix-t-%EC%99%80-%EC%A4%91%EC%B2%A9superposition" class="hash-link" aria-label="4-파동함수wavefunction-psix-t-와-중첩superposition에 대한 직접 링크" title="4-파동함수wavefunction-psix-t-와-중첩superposition에 대한 직접 링크" translate="no">​</a></h3>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>ψ</mi><mo stretchy="false">(</mo><mi>x</mi><mo separator="true">,</mo><mi>t</mi><mo stretchy="false">)</mo><mo>=</mo><munder><mo>∑</mo><mtext>All&nbsp;Paths</mtext></munder><msup><mi>e</mi><mrow><mi>i</mi><mi>S</mi><mi mathvariant="normal">/</mi><mi mathvariant="normal">ℏ</mi></mrow></msup></mrow><annotation encoding="application/x-tex">\psi(x, t) = \sum_{\text{All Paths}} e^{iS/\hbar}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.0359em">ψ</span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal">t</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:2.3521em;vertical-align:-1.3021em"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.05em"><span style="top:-1.8479em;margin-left:0em"><span class="pstrut" style="height:3.05em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord text mtight"><span class="mord mtight">All&nbsp;Paths</span></span></span></span></span><span style="top:-3.05em"><span class="pstrut" style="height:3.05em"></span><span><span class="mop op-symbol large-op">∑</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.3021em"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal">e</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.938em"><span style="top:-3.113em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span><span class="mord mathnormal mtight" style="margin-right:0.0576em">S</span><span class="mord mtight">/ℏ</span></span></span></span></span></span></span></span></span></span></span></span></span>
<ul>
<li class="">입자가 출발점 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>A</mi></mrow><annotation encoding="application/x-tex">A</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal">A</span></span></span></span>에서 도착점 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>B</mi></mrow><annotation encoding="application/x-tex">B</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0502em">B</span></span></span></span>로 가는 경로(<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>P</mi><mi>a</mi><mi>t</mi><mi>h</mi><mi>s</mi><mo stretchy="false">(</mo><mi>A</mi><mo>→</mo><mi>B</mi></mrow><annotation encoding="application/x-tex">Paths(A \rightarrow B</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.1389em">P</span><span class="mord mathnormal">a</span><span class="mord mathnormal">t</span><span class="mord mathnormal">h</span><span class="mord mathnormal">s</span><span class="mopen">(</span><span class="mord mathnormal">A</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">→</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0502em">B</span></span></span></span>))는 <strong>무한대(<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">∞</mi></mrow><annotation encoding="application/x-tex">\infty</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em"></span><span class="mord">∞</span></span></span></span>)</strong> 이다.</li>
<li class="">따라서 각 경로마다 각도(Phase)가 제각각인 무한히 많은 화살표(<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mi>e</mi><mrow><mi>i</mi><mi>S</mi><mi mathvariant="normal">/</mi><mi mathvariant="normal">ℏ</mi></mrow></msup></mrow><annotation encoding="application/x-tex">e^{iS/\hbar}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.888em"></span><span class="mord"><span class="mord mathnormal">e</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.888em"><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span><span class="mord mathnormal mtight" style="margin-right:0.0576em">S</span><span class="mord mtight">/ℏ</span></span></span></span></span></span></span></span></span></span></span></span>)들이 존재한다.</li>
<li class="">이 무한한 확률 진폭(Contributions)들을 꼬리에 꼬리를 물고 전부 이어 붙여 벡터의 덧셈으로 더했을 때 (<strong>중첩, Superposition</strong>), 시작점에서 출발해 최종적으로 도달한 곳을 가리키는 <strong>최종 화살표의 벡터합</strong>이 바로 양자역학의 상태를 완벽하게 나타내는 <strong>파동함수 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>ψ</mi><mo stretchy="false">(</mo><mi>x</mi><mo separator="true">,</mo><mi>t</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">\psi(x, t)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.0359em">ψ</span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal">t</span><span class="mclose">)</span></span></span></span></strong> 가 된다.</li>
</ul>
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="1-introduction-서론">1. Introduction (서론)<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/path-integral#1-introduction-%EC%84%9C%EB%A1%A0" class="hash-link" aria-label="1. Introduction (서론)에 대한 직접 링크" title="1. Introduction (서론)에 대한 직접 링크" translate="no">​</a></h2>
<p>양자역학의 초기 역사와 본 논문이 제안하는 '세 번째 공식화(Third formulation)'의 배경 및 목적을 다룬다.</p>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="1-양자역학의-두-가지-초기-공식화">1) 양자역학의 두 가지 초기 공식화<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/path-integral#1-%EC%96%91%EC%9E%90%EC%97%AD%ED%95%99%EC%9D%98-%EB%91%90-%EA%B0%80%EC%A7%80-%EC%B4%88%EA%B8%B0-%EA%B3%B5%EC%8B%9D%ED%99%94" class="hash-link" aria-label="1) 양자역학의 두 가지 초기 공식화에 대한 직접 링크" title="1) 양자역학의 두 가지 초기 공식화에 대한 직접 링크" translate="no">​</a></h3>
<p>현대 양자역학은 초창기에 수학적으로 완전히 다르게 생긴 두 가지 방식으로 출발했다.</p>
<ul>
<li class=""><strong>슈뢰딩거(Schrödinger)</strong>: 파동 관점의 미분 방정식</li>
<li class=""><strong>하이젠베르크(Heisenberg)</strong>: 입자 관점의 행렬 역학 (Matrix algebra)
이 두 방식은 겉보기엔 달라 보였으나 수학적으로 완벽히 동일한 결과를 낸다는 것이 증명되었고, 훗날 폴 디랙(Paul Dirac)의 변환 이론(Transformation theory)으로 통합되었다.</li>
</ul>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="2-세-번째-공식화-경로적분path-integral의-탄생">2) 세 번째 공식화: 경로적분(Path Integral)의 탄생<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/path-integral#2-%EC%84%B8-%EB%B2%88%EC%A7%B8-%EA%B3%B5%EC%8B%9D%ED%99%94-%EA%B2%BD%EB%A1%9C%EC%A0%81%EB%B6%84path-integral%EC%9D%98-%ED%83%84%EC%83%9D" class="hash-link" aria-label="2) 세 번째 공식화: 경로적분(Path Integral)의 탄생에 대한 직접 링크" title="2) 세 번째 공식화: 경로적분(Path Integral)의 탄생에 대한 직접 링크" translate="no">​</a></h3>
<p>본 논문은 비상대론적 양자역학을 설명하는 <strong>세 번째 공식화</strong>를 제시한다.</p>
<ul>
<li class=""><strong>디랙의 힌트</strong>: 디랙이 언급한 '고전적 작용(<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>S</mi></mrow><annotation encoding="application/x-tex">S</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0576em">S</span></span></span></span>)과 양자역학의 관계'에서 영감을 받았다.</li>
<li class=""><strong>관점의 확장</strong>: 특정 시간(<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>t</mi></mrow><annotation encoding="application/x-tex">t</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6151em"></span><span class="mord mathnormal">t</span></span></span></span>)에 특정 위치(<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>x</mi></mrow><annotation encoding="application/x-tex">x</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em"></span><span class="mord mathnormal">x</span></span></span></span>)에 있을 확률을 구하는 기존 방식에서 벗어나, 입자가 시간에 따라 이동하는 <strong>'전체 궤적(Entire motion)' 자체에 확률 진폭(Probability amplitude)을 부여</strong>한다.</li>
</ul>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="3-굳이-새로운-공식을-만든-4가지-이유">3) 굳이 새로운 공식을 만든 4가지 이유<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/path-integral#3-%EA%B5%B3%EC%9D%B4-%EC%83%88%EB%A1%9C%EC%9A%B4-%EA%B3%B5%EC%8B%9D%EC%9D%84-%EB%A7%8C%EB%93%A0-4%EA%B0%80%EC%A7%80-%EC%9D%B4%EC%9C%A0" class="hash-link" aria-label="3) 굳이 새로운 공식을 만든 4가지 이유에 대한 직접 링크" title="3) 굳이 새로운 공식을 만든 4가지 이유에 대한 직접 링크" translate="no">​</a></h3>
<p>이 새로운 공식은 기존 이론(슈뢰딩거, 하이젠베르크)과 수학적으로 완벽히 동일한 결과를 내므로 근본적으로 새로운 결과는 없다. 그럼에도 불구하고 이 접근법을 제안하는 이유는 다음과 같다.</p>
<ol>
<li class=""><strong>새로운 관점의 즐거움</strong>: 이미 아는 진리라도 전혀 새로운 시각으로 바라보는 것은 그 자체로 큰 기쁨(Pleasure)이다.</li>
<li class=""><strong>복잡한 상호작용 시스템의 단순화</strong>: 두 시스템 A, B가 상호작용할 때, 새로운 방식을 쓰면 방해되는 시스템 B의 좌표를 수학적으로 소거(Eliminate)하고 A의 수식만 수정하여 훨씬 쉽게 계산할 수 있다.</li>
<li class=""><strong>양자전기역학(QED)으로의 응용</strong>: 이러한 장점을 활용해, 양자전기역학 방정식에서 불필요한 장(Field)의 진동자 좌표들을 제거할 수 있다.</li>
<li class=""><strong>미래 물리학을 위한 도약</strong>: 이 완전히 새로운 관점이 훗날 꽉 막힌 현대 물리학의 한계를 깨고, 현재의 실험 결과들을 포괄할 수 있는 새로운 이론을 만드는 데 영감(Inspire)을 줄 것이라는 희망 때문이다.</li>
</ol>]]></content:encoded>
            <category>논문</category>
            <category>물리</category>
            <category>양자역학</category>
        </item>
        <item>
            <title><![CDATA[[공부] Transformer Language Model 구조]]></title>
            <link>https://hwkim-dev.github.io/hwkim-dev/ko/blog/transformer-structure</link>
            <guid>https://hwkim-dev.github.io/hwkim-dev/ko/blog/transformer-structure</guid>
            <pubDate>Fri, 17 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[1. 인공지능이 텍스트를 처리하는 수학적 접근]]></description>
            <content:encoded><![CDATA[<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="1-인공지능이-텍스트를-처리하는-수학적-접근">1. 인공지능이 텍스트를 처리하는 수학적 접근<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/transformer-structure#1-%EC%9D%B8%EA%B3%B5%EC%A7%80%EB%8A%A5%EC%9D%B4-%ED%85%8D%EC%8A%A4%ED%8A%B8%EB%A5%BC-%EC%B2%98%EB%A6%AC%ED%95%98%EB%8A%94-%EC%88%98%ED%95%99%EC%A0%81-%EC%A0%91%EA%B7%BC" class="hash-link" aria-label="1. 인공지능이 텍스트를 처리하는 수학적 접근에 대한 직접 링크" title="1. 인공지능이 텍스트를 처리하는 수학적 접근에 대한 직접 링크" translate="no">​</a></h2>
<p>인공지능 모델은 사람처럼 글자의 형태나 문장의 의미를 직관적으로 이해하지 못한다. 컴퓨터를 구성하는 프로세서는 오직 숫자만을 계산할 수 있는 물리적 장치이다. 따라서 인공지능이 문장을 처리하기 위해서는 가장 먼저 문장을 구성하는 모든 단어를 철저하게 수학적인 숫자의 배열로 변환하는 과정이 필요함.</p>
<p>사용자가 문장을 입력하면, 컴퓨터는 이 문장을 '토큰(Token)'이라는 아주 작은 단위로 쪼갠다. 하나의 토큰은 하나의 단어일 수도 있고, 단어의 일부분일 수도 있다. 컴퓨터는 이렇게 쪼개진 각각의 토큰에 대해 미리 학습된 긴 숫자의 목록을 할당한다. 이 숫자의 목록을 '임베딩(Embedding)' 또는 '벡터(Vector)'라고 부른다. 임베딩은 보통 수백 개에서 수천 개의 소수점 숫자로 이루어져 있으며, 이 숫자들은 해당 단어가 문법적으로 어떤 위치에 있는지, 다른 단어들과 함께 쓰일 때 어떤 패턴을 가지는지를 수학적 좌표로 나타낸 것이다.</p>
<p>그러나 단어 하나를 단순히 고정된 숫자 목록으로 바꾸는 것만으로는 문장 전체의 복잡한 의미를 파악할 수 없습니다. 문장 안에서 단어들은 서로에게 영향을 미치며 그 의미가 계속해서 변하기 때문이다. 컴퓨터가 이러한 단어들 사이의 관계를 파악하기 위해 사용하는 핵심적인 계산 과정이 바로 '어텐션(Attention)' 메커니즘이다.</p>
<p>어텐션 메커니즘은 문장 안의 모든 단어(토큰)들을 동시에 살펴보고, 현재 처리하고 있는 단어가 문장 내의 다른 모든 단어들과 수학적으로 얼마나 강력하게 연결되어 있는지를 '점수(Score)'로 계산하는 과정이다. 이 보고서에서는 초등학생과 중학생을 포함한 초보자들도 직관적으로 이해할 수 있도록, 어텐션 메커니즘을 구성하는 가장 기초적인 다중 헤드 어텐션(MHA)부터 속도를 개선한 다중 쿼리 어텐션(MQA), 그리고 최적의 균형을 찾은 그룹화 쿼리 어텐션(GQA)의 구조와 수학적 차이점을 비유 없이 숫자와 배열, 계산 과정 그 자체의 직관적인 단어만으로 아주 꼼꼼하게 설명한다.</p>
<hr>
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="2-어텐션-연산의-세-가지-핵심-숫자-배열-q-k-v">2. 어텐션 연산의 세 가지 핵심 숫자 배열: Q, K, V<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/transformer-structure#2-%EC%96%B4%ED%85%90%EC%85%98-%EC%97%B0%EC%82%B0%EC%9D%98-%EC%84%B8-%EA%B0%80%EC%A7%80-%ED%95%B5%EC%8B%AC-%EC%88%AB%EC%9E%90-%EB%B0%B0%EC%97%B4-q-k-v" class="hash-link" aria-label="2. 어텐션 연산의 세 가지 핵심 숫자 배열: Q, K, V에 대한 직접 링크" title="2. 어텐션 연산의 세 가지 핵심 숫자 배열: Q, K, V에 대한 직접 링크" translate="no">​</a></h2>
<p>어텐션 메커니즘이 단어들 사이의 관계 점수를 계산하기 위해서는, 입력된 단어의 원래 숫자 배열(임베딩)을 그대로 사용하지 않습니다. 대신, 컴퓨터는 각 단어마다 세 가지의 완전히 새로운 숫자 배열을 만들어냅니다. 이 세 가지 숫자 배열을 각각 쿼리(Query,  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi></mrow><annotation encoding="application/x-tex">Q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span></span></span></span> ), 키(Key,  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span></span></span></span> ), 밸류(Value,  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi></mrow><annotation encoding="application/x-tex">V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span> )라고 부른다. 이 세 가지 배열은 연산 과정에서 각기 다른 독립적인 역할을 수행한다.</p>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="21-쿼리query--q--숫자-배열의-의미와-역할">2.1 쿼리(Query,  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi></mrow><annotation encoding="application/x-tex">Q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span></span></span></span> ) 숫자 배열의 의미와 역할<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/transformer-structure#21-%EC%BF%BC%EB%A6%ACquery--q--%EC%88%AB%EC%9E%90-%EB%B0%B0%EC%97%B4%EC%9D%98-%EC%9D%98%EB%AF%B8%EC%99%80-%EC%97%AD%ED%95%A0" class="hash-link" aria-label="21-쿼리query--q--숫자-배열의-의미와-역할에 대한 직접 링크" title="21-쿼리query--q--숫자-배열의-의미와-역할에 대한 직접 링크" translate="no">​</a></h3>
<p>수학 기호  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi></mrow><annotation encoding="application/x-tex">Q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span></span></span></span> 로 표기되는 쿼리는, '현재 컴퓨터가 관계를 파악하고자 하는 기준 단어'가 다른 단어들로부터 어떤 정보를 얻어와야 하는지를 나타내는 숫자 배열이다.
컴퓨터가 문장을 왼쪽에서 오른쪽으로 순서대로 처리할 때, 현재 처리하고 있는 특정한 단어가 존재한다. 이 단어의  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi></mrow><annotation encoding="application/x-tex">Q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span></span></span></span>  숫자 배열 안에는 "나는 지금 문법적인 목적어를 나타내는 숫자를 찾고 있다"라거나 "나는 시간이나 장소를 나타내는 숫자를 찾고 있다"는 목적을 띠는 수학적 값들이 들어 있다. 즉,  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi></mrow><annotation encoding="application/x-tex">Q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span></span></span></span> 는 문장 내의 다른 단어들과 곱해지기 위해 준비된 일종의 '탐색용 숫자 목록'이다.</p>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="22-키key--k--숫자-배열의-의미와-역할">2.2 키(Key,  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span></span></span></span> ) 숫자 배열의 의미와 역할<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/transformer-structure#22-%ED%82%A4key--k--%EC%88%AB%EC%9E%90-%EB%B0%B0%EC%97%B4%EC%9D%98-%EC%9D%98%EB%AF%B8%EC%99%80-%EC%97%AD%ED%95%A0" class="hash-link" aria-label="22-키key--k--숫자-배열의-의미와-역할에 대한 직접 링크" title="22-키key--k--숫자-배열의-의미와-역할에 대한 직접 링크" translate="no">​</a></h3>
<p>수학 기호  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span></span></span></span> 로 표기되는 키는, 문장 안에 있는 각 단어가 '자기 자신이 어떤 문법적 특징과 정보를 가지고 있는지'를 나타내는 숫자 배열이다.
<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi></mrow><annotation encoding="application/x-tex">Q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span></span></span></span> 가 탐색을 위한 숫자 배열이라면,  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span></span></span></span> 는 그 탐색의 대상이 되는 숫자 배열이다. 어텐션 계산 과정에서 기준 단어의  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi></mrow><annotation encoding="application/x-tex">Q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span></span></span></span>  숫자 배열은 문장 안에 있는 모든 단어들의  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span></span></span></span>  숫자 배열과 직접 수학적으로 곱해집니다. 곱셈 계산의 결과물이 큰 숫자로 나오면 두 단어의 관련도가 높다는 뜻이고, 작은 숫자나 음수로 나오면 관련도가 낮다는 뜻이다. 따라서  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span></span></span></span> 는  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi></mrow><annotation encoding="application/x-tex">Q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span></span></span></span> 와 상호작용하여 관련도 점수를 도출해내는 역할을 한다.</p>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="23-밸류value--v--숫자-배열의-의미와-역할">2.3 밸류(Value,  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi></mrow><annotation encoding="application/x-tex">V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span> ) 숫자 배열의 의미와 역할<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/transformer-structure#23-%EB%B0%B8%EB%A5%98value--v--%EC%88%AB%EC%9E%90-%EB%B0%B0%EC%97%B4%EC%9D%98-%EC%9D%98%EB%AF%B8%EC%99%80-%EC%97%AD%ED%95%A0" class="hash-link" aria-label="23-밸류value--v--숫자-배열의-의미와-역할에 대한 직접 링크" title="23-밸류value--v--숫자-배열의-의미와-역할에 대한 직접 링크" translate="no">​</a></h3>
<p>수학 기호  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi></mrow><annotation encoding="application/x-tex">V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span> 로 표기되는 밸류는, 해당 단어가 실제로 다음 계산 단계로 넘겨줄 '진짜 알맹이 정보'를 담고 있는 숫자 배열이다.
앞서  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi></mrow><annotation encoding="application/x-tex">Q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span></span></span></span> 와  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span></span></span></span> 를 곱해서 두 단어 사이의 관련도 점수를 계산한다고 설명했습니다. 이 점수가 계산되고 나면, 컴퓨터는 그 점수(비율)만큼  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi></mrow><annotation encoding="application/x-tex">V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span>  숫자 배열에 들어있는 값들을 곱해서 가져옵니다. 만약 어떤 단어가 기준 단어와 관련도가 매우 높다고 점수가 나오면, 컴퓨터는 해당 단어의  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi></mrow><annotation encoding="application/x-tex">V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span>  숫자 배열에 있는 숫자들을 거의 그대로 복사해서 가져옵니다. 반대로 관련도 점수가 낮으면  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi></mrow><annotation encoding="application/x-tex">V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span> 의 숫자들에 아주 작은 소수를 곱해서 거의 무시할 수 있는 수준의 숫자로 만들어 버립니다. 최종적으로  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi></mrow><annotation encoding="application/x-tex">Q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span></span></span></span> 와  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span></span></span></span> 는 점수를 계산하는 데 사용되고 소멸하며,  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi></mrow><annotation encoding="application/x-tex">V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span>  숫자 배열들만이 점수에 따라 섞여서 다음 단계로 전달됩니다.</p>
<hr>
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="3-기호--w_q-w_k-w_v-의-의미-숫자-배열을-변환하는-가중치-행렬">3. 기호  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>W</mi><mi>q</mi></msub><mo separator="true">,</mo><msub><mi>W</mi><mi>k</mi></msub><mo separator="true">,</mo><msub><mi>W</mi><mi>v</mi></msub></mrow><annotation encoding="application/x-tex">W_q, W_k, W_v</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9694em;vertical-align:-0.2861em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0359em">q</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0315em">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0359em">v</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span> 의 의미: 숫자 배열을 변환하는 가중치 행렬<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/transformer-structure#3-%EA%B8%B0%ED%98%B8--w_q-w_k-w_v-%EC%9D%98-%EC%9D%98%EB%AF%B8-%EC%88%AB%EC%9E%90-%EB%B0%B0%EC%97%B4%EC%9D%84-%EB%B3%80%ED%99%98%ED%95%98%EB%8A%94-%EA%B0%80%EC%A4%91%EC%B9%98-%ED%96%89%EB%A0%AC" class="hash-link" aria-label="3-기호--w_q-w_k-w_v-의-의미-숫자-배열을-변환하는-가중치-행렬에 대한 직접 링크" title="3-기호--w_q-w_k-w_v-의-의미-숫자-배열을-변환하는-가중치-행렬에 대한 직접 링크" translate="no">​</a></h2>
<p>컴퓨터는 원래 단어의 숫자 배열(임베딩)을 어떻게  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi><mo separator="true">,</mo><mi>K</mi><mo separator="true">,</mo><mi>V</mi></mrow><annotation encoding="application/x-tex">Q, K, V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span> 라는 세 가지 서로 다른 숫자 배열로 나눌 수 있을까요? 이 변환 작업을 수행하는 수학적 도구가 바로 가중치 행렬(Weight Matrix)이며, 수학 기호로는  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>W</mi></mrow><annotation encoding="application/x-tex">W</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.1389em">W</span></span></span></span> 를 사용한다.</p>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="31-가중치-행렬-w-이란-무엇인가">3.1 가중치 행렬( <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>W</mi></mrow><annotation encoding="application/x-tex">W</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.1389em">W</span></span></span></span> )이란 무엇인가?<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/transformer-structure#31-%EA%B0%80%EC%A4%91%EC%B9%98-%ED%96%89%EB%A0%AC-w-%EC%9D%B4%EB%9E%80-%EB%AC%B4%EC%97%87%EC%9D%B8%EA%B0%80" class="hash-link" aria-label="31-가중치-행렬-w-이란-무엇인가에 대한 직접 링크" title="31-가중치-행렬-w-이란-무엇인가에 대한 직접 링크" translate="no">​</a></h3>
<p>가중치 행렬은 수많은 숫자들을 가로와 세로로 반듯하게 줄지어 배치해 놓은 '거대한 숫자 표(Grid)'이다. 인공지능이 대규모 문서를 읽으며 학습(Training)을 진행할 때, 이 표 안에 들어있는 숫자들은 고정되어 있지 않고 끊임없이 변경됩니다. 정답에 가까운 결과를 내기 위해 표 안의 숫자 값들이 스스로 조금씩 커지거나 작아지는 과정을 거치는데, 이를 '학습 가능한 매개변수(Learnable Parameters)'라고 부른다.</p>
<p>어텐션 메커니즘에는 기본적으로 세 가지의 독립적인 가중치 행렬 표가 존재한다:</p>
<ul>
<li class=""><strong><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>W</mi><mi>q</mi></msub></mrow><annotation encoding="application/x-tex">W_q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9694em;vertical-align:-0.2861em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0359em">q</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em"><span></span></span></span></span></span></span></span></span></span></strong> : 입력된 원래 단어의 숫자를  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi></mrow><annotation encoding="application/x-tex">Q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span></span></span></span> (쿼리) 숫자 배열로 바꾸기 위해 사용되는 숫자 표이다.</li>
<li class=""><strong><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>W</mi><mi>k</mi></msub></mrow><annotation encoding="application/x-tex">W_k</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0315em">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span></strong> : 입력된 원래 단어의 숫자를  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span></span></span></span> (키) 숫자 배열로 바꾸기 위해 사용되는 숫자 표이다.</li>
<li class=""><strong><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>W</mi><mi>v</mi></msub></mrow><annotation encoding="application/x-tex">W_v</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0359em">v</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span></strong> : 입력된 원래 단어의 숫자를  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi></mrow><annotation encoding="application/x-tex">V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span> (밸류) 숫자 배열로 바꾸기 위해 사용되는 숫자 표이다.</li>
</ul>
<p>기호에서 대문자  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>W</mi></mrow><annotation encoding="application/x-tex">W</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.1389em">W</span></span></span></span> 는 숫자들이 표 형태로 모여 있는 행렬(Weight Matrix)임을 의미하고, 아래 첨자인  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>q</mi><mo separator="true">,</mo><mi>k</mi><mo separator="true">,</mo><mi>v</mi></mrow><annotation encoding="application/x-tex">q, k, v</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em"></span><span class="mord mathnormal" style="margin-right:0.0359em">q</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.0315em">k</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.0359em">v</span></span></span></span> 는 이 표를 통과한 결과물이 각각 쿼리, 키, 밸류가 된다는 것을 구체적으로 지칭한다.</p>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="32-행렬-변환의-수학적-공식">3.2 행렬 변환의 수학적 공식<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/transformer-structure#32-%ED%96%89%EB%A0%AC-%EB%B3%80%ED%99%98%EC%9D%98-%EC%88%98%ED%95%99%EC%A0%81-%EA%B3%B5%EC%8B%9D" class="hash-link" aria-label="3.2 행렬 변환의 수학적 공식에 대한 직접 링크" title="3.2 행렬 변환의 수학적 공식에 대한 직접 링크" translate="no">​</a></h3>
<p>단어의 초기 입력 숫자 배열을 알파벳  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>X</mi></mrow><annotation encoding="application/x-tex">X</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0785em">X</span></span></span></span> 라고 부른다. 이  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>X</mi></mrow><annotation encoding="application/x-tex">X</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0785em">X</span></span></span></span> 를  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi><mo separator="true">,</mo><mi>K</mi><mo separator="true">,</mo><mi>V</mi></mrow><annotation encoding="application/x-tex">Q, K, V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span> 로 변환하는 수학 공식은 다음과 같습니다:</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>Q</mi><mo>=</mo><mi>X</mi><mo>⋅</mo><msub><mi>W</mi><mi>q</mi></msub></mrow><annotation encoding="application/x-tex">Q = X \cdot W_q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0785em">X</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">⋅</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.9694em;vertical-align:-0.2861em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0359em">q</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em"><span></span></span></span></span></span></span></span></span></span></span>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>K</mi><mo>=</mo><mi>X</mi><mo>⋅</mo><msub><mi>W</mi><mi>k</mi></msub></mrow><annotation encoding="application/x-tex">K = X \cdot W_k</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0785em">X</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">⋅</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0315em">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span></span>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>V</mi><mo>=</mo><mi>X</mi><mo>⋅</mo><msub><mi>W</mi><mi>v</mi></msub></mrow><annotation encoding="application/x-tex">V = X \cdot W_v</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0785em">X</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">⋅</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0359em">v</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span></span>
<p>여기서 가운데 점( <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo>⋅</mo></mrow><annotation encoding="application/x-tex">\cdot</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4445em"></span><span class="mord">⋅</span></span></span></span> )은 '행렬 곱셈(Matrix Multiplication)'을 의미한다. 행렬 곱셈은  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>X</mi></mrow><annotation encoding="application/x-tex">X</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0785em">X</span></span></span></span>  배열에 있는 숫자들과  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>W</mi></mrow><annotation encoding="application/x-tex">W</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.1389em">W</span></span></span></span>  표에 있는 숫자들을 정해진 순서대로 하나씩 곱하고 그 결과들을 모두 더하여 완전히 새로운 숫자를 만들어내는 매우 복잡한 산술 과정이다.</p>
<p>동일한 원본 단어  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>X</mi></mrow><annotation encoding="application/x-tex">X</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0785em">X</span></span></span></span> 를 가지고 출발하더라도, 곱해지는 대상인  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>W</mi><mi>q</mi></msub><mo separator="true">,</mo><msub><mi>W</mi><mi>k</mi></msub><mo separator="true">,</mo><msub><mi>W</mi><mi>v</mi></msub></mrow><annotation encoding="application/x-tex">W_q, W_k, W_v</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9694em;vertical-align:-0.2861em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0359em">q</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0315em">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0359em">v</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span>  숫자 표 안에 들어있는 값들이 서로 완전히 다르기 때문에, 계산이 끝난 후 생성되는  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi><mo separator="true">,</mo><mi>K</mi><mo separator="true">,</mo><mi>V</mi></mrow><annotation encoding="application/x-tex">Q, K, V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span>  숫자 배열 역시 서로 완전히 다른 값들을 가지게 됩니다. 이 과정을 통해 하나의 단어가 탐색 목적( <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi></mrow><annotation encoding="application/x-tex">Q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span></span></span></span> ), 특징 식별( <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span></span></span></span> ), 실제 정보( <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi></mrow><annotation encoding="application/x-tex">V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span> )라는 세 가지 수학적 상태로 분리됩니다.</p>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="33-변환-과정의-구체적인-숫자-계산-예시">3.3 변환 과정의 구체적인 숫자 계산 예시<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/transformer-structure#33-%EB%B3%80%ED%99%98-%EA%B3%BC%EC%A0%95%EC%9D%98-%EA%B5%AC%EC%B2%B4%EC%A0%81%EC%9D%B8-%EC%88%AB%EC%9E%90-%EA%B3%84%EC%82%B0-%EC%98%88%EC%8B%9C" class="hash-link" aria-label="3.3 변환 과정의 구체적인 숫자 계산 예시에 대한 직접 링크" title="3.3 변환 과정의 구체적인 숫자 계산 예시에 대한 직접 링크" translate="no">​</a></h3>
<p>위의 곱셈이 실제로 어떻게 이루어지는지 구체적인 숫자를 통해 살펴보겠습니다.
세 개의 단어로 이루어진 문장이 있고, 각 단어( <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>X</mi></mrow><annotation encoding="application/x-tex">X</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0785em">X</span></span></span></span> )는 4개의 숫자로 표현된다고 가정한다.</p>
<ul>
<li class="">첫 번째 단어 ( <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>x</mi><mn>1</mn></msub></mrow><annotation encoding="application/x-tex">x_1</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5806em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span> ) = <code>[1, 0, 1, 0]</code></li>
<li class="">두 번째 단어 ( <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>x</mi><mn>2</mn></msub></mrow><annotation encoding="application/x-tex">x_2</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5806em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span> ) = <code>[0, 1, 0, 1]</code></li>
<li class="">세 번째 단어 ( <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>x</mi><mn>3</mn></msub></mrow><annotation encoding="application/x-tex">x_3</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5806em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">3</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span> ) = <code>[1, 1, 0, 0]</code></li>
</ul>
<p>이 단어들을 2개의 숫자로 이루어진 쿼리( <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi></mrow><annotation encoding="application/x-tex">Q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span></span></span></span> ) 배열로 만들기 위해,  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>W</mi><mi>q</mi></msub></mrow><annotation encoding="application/x-tex">W_q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9694em;vertical-align:-0.2861em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0359em">q</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em"><span></span></span></span></span></span></span></span></span></span> 라는 가중치 행렬 표를 준비한다. 이 표는 4줄(행)과 2칸(열)으로 이루어진 숫자들이다.</p>
<ul>
<li class="">1번째 줄: <code>[1, 0]</code></li>
<li class="">2번째 줄: <code>[0, 1]</code></li>
<li class="">3번째 줄: <code>[1, 0]</code></li>
<li class="">4번째 줄: <code>[0, 1]</code></li>
</ul>
<p>첫 번째 단어  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>x</mi><mn>1</mn></msub></mrow><annotation encoding="application/x-tex">x_1</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5806em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span> 의 쿼리 점수를 만들기 위해  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>x</mi><mn>1</mn></msub><mo>⋅</mo><msub><mi>W</mi><mi>q</mi></msub></mrow><annotation encoding="application/x-tex">x_1 \cdot W_q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5945em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">⋅</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.9694em;vertical-align:-0.2861em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0359em">q</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em"><span></span></span></span></span></span></span></span></span></span> 를 계산한다. 행렬 곱셈의 규칙에 따라  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>x</mi><mn>1</mn></msub></mrow><annotation encoding="application/x-tex">x_1</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5806em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span> 의 숫자들과  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>W</mi><mi>q</mi></msub></mrow><annotation encoding="application/x-tex">W_q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9694em;vertical-align:-0.2861em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0359em">q</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em"><span></span></span></span></span></span></span></span></span></span> 의 세로줄 숫자를 차례대로 곱해서 더한다.</p>
<ul>
<li class=""><strong>첫 번째 결괏값:</strong> (1 × 1) + (0 × 0) + (1 × 1) + (0 × 0) =  <strong>2</strong> * <strong>두 번째 결괏값:</strong> (1 × 0) + (0 × 1) + (1 × 0) + (0 × 1) =  <strong>0</strong> 계산 결과, 원래 <code>[1, 0, 1, 0]</code>이었던 첫 번째 단어의 숫자는  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>W</mi><mi>q</mi></msub></mrow><annotation encoding="application/x-tex">W_q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9694em;vertical-align:-0.2861em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0359em">q</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em"><span></span></span></span></span></span></span></span></span></span> 와의 곱셈을 통해 <code>[2, 0]</code>이라는 쿼리( <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi></mrow><annotation encoding="application/x-tex">Q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span></span></span></span> ) 숫자 배열로 새롭게 변환되었습니다. 컴퓨터는 이와 완전히 동일한 덧셈과 곱셈 방식을  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>W</mi><mi>k</mi></msub></mrow><annotation encoding="application/x-tex">W_k</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0315em">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span>  표와  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>W</mi><mi>v</mi></msub></mrow><annotation encoding="application/x-tex">W_v</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0359em">v</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span>  표에 대해서도 수만 번, 수억 번 반복하여 모든 단어의  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span></span></span></span> 와  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi></mrow><annotation encoding="application/x-tex">V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span>  배열을 만들어냅니다. 또한  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>W</mi><mi>q</mi></msub></mrow><annotation encoding="application/x-tex">W_q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9694em;vertical-align:-0.2861em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0359em">q</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em"><span></span></span></span></span></span></span></span></span></span> 와  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>W</mi><mi>k</mi></msub></mrow><annotation encoding="application/x-tex">W_k</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0315em">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span>  행렬은 곱셈 과정을 통해 원래 입력된 단어의 숫자 개수(4개)보다 적은 개수(2개)의 숫자로 차원을 줄여주는 역할도 하여, 컴퓨터가 계산해야 할 전체 데이터의 크기를 줄이는 데 도움을 줍니다.</li>
</ul>
<hr>
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="4-어텐션-점수를-계산하는-수학-공식-단계별-해설">4. 어텐션 점수를 계산하는 수학 공식 단계별 해설<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/transformer-structure#4-%EC%96%B4%ED%85%90%EC%85%98-%EC%A0%90%EC%88%98%EB%A5%BC-%EA%B3%84%EC%82%B0%ED%95%98%EB%8A%94-%EC%88%98%ED%95%99-%EA%B3%B5%EC%8B%9D-%EB%8B%A8%EA%B3%84%EB%B3%84-%ED%95%B4%EC%84%A4" class="hash-link" aria-label="4. 어텐션 점수를 계산하는 수학 공식 단계별 해설에 대한 직접 링크" title="4. 어텐션 점수를 계산하는 수학 공식 단계별 해설에 대한 직접 링크" translate="no">​</a></h2>
<p>모든 단어에 대해  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi><mo separator="true">,</mo><mi>K</mi><mo separator="true">,</mo><mi>V</mi></mrow><annotation encoding="application/x-tex">Q, K, V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span>  숫자 배열이 준비되면, 컴퓨터는 최종적으로 각 단어가 서로 얼마나 연관되어 있는지를 구하는 수학 공식을 실행한다. 이 공식은 다음과 같습니다.</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>A</mi><mi>t</mi><mi>t</mi><mi>e</mi><mi>n</mi><mi>t</mi><mi>i</mi><mi>o</mi><mi>n</mi><mo stretchy="false">(</mo><mi>Q</mi><mo separator="true">,</mo><mi>K</mi><mo separator="true">,</mo><mi>V</mi><mo stretchy="false">)</mo><mo>=</mo><mi>s</mi><mi>o</mi><mi>f</mi><mi>t</mi><mi>m</mi><mi>a</mi><mi>x</mi><mrow><mo fence="true">(</mo><mfrac><mrow><mi>Q</mi><msup><mi>K</mi><mi>T</mi></msup></mrow><msqrt><msub><mi>d</mi><mi>k</mi></msub></msqrt></mfrac><mo fence="true">)</mo></mrow><mi>V</mi></mrow><annotation encoding="application/x-tex">Attention(Q, K, V) = softmax\left(\frac{Q K^T}{\sqrt{d_k}}\right) V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal">A</span><span class="mord mathnormal">tt</span><span class="mord mathnormal">e</span><span class="mord mathnormal">n</span><span class="mord mathnormal">t</span><span class="mord mathnormal">i</span><span class="mord mathnormal">o</span><span class="mord mathnormal">n</span><span class="mopen">(</span><span class="mord mathnormal">Q</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:2.4684em;vertical-align:-0.95em"></span><span class="mord mathnormal">so</span><span class="mord mathnormal" style="margin-right:0.1076em">f</span><span class="mord mathnormal">t</span><span class="mord mathnormal">ma</span><span class="mord mathnormal">x</span><span class="mspace" style="margin-right:0.1667em"></span><span class="minner"><span class="mopen delimcenter" style="top:0em"><span class="delimsizing size3">(</span></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.5183em"><span style="top:-2.2528em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord sqrt"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8572em"><span class="svg-align" style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord" style="padding-left:0.833em"><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0315em">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span><span style="top:-2.8172em"><span class="pstrut" style="height:3em"></span><span class="hide-tail" style="min-width:0.853em;height:1.08em"><svg xmlns="http://www.w3.org/2000/svg" width="400em" height="1.08em" viewBox="0 0 400000 1080" preserveAspectRatio="xMinYMin slice"><path d="M95,702
c-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14
c0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54
c44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10
s173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429
c69,-144,104.5,-217.7,106.5,-221
l0 -0
c5.3,-9.3,12,-14,20,-14
H400000v40H845.2724
s-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7
c-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z
M834 80h400000v40h-400000z"></path></svg></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.1828em"><span></span></span></span></span></span></span></span><span style="top:-3.23em"><span class="pstrut" style="height:3em"></span><span class="frac-line" style="border-bottom-width:0.04em"></span></span><span style="top:-3.677em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal">Q</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.0715em">K</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413em"><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.1389em">T</span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.93em"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mclose delimcenter" style="top:0em"><span class="delimsizing size3">)</span></span></span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span></span>
<p>이 복잡해 보이는 공식은 실제로는 4개의 순차적인 계산 단계로 나뉘어 있다. 각 단계를 차례대로 분석해 보겠습니다.</p>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="41-1단계-내적-dot-product--q-cdot-kt--계산">4.1 1단계: 내적 (Dot Product,  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi><mo>⋅</mo><msup><mi>K</mi><mi>T</mi></msup></mrow><annotation encoding="application/x-tex">Q \cdot K^T</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">⋅</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.8413em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.0715em">K</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413em"><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.1389em">T</span></span></span></span></span></span></span></span></span></span></span> ) 계산<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/transformer-structure#41-1%EB%8B%A8%EA%B3%84-%EB%82%B4%EC%A0%81-dot-product--q-cdot-kt--%EA%B3%84%EC%82%B0" class="hash-link" aria-label="41-1단계-내적-dot-product--q-cdot-kt--계산에 대한 직접 링크" title="41-1단계-내적-dot-product--q-cdot-kt--계산에 대한 직접 링크" translate="no">​</a></h3>
<p>가장 먼저 괄호 안의 위쪽에 있는  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi><mo>⋅</mo><msup><mi>K</mi><mi>T</mi></msup></mrow><annotation encoding="application/x-tex">Q \cdot K^T</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">⋅</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.8413em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.0715em">K</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413em"><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.1389em">T</span></span></span></span></span></span></span></span></span></span></span> 를 계산한다. 이는 쿼리 행렬과 키 행렬을 곱한다는 뜻이다. 여기서 대문자  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>T</mi></mrow><annotation encoding="application/x-tex">T</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.1389em">T</span></span></span></span> 는 '전치(Transpose)'라는 수학 연산을 뜻한다. 전치 연산은 숫자 표의 가로줄을 세로줄로, 세로줄을 가로줄로 모양을 뒤집는 작업이다. 곱셈이 수학적으로 올바르게 맞물려 돌아가게 하기 위해  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span></span></span></span>  표의 방향을 돌려주는 필수적인 단계이다.</p>
<p>방향을 맞춘 후, 현재 단어의  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi></mrow><annotation encoding="application/x-tex">Q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span></span></span></span>  숫자 배열과 다른 단어의  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span></span></span></span>  숫자 배열을 서로 곱한다. 이를 '내적'이라고 부른다.
만약 두 숫자 배열의 같은 위치에 있는 숫자들이 비슷하게 크고 양의 부호를 가진다면 곱셈 결과는 아주 큰 양수가 나옵니다. 반대로 숫자들이 서로 반대 부호이거나 한쪽이 0이라면 곱셈 결과는 작아지거나 음수가 됩니다. 따라서 이 계산 결과로 나온 '큰 숫자'는 두 단어가 문법적, 의미적으로 매우 강력하게 연결되어 있다는 최초의 '관련도 원시 점수(Raw Score)'가 됩니다.</p>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="42-2단계-스케일링-scaling--sqrtd_k-로-나누기">4.2 2단계: 스케일링 (Scaling,  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msqrt><msub><mi>d</mi><mi>k</mi></msub></msqrt></mrow><annotation encoding="application/x-tex">\sqrt{d_k}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.04em;vertical-align:-0.1828em"></span><span class="mord sqrt"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8572em"><span class="svg-align" style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord" style="padding-left:0.833em"><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0315em">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span><span style="top:-2.8172em"><span class="pstrut" style="height:3em"></span><span class="hide-tail" style="min-width:0.853em;height:1.08em"><svg xmlns="http://www.w3.org/2000/svg" width="400em" height="1.08em" viewBox="0 0 400000 1080" preserveAspectRatio="xMinYMin slice"><path d="M95,702
c-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14
c0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54
c44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10
s173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429
c69,-144,104.5,-217.7,106.5,-221
l0 -0
c5.3,-9.3,12,-14,20,-14
H400000v40H845.2724
s-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7
c-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z
M834 80h400000v40h-400000z"></path></svg></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.1828em"><span></span></span></span></span></span></span></span></span> 로 나누기)<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/transformer-structure#42-2%EB%8B%A8%EA%B3%84-%EC%8A%A4%EC%BC%80%EC%9D%BC%EB%A7%81-scaling--sqrtd_k-%EB%A1%9C-%EB%82%98%EB%88%84%EA%B8%B0" class="hash-link" aria-label="42-2단계-스케일링-scaling--sqrtd_k-로-나누기에 대한 직접 링크" title="42-2단계-스케일링-scaling--sqrtd_k-로-나누기에 대한 직접 링크" translate="no">​</a></h3>
<p>1단계에서 만들어진 원시 점수들을 그 아래에 있는  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msqrt><msub><mi>d</mi><mi>k</mi></msub></msqrt></mrow><annotation encoding="application/x-tex">\sqrt{d_k}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.04em;vertical-align:-0.1828em"></span><span class="mord sqrt"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8572em"><span class="svg-align" style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord" style="padding-left:0.833em"><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0315em">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span><span style="top:-2.8172em"><span class="pstrut" style="height:3em"></span><span class="hide-tail" style="min-width:0.853em;height:1.08em"><svg xmlns="http://www.w3.org/2000/svg" width="400em" height="1.08em" viewBox="0 0 400000 1080" preserveAspectRatio="xMinYMin slice"><path d="M95,702
c-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14
c0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54
c44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10
s173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429
c69,-144,104.5,-217.7,106.5,-221
l0 -0
c5.3,-9.3,12,-14,20,-14
H400000v40H845.2724
s-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7
c-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z
M834 80h400000v40h-400000z"></path></svg></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.1828em"><span></span></span></span></span></span></span></span></span> 라는 값으로 나누어 줍니다.
여기서  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>d</mi><mi>k</mi></msub></mrow><annotation encoding="application/x-tex">d_k</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8444em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0315em">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span> 는 '키( <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span></span></span></span> ) 숫자 배열 안에 들어있는 숫자의 총 개수'를 의미한다. 만약  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span></span></span></span>  배열 안에 64개의 숫자가 들어있다면  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>d</mi><mi>k</mi></msub></mrow><annotation encoding="application/x-tex">d_k</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8444em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0315em">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span> 는 64이다. 기호  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msqrt><mrow></mrow></msqrt></mrow><annotation encoding="application/x-tex">\sqrt{}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.04em;vertical-align:-0.2395em"></span><span class="mord sqrt"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8005em"><span class="svg-align" style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord" style="padding-left:0.833em"></span></span><span style="top:-2.7605em"><span class="pstrut" style="height:3em"></span><span class="hide-tail" style="min-width:0.853em;height:1.08em"><svg xmlns="http://www.w3.org/2000/svg" width="400em" height="1.08em" viewBox="0 0 400000 1080" preserveAspectRatio="xMinYMin slice"><path d="M95,702
c-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14
c0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54
c44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10
s173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429
c69,-144,104.5,-217.7,106.5,-221
l0 -0
c5.3,-9.3,12,-14,20,-14
H400000v40H845.2724
s-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7
c-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z
M834 80h400000v40h-400000z"></path></svg></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2395em"><span></span></span></span></span></span></span></span></span> 는 '제곱근'을 의미한다. 64의 제곱근은 8이다. 따라서 1단계에서 얻은 모든 점수를 숫자 8로 나누어 줍니다.</p>
<p>이 나눗셈을 하는 이유는 수학적 안정성 때문이다. 수많은 숫자를 서로 곱하고 더하는 1단계의 내적 과정을 거치면 원시 점수가 수백, 수천 단위로 비정상적으로 커질 수 있다. 숫자가 지나치게 커지면 다음 단계의 계산에서 컴퓨터가 오류를 일으키거나 한 단어에만 점수가 극단적으로 몰리는 현상이 발생한다. 이를 막기 위해 일정한 비율로 숫자의 크기를 줄여주는 안전장치가 바로 이 스케일링 나눗셈 단계이다.</p>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="43-3단계-소프트맥스softmax-함수-적용">4.3 3단계: 소프트맥스(Softmax) 함수 적용<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/transformer-structure#43-3%EB%8B%A8%EA%B3%84-%EC%86%8C%ED%94%84%ED%8A%B8%EB%A7%A5%EC%8A%A4softmax-%ED%95%A8%EC%88%98-%EC%A0%81%EC%9A%A9" class="hash-link" aria-label="4.3 3단계: 소프트맥스(Softmax) 함수 적용에 대한 직접 링크" title="4.3 3단계: 소프트맥스(Softmax) 함수 적용에 대한 직접 링크" translate="no">​</a></h3>
<p>숫자의 크기를 줄인 후, 소프트맥스(Softmax)라는 특별한 수학 공식을 적용한다.
소프트맥스 함수는 음수, 양수, 0 등 제각각인 숫자들의 목록을 입력받아서, 이 숫자들을 모두 '0보다 크고 1보다 작은 양의 소수'로 변환해 줍니다. 가장 중요한 특징은, 소프트맥스 함수를 통과하여 나온 숫자들을 전부 합치면 반드시 정확히 '1.0'이 된다는 것이다. 1.0은 수학적으로 100%를 의미한다.</p>
<p>즉, 이 과정은 단순한 점수들을 '백분율 확률'로 바꿔주는 역할을 한다. 만약 기준 단어 주변에 A, B, C라는 세 단어가 있다면, 소프트맥스 계산 후 A에는 0.1(10%), B에는 0.2(20%), C에는 0.7(70%)이라는 비율이 할당됩니다. 이렇게 되면 컴퓨터는 "현재 단어를 이해하기 위해서는 전체 정보의 70%를 C 단어에서 가져오고, 20%를 B 단어에서 가져오면 된다"라고 정확한 수치로 판단할 수 있게 됩니다.</p>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="44-4단계-밸류-v--배열-곱하기">4.4 4단계: 밸류( <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi></mrow><annotation encoding="application/x-tex">V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span> ) 배열 곱하기<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/transformer-structure#44-4%EB%8B%A8%EA%B3%84-%EB%B0%B8%EB%A5%98-v--%EB%B0%B0%EC%97%B4-%EA%B3%B1%ED%95%98%EA%B8%B0" class="hash-link" aria-label="44-4단계-밸류-v--배열-곱하기에 대한 직접 링크" title="44-4단계-밸류-v--배열-곱하기에 대한 직접 링크" translate="no">​</a></h3>
<p>마지막으로, 3단계에서 구한 백분율 점수들에 각각의 단어가 가지고 있던  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi></mrow><annotation encoding="application/x-tex">V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span> (밸류) 숫자 배열을 곱해줍니다.
앞선 예시에서 C 단어의 비율이 0.7(70%)이므로, C 단어의  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi></mrow><annotation encoding="application/x-tex">V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span>  숫자 배열에 들어있는 모든 숫자에 0.7을 곱한다. B 단어의  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi></mrow><annotation encoding="application/x-tex">V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span>  숫자 배열에는 0.2를 곱하고, A 단어의  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi></mrow><annotation encoding="application/x-tex">V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span>  숫자 배열에는 0.1을 곱한다. 그런 다음, 곱셈이 완료된 숫자 배열들을 전부 하나로 더해줍니다.</p>
<p>이 덧셈의 결과물로 단 하나의 새로운 숫자 배열이 탄생한다. 이 배열 안에는 주변 단어들로부터 얻어낸 핵심 정보들이 관련도 비율에 맞게 정확히 혼합되어 있다. 어텐션 메커니즘 공식의 결과물이자 완성품인 이 최종 숫자 배열은 인공지능 모델의 다음 계산 단계로 넘어가게 됩니다.</p>
<hr>
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="5-다중-헤드-어텐션mha-다양한-시각으로-문장-분석하기">5. 다중 헤드 어텐션(MHA): 다양한 시각으로 문장 분석하기<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/transformer-structure#5-%EB%8B%A4%EC%A4%91-%ED%97%A4%EB%93%9C-%EC%96%B4%ED%85%90%EC%85%98mha-%EB%8B%A4%EC%96%91%ED%95%9C-%EC%8B%9C%EA%B0%81%EC%9C%BC%EB%A1%9C-%EB%AC%B8%EC%9E%A5-%EB%B6%84%EC%84%9D%ED%95%98%EA%B8%B0" class="hash-link" aria-label="5. 다중 헤드 어텐션(MHA): 다양한 시각으로 문장 분석하기에 대한 직접 링크" title="5. 다중 헤드 어텐션(MHA): 다양한 시각으로 문장 분석하기에 대한 직접 링크" translate="no">​</a></h2>
<p>앞서 설명한 기본적인 수학적 과정을 단 한 번만 수행하는 구조를 단일 헤드 어텐션(Single-Head Attention)이라고 부른다. 그러나 이 방식에는 치명적인 약점이 있다. 하나의 단어는 문장 속에서 동시에 여러 가지 역할을 수행할 수 있다. 예를 들어 어떤 단어는 앞 단어와는 주어-동사 관계를 가지면서 동시에 뒤 단어와는 시간적 순서 관계를 가질 수 있다. 단 한 번의 점수 계산만으로는 이처럼 복잡하고 다양한 관계를 동시에 모두 찾아내는 것이 수학적으로 불가능한다.
이 문제를 해결하기 위해 고안된 가장 표준적인 구조가 바로 '다중 헤드 어텐션(Multi-Head Attention, MHA)'이다.</p>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="51-다중-헤드-어텐션의-병렬-계산-구조">5.1 다중 헤드 어텐션의 병렬 계산 구조<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/transformer-structure#51-%EB%8B%A4%EC%A4%91-%ED%97%A4%EB%93%9C-%EC%96%B4%ED%85%90%EC%85%98%EC%9D%98-%EB%B3%91%EB%A0%AC-%EA%B3%84%EC%82%B0-%EA%B5%AC%EC%A1%B0" class="hash-link" aria-label="5.1 다중 헤드 어텐션의 병렬 계산 구조에 대한 직접 링크" title="5.1 다중 헤드 어텐션의 병렬 계산 구조에 대한 직접 링크" translate="no">​</a></h3>
<p>MHA는 앞서 설명한  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi><mo separator="true">,</mo><mi>K</mi><mo separator="true">,</mo><mi>V</mi></mrow><annotation encoding="application/x-tex">Q, K, V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span>  변환과 어텐션 공식을 한 번이 아니라 '동시에 여러 번(병렬로)' 계산하는 구조이다. 이 독립적인 계산 과정 하나하나를 '헤드(Head)'라고 부른다.</p>
<p>만약 인공지능 모델이 32개의 헤드를 사용하도록 설계되었다면, 컴퓨터 내부에는  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>W</mi><mi>q</mi></msub></mrow><annotation encoding="application/x-tex">W_q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9694em;vertical-align:-0.2861em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0359em">q</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em"><span></span></span></span></span></span></span></span></span></span>  표 32개,  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>W</mi><mi>k</mi></msub></mrow><annotation encoding="application/x-tex">W_k</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0315em">k</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span>  표 32개,  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>W</mi><mi>v</mi></msub></mrow><annotation encoding="application/x-tex">W_v</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em"><span style="top:-2.55em;margin-left:-0.1389em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight" style="margin-right:0.0359em">v</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span>  표 32개가 각각 완전히 독립적으로 존재하게 됩니다. 첫 번째 헤드의  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>W</mi></mrow><annotation encoding="application/x-tex">W</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.1389em">W</span></span></span></span>  표들은 주어와 목적어의 관계를 찾는 데 특화되도록 숫자들이 맞춰지고, 두 번째 헤드의  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>W</mi></mrow><annotation encoding="application/x-tex">W</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.1389em">W</span></span></span></span>  표들은 감정적인 관계를 찾는 데 특화되도록 맞춰지는 방식이다.</p>
<p>32개의 헤드 각각은 자신이 담당한 고유의  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi><mo separator="true">,</mo><mi>K</mi><mo separator="true">,</mo><mi>V</mi></mrow><annotation encoding="application/x-tex">Q, K, V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span>  배열을 만들고 내적, 스케일링, 소프트맥스 연산을 독립적으로 수행한다. 모든 계산이 끝나면 32개의 최종 숫자 배열 결과물이 생성됩니다. 컴퓨터는 이 32개의 짧은 숫자 배열들을 일렬로 길게 이어 붙여(Concatenation) 하나의 거대한 숫자 배열로 만듭니다. 이 거대한 숫자 배열은 한 번 더 가중치 행렬과 곱해져 최종적인 결괏값으로 압축됩니다. MHA는 문장의 다각적인 특징을 동시에 잡아내기 때문에 최고 수준의 높은 품질과 정확도를 보장한다.</p>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="52-mha의-치명적-한계-메모리-대역폭memory-bandwidth-병목-현상">5.2 MHA의 치명적 한계: 메모리 대역폭(Memory Bandwidth) 병목 현상<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/transformer-structure#52-mha%EC%9D%98-%EC%B9%98%EB%AA%85%EC%A0%81-%ED%95%9C%EA%B3%84-%EB%A9%94%EB%AA%A8%EB%A6%AC-%EB%8C%80%EC%97%AD%ED%8F%ADmemory-bandwidth-%EB%B3%91%EB%AA%A9-%ED%98%84%EC%83%81" class="hash-link" aria-label="5.2 MHA의 치명적 한계: 메모리 대역폭(Memory Bandwidth) 병목 현상에 대한 직접 링크" title="5.2 MHA의 치명적 한계: 메모리 대역폭(Memory Bandwidth) 병목 현상에 대한 직접 링크" translate="no">​</a></h3>
<p>MHA 구조는 문맥의 복잡한 뉘앙스를 파악하는 데는 탁월하지만, 인공지능이 텍스트를 생성하여 사용자에게 답변을 출력하는 '추론(Inference)' 과정에서 심각한 하드웨어적 문제, 즉 '메모리 대역폭 병목 현상'을 발생시킵니다.</p>
<p>인공지능은 문장을 한 번에 뱉어내지 않고, 한 번에 한 토큰(단어)씩 순서대로 만들어냅니다. 열 번째 단어를 생성하기 위해서는 앞서 만들어진 아홉 개의 단어들이 가지고 있는  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span></span></span></span> 와  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi></mrow><annotation encoding="application/x-tex">V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span>  숫자 배열 정보가 반드시 필요함. 단어를 하나 생성할 때마다 과거 단어들의  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span></span></span></span> 와  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi></mrow><annotation encoding="application/x-tex">V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span> 를 처음부터 다시 계산하는 것은 비효율적이므로, 컴퓨터는 생성된 단어들의  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi><mo separator="true">,</mo><mi>V</mi></mrow><annotation encoding="application/x-tex">K, V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span>  숫자 배열들을 컴퓨터의 저장 장치인 메모리에 임시로 차곡차곡 보관해 둡니다. 이 보관 장소를 'KV 캐시(Key-Value Cache)'라고 부른다.</p>
<p>문제는 MHA 구조에서는 단어 하나당 헤드의 개수만큼  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span></span></span></span> 와  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi></mrow><annotation encoding="application/x-tex">V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span> 가 무더기로 만들어진다는 점이다. 32개의 헤드가 있다면, 방금 생성된 단어 하나에 대해서만 32개의  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span></span></span></span>  숫자 배열과 32개의  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi></mrow><annotation encoding="application/x-tex">V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span>  숫자 배열이 생겨나서 KV 캐시에 저장됩니다.</p>
<p>컴퓨터의 프로세서(연산 장치)가 다음 단어를 계산하려면, 메모리에 보관된 과거의 모든 숫자를 프로세서 내부로 끌어와야 한다. '메모리 대역폭'이란 한 번에 메모리에서 프로세서로 옮길 수 있는 데이터의 물리적인 최대 한계량을 뜻한다. MHA에서는 단어가 길어질수록 KV 캐시의 크기가 눈덩이처럼 거대해져서, 숫자를 옮기는 양이 대역폭의 한계를 넘어버립니다. 결국 프로세서는 산술 계산을 1초 만에 끝낼 수 있음에도 불구하고, 메모리에서 엄청난 양의  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi><mo separator="true">,</mo><mi>V</mi></mrow><annotation encoding="application/x-tex">K, V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span>  숫자 덩어리들이 도착할 때까지 아무것도 하지 못하고 대기해야 한다. 이로 인해 응답 속도가 치명적으로 느려지며, 대규모 서비스에서는 하드웨어 비용이 기하급수적으로 증가하게 됩니다.</p>
<hr>
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="6-다중-쿼리-어텐션mqa-속도-극대화와-메모리-다이어트-구조">6. 다중 쿼리 어텐션(MQA): 속도 극대화와 메모리 다이어트 구조<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/transformer-structure#6-%EB%8B%A4%EC%A4%91-%EC%BF%BC%EB%A6%AC-%EC%96%B4%ED%85%90%EC%85%98mqa-%EC%86%8D%EB%8F%84-%EA%B7%B9%EB%8C%80%ED%99%94%EC%99%80-%EB%A9%94%EB%AA%A8%EB%A6%AC-%EB%8B%A4%EC%9D%B4%EC%96%B4%ED%8A%B8-%EA%B5%AC%EC%A1%B0" class="hash-link" aria-label="6. 다중 쿼리 어텐션(MQA): 속도 극대화와 메모리 다이어트 구조에 대한 직접 링크" title="6. 다중 쿼리 어텐션(MQA): 속도 극대화와 메모리 다이어트 구조에 대한 직접 링크" translate="no">​</a></h2>
<p>MHA가 유발하는 거대한 KV 캐시 용량과 메모리 대역폭 초과 문제를 해결하기 위해 컴퓨터 공학자들이 새롭게 고안한 극단적인 구조가 바로 '다중 쿼리 어텐션(Multi-Query Attention, MQA)'이다. MQA의 유일한 목적은 KV 캐시에 저장해야 할 숫자의 양을 획기적으로 줄여 컴퓨터의 연산 대기 시간을 없애는 것이다.</p>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="61-mqa의-구조-키-k-와-밸류-v-의-단일화-및-공유">6.1 MQA의 구조: 키( <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span></span></span></span> )와 밸류( <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi></mrow><annotation encoding="application/x-tex">V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span> )의 단일화 및 공유<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/transformer-structure#61-mqa%EC%9D%98-%EA%B5%AC%EC%A1%B0-%ED%82%A4-k-%EC%99%80-%EB%B0%B8%EB%A5%98-v-%EC%9D%98-%EB%8B%A8%EC%9D%BC%ED%99%94-%EB%B0%8F-%EA%B3%B5%EC%9C%A0" class="hash-link" aria-label="61-mqa의-구조-키-k-와-밸류-v-의-단일화-및-공유에 대한 직접 링크" title="61-mqa의-구조-키-k-와-밸류-v-의-단일화-및-공유에 대한 직접 링크" translate="no">​</a></h3>
<p>MHA 구조에서는 32개의 쿼리( <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi></mrow><annotation encoding="application/x-tex">Q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span></span></span></span> ) 헤드가 있다면, 이에 대응하여 32개의 독립적인 키( <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span></span></span></span> ) 헤드와 32개의 독립적인 밸류( <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi></mrow><annotation encoding="application/x-tex">V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span> ) 헤드가 존재했습니다.
반면 MQA 구조에서는 32개의 쿼리( <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi></mrow><annotation encoding="application/x-tex">Q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span></span></span></span> ) 헤드는 그대로 유지하지만, 키( <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span></span></span></span> ) 헤드와 밸류( <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi></mrow><annotation encoding="application/x-tex">V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span> ) 헤드의 개수를 강제로 단 1개로 줄여버립니다. 즉, 32개의 완전히 다른 탐색 목적을 가진  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi></mrow><annotation encoding="application/x-tex">Q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span></span></span></span>  헤드들이 점수 계산을 할 때, 오직 1개의 동일한  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span></span></span></span>  숫자 배열과 1개의 동일한  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi></mrow><annotation encoding="application/x-tex">V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span>  숫자 배열을 모든  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi></mrow><annotation encoding="application/x-tex">Q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span></span></span></span>  헤드가 '공유(Share)'하여 계산에 사용하는 방식이다.</p>
<p>수학적 연산 과정(내적 단계)에서, 첫 번째  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi></mrow><annotation encoding="application/x-tex">Q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span></span></span></span>  숫자 배열은 이 유일한  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span></span></span></span>  배열과 곱셈을 한다. 두 번째  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi></mrow><annotation encoding="application/x-tex">Q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span></span></span></span>  숫자 배열 역시 동일한  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span></span></span></span>  배열과 곱셈을 하고, 32번째  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi></mrow><annotation encoding="application/x-tex">Q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span></span></span></span>  숫자 배열도 완전히 똑같은  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span></span></span></span>  배열과 곱셈을 수행한다.</p>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="62-kv-캐시-축소가-가져오는-속도-향상">6.2 KV 캐시 축소가 가져오는 속도 향상<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/transformer-structure#62-kv-%EC%BA%90%EC%8B%9C-%EC%B6%95%EC%86%8C%EA%B0%80-%EA%B0%80%EC%A0%B8%EC%98%A4%EB%8A%94-%EC%86%8D%EB%8F%84-%ED%96%A5%EC%83%81" class="hash-link" aria-label="6.2 KV 캐시 축소가 가져오는 속도 향상에 대한 직접 링크" title="6.2 KV 캐시 축소가 가져오는 속도 향상에 대한 직접 링크" translate="no">​</a></h3>
<p>이렇게  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span></span></span></span> 와  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi></mrow><annotation encoding="application/x-tex">V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span> 를 하나로 통일하여 공유하면, KV 캐시 공간에 보관해야 하는 숫자의 양이 극적으로 감소한다. 헤드가 32개인 모델을 기준으로 할 때, MHA에 비해 보관해야 할 숫자의 크기가 32분의 1로 줄어드는 엄청난 절약 효과가 발생한다.</p>
<p>저장된 숫자의 덩어리가 매우 작기 때문에, 컴퓨터는 메모리에서 프로세서로 이 숫자들을 즉각적으로 이동시킬 수 있다. 데이터 이동이 메모리 대역폭의 한계에 부딪히지 않으므로 대기 시간이 사라지고, 프로세서의 연산 능력을 100% 활용하여 글자를 엄청나게 빠른 속도로 생성해냅니다. 또한 차지하는 메모리 공간이 작아진 덕분에 동시에 더 많은 사용자의 질문을 한 번에 모아서 처리(배치 크기 증가)할 수 있어 대규모 서비스에 매우 유리하다. 그 결과 MQA는 가장 처리 속도가 빠른 구조로 평가받습니다.</p>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="63-속도를-얻기-위해-지불하는-대가-품질-저하-현상">6.3 속도를 얻기 위해 지불하는 대가: 품질 저하 현상<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/transformer-structure#63-%EC%86%8D%EB%8F%84%EB%A5%BC-%EC%96%BB%EA%B8%B0-%EC%9C%84%ED%95%B4-%EC%A7%80%EB%B6%88%ED%95%98%EB%8A%94-%EB%8C%80%EA%B0%80-%ED%92%88%EC%A7%88-%EC%A0%80%ED%95%98-%ED%98%84%EC%83%81" class="hash-link" aria-label="6.3 속도를 얻기 위해 지불하는 대가: 품질 저하 현상에 대한 직접 링크" title="6.3 속도를 얻기 위해 지불하는 대가: 품질 저하 현상에 대한 직접 링크" translate="no">​</a></h3>
<p>그러나 MQA는 메모리 이동 속도 문제를 완벽히 해결한 대신, 생성해 내는 글의 품질이 떨어지고 모델을 학습시키는 과정이 매우 불안정해지는 치명적인 부작용을 동반한다.</p>
<p>여러 개의  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi></mrow><annotation encoding="application/x-tex">Q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span></span></span></span>  헤드를 만드는 이유는 다각적이고 복잡한 문맥을 탐색하기 위함이다. 그러나 MQA에서는 32가지의 각기 다른 질문( <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi></mrow><annotation encoding="application/x-tex">Q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span></span></span></span> )을 던짐에도 불구하고, 오직 단 1개의  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span></span></span></span>  숫자 배열과 단 1개의  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi></mrow><annotation encoding="application/x-tex">V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span>  숫자 배열 안에서만 해답을 찾아야 한다. 수학적으로 단 1개의 숫자 배열 안에 한 단어가 가지는 모든 복잡한 뉘앙스와 특성을 뭉개지지 않게 담아내는 것은 불가능한다.</p>
<p>결국  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span></span></span></span> 와  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi></mrow><annotation encoding="application/x-tex">V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span> 의 다양성이 제한되기 때문에 계산되는 어텐션 점수도 획일화되며, 미묘한 문법적 차이나 긴 문맥에서의 앞뒤 관계를 제대로 포착하지 못하게 됩니다. 이로 인해 MQA 모델은 생성해 내는 문장의 정확도가 떨어지고 인공지능의 사고력(용량)이 하락하는 결과를 초래한다.</p>
<hr>
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="7-그룹화-쿼리-어텐션gqa-구조적-타협을-통한-수학적-최적화">7. 그룹화 쿼리 어텐션(GQA): 구조적 타협을 통한 수학적 최적화<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/transformer-structure#7-%EA%B7%B8%EB%A3%B9%ED%99%94-%EC%BF%BC%EB%A6%AC-%EC%96%B4%ED%85%90%EC%85%98gqa-%EA%B5%AC%EC%A1%B0%EC%A0%81-%ED%83%80%ED%98%91%EC%9D%84-%ED%86%B5%ED%95%9C-%EC%88%98%ED%95%99%EC%A0%81-%EC%B5%9C%EC%A0%81%ED%99%94" class="hash-link" aria-label="7. 그룹화 쿼리 어텐션(GQA): 구조적 타협을 통한 수학적 최적화에 대한 직접 링크" title="7. 그룹화 쿼리 어텐션(GQA): 구조적 타협을 통한 수학적 최적화에 대한 직접 링크" translate="no">​</a></h2>
<p>MHA는 품질이 최고지만 메모리를 너무 많이 소모하고 너무 느리다는 극단적인 단점이 있다. 반대로 MQA는 속도가 최고지만 품질이 떨어진다는 극단적인 단점이 있다. 컴퓨터 공학자들은 이 두 가지 극단적인 구조의 장점만을 취합하기 위해, 수학적인 중간 형태(Interpolation)인 '그룹화 쿼리 어텐션(Grouped-Query Attention, GQA)'이라는 혁신적인 구조를 개발했습니다.</p>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="71-gqa의-작동-원리-쿼리를-묶어--k-v--공유하기">7.1 GQA의 작동 원리: 쿼리를 묶어  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi><mo separator="true">,</mo><mi>V</mi></mrow><annotation encoding="application/x-tex">K, V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span>  공유하기<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/transformer-structure#71-gqa%EC%9D%98-%EC%9E%91%EB%8F%99-%EC%9B%90%EB%A6%AC-%EC%BF%BC%EB%A6%AC%EB%A5%BC-%EB%AC%B6%EC%96%B4--k-v--%EA%B3%B5%EC%9C%A0%ED%95%98%EA%B8%B0" class="hash-link" aria-label="71-gqa의-작동-원리-쿼리를-묶어--k-v--공유하기에 대한 직접 링크" title="71-gqa의-작동-원리-쿼리를-묶어--k-v--공유하기에 대한 직접 링크" translate="no">​</a></h3>
<p>GQA 구조는  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span></span></span></span> 와  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi></mrow><annotation encoding="application/x-tex">V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span>  헤드의 개수를 1개(MQA)로 줄이지도 않고, 전체  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi></mrow><annotation encoding="application/x-tex">Q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span></span></span></span>  헤드의 개수(MHA)만큼 다 만들지도 않습니다. 대신 그 사이의 적절한 중간 개수를 설정하고,  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi></mrow><annotation encoding="application/x-tex">Q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span></span></span></span>  헤드들을 여러 개의 '그룹(Group)'으로 나누어 관리한다.</p>
<p>구체적인 계산 과정은 다음과 같습니다:</p>
<ol>
<li class=""><strong>초기 변환(Projection):</strong> 이전과 마찬가지로 원래 단어의 숫자들을 곱셈하여 여러 개의  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi><mo separator="true">,</mo><mi>K</mi><mo separator="true">,</mo><mi>V</mi></mrow><annotation encoding="application/x-tex">Q, K, V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span>  헤드 배열로 만듭니다.</li>
<li class=""><strong>그룹 분할(Grouping of Queries):</strong> 전체  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi></mrow><annotation encoding="application/x-tex">Q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span></span></span></span>  헤드들을 일정한 숫자의 묶음(그룹)으로 정갈하게 나눕니다.</li>
<li class=""><strong>헤드 할당(Grouped Key/Value):</strong> 나누어진 각각의 그룹에 정확히 1쌍의  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span></span></span></span>  헤드와  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi></mrow><annotation encoding="application/x-tex">V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span>  헤드를 배정한다.</li>
<li class=""><strong>그룹 내부 연산(Within-Group Attention):</strong> 각 그룹에 속한  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi></mrow><annotation encoding="application/x-tex">Q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span></span></span></span>  헤드들은 다른 그룹의  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi><mo separator="true">,</mo><mi>V</mi></mrow><annotation encoding="application/x-tex">K, V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span> 는 쳐다보지 않고, 오직 자신들이 속한 그룹에 배정된 1개의  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi><mo separator="true">,</mo><mi>V</mi></mrow><annotation encoding="application/x-tex">K, V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span>  숫자 배열만을 공유하여 곱셈 연산과 소프트맥스 점수 계산을 진행한다.</li>
<li class=""><strong>결과 연결(Concatenation):</strong> 모든 그룹에서의 계산이 개별적으로 끝나면, 그 결과물 숫자 배열들을 하나의 긴 선으로 이어 붙여 최종 결과물을 만들어냅니다.</li>
</ol>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="72-그룹을-나누는-수학적-규칙과-구조적-확장성">7.2 그룹을 나누는 수학적 규칙과 구조적 확장성<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/transformer-structure#72-%EA%B7%B8%EB%A3%B9%EC%9D%84-%EB%82%98%EB%88%84%EB%8A%94-%EC%88%98%ED%95%99%EC%A0%81-%EA%B7%9C%EC%B9%99%EA%B3%BC-%EA%B5%AC%EC%A1%B0%EC%A0%81-%ED%99%95%EC%9E%A5%EC%84%B1" class="hash-link" aria-label="7.2 그룹을 나누는 수학적 규칙과 구조적 확장성에 대한 직접 링크" title="7.2 그룹을 나누는 수학적 규칙과 구조적 확장성에 대한 직접 링크" translate="no">​</a></h3>
<p>GQA 모델이 구체적으로 몇 개의 헤드를 공유할지는 수학적 나눗셈 공식으로 결정됩니다.</p>
<ul>
<li class="">전체 쿼리( <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi></mrow><annotation encoding="application/x-tex">Q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span></span></span></span> ) 헤드의 총 개수를 수학 기호  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>H</mi></mrow><annotation encoding="application/x-tex">H</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0813em">H</span></span></span></span> 라고 부른다.</li>
<li class="">쪼개려는 그룹의 총 개수를 수학 기호  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>G</mi></mrow><annotation encoding="application/x-tex">G</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal">G</span></span></span></span> 라고 부른다.</li>
</ul>
<p>각 그룹 안에 몇 개의  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi></mrow><annotation encoding="application/x-tex">Q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span></span></span></span>  헤드가 들어갈지는 아주 간단하게 전체 쿼리 개수( <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>H</mi></mrow><annotation encoding="application/x-tex">H</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0813em">H</span></span></span></span> )를 그룹 개수( <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>G</mi></mrow><annotation encoding="application/x-tex">G</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal">G</span></span></span></span> )로 나누면 됩니다 ( <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mfrac><mi>H</mi><mi>G</mi></mfrac></mrow><annotation encoding="application/x-tex">\frac{H}{G}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.2173em;vertical-align:-0.345em"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8723em"><span style="top:-2.655em"><span class="pstrut" style="height:3em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">G</span></span></span></span><span style="top:-3.23em"><span class="pstrut" style="height:3em"></span><span class="frac-line" style="border-bottom-width:0.04em"></span></span><span style="top:-3.394em"><span class="pstrut" style="height:3em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.0813em">H</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.345em"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span> ). 예를 들어, 전체 쿼리 헤드가 32개( <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>H</mi><mo>=</mo><mn>32</mn></mrow><annotation encoding="application/x-tex">H=32</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0813em">H</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.6444em"></span><span class="mord">32</span></span></span></span> )이고 그룹을 8개( <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>G</mi><mo>=</mo><mn>8</mn></mrow><annotation encoding="application/x-tex">G=8</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal">G</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.6444em"></span><span class="mord">8</span></span></span></span> )로 설정했다면, 하나의 그룹 안에는 정확히 4개의  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi></mrow><annotation encoding="application/x-tex">Q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span></span></span></span>  헤드가 들어갑니다. 따라서 이 4개의 쿼리 헤드가 1쌍의  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi><mo separator="true">,</mo><mi>V</mi></mrow><annotation encoding="application/x-tex">K, V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span>  헤드를 수학적으로 공유하여 함께 계산을 수행한다.</p>
<p>이 공식( <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mfrac><mi>H</mi><mi>G</mi></mfrac></mrow><annotation encoding="application/x-tex">\frac{H}{G}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.2173em;vertical-align:-0.345em"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8723em"><span style="top:-2.655em"><span class="pstrut" style="height:3em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">G</span></span></span></span><span style="top:-3.23em"><span class="pstrut" style="height:3em"></span><span class="frac-line" style="border-bottom-width:0.04em"></span></span><span style="top:-3.394em"><span class="pstrut" style="height:3em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.0813em">H</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.345em"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span> )은 매우 특별한 성질을 가집니다.  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>G</mi></mrow><annotation encoding="application/x-tex">G</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal">G</span></span></span></span> 에 어떤 숫자를 넣느냐에 따라 앞서 설명한 MHA와 MQA의 형태로 완벽하게 변형될 수 있기 때문이다.</p>
<ul>
<li class=""><strong><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>G</mi><mo>=</mo><mn>1</mn></mrow><annotation encoding="application/x-tex">G = 1</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal">G</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.6444em"></span><span class="mord">1</span></span></span></span> 인 경우:</strong> 전체  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi></mrow><annotation encoding="application/x-tex">Q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span></span></span></span>  헤드를 오직 1개의 덩어리로 묶는다는 뜻이다. 즉 32개의 쿼리가 전부 같은 그룹에 들어가 단 1쌍의  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi><mo separator="true">,</mo><mi>V</mi></mrow><annotation encoding="application/x-tex">K, V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span> 를 공유하므로, 이는 다중 쿼리 어텐션(MQA) 구조와 완벽하게 100% 동일해집니다.</li>
<li class=""><strong><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>G</mi><mo>=</mo><mi>H</mi></mrow><annotation encoding="application/x-tex">G = H</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal">G</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0813em">H</span></span></span></span> 인 경우:</strong> 쿼리의 총 개수(32개)만큼 그룹(32개)을 만든다는 뜻이다. 즉 1개의 그룹 안에 1개의 쿼리만 들어가게 되므로, 각 쿼리마다 자신만의 독립적인  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi><mo separator="true">,</mo><mi>V</mi></mrow><annotation encoding="application/x-tex">K, V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span> 를 가지게 됩니다. 이는 다중 헤드 어텐션(MHA) 구조와 완벽하게 100% 동일해집니다.</li>
</ul>
<p>GQA는 이처럼 1과 전체 숫자 사이의 중간값( <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>G</mi><mo>=</mo><mn>8</mn></mrow><annotation encoding="application/x-tex">G=8</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal">G</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.6444em"></span><span class="mord">8</span></span></span></span>  등)을 채택함으로써, 두 극단적인 구조의 장점을 취하는 중도적인 형태를 완성한다.</p>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="73-gqa가-메모리와-품질의-균형을-잡는-원리">7.3 GQA가 메모리와 품질의 균형을 잡는 원리<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/transformer-structure#73-gqa%EA%B0%80-%EB%A9%94%EB%AA%A8%EB%A6%AC%EC%99%80-%ED%92%88%EC%A7%88%EC%9D%98-%EA%B7%A0%ED%98%95%EC%9D%84-%EC%9E%A1%EB%8A%94-%EC%9B%90%EB%A6%AC" class="hash-link" aria-label="7.3 GQA가 메모리와 품질의 균형을 잡는 원리에 대한 직접 링크" title="7.3 GQA가 메모리와 품질의 균형을 잡는 원리에 대한 직접 링크" translate="no">​</a></h3>
<p>GQA 구조의 가장 강력한 장점은, 높은 메모리 대역폭을 요구하지 않으면서도 MHA가 생성하는 고품질의 텍스트와 거의 비슷한 수준의 정교한 결과물을 만들어낸다는 점이다.</p>
<p>단 1쌍의  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi><mo separator="true">,</mo><mi>V</mi></mrow><annotation encoding="application/x-tex">K, V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span> 만 가지던 MQA와 달리, GQA(예: 그룹이 8개인 경우)는 서로 다른 8개의  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi><mo separator="true">,</mo><mi>V</mi></mrow><annotation encoding="application/x-tex">K, V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span>  숫자 배열을 보유하고 있다. 이 8개의 배열은 문장의 각기 다른 뉘앙스(문법, 감정, 시제 등)를 나누어 담기에 충분한 수학적 다양성을 제공한다. 따라서 MQA에서 발생하던 품질 하락 현상이 거의 관찰되지 않습니다.</p>
<p>동시에 전체  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi><mo separator="true">,</mo><mi>V</mi></mrow><annotation encoding="application/x-tex">K, V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span> 의 개수를 32개(MHA)에서 8개(GQA)로 과감하게 줄였기 때문에, KV 캐시에 저장해야 할 숫자의 덩어리 크기가 MHA 대비 4분의 1로 대폭 축소됩니다. 이 정도 크기는 프로세서로 숫자를 실어 나르는 메모리 대역폭의 물리적 한계를 넘지 않기 때문에 병목 현상이 발생하지 않습니다. 따라서 프로세서가 연산을 기다릴 필요가 없어 응답 속도는 속도 특화 구조인 MQA와 거의 비슷한 수준으로 매우 빠르게 유지됩니다.</p>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="74-업트레이닝uptraining-기존-mha-모델을-gqa-모델로-개조하는-방법">7.4 업트레이닝(Uptraining): 기존 MHA 모델을 GQA 모델로 개조하는 방법<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/transformer-structure#74-%EC%97%85%ED%8A%B8%EB%A0%88%EC%9D%B4%EB%8B%9Duptraining-%EA%B8%B0%EC%A1%B4-mha-%EB%AA%A8%EB%8D%B8%EC%9D%84-gqa-%EB%AA%A8%EB%8D%B8%EB%A1%9C-%EA%B0%9C%EC%A1%B0%ED%95%98%EB%8A%94-%EB%B0%A9%EB%B2%95" class="hash-link" aria-label="7.4 업트레이닝(Uptraining): 기존 MHA 모델을 GQA 모델로 개조하는 방법에 대한 직접 링크" title="7.4 업트레이닝(Uptraining): 기존 MHA 모델을 GQA 모델로 개조하는 방법에 대한 직접 링크" translate="no">​</a></h3>
<p>GQA 구조의 우수성이 입증되자, 컴퓨터 공학자들은 이미 막대한 비용과 전력을 들여 학습시켜 놓은 기존의 MHA 구조 인공지능들을 버리지 않고 GQA 구조로 저렴하게 개조하는 기술을 개발했습니다. 이 개조 및 재학습 과정을 '업트레이닝(Uptraining)'이라고 부른다.</p>
<p>기존 32개의 헤드를 가진 MHA를 8개의 그룹을 가진 GQA로 변환하기 위해, 컴퓨터는 MHA가 가지고 있던 기존 32개의  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span></span></span></span>  가중치 행렬 표 숫자들을 무작위로 지우지 않습니다. 대신, 한 그룹에 할당될 4개의  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span></span></span></span>  행렬 숫자들을 하나로 합친 뒤 평균(평균값 내기, Mean-pooling)을 구하는 수학적 계산을 수행한다. 즉, 기존 4개의 배열이 나누어 가지고 있던 특징 정보를 하나의 숫자 배열에 고르게 압축하여 평균값 형태로 욱여넣어 단일한 공유  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span></span></span></span>  배열을 만드는 것이다.  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>V</mi></mrow><annotation encoding="application/x-tex">V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span>  가중치 행렬 표도 똑같이 평균값을 구하여 변환한다.</p>
<p>이러한 평균화 계산을 마치고 나면 모델 구조가 GQA로 변경되지만, 숫자들을 강제로 평균 냈기 때문에 일시적으로 인공지능의 성능이 불안정해집니다. 이를 바로잡기 위해 아주 짧은 기간 동안 추가로 모델을 학습시킵니다. 이 추가 학습에 들어가는 연산 비용은 처음 모델을 바닥부터 만들 때 썼던 전체 비용의 단 5%밖에 되지 않습니다. 이 5%의 비용만으로도 숫자들이 새로운 공유 구조에 완벽하게 적응하며 자리를 잡게 됩니다. 특히 처음부터 바닥에서 시작한 MQA 모델들은 학습 과정에서 심한 오류(손실 값 폭등)를 겪으며 붕괴하는 현상이 잦았으나, 이 업트레이닝 기법을 적용한 GQA 모델들은 그러한 불안정성 없이 매우 안정적으로 고성능을 도출한다는 것이 실험으로 증명되었습니다.</p>
<hr>
<!-- -->
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="8-mha-mqa-gqa의-종합-비교-분석">8. MHA, MQA, GQA의 종합 비교 분석<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/transformer-structure#8-mha-mqa-gqa%EC%9D%98-%EC%A2%85%ED%95%A9-%EB%B9%84%EA%B5%90-%EB%B6%84%EC%84%9D" class="hash-link" aria-label="8. MHA, MQA, GQA의 종합 비교 분석에 대한 직접 링크" title="8. MHA, MQA, GQA의 종합 비교 분석에 대한 직접 링크" translate="no">​</a></h2>
<p>이상의 수학적 원리와 구조적 특징을 한눈에 명확하게 비교하기 위해 세 가지 어텐션 메커니즘을 종합적으로 대조해 보겠습니다.</p>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="81-필요-숫자-행렬-개수-비교-구조적-차이">8.1 필요 숫자 행렬 개수 비교 (구조적 차이)<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/transformer-structure#81-%ED%95%84%EC%9A%94-%EC%88%AB%EC%9E%90-%ED%96%89%EB%A0%AC-%EA%B0%9C%EC%88%98-%EB%B9%84%EA%B5%90-%EA%B5%AC%EC%A1%B0%EC%A0%81-%EC%B0%A8%EC%9D%B4" class="hash-link" aria-label="8.1 필요 숫자 행렬 개수 비교 (구조적 차이)에 대한 직접 링크" title="8.1 필요 숫자 행렬 개수 비교 (구조적 차이)에 대한 직접 링크" translate="no">​</a></h3>
<p>아래 표는 총 쿼리( <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi></mrow><annotation encoding="application/x-tex">Q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">Q</span></span></span></span> ) 헤드 개수가 32개로 고정된 인공지능 모델에서 각 메커니즘을 적용했을 때, 계산에 필요한 전체 독립적인 행렬(숫자 표)의 개수가 어떻게 달라지는지를 보여줍니다. 이 행렬의 개수 총합이 적을수록 메모리를 덜 차지하는 구조임을 직관적으로 알 수 있다.</p>
<table><thead><tr><th style="text-align:left">어텐션 메커니즘 구조</th><th style="text-align:left">쿼리(Q) 헤드 개수</th><th style="text-align:left">키(K) 헤드 개수</th><th style="text-align:left">밸류(V) 헤드 개수</th><th style="text-align:left">단일 처리 층에서의 행렬 총합</th></tr></thead><tbody><tr><td style="text-align:left"><strong>다중 헤드 어텐션 (MHA)</strong></td><td style="text-align:left">32개</td><td style="text-align:left">32개</td><td style="text-align:left">32개</td><td style="text-align:left">총 96개의 독립 행렬 사용</td></tr><tr><td style="text-align:left"><strong>다중 쿼리 어텐션 (MQA)</strong></td><td style="text-align:left">32개</td><td style="text-align:left">단 1개 (공유)</td><td style="text-align:left">단 1개 (공유)</td><td style="text-align:left">총 34개의 독립 행렬 사용</td></tr><tr><td style="text-align:left"><strong>그룹화 쿼리 어텐션 (GQA-8)</strong></td><td style="text-align:left">32개</td><td style="text-align:left">8개 (그룹 공유)</td><td style="text-align:left">8개 (그룹 공유)</td><td style="text-align:left">총 48개의 독립 행렬 사용</td></tr></tbody></table>
<blockquote>
<p><strong>참고사항:</strong> GQA-8은 전체 헤드를 8개의 그룹으로 나누었다는 것을 의미하며, 하나의 그룹당 4개의 쿼리 헤드가 배치되어 연산을 진행하는 설정이다.</p>
</blockquote>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="82-컴퓨터-성능-및-작동-효율성에-미치는-영향-비교">8.2 컴퓨터 성능 및 작동 효율성에 미치는 영향 비교<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/transformer-structure#82-%EC%BB%B4%ED%93%A8%ED%84%B0-%EC%84%B1%EB%8A%A5-%EB%B0%8F-%EC%9E%91%EB%8F%99-%ED%9A%A8%EC%9C%A8%EC%84%B1%EC%97%90-%EB%AF%B8%EC%B9%98%EB%8A%94-%EC%98%81%ED%96%A5-%EB%B9%84%EA%B5%90" class="hash-link" aria-label="8.2 컴퓨터 성능 및 작동 효율성에 미치는 영향 비교에 대한 직접 링크" title="8.2 컴퓨터 성능 및 작동 효율성에 미치는 영향 비교에 대한 직접 링크" translate="no">​</a></h3>
<p>위의 행렬 구조 차이는 인공지능이 동작할 때 컴퓨터 메모리 시스템과 품질에 다음과 같은 직접적인 결과를 초래한다.</p>
<ul>
<li class="">
<p><strong>다중 헤드 어텐션 (MHA):</strong> * <strong>KV 캐시 메모리 소모량:</strong> 극도로 높습니다. 단어 하나당 수많은 숫자를 보관해야 한다.</p>
<ul>
<li class=""><strong>응답 속도 제약 요소:</strong> 메모리 대역폭의 한계(포화 상태)에 부딪힙니다. 엄청난 양의  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi><mo separator="true">,</mo><mi>V</mi></mrow><annotation encoding="application/x-tex">K, V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span>  숫자 덩어리들을 메모리 창고에서 꺼내오느라 시간이 오래 걸려 답변 출력 속도가 가장 느립니다.</li>
<li class=""><strong>결과물의 품질 수준:</strong> 가장 뛰어납니다. 각각 독립된 32개의 시선이 단어 사이의 모든 미세한 문법적, 논리적 관계를 놓치지 않고 완벽하게 수학적으로 계산해냅니다.</li>
</ul>
</li>
<li class="">
<p><strong>다중 쿼리 어텐션 (MQA):</strong> * <strong>KV 캐시 메모리 소모량:</strong> 매우 적습니다. 단 1쌍만 저장하면 됩니다.</p>
<ul>
<li class=""><strong>응답 속도 제약 요소:</strong> 숫자 덩어리가 매우 가벼워 메모리 대역폭의 병목 현상이 발생하지 않습니다. 숫자가 즉시즉시 이동하므로 프로세서가 쉴 틈 없이 돌아가 속도가 최고로 빠릅니다.</li>
<li class=""><strong>결과물의 품질 수준:</strong> 눈에 띄게 하락한다. 서로 다른 32가지의 목적을 가진 쿼리들이 오직 한 가지 종류의  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi><mo separator="true">,</mo><mi>V</mi></mrow><annotation encoding="application/x-tex">K, V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span>  안에서만 해답을 찾아야 하므로 억지스러운 수학적 결론이 도출되어 문장 관계 파악 능력이 둔화됩니다.</li>
</ul>
</li>
<li class="">
<p><strong>그룹화 쿼리 어텐션 (GQA):</strong> * <strong>KV 캐시 메모리 소모량:</strong> 중간 수준으로 균형 잡혀 있다 (그룹 개수 조정으로 세밀한 통제 가능).</p>
<ul>
<li class=""><strong>응답 속도 제약 요소:</strong> <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi><mo separator="true">,</mo><mi>V</mi></mrow><annotation encoding="application/x-tex">K, V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span>  덩어리를 8개 수준으로 통제했기 때문에 메모리 대역폭 한계선 아래로 데이터 이동량을 줄여냈습니다. 그 결과 연산 대기 시간이 최소화되어 MQA에 버금가는 빠른 속도를 냅니다.</li>
<li class=""><strong>결과물의 품질 수준:</strong> 우수하다. 각기 다른 특징을 가지는 8쌍의  <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi><mo separator="true">,</mo><mi>V</mi></mrow><annotation encoding="application/x-tex">K, V</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal" style="margin-right:0.0715em">K</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.2222em">V</span></span></span></span>  배열 덕분에 MQA와 같은 정보의 심각한 뭉개짐이 없으며, MHA 구조 모델과 비교했을 때 질적으로 큰 차이가 나지 않는 안정적인 답변을 산출한다.</li>
</ul>
</li>
</ul>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="83-실제-산업-환경에서의-적용-및-평가-실험-결과">8.3 실제 산업 환경에서의 적용 및 평가 실험 결과<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/transformer-structure#83-%EC%8B%A4%EC%A0%9C-%EC%82%B0%EC%97%85-%ED%99%98%EA%B2%BD%EC%97%90%EC%84%9C%EC%9D%98-%EC%A0%81%EC%9A%A9-%EB%B0%8F-%ED%8F%89%EA%B0%80-%EC%8B%A4%ED%97%98-%EA%B2%B0%EA%B3%BC" class="hash-link" aria-label="8.3 실제 산업 환경에서의 적용 및 평가 실험 결과에 대한 직접 링크" title="8.3 실제 산업 환경에서의 적용 및 평가 실험 결과에 대한 직접 링크" translate="no">​</a></h3>
<p>이러한 수학적 이론은 최신 인공지능 모델들의 실제 성능 평가에서 명백하게 증명되었습니다.</p>
<p>문장의 길이가 짧고 사용자의 질문이 드물게 들어오는 가벼운 작업 환경에서는 MHA 구조를 가진 모델도 무리 없이 작동한다. 그러나 한 번에 입력해야 할 텍스트가 아주 길거나 수천 명의 사용자가 동시에 질문을 던지는 고강도 작업(Heavy Workloads)이 주어질 때 MHA의 메모리 병목 현상이라는 한계는 결정적으로 드러납니다.</p>
<p>전문가들이 과거 세대의 대표적 MHA 기반 모델인 Llama 2(라마 2)와 최신 GQA 기반 모델인 Mistral(미스트랄)에 동시에 엄청난 부하를 거는 실험을 진행했습니다. 실험 결과, 부하가 커지자 막대한 KV 캐시 메모리를 메모리 대역폭을 통해 퍼 나르지 못한 MHA 기반의 Llama 2 모델은 결국 정해진 시간 안에 답변 문장을 생성해내는 데 실패하며 마비되었습니다. 반면, KV 캐시 공간을 절약하여 대역폭의 여유 공간을 확보한 GQA 기반의 Mistral 모델은 과부하 상태에서도 하드웨어적 한계를 회피하며 지속적이고 원활하게 고품질의 텍스트 토큰을 안정적으로 출력해 냈습니다.</p>
<p>이러한 명확한 성능 차이와 5% 계산 비용만 들어가는 손쉬운 업트레이닝 기술 덕분에, 그룹화 쿼리 어텐션(GQA)은 Llama 2 70B(대형 버전), Mistral 7B 등 현재 산업계를 주도하는 핵심적인 기초(Foundation) 인공지능 모델들의 표준 수학적 아키텍처로 완전히 자리 잡았습니다. 기술 운용을 위한 최적의 표준 지침(Best Practices)에 따르면 대부분의 범용 모델 제작 시 8개의 그룹을 사용하는 GQA-8 구성을 기본값으로 선택하여 품질 손실 없이 메모리 4배 절약 효과를 얻는 것이 권장됩니다. MHA 구조는 오직 컴퓨터 성능의 한계가 없고 오직 최고 수준의 정확도만이 요구되는 아주 작은 규모의 특수 연구 환경에서만 예외적으로 활용되며, 반대로 MQA 구조는 메모리가 40GB 이하로 극도로 부족하거나 16,000자 이상의 초장문 글을 초고속으로 처리해야만 하는 매우 특수한 제한 환경에서만 제한적으로 검토되는 구조로 각자의 역할이 명확하게 구분되었습니다.</p>]]></content:encoded>
            <category>공부</category>
            <category>transformer</category>
            <category>nlp</category>
            <category>딥러닝</category>
        </item>
        <item>
            <title><![CDATA[[일상] 봄, 그리고 새 시작]]></title>
            <link>https://hwkim-dev.github.io/hwkim-dev/ko/blog/daily-first</link>
            <guid>https://hwkim-dev.github.io/hwkim-dev/ko/blog/daily-first</guid>
            <pubDate>Tue, 14 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[벚꽃이 피기 시작하는 계절에 개인 홈페이지도 새로 시작합니다.]]></description>
            <content:encoded><![CDATA[<p>벚꽃이 피기 시작하는 계절에 개인 홈페이지도 새로 시작합니다.</p>
<p>요즘 연구실에서 GPU 프로그래밍 프로젝트를 진행 중인데, 코드를 짜다 보면 시간 가는 줄 모릅니다.<br>
<!-- -->CUDA 커널이 처음 예상대로 동작할 때의 그 쾌감이... 아직도 짜릿해요 😄</p>
<p>블로그를 꾸준히 쓰는 게 목표인데, 공부 기록뿐 아니라 이런 가벼운 일상 이야기도 남겨두려 합니다.</p>
<p>오늘은 커피 한 잔 하면서 사이트 세팅을 마무리했습니다.<br>
<!-- -->봄처럼 좋은 하루였어요.</p>]]></content:encoded>
            <category>일상</category>
        </item>
        <item>
            <title><![CDATA[[잡도리] 개인 홈페이지를 Docusaurus로 새로 만들었습니다]]></title>
            <link>https://hwkim-dev.github.io/hwkim-dev/ko/blog/jabdori-first</link>
            <guid>https://hwkim-dev.github.io/hwkim-dev/ko/blog/jabdori-first</guid>
            <pubDate>Tue, 14 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[드디어 개인 홈페이지를 제대로 꾸렸습니다. 그동안 GitHub Profile README로만 유지하던 걸, Docusaurus 기반의 정적 사이트로 이전했어요.]]></description>
            <content:encoded><![CDATA[<p>드디어 개인 홈페이지를 제대로 꾸렸습니다. 그동안 GitHub Profile README로만 유지하던 걸, Docusaurus 기반의 정적 사이트로 이전했어요.</p>
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="왜-docusaurus인가">왜 Docusaurus인가<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/jabdori-first#%EC%99%9C-docusaurus%EC%9D%B8%EA%B0%80" class="hash-link" aria-label="왜 Docusaurus인가에 대한 직접 링크" title="왜 Docusaurus인가에 대한 직접 링크" translate="no">​</a></h2>
<ul>
<li class=""><strong>Markdown 우선</strong>: 블로그 글을 <code>.md</code> 파일로 관리하면 충분합니다.</li>
<li class=""><strong>React 확장</strong>: 논문, 프로젝트, 챗봇 같은 커스텀 페이지는 React 컴포넌트로 자유롭게 만들 수 있어요.</li>
<li class=""><strong>GitHub Pages 배포</strong>: <code>gh-pages</code> 브랜치 push 한 번으로 배포가 완료됩니다.</li>
<li class=""><strong>다크모드 기본 지원</strong>: 따로 구현 안 해도 됩니다 😄</li>
</ul>
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="이-사이트의-구성">이 사이트의 구성<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/jabdori-first#%EC%9D%B4-%EC%82%AC%EC%9D%B4%ED%8A%B8%EC%9D%98-%EA%B5%AC%EC%84%B1" class="hash-link" aria-label="이 사이트의 구성에 대한 직접 링크" title="이 사이트의 구성에 대한 직접 링크" translate="no">​</a></h2>
<table><thead><tr><th>섹션</th><th>내용</th></tr></thead><tbody><tr><td>홈</td><td>소개, 기술 스택, 연락처</td></tr><tr><td>블로그</td><td>공부 / 잡도리 / 일상 / 리뷰 / 뉴스</td></tr><tr><td>논문</td><td>작성한 논문 아카이브</td></tr><tr><td>프로젝트</td><td>GitHub 저장소 &amp; 릴리즈 쇼케이스</td></tr><tr><td>챗봇</td><td>나에 대한 AI Q&amp;A 챗봇 (예정)</td></tr></tbody></table>
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="앞으로-할-것들">앞으로 할 것들<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/jabdori-first#%EC%95%9E%EC%9C%BC%EB%A1%9C-%ED%95%A0-%EA%B2%83%EB%93%A4" class="hash-link" aria-label="앞으로 할 것들에 대한 직접 링크" title="앞으로 할 것들에 대한 직접 링크" translate="no">​</a></h2>
<ul class="contains-task-list containsTaskList_mC6p">
<li class="task-list-item"><input type="checkbox" disabled=""> <!-- -->챗봇 실제 배포 &amp; 연결</li>
<li class="task-list-item"><input type="checkbox" disabled=""> <!-- -->논문/프로젝트 데이터 채우기</li>
<li class="task-list-item"><input type="checkbox" disabled=""> <!-- -->블로그 꾸준히 쓰기 (가장 어려운 부분...)</li>
</ul>
<p>부담 없이 기록하는 공간으로 쓰려고 합니다. 자주 들러주세요!</p>]]></content:encoded>
            <category>잡도리</category>
        </item>
        <item>
            <title><![CDATA[[뉴스] AI/HPC 주간 클리핑 — 2026.04.14]]></title>
            <link>https://hwkim-dev.github.io/hwkim-dev/ko/blog/news-first</link>
            <guid>https://hwkim-dev.github.io/hwkim-dev/ko/blog/news-first</guid>
            <pubDate>Tue, 14 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[관심 분야(딥러닝 추론, GPU 아키텍처, HPC)에서 이번 주 눈에 띄는 소식들을 정리합니다.]]></description>
            <content:encoded><![CDATA[<p>관심 분야(딥러닝 추론, GPU 아키텍처, HPC)에서 이번 주 눈에 띄는 소식들을 정리합니다.</p>
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="이번-주-주요-소식">이번 주 주요 소식<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/news-first#%EC%9D%B4%EB%B2%88-%EC%A3%BC-%EC%A3%BC%EC%9A%94-%EC%86%8C%EC%8B%9D" class="hash-link" aria-label="이번 주 주요 소식에 대한 직접 링크" title="이번 주 주요 소식에 대한 직접 링크" translate="no">​</a></h2>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="1-nvidia-blackwell-2세대-추론-벤치마크-공개">1. NVIDIA Blackwell 2세대 추론 벤치마크 공개<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/news-first#1-nvidia-blackwell-2%EC%84%B8%EB%8C%80-%EC%B6%94%EB%A1%A0-%EB%B2%A4%EC%B9%98%EB%A7%88%ED%81%AC-%EA%B3%B5%EA%B0%9C" class="hash-link" aria-label="1. NVIDIA Blackwell 2세대 추론 벤치마크 공개에 대한 직접 링크" title="1. NVIDIA Blackwell 2세대 추론 벤치마크 공개에 대한 직접 링크" translate="no">​</a></h3>
<p>차세대 Blackwell 아키텍처의 FP8 추론 처리량이 H100 대비 최대 <strong>4× 향상</strong>됐다는 벤치마크 결과가 공개됐습니다.<br>
<!-- -->특히 LLM 디코딩 단계에서의 메모리 대역폭 효율이 크게 개선된 것이 주목됩니다.</p>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="2-flashattention-3-논문-arxiv-공개">2. FlashAttention-3 논문 arXiv 공개<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/news-first#2-flashattention-3-%EB%85%BC%EB%AC%B8-arxiv-%EA%B3%B5%EA%B0%9C" class="hash-link" aria-label="2. FlashAttention-3 논문 arXiv 공개에 대한 직접 링크" title="2. FlashAttention-3 논문 arXiv 공개에 대한 직접 링크" translate="no">​</a></h3>
<p>Flash Attention 시리즈의 세 번째 논문이 공개됐습니다.<br>
<!-- -->Hopper 아키텍처(H100)의 **Tensor Memory Accelerator(TMA)**와 비동기 파이프라인을 활용해 Attention 커널 효율을 높였습니다.</p>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="3-pytorch-27-릴리즈">3. PyTorch 2.7 릴리즈<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/news-first#3-pytorch-27-%EB%A6%B4%EB%A6%AC%EC%A6%88" class="hash-link" aria-label="3. PyTorch 2.7 릴리즈에 대한 직접 링크" title="3. PyTorch 2.7 릴리즈에 대한 직접 링크" translate="no">​</a></h3>
<p><code>torch.compile</code>의 안정성 개선과 함께 CUDA Graph 자동화 기능이 강화됐습니다.</p>
<hr>
<p><em>개인적으로 정리한 내용이라 오류가 있을 수 있습니다. 원본 소스를 꼭 확인하세요!</em></p>]]></content:encoded>
            <category>뉴스</category>
            <category>AI</category>
            <category>GPU</category>
        </item>
        <item>
            <title><![CDATA[[리뷰] 책 『CUDA by Example』 — GPU 입문에 가장 좋은 책]]></title>
            <link>https://hwkim-dev.github.io/hwkim-dev/ko/blog/review-first</link>
            <guid>https://hwkim-dev.github.io/hwkim-dev/ko/blog/review-first</guid>
            <pubDate>Tue, 14 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[CUDA 프로그래밍을 처음 배울 때 가장 많은 도움을 받은 책을 소개합니다.]]></description>
            <content:encoded><![CDATA[<p>CUDA 프로그래밍을 처음 배울 때 가장 많은 도움을 받은 책을 소개합니다.</p>
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="책-정보">책 정보<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/review-first#%EC%B1%85-%EC%A0%95%EB%B3%B4" class="hash-link" aria-label="책 정보에 대한 직접 링크" title="책 정보에 대한 직접 링크" translate="no">​</a></h2>
<ul>
<li class=""><strong>제목</strong>: CUDA by Example: An Introduction to General-Purpose GPU Programming</li>
<li class=""><strong>저자</strong>: Jason Sanders, Edward Kandrot</li>
<li class=""><strong>출판</strong>: Addison-Wesley Professional (2010)</li>
<li class=""><strong>난이도</strong>: ⭐⭐☆☆☆ (입문)</li>
</ul>
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="왜-좋은가">왜 좋은가<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/review-first#%EC%99%9C-%EC%A2%8B%EC%9D%80%EA%B0%80" class="hash-link" aria-label="왜 좋은가에 대한 직접 링크" title="왜 좋은가에 대한 직접 링크" translate="no">​</a></h2>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="예제-중심-구성">예제 중심 구성<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/review-first#%EC%98%88%EC%A0%9C-%EC%A4%91%EC%8B%AC-%EA%B5%AC%EC%84%B1" class="hash-link" aria-label="예제 중심 구성에 대한 직접 링크" title="예제 중심 구성에 대한 직접 링크" translate="no">​</a></h3>
<p>이론 설명보다 <strong>실제 동작하는 코드</strong>를 먼저 보여주고 설명하는 방식이라 직관적입니다.<br>
<!-- -->커널 작성 → 메모리 관리 → 텍스처/상수 메모리 → 스트리밍 순으로 자연스럽게 발전합니다.</p>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="다루는-핵심-개념">다루는 핵심 개념<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/review-first#%EB%8B%A4%EB%A3%A8%EB%8A%94-%ED%95%B5%EC%8B%AC-%EA%B0%9C%EB%85%90" class="hash-link" aria-label="다루는 핵심 개념에 대한 직접 링크" title="다루는 핵심 개념에 대한 직접 링크" translate="no">​</a></h3>
<table><thead><tr><th>챕터</th><th>주제</th></tr></thead><tbody><tr><td>3</td><td>기본 커널 작성 &amp; 실행</td></tr><tr><td>4</td><td>병렬 Reduction</td></tr><tr><td>5</td><td>스레드 협력 &amp; Shared Memory</td></tr><tr><td>9</td><td>원자적 연산(Atomics)</td></tr><tr><td>10</td><td>CUDA 스트림</td></tr></tbody></table>
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="아쉬운-점">아쉬운 점<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/review-first#%EC%95%84%EC%89%AC%EC%9A%B4-%EC%A0%90" class="hash-link" aria-label="아쉬운 점에 대한 직접 링크" title="아쉬운 점에 대한 직접 링크" translate="no">​</a></h2>
<ul>
<li class="">2010년 책이라 최신 아키텍처(Volta/Ampere/Hopper) 내용이 없습니다.</li>
<li class="">Warp-level 프리미티브(<code>__shfl_sync</code> 등)는 NVIDIA 공식 Programming Guide를 별도로 봐야 합니다.</li>
</ul>
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="추천-대상">추천 대상<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/review-first#%EC%B6%94%EC%B2%9C-%EB%8C%80%EC%83%81" class="hash-link" aria-label="추천 대상에 대한 직접 링크" title="추천 대상에 대한 직접 링크" translate="no">​</a></h2>
<p>C를 알고 CUDA를 처음 시작하는 분에게 <strong>강력히 추천</strong>합니다.<br>
<!-- -->진지한 최적화는 이후 Programming Guide와 GTC 발표 자료를 참고하면 됩니다.</p>
<p><strong>총점: 4 / 5</strong> ⭐⭐⭐⭐☆</p>]]></content:encoded>
            <category>리뷰</category>
            <category>CUDA</category>
            <category>책</category>
        </item>
        <item>
            <title><![CDATA[[공부] CUDA 커널 최적화 — 메모리 접근 패턴 정리]]></title>
            <link>https://hwkim-dev.github.io/hwkim-dev/ko/blog/study-first</link>
            <guid>https://hwkim-dev.github.io/hwkim-dev/ko/blog/study-first</guid>
            <pubDate>Tue, 14 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[딥러닝 추론 최적화를 공부하면서 CUDA 커널 작성 시 메모리 접근 패턴이 성능에 얼마나 영향을 주는지 정리해봤습니다.]]></description>
            <content:encoded><![CDATA[<p>딥러닝 추론 최적화를 공부하면서 CUDA 커널 작성 시 메모리 접근 패턴이 성능에 얼마나 영향을 주는지 정리해봤습니다.</p>
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="핵심-개념">핵심 개념<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/study-first#%ED%95%B5%EC%8B%AC-%EA%B0%9C%EB%85%90" class="hash-link" aria-label="핵심 개념에 대한 직접 링크" title="핵심 개념에 대한 직접 링크" translate="no">​</a></h2>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="coalesced-memory-access">Coalesced Memory Access<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/study-first#coalesced-memory-access" class="hash-link" aria-label="Coalesced Memory Access에 대한 직접 링크" title="Coalesced Memory Access에 대한 직접 링크" translate="no">​</a></h3>
<p>GPU 글로벌 메모리는 워프(warp) 내 스레드들이 <strong>연속된 주소</strong>에 접근할 때 하나의 트랜잭션으로 묶어 처리합니다.<br>
<!-- -->비연속 접근(Strided Access)은 트랜잭션 수가 늘어나 대역폭 효율이 급격히 떨어집니다.</p>
<h3 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="shared-memory-활용">Shared Memory 활용<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/study-first#shared-memory-%ED%99%9C%EC%9A%A9" class="hash-link" aria-label="Shared Memory 활용에 대한 직접 링크" title="Shared Memory 활용에 대한 직접 링크" translate="no">​</a></h3>
<p>L1 캐시와 물리적으로 같은 온칩 SRAM인 Shared Memory를 타일(tile) 단위로 미리 적재하면 글로벌 메모리 접근 횟수를 대폭 줄일 수 있습니다.</p>
<div class="language-c codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#bfc7d5;--prism-background-color:#292d3e"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-c codeBlock_bY9V thin-scrollbar" style="color:#bfc7d5;background-color:#292d3e"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#bfc7d5"><span class="token plain">__global__ </span><span class="token keyword" style="font-style:italic">void</span><span class="token plain"> </span><span class="token function" style="color:rgb(130, 170, 255)">matmul_tiled</span><span class="token punctuation" style="color:rgb(199, 146, 234)">(</span><span class="token keyword" style="font-style:italic">float</span><span class="token plain"> </span><span class="token operator" style="color:rgb(137, 221, 255)">*</span><span class="token plain">A</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token keyword" style="font-style:italic">float</span><span class="token plain"> </span><span class="token operator" style="color:rgb(137, 221, 255)">*</span><span class="token plain">B</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token keyword" style="font-style:italic">float</span><span class="token plain"> </span><span class="token operator" style="color:rgb(137, 221, 255)">*</span><span class="token plain">C</span><span class="token punctuation" style="color:rgb(199, 146, 234)">,</span><span class="token plain"> </span><span class="token keyword" style="font-style:italic">int</span><span class="token plain"> N</span><span class="token punctuation" style="color:rgb(199, 146, 234)">)</span><span class="token plain"> </span><span class="token punctuation" style="color:rgb(199, 146, 234)">{</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">    __shared__ </span><span class="token keyword" style="font-style:italic">float</span><span class="token plain"> sA</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">TILE</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">TILE</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">    __shared__ </span><span class="token keyword" style="font-style:italic">float</span><span class="token plain"> sB</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">TILE</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">[</span><span class="token plain">TILE</span><span class="token punctuation" style="color:rgb(199, 146, 234)">]</span><span class="token punctuation" style="color:rgb(199, 146, 234)">;</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain">    </span><span class="token comment" style="color:rgb(105, 112, 152);font-style:italic">// ...</span><span class="token plain"></span><br></span><span class="token-line" style="color:#bfc7d5"><span class="token plain"></span><span class="token punctuation" style="color:rgb(199, 146, 234)">}</span><br></span></code></pre></div></div>
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="오늘의-실험-결과">오늘의 실험 결과<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/study-first#%EC%98%A4%EB%8A%98%EC%9D%98-%EC%8B%A4%ED%97%98-%EA%B2%B0%EA%B3%BC" class="hash-link" aria-label="오늘의 실험 결과에 대한 직접 링크" title="오늘의 실험 결과에 대한 직접 링크" translate="no">​</a></h2>
<table><thead><tr><th>구현 방식</th><th>처리량 (GFLOPS)</th></tr></thead><tbody><tr><td>Naive (글로벌)</td><td>42</td></tr><tr><td>Coalesced</td><td>198</td></tr><tr><td>+ Shared Memory</td><td>573</td></tr></tbody></table>
<p>Shared Memory 타일링만 적용해도 약 <strong>13.6× 성능 향상</strong>을 확인했습니다.</p>
<h2 class="anchor anchorTargetHideOnScrollNavbar_vjPI" id="다음-목표">다음 목표<a href="https://hwkim-dev.github.io/hwkim-dev/ko/blog/study-first#%EB%8B%A4%EC%9D%8C-%EB%AA%A9%ED%91%9C" class="hash-link" aria-label="다음 목표에 대한 직접 링크" title="다음 목표에 대한 직접 링크" translate="no">​</a></h2>
<ul>
<li class="">Bank conflict 분석 및 패딩 전략</li>
<li class=""><code>__ldg()</code> read-only cache 활용</li>
<li class="">Warp divergence 최소화 패턴</li>
</ul>]]></content:encoded>
            <category>공부</category>
            <category>CUDA</category>
            <category>GPU</category>
        </item>
    </channel>
</rss>