fix(latex): apply Claude's final feedback to Paper 02 (bibliography, Fermi estimates, conclusion)
Mirror to GitLab / mirror (push) Waiting to run
Mirror to GitLab / mirror (push) Waiting to run
This commit is contained in:
@@ -8,4 +8,8 @@
|
|||||||
\@writefile{toc}{\contentsline {section}{\numberline {4}PagedFieldprintAttention: A Custom Fused Triton Kernel Proposal}{2}{section.4}\protected@file@percent }
|
\@writefile{toc}{\contentsline {section}{\numberline {4}PagedFieldprintAttention: A Custom Fused Triton Kernel Proposal}{2}{section.4}\protected@file@percent }
|
||||||
\@writefile{toc}{\contentsline {subsection}{\numberline {4.1}Preliminary Benchmark Estimates}{2}{subsection.4.1}\protected@file@percent }
|
\@writefile{toc}{\contentsline {subsection}{\numberline {4.1}Preliminary Benchmark Estimates}{2}{subsection.4.1}\protected@file@percent }
|
||||||
\@writefile{toc}{\contentsline {section}{\numberline {5}Conclusion}{2}{section.5}\protected@file@percent }
|
\@writefile{toc}{\contentsline {section}{\numberline {5}Conclusion}{2}{section.5}\protected@file@percent }
|
||||||
\gdef \@abspage@last{2}
|
\bibcite{memorizing}{1}
|
||||||
|
\bibcite{retro}{2}
|
||||||
|
\bibcite{flashattention}{3}
|
||||||
|
\bibcite{pagedattention}{4}
|
||||||
|
\gdef \@abspage@last{3}
|
||||||
|
|||||||
@@ -1,4 +1,4 @@
|
|||||||
This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023/Debian) (preloaded format=pdflatex 2026.5.25) 25 MAY 2026 11:50
|
This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023/Debian) (preloaded format=pdflatex 2026.5.25) 25 MAY 2026 12:12
|
||||||
entering extended mode
|
entering extended mode
|
||||||
restricted \write18 enabled.
|
restricted \write18 enabled.
|
||||||
%&-line parsing enabled.
|
%&-line parsing enabled.
|
||||||
@@ -382,8 +382,7 @@ LaTeX Info: Redefining \Relbar on input line 971.
|
|||||||
\mathdisplay@stack=\toks28
|
\mathdisplay@stack=\toks28
|
||||||
LaTeX Info: Redefining \[ on input line 2953.
|
LaTeX Info: Redefining \[ on input line 2953.
|
||||||
LaTeX Info: Redefining \] on input line 2954.
|
LaTeX Info: Redefining \] on input line 2954.
|
||||||
)
|
) (./main.aux)
|
||||||
No file main.aux.
|
|
||||||
\openout1 = `main.aux'.
|
\openout1 = `main.aux'.
|
||||||
|
|
||||||
LaTeX Font Info: Checking defaults for OML/cmm/m/it on input line 24.
|
LaTeX Font Info: Checking defaults for OML/cmm/m/it on input line 24.
|
||||||
@@ -404,6 +403,7 @@ LaTeX Font Info: Checking defaults for PD1/pdf/m/n on input line 24.
|
|||||||
LaTeX Font Info: ... okay on input line 24.
|
LaTeX Font Info: ... okay on input line 24.
|
||||||
LaTeX Font Info: Checking defaults for PU/pdf/m/n on input line 24.
|
LaTeX Font Info: Checking defaults for PU/pdf/m/n on input line 24.
|
||||||
LaTeX Font Info: ... okay on input line 24.
|
LaTeX Font Info: ... okay on input line 24.
|
||||||
|
|
||||||
*geometry* driver: auto-detecting
|
*geometry* driver: auto-detecting
|
||||||
*geometry* detected driver: pdftex
|
*geometry* detected driver: pdftex
|
||||||
*geometry* verbose mode - [ preamble ] result:
|
*geometry* verbose mode - [ preamble ] result:
|
||||||
@@ -471,6 +471,7 @@ LaTeX Font Info: ... okay on input line 24.
|
|||||||
* (1in=72.27pt=25.4mm, 1cm=28.453pt)
|
* (1in=72.27pt=25.4mm, 1cm=28.453pt)
|
||||||
|
|
||||||
Package hyperref Info: Link coloring OFF on input line 24.
|
Package hyperref Info: Link coloring OFF on input line 24.
|
||||||
|
(./main.out) (./main.out)
|
||||||
\@outlinefile=\write3
|
\@outlinefile=\write3
|
||||||
\openout3 = `main.out'.
|
\openout3 = `main.out'.
|
||||||
|
|
||||||
@@ -490,6 +491,7 @@ LaTeX Info: Redefining \showhyphens on input line 24.
|
|||||||
Package microtype Info: No adjustment of tracking.
|
Package microtype Info: No adjustment of tracking.
|
||||||
Package microtype Info: No adjustment of interword spacing.
|
Package microtype Info: No adjustment of interword spacing.
|
||||||
Package microtype Info: No adjustment of character kerning.
|
Package microtype Info: No adjustment of character kerning.
|
||||||
|
|
||||||
(/usr/share/texlive/texmf-dist/tex/latex/microtype/mt-ptm.cfg
|
(/usr/share/texlive/texmf-dist/tex/latex/microtype/mt-ptm.cfg
|
||||||
File: mt-ptm.cfg 2006/04/20 v1.7 microtype config. file: Times (RS)
|
File: mt-ptm.cfg 2006/04/20 v1.7 microtype config. file: Times (RS)
|
||||||
)
|
)
|
||||||
@@ -554,29 +556,26 @@ LaTeX Font Info: Trying to load font information for TS1+ptm on input line 7
|
|||||||
(/usr/share/texlive/texmf-dist/tex/latex/psnfss/ts1ptm.fd
|
(/usr/share/texlive/texmf-dist/tex/latex/psnfss/ts1ptm.fd
|
||||||
File: ts1ptm.fd 2001/06/04 font definitions for TS1/ptm.
|
File: ts1ptm.fd 2001/06/04 font definitions for TS1/ptm.
|
||||||
)
|
)
|
||||||
[2] (./main.aux)
|
[2] [3] (./main.aux)
|
||||||
***********
|
***********
|
||||||
LaTeX2e <2023-11-01> patch level 1
|
LaTeX2e <2023-11-01> patch level 1
|
||||||
L3 programming layer <2024-01-22>
|
L3 programming layer <2024-01-22>
|
||||||
***********
|
***********
|
||||||
|
|
||||||
|
|
||||||
Package rerunfilecheck Warning: File `main.out' has changed.
|
LaTeX Warning: Label(s) may have changed. Rerun to get cross-references right.
|
||||||
(rerunfilecheck) Rerun to get outlines right
|
|
||||||
(rerunfilecheck) or use package `bookmark'.
|
|
||||||
|
|
||||||
Package rerunfilecheck Info: Checksums for `main.out':
|
Package rerunfilecheck Info: File `main.out' has not changed.
|
||||||
(rerunfilecheck) Before: <no file>
|
(rerunfilecheck) Checksum: 0FC071C82A1723F9BC8E142AE932A6CB;1472.
|
||||||
(rerunfilecheck) After: 0FC071C82A1723F9BC8E142AE932A6CB;1472.
|
|
||||||
)
|
)
|
||||||
Here is how much of TeX's memory you used:
|
Here is how much of TeX's memory you used:
|
||||||
12813 strings out of 476106
|
12859 strings out of 476106
|
||||||
203551 string characters out of 5793933
|
204087 string characters out of 5793933
|
||||||
1936975 words of memory out of 5000000
|
1936975 words of memory out of 5000000
|
||||||
34494 multiletter control sequences out of 15000+600000
|
34509 multiletter control sequences out of 15000+600000
|
||||||
600462 words of font info for 147 fonts, out of 8000000 for 9000
|
601548 words of font info for 160 fonts, out of 8000000 for 9000
|
||||||
59 hyphenation exceptions out of 8191
|
59 hyphenation exceptions out of 8191
|
||||||
79i,11n,93p,1013b,456s stack positions out of 10000i,1000n,20000p,200000b,200000s
|
79i,11n,93p,1013b,466s stack positions out of 10000i,1000n,20000p,200000b,200000s
|
||||||
</usr/share/texlive/texmf-dist/fonts/type1/public/amsfonts/cm/cmex10.pfb></us
|
</usr/share/texlive/texmf-dist/fonts/type1/public/amsfonts/cm/cmex10.pfb></us
|
||||||
r/share/texlive/texmf-dist/fonts/type1/public/amsfonts/cm/cmmi10.pfb></usr/shar
|
r/share/texlive/texmf-dist/fonts/type1/public/amsfonts/cm/cmmi10.pfb></usr/shar
|
||||||
e/texlive/texmf-dist/fonts/type1/public/amsfonts/cm/cmmi7.pfb></usr/share/texli
|
e/texlive/texmf-dist/fonts/type1/public/amsfonts/cm/cmmi7.pfb></usr/share/texli
|
||||||
@@ -585,10 +584,10 @@ f-dist/fonts/type1/public/amsfonts/cm/cmsy10.pfb></usr/share/texlive/texmf-dist
|
|||||||
/fonts/type1/urw/times/utmb8a.pfb></usr/share/texlive/texmf-dist/fonts/type1/ur
|
/fonts/type1/urw/times/utmb8a.pfb></usr/share/texlive/texmf-dist/fonts/type1/ur
|
||||||
w/times/utmr8a.pfb></usr/share/texlive/texmf-dist/fonts/type1/urw/times/utmri8a
|
w/times/utmr8a.pfb></usr/share/texlive/texmf-dist/fonts/type1/urw/times/utmri8a
|
||||||
.pfb>
|
.pfb>
|
||||||
Output written on main.pdf (2 pages, 98439 bytes).
|
Output written on main.pdf (3 pages, 108708 bytes).
|
||||||
PDF statistics:
|
PDF statistics:
|
||||||
71 PDF objects out of 1000 (max. 8388607)
|
100 PDF objects out of 1000 (max. 8388607)
|
||||||
50 compressed objects within 1 object stream
|
78 compressed objects within 1 object stream
|
||||||
13 named destinations out of 1000 (max. 500000)
|
19 named destinations out of 1000 (max. 500000)
|
||||||
42497 words of extra memory for PDF output out of 42996 (max. 10000000)
|
42545 words of extra memory for PDF output out of 42996 (max. 10000000)
|
||||||
|
|
||||||
|
|||||||
Binary file not shown.
@@ -68,11 +68,35 @@ It must be explicitly noted that this concatenation modifies the underlying math
|
|||||||
\subsection{Preliminary Benchmark Estimates}
|
\subsection{Preliminary Benchmark Estimates}
|
||||||
To quantify the necessity of this kernel, we provide back-of-the-envelope estimates for a 13B parameter model operating at a 64k token context window:
|
To quantify the necessity of this kernel, we provide back-of-the-envelope estimates for a 13B parameter model operating at a 64k token context window:
|
||||||
\begin{itemize}
|
\begin{itemize}
|
||||||
\item \textbf{Naive Unfused Dual-Attention:} Requires materializing the full $N \times N$ attention matrix to HBM twice, transferring approximately 8GB of data per layer. At 40 layers, this incurs an estimated $O(\text{100+ ms})$ latency penalty per token, rendering the system unusable for interactive generation.
|
\item \textbf{Naive Unfused Dual-Attention:} Assuming a hidden dimension $d \approx 5120$ and standard FP16 precision (2 bytes per element), materializing the full $N \times N$ attention matrix ($64000 \times 64000$) requires $\approx 8$ GB of memory per layer. For a 40-layer model, this forces $\approx 320$ GB of intermediate HBM read/writes per token. On an NVIDIA A100 with $\approx 2$ TB/s of memory bandwidth, these transfers alone inject a mathematically unavoidable $O(\text{160 ms})$ latency penalty per token. This renders the system unusable for interactive generation, where target latencies are typically $<20$ ms per token.
|
||||||
\item \textbf{PagedFieldprintAttention (Fused):} By maintaining intermediate softmax reductions in SRAM and relying on PagedAttention's block-level K/V caching, memory transfers are reduced by an order of magnitude, preserving the $O(N)$ memory complexity of FlashAttention and adding an estimated $<5\%$ overhead compared to standard inference.
|
\item \textbf{PagedFieldprintAttention (Fused):} By maintaining intermediate softmax reductions in SRAM and relying on PagedAttention's block-level K/V caching, memory transfers are reduced by an order of magnitude, preserving the $O(N)$ memory complexity of FlashAttention and adding an estimated $<5\%$ overhead compared to standard inference.
|
||||||
\end{itemize}
|
\end{itemize}
|
||||||
|
|
||||||
\section{Conclusion}
|
\section{Conclusion}
|
||||||
Theoretical mathematics and alignment philosophy mean nothing if they cannot physically run on silicon. By diagnosing the catastrophic failures of synchronous hashing and unfused attention equations, we have engineered the required hardware optimizations. Asynchronous Merkle Validation, deterministic INT8 quantization, and the PagedFieldprintAttention fused kernel provide the physical blueprints for deploying Verifiable Dual-Path Architectures at massive scale.
|
Theoretical mathematics and alignment philosophy mean nothing if they cannot physically run on silicon. By diagnosing the catastrophic failures of synchronous hashing and unfused attention equations, we have specified the required hardware optimizations. Asynchronous Merkle Validation, deterministic INT8 quantization, and the PagedFieldprintAttention fused kernel provide the physical blueprints for deploying Verifiable Dual-Path Architectures at massive scale.
|
||||||
|
|
||||||
|
\begin{thebibliography}{9}
|
||||||
|
|
||||||
|
\bibitem{memorizing}
|
||||||
|
Wu, Y., Rabe, M. N., Hutchins, D., \& Szegedy, C. (2022).
|
||||||
|
\textit{Memorizing Transformers}.
|
||||||
|
International Conference on Learning Representations (ICLR).
|
||||||
|
|
||||||
|
\bibitem{retro}
|
||||||
|
Borgeaud, S., Mensch, A., Hoffmann, J., Cai, T., Rutherford, E., Millican, K., ... \& Sifre, L. (2021).
|
||||||
|
\textit{Improving language models by retrieving from trillions of tokens}.
|
||||||
|
arXiv preprint arXiv:2112.04426.
|
||||||
|
|
||||||
|
\bibitem{flashattention}
|
||||||
|
Dao, T., Fu, D., Ermon, S., Rudra, A., \& Ré, C. (2022).
|
||||||
|
\textit{FlashAttention: Fast and memory-efficient exact attention with IO-awareness}.
|
||||||
|
Advances in Neural Information Processing Systems (NeurIPS).
|
||||||
|
|
||||||
|
\bibitem{pagedattention}
|
||||||
|
Kwon, W., Li, Z., Zhuang, S., Sheng, Y., Zheng, L., Yu, C. H., ... \& Stoica, I. (2023).
|
||||||
|
\textit{Efficient Memory Management for Large Language Model Serving with PagedAttention}.
|
||||||
|
Symposium on Operating Systems Principles (SOSP).
|
||||||
|
|
||||||
|
\end{thebibliography}
|
||||||
|
|
||||||
\end{document}
|
\end{document}
|
||||||
|
|||||||
Reference in New Issue
Block a user