learning Whitebox Transformer Implementation Interpretable deep learning architecture and data representation Resemblance of Cross Attention like Operator with Condional GMM Denoiser Demystify cross attention mechanism (WIP) Mini World Model with CRATE