Generalization Bounds for Transformer Channel Decoders
By: Qinshan Zhang , Bin Chen , Yong Jiang and more
Potential Business Impact:
Makes wireless signals more reliable and error-free.
Transformer channel decoders, such as the Error Correction Code Transformer (ECCT), have shown strong empirical performance in channel decoding, yet their generalization behavior remains theoretically unclear. This paper studies the generalization performance of ECCT from a learning-theoretic perspective. By establishing a connection between multiplicative noise estimation errors and bit-error-rate (BER), we derive an upper bound on the generalization gap via bit-wise Rademacher complexity. The resulting bound characterizes the dependence on code length, model parameters, and training set size, and applies to both single-layer and multi-layer ECCTs. We further show that parity-check-based masked attention induces sparsity that reduces the covering number, leading to a tighter generalization bound. To the best of our knowledge, this work provides the first theoretical generalization guarantees for this class of decoders.
Similar Papers
TransCoder: A Neural-Enhancement Framework for Channel Codes
Information Theory
Makes wireless messages clearer, even with bad signals.
Lowering the Error Floor of Error Correction Code Transformer
Information Theory
Fixes errors in computer messages better.
From Small to Large: Generalization Bounds for Transformers on Variable-Size Inputs
Machine Learning (CS)
Helps AI understand bigger data from smaller samples.