The last 1x1 layer of the NNMF network is a problem and had to be removed. The structure is now: ``` Sequential( (0): ReLU() (1): Unfold(kernel_size=(5, 5), dilation=(1, 1), padding=(0, 0), stride=(1, 1)) (2): Fold(output_size=torch.Size([24, 24]), kernel_size=(1, 1), dilation=1, padding=0, stride=1) (3): L1NormLayer() (4): Conv2d(75, 32, kernel_size=(1, 1), stride=(1, 1), bias=False) (5): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (6): ReLU() (7): Conv2d(32, 32, kernel_size=(1, 1), stride=(1, 1)) (8): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (9): ReLU() (10): Unfold(kernel_size=(2, 2), dilation=(1, 1), padding=(0, 0), stride=(2, 2)) (11): Fold(output_size=torch.Size([12, 12]), kernel_size=(1, 1), dilation=1, padding=0, stride=1) (12): L1NormLayer() (13): Conv2d(128, 32, kernel_size=(1, 1), stride=(1, 1), bias=False) (14): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (15): ReLU() (16): Unfold(kernel_size=(5, 5), dilation=(1, 1), padding=(0, 0), stride=(1, 1)) (17): Fold(output_size=torch.Size([8, 8]), kernel_size=(1, 1), dilation=1, padding=0, stride=1) (18): L1NormLayer() (19): Conv2d(800, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (20): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (21): ReLU() (22): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1)) (23): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (24): ReLU() (25): Unfold(kernel_size=(2, 2), dilation=(1, 1), padding=(0, 0), stride=(2, 2)) (26): Fold(output_size=torch.Size([4, 4]), kernel_size=(1, 1), dilation=1, padding=0, stride=1) (27): L1NormLayer() (28): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (29): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False) (30): ReLU() (31): Unfold(kernel_size=(4, 4), dilation=(1, 1), padding=(0, 0), stride=(1, 1)) (32): Fold(output_size=torch.Size([1, 1]), kernel_size=(1, 1), dilation=1, padding=0, stride=1) (33): L1NormLayer() (34): Conv2d(1024, 96, kernel_size=(1, 1), stride=(1, 1), bias=False) (35): ReLU() (36): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1)) (37): ReLU() (38): Unfold(kernel_size=(1, 1), dilation=(1, 1), padding=(0, 0), stride=(1, 1)) (39): Fold(output_size=torch.Size([1, 1]), kernel_size=(1, 1), dilation=1, padding=0, stride=1) (40): L1NormLayer() (41): Conv2d(96, 10, kernel_size=(1, 1), stride=(1, 1), bias=False) (42): Softmax(dim=1) (43): Flatten(start_dim=1, end_dim=-1) ) ```