| end of split 1 / 23 | epoch 1 | time: 12890.89s | valid loss 1.63 | valid ppl 5.12 | learning rate 20.0000 | end of split 2 / 23 | epoch 1 | time: 12855.27s | valid loss 1.54 | valid ppl 4.68 | learning rate 20.0000 | end of split 3 / 23 | epoch 1 | time: 12884.56s | valid loss 1.52 | valid ppl 4.56 | learning rate 20.0000 | end of split 4 / 23 | epoch 1 | time: 12880.62s | valid loss 1.50 | valid ppl 4.49 | learning rate 20.0000 | end of split 5 / 23 | epoch 1 | time: 12913.85s | valid loss 1.45 | valid ppl 4.25 | learning rate 20.0000 | end of split 6 / 23 | epoch 1 | time: 12863.04s | valid loss 1.43 | valid ppl 4.17 | learning rate 20.0000 | end of split 1 / 23 | epoch 1 | time: 11612.74s | valid loss 1.42 | valid ppl 4.14 | learning rate 20.0000 | end of split 2 / 23 | epoch 1 | time: 11608.13s | valid loss 1.41 | valid ppl 4.09 | learning rate 20.0000 | end of split 3 / 23 | epoch 1 | time: 11614.50s | valid loss 1.40 | valid ppl 4.07 | learning rate 20.0000 | end of split 1 / 23 | epoch 1 | time: 11849.05s | valid loss 1.40 | valid ppl 4.05 | learning rate 20.0000 | end of split 2 / 23 | epoch 1 | time: 11832.21s | valid loss 1.39 | valid ppl 4.02 | learning rate 20.0000 | end of split 3 / 23 | epoch 1 | time: 11836.04s | valid loss 1.39 | valid ppl 4.00 | learning rate 20.0000 | end of split 4 / 23 | epoch 1 | time: 11864.39s | valid loss 1.41 | valid ppl 4.09 | learning rate 20.0000 | end of split 5 / 23 | epoch 1 | time: 11822.88s | valid loss 1.38 | valid ppl 3.97 | learning rate 20.0000 | end of split 6 / 23 | epoch 1 | time: 11825.85s | valid loss 1.37 | valid ppl 3.95 | learning rate 20.0000 | end of split 1 / 23 | epoch 1 | time: 12095.59s | valid loss 1.37 | valid ppl 3.95 | learning rate 20.0000 | end of split 2 / 23 | epoch 1 | time: 12096.87s | valid loss 1.37 | valid ppl 3.94 | learning rate 20.0000 | end of split 3 / 23 | epoch 1 | time: 12099.01s | valid loss 1.37 | valid ppl 3.93 | learning rate 20.0000 | end of split 4 / 23 | epoch 1 | time: 12144.47s | valid loss 1.36 | valid ppl 3.90 | learning rate 20.0000 | end of split 5 / 23 | epoch 1 | time: 12078.76s | valid loss 1.36 | valid ppl 3.91 | learning rate 20.0000 | end of split 6 / 23 | epoch 1 | time: 12089.28s | valid loss 1.36 | valid ppl 3.89 | learning rate 20.0000 | end of split 7 / 23 | epoch 1 | time: 12074.83s | valid loss 1.36 | valid ppl 3.88 | learning rate 20.0000 | end of split 8 / 23 | epoch 1 | time: 1702.88s | valid loss 1.49 | valid ppl 4.44 | learning rate 20.0000 | end of split 9 / 23 | epoch 1 | time: 12114.93s | valid loss 1.35 | valid ppl 3.87 | learning rate 20.0000 | end of split 10 / 23 | epoch 1 | time: 12097.43s | valid loss 1.36 | valid ppl 3.88 | learning rate 20.0000 | end of split 11 / 23 | epoch 1 | time: 12087.62s | valid loss 1.37 | valid ppl 3.93 | learning rate 20.0000 | end of split 12 / 23 | epoch 1 | time: 12094.86s | valid loss 1.35 | valid ppl 3.85 | learning rate 20.0000 | end of split 13 / 23 | epoch 1 | time: 12121.99s | valid loss 1.35 | valid ppl 3.84 | learning rate 20.0000 | end of split 14 / 23 | epoch 1 | time: 12097.94s | valid loss 1.35 | valid ppl 3.84 | learning rate 20.0000 | end of split 15 / 23 | epoch 1 | time: 12110.23s | valid loss 1.35 | valid ppl 3.85 | learning rate 20.0000 | end of split 16 / 23 | epoch 1 | time: 12095.26s | valid loss 1.34 | valid ppl 3.83 | learning rate 20.0000 | end of split 17 / 23 | epoch 1 | time: 12095.95s | valid loss 1.34 | valid ppl 3.82 | learning rate 20.0000 | end of split 18 / 23 | epoch 1 | time: 12082.73s | valid loss 1.34 | valid ppl 3.82 | learning rate 20.0000 | end of split 19 / 23 | epoch 1 | time: 12083.04s | valid loss 1.34 | valid ppl 3.82 | learning rate 20.0000 | end of split 20 / 23 | epoch 1 | time: 12110.60s | valid loss 1.34 | valid ppl 3.82 | learning rate 20.0000 | end of split 21 / 23 | epoch 1 | time: 12114.68s | valid loss 1.33 | valid ppl 3.80 | learning rate 20.0000 | end of split 22 / 23 | epoch 1 | time: 12104.29s | valid loss 1.34 | valid ppl 3.80 | learning rate 20.0000 | end of split 23 / 23 | epoch 1 | time: 8203.61s | valid loss 1.34 | valid ppl 3.80 | learning rate 20.0000 | end of split 1 / 23 | epoch 2 | time: 12120.35s | valid loss 1.33 | valid ppl 3.80 | learning rate 20.0000 | end of split 2 / 23 | epoch 2 | time: 12097.96s | valid loss 1.33 | valid ppl 3.79 | learning rate 20.0000 | end of split 3 / 23 | epoch 2 | time: 12092.09s | valid loss 1.33 | valid ppl 3.78 | learning rate 20.0000 | end of split 4 / 23 | epoch 2 | time: 12108.95s | valid loss 1.33 | valid ppl 3.78 | learning rate 20.0000 | end of split 5 / 23 | epoch 2 | time: 12111.96s | valid loss 1.33 | valid ppl 3.78 | learning rate 20.0000 | end of split 6 / 23 | epoch 2 | time: 12107.57s | valid loss 1.33 | valid ppl 3.78 | learning rate 20.0000 | end of split 7 / 23 | epoch 2 | time: 1703.68s | valid loss 1.44 | valid ppl 4.22 | learning rate 20.0000 | end of split 8 / 23 | epoch 2 | time: 12123.18s | valid loss 1.34 | valid ppl 3.83 | learning rate 20.0000 | end of split 9 / 23 | epoch 2 | time: 8179.90s | valid loss 1.33 | valid ppl 3.78 | learning rate 20.0000 | end of split 10 / 23 | epoch 2 | time: 12099.22s | valid loss 1.33 | valid ppl 3.78 | learning rate 20.0000 | end of split 11 / 23 | epoch 2 | time: 12153.83s | valid loss 1.33 | valid ppl 3.77 | learning rate 20.0000 | end of split 12 / 23 | epoch 2 | time: 12156.23s | valid loss 1.33 | valid ppl 3.77 | learning rate 20.0000 | end of split 13 / 23 | epoch 2 | time: 12109.41s | valid loss 1.32 | valid ppl 3.76 | learning rate 20.0000 | end of split 14 / 23 | epoch 2 | time: 12089.66s | valid loss 1.33 | valid ppl 3.77 | learning rate 20.0000 | end of split 15 / 23 | epoch 2 | time: 12138.02s | valid loss 1.32 | valid ppl 3.75 | learning rate 20.0000