| end of split 1 / 23 | epoch 1 | time: 11669.17s | valid loss 1.63 | valid ppl 5.09 | learning rate 20.0000 | end of split 2 / 23 | epoch 1 | time: 11663.18s | valid loss 1.55 | valid ppl 4.70 | learning rate 20.0000 | end of split 3 / 23 | epoch 1 | time: 11773.26s | valid loss 1.49 | valid ppl 4.45 | learning rate 20.0000 | end of split 4 / 23 | epoch 1 | time: 11686.93s | valid loss 1.51 | valid ppl 4.55 | learning rate 20.0000 | end of split 5 / 23 | epoch 1 | time: 11659.15s | valid loss 1.44 | valid ppl 4.24 | learning rate 20.0000 | end of split 6 / 23 | epoch 1 | time: 11662.28s | valid loss 1.43 | valid ppl 4.18 | learning rate 20.0000 | end of split 1 / 23 | epoch 1 | time: 11827.93s | valid loss 1.42 | valid ppl 4.12 | learning rate 20.0000 | end of split 2 / 23 | epoch 1 | time: 11870.51s | valid loss 1.41 | valid ppl 4.10 | learning rate 20.0000 | end of split 3 / 23 | epoch 1 | time: 11847.01s | valid loss 1.40 | valid ppl 4.06 | learning rate 20.0000 | end of split 1 / 23 | epoch 1 | time: 11619.72s | valid loss 1.39 | valid ppl 4.03 | learning rate 20.0000 | end of split 2 / 23 | epoch 1 | time: 11613.37s | valid loss 1.39 | valid ppl 4.00 | learning rate 20.0000 | end of split 3 / 23 | epoch 1 | time: 11614.88s | valid loss 1.39 | valid ppl 4.00 | learning rate 20.0000 | end of split 4 / 23 | epoch 1 | time: 11615.26s | valid loss 1.41 | valid ppl 4.09 | learning rate 20.0000 | end of split 5 / 23 | epoch 1 | time: 11595.75s | valid loss 1.37 | valid ppl 3.95 | learning rate 20.0000 | end of split 6 / 23 | epoch 1 | time: 11610.69s | valid loss 1.37 | valid ppl 3.94 | learning rate 20.0000 | end of split 1 / 23 | epoch 1 | time: 12295.21s | valid loss 1.37 | valid ppl 3.94 | learning rate 20.0000 | end of split 2 / 23 | epoch 1 | time: 12302.05s | valid loss 1.37 | valid ppl 3.92 | learning rate 20.0000 | end of split 3 / 23 | epoch 1 | time: 12296.26s | valid loss 1.37 | valid ppl 3.92 | learning rate 20.0000 | end of split 4 / 23 | epoch 1 | time: 12293.98s | valid loss 1.36 | valid ppl 3.90 | learning rate 20.0000 | end of split 5 / 23 | epoch 1 | time: 12293.67s | valid loss 1.36 | valid ppl 3.91 | learning rate 20.0000 | end of split 6 / 23 | epoch 1 | time: 12300.27s | valid loss 1.36 | valid ppl 3.88 | learning rate 20.0000 | end of split 7 / 23 | epoch 1 | time: 12285.05s | valid loss 1.35 | valid ppl 3.86 | learning rate 20.0000 | end of split 8 / 23 | epoch 1 | time: 1732.28s | valid loss 1.53 | valid ppl 4.64 | learning rate 20.0000 | end of split 9 / 23 | epoch 1 | time: 12308.78s | valid loss 1.35 | valid ppl 3.86 | learning rate 20.0000 | end of split 10 / 23 | epoch 1 | time: 12306.58s | valid loss 1.35 | valid ppl 3.86 | learning rate 20.0000 | end of split 11 / 23 | epoch 1 | time: 12314.49s | valid loss 1.36 | valid ppl 3.90 | learning rate 20.0000 | end of split 12 / 23 | epoch 1 | time: 12311.65s | valid loss 1.35 | valid ppl 3.84 | learning rate 20.0000 | end of split 13 / 23 | epoch 1 | time: 12333.67s | valid loss 1.34 | valid ppl 3.83 | learning rate 20.0000 | end of split 14 / 23 | epoch 1 | time: 12315.06s | valid loss 1.34 | valid ppl 3.82 | learning rate 20.0000 | end of split 15 / 23 | epoch 1 | time: 12322.23s | valid loss 1.34 | valid ppl 3.84 | learning rate 20.0000 | end of split 16 / 23 | epoch 1 | time: 12318.94s | valid loss 1.34 | valid ppl 3.81 | learning rate 20.0000 | end of split 17 / 23 | epoch 1 | time: 12326.56s | valid loss 1.34 | valid ppl 3.81 | learning rate 20.0000 | end of split 18 / 23 | epoch 1 | time: 12316.58s | valid loss 1.34 | valid ppl 3.83 | learning rate 20.0000 | end of split 19 / 23 | epoch 1 | time: 12360.78s | valid loss 1.34 | valid ppl 3.83 | learning rate 20.0000 | end of split 20 / 23 | epoch 1 | time: 12347.99s | valid loss 1.34 | valid ppl 3.81 | learning rate 20.0000 | end of split 21 / 23 | epoch 1 | time: 12342.64s | valid loss 1.33 | valid ppl 3.79 | learning rate 20.0000 | end of split 22 / 23 | epoch 1 | time: 12343.95s | valid loss 1.33 | valid ppl 3.79 | learning rate 20.0000 | end of split 23 / 23 | epoch 1 | time: 8358.19s | valid loss 1.34 | valid ppl 3.80 | learning rate 20.0000 | end of split 1 / 23 | epoch 2 | time: 12309.01s | valid loss 1.33 | valid ppl 3.78 | learning rate 20.0000 | end of split 2 / 23 | epoch 2 | time: 12304.02s | valid loss 1.33 | valid ppl 3.78 | learning rate 20.0000 | end of split 3 / 23 | epoch 2 | time: 12314.77s | valid loss 1.33 | valid ppl 3.79 | learning rate 20.0000 | end of split 4 / 23 | epoch 2 | time: 12309.68s | valid loss 1.33 | valid ppl 3.77 | learning rate 20.0000 | end of split 5 / 23 | epoch 2 | time: 12298.33s | valid loss 1.33 | valid ppl 3.78 | learning rate 20.0000 | end of split 6 / 23 | epoch 2 | time: 12305.77s | valid loss 1.33 | valid ppl 3.77 | learning rate 20.0000 | end of split 7 / 23 | epoch 2 | time: 12296.93s | valid loss 1.33 | valid ppl 3.78 | learning rate 20.0000 | end of split 8 / 23 | epoch 2 | time: 12293.94s | valid loss 1.32 | valid ppl 3.76 | learning rate 20.0000 | end of split 9 / 23 | epoch 2 | time: 1733.38s | valid loss 1.47 | valid ppl 4.35 | learning rate 20.0000 | end of split 10 / 23 | epoch 2 | time: 12291.81s | valid loss 1.33 | valid ppl 3.77 | learning rate 20.0000 | end of split 11 / 23 | epoch 2 | time: 12366.15s | valid loss 1.33 | valid ppl 3.76 | learning rate 20.0000 | end of split 12 / 23 | epoch 2 | time: 12322.27s | valid loss 1.33 | valid ppl 3.76 | learning rate 20.0000 | end of split 13 / 23 | epoch 2 | time: 12329.00s | valid loss 1.32 | valid ppl 3.75 | learning rate 20.0000 | end of split 14 / 23 | epoch 2 | time: 12343.04s | valid loss 1.32 | valid ppl 3.75 | learning rate 20.0000