| end of split 1 / 54 | epoch 1 | time: 4046.52s | valid loss 1.33 | valid ppl 3.77 | learning rate 20.0000 | end of split 2 / 54 | epoch 1 | time: 4043.44s | valid loss 1.19 | valid ppl 3.29 | learning rate 20.0000 | end of split 3 / 54 | epoch 1 | time: 4053.06s | valid loss 1.14 | valid ppl 3.13 | learning rate 20.0000 | end of split 4 / 54 | epoch 1 | time: 4047.01s | valid loss 1.11 | valid ppl 3.03 | learning rate 20.0000 | end of split 5 / 54 | epoch 1 | time: 4042.49s | valid loss 1.09 | valid ppl 2.99 | learning rate 20.0000 | end of split 6 / 54 | epoch 1 | time: 4042.80s | valid loss 1.07 | valid ppl 2.90 | learning rate 20.0000 | end of split 7 / 54 | epoch 1 | time: 4046.11s | valid loss 1.05 | valid ppl 2.85 | learning rate 20.0000 | end of split 8 / 54 | epoch 1 | time: 4046.88s | valid loss 1.04 | valid ppl 2.84 | learning rate 20.0000 | end of split 9 / 54 | epoch 1 | time: 4041.52s | valid loss 1.03 | valid ppl 2.81 | learning rate 20.0000 | end of split 10 / 54 | epoch 1 | time: 4053.48s | valid loss 1.02 | valid ppl 2.79 | learning rate 20.0000 | end of split 11 / 54 | epoch 1 | time: 4048.76s | valid loss 1.02 | valid ppl 2.76 | learning rate 20.0000 | end of split 12 / 54 | epoch 1 | time: 4046.03s | valid loss 1.03 | valid ppl 2.79 | learning rate 20.0000 | end of split 13 / 54 | epoch 1 | time: 4044.83s | valid loss 1.01 | valid ppl 2.74 | learning rate 20.0000 | end of split 14 / 54 | epoch 1 | time: 4049.84s | valid loss 1.00 | valid ppl 2.72 | learning rate 20.0000 | end of split 15 / 54 | epoch 1 | time: 4051.21s | valid loss 1.00 | valid ppl 2.72 | learning rate 20.0000 | end of split 1 / 54 | epoch 1 | time: 4110.39s | valid loss 1.00 | valid ppl 2.71 | learning rate 20.0000 | end of split 2 / 54 | epoch 1 | time: 4103.65s | valid loss 0.99 | valid ppl 2.70 | learning rate 20.0000 | end of split 3 / 54 | epoch 1 | time: 4107.61s | valid loss 0.99 | valid ppl 2.69 | learning rate 20.0000 | end of split 4 / 54 | epoch 1 | time: 4100.84s | valid loss 0.99 | valid ppl 2.69 | learning rate 20.0000 | end of split 5 / 54 | epoch 1 | time: 4100.73s | valid loss 0.99 | valid ppl 2.69 | learning rate 20.0000 | end of split 6 / 54 | epoch 1 | time: 4102.88s | valid loss 0.98 | valid ppl 2.67 | learning rate 20.0000 | end of split 7 / 54 | epoch 1 | time: 4102.37s | valid loss 0.98 | valid ppl 2.66 | learning rate 20.0000 | end of split 8 / 54 | epoch 1 | time: 4109.03s | valid loss 0.98 | valid ppl 2.66 | learning rate 20.0000 | end of split 9 / 54 | epoch 1 | time: 4102.79s | valid loss 0.98 | valid ppl 2.65 | learning rate 20.0000 | end of split 10 / 54 | epoch 1 | time: 4109.37s | valid loss 0.97 | valid ppl 2.65 | learning rate 20.0000 | end of split 11 / 54 | epoch 1 | time: 4098.81s | valid loss 0.97 | valid ppl 2.64 | learning rate 20.0000 | end of split 12 / 54 | epoch 1 | time: 4102.42s | valid loss 0.98 | valid ppl 2.66 | learning rate 20.0000 | end of split 13 / 54 | epoch 1 | time: 4104.50s | valid loss 0.97 | valid ppl 2.63 | learning rate 20.0000 | end of split 14 / 54 | epoch 1 | time: 4105.88s | valid loss 0.97 | valid ppl 2.63 | learning rate 20.0000 | end of split 15 / 54 | epoch 1 | time: 4110.56s | valid loss 0.97 | valid ppl 2.63 | learning rate 20.0000 | end of split 16 / 54 | epoch 1 | time: 4101.80s | valid loss 0.96 | valid ppl 2.62 | learning rate 20.0000 | end of split 17 / 54 | epoch 1 | time: 4106.40s | valid loss 0.96 | valid ppl 2.62 | learning rate 20.0000 | end of split 18 / 54 | epoch 1 | time: 4087.16s | valid loss 0.96 | valid ppl 2.61 | learning rate 20.0000 | end of split 19 / 54 | epoch 1 | time: 4084.77s | valid loss 0.96 | valid ppl 2.61 | learning rate 20.0000 | end of split 20 / 54 | epoch 1 | time: 4094.51s | valid loss 0.96 | valid ppl 2.61 | learning rate 20.0000 | end of split 1 / 34 | epoch 1 | time: 4098.87s | valid loss 0.96 | valid ppl 2.60 | learning rate 20.0000 | end of split 2 / 34 | epoch 1 | time: 391.14s | valid loss 1.17 | valid ppl 3.23 | learning rate 20.0000 | end of split 3 / 34 | epoch 1 | time: 4108.77s | valid loss 0.96 | valid ppl 2.61 | learning rate 20.0000 | end of split 4 / 34 | epoch 1 | time: 4102.40s | valid loss 0.96 | valid ppl 2.61 | learning rate 20.0000 | end of split 5 / 34 | epoch 1 | time: 4100.62s | valid loss 0.95 | valid ppl 2.60 | learning rate 20.0000 | end of split 6 / 34 | epoch 1 | time: 4102.16s | valid loss 0.95 | valid ppl 2.59 | learning rate 20.0000 | end of split 7 / 34 | epoch 1 | time: 4102.53s | valid loss 0.95 | valid ppl 2.59 | learning rate 20.0000 | end of split 8 / 34 | epoch 1 | time: 4099.27s | valid loss 0.95 | valid ppl 2.59 | learning rate 20.0000 | end of split 9 / 34 | epoch 1 | time: 4085.06s | valid loss 0.95 | valid ppl 2.60 | learning rate 20.0000 | end of split 1 / 34 | epoch 1 | time: 4096.33s | valid loss 0.95 | valid ppl 2.59 | learning rate 20.0000 | end of split 2 / 34 | epoch 1 | time: 390.91s | valid loss 1.15 | valid ppl 3.14 | learning rate 20.0000 | end of split 3 / 34 | epoch 1 | time: 4104.58s | valid loss 0.95 | valid ppl 2.59 | learning rate 20.0000 | end of split 4 / 34 | epoch 1 | time: 4108.14s | valid loss 0.95 | valid ppl 2.59 | learning rate 20.0000 | end of split 5 / 34 | epoch 1 | time: 4099.78s | valid loss 0.95 | valid ppl 2.59 | learning rate 20.0000 | end of split 6 / 34 | epoch 1 | time: 4101.01s | valid loss 0.95 | valid ppl 2.58 | learning rate 20.0000 | end of split 7 / 34 | epoch 1 | time: 4099.47s | valid loss 0.95 | valid ppl 2.58 | learning rate 20.0000 | end of split 8 / 34 | epoch 1 | time: 4099.02s | valid loss 0.95 | valid ppl 2.58 | learning rate 20.0000 | end of split 9 / 34 | epoch 1 | time: 4081.95s | valid loss 0.95 | valid ppl 2.59 | learning rate 20.0000 | end of split 10 / 34 | epoch 1 | time: 4106.84s | valid loss 0.95 | valid ppl 2.58 | learning rate 20.0000 | end of split 11 / 34 | epoch 1 | time: 4096.42s | valid loss 0.95 | valid ppl 2.58 | learning rate 20.0000 | end of split 12 / 34 | epoch 1 | time: 390.69s | valid loss 0.95 | valid ppl 2.58 | learning rate 20.0000 | end of split 13 / 34 | epoch 1 | time: 4101.25s | valid loss 0.95 | valid ppl 2.58 | learning rate 20.0000 | end of split 14 / 34 | epoch 1 | time: 4101.14s | valid loss 0.95 | valid ppl 2.58 | learning rate 20.0000 | end of split 15 / 34 | epoch 1 | time: 4112.65s | valid loss 0.95 | valid ppl 2.57 | learning rate 20.0000 | end of split 16 / 34 | epoch 1 | time: 4102.52s | valid loss 0.94 | valid ppl 2.57 | learning rate 20.0000 | end of split 17 / 34 | epoch 1 | time: 4108.03s | valid loss 0.94 | valid ppl 2.57 | learning rate 20.0000 | end of split 18 / 34 | epoch 1 | time: 4101.40s | valid loss 0.94 | valid ppl 2.56 | learning rate 20.0000 | end of split 19 / 34 | epoch 1 | time: 4099.51s | valid loss 0.94 | valid ppl 2.57 | learning rate 20.0000 | end of split 20 / 34 | epoch 1 | time: 4102.71s | valid loss 0.94 | valid ppl 2.56 | learning rate 20.0000 | end of split 21 / 34 | epoch 1 | time: 4103.02s | valid loss 0.94 | valid ppl 2.56 | learning rate 20.0000 | end of split 22 / 34 | epoch 1 | time: 4112.45s | valid loss 0.94 | valid ppl 2.56 | learning rate 20.0000 | end of split 23 / 34 | epoch 1 | time: 4099.56s | valid loss 0.94 | valid ppl 2.56 | learning rate 20.0000 | end of split 24 / 34 | epoch 1 | time: 4107.08s | valid loss 0.94 | valid ppl 2.56 | learning rate 20.0000 | end of split 25 / 34 | epoch 1 | time: 4103.92s | valid loss 0.94 | valid ppl 2.55 | learning rate 20.0000 | end of split 26 / 34 | epoch 1 | time: 4089.40s | valid loss 0.94 | valid ppl 2.56 | learning rate 20.0000 | end of split 27 / 34 | epoch 1 | time: 4102.76s | valid loss 0.94 | valid ppl 2.55 | learning rate 20.0000 | end of split 28 / 34 | epoch 1 | time: 4099.45s | valid loss 0.94 | valid ppl 2.55 | learning rate 20.0000 | end of split 29 / 34 | epoch 1 | time: 390.82s | valid loss 1.14 | valid ppl 3.13 | learning rate 20.0000 | end of split 30 / 34 | epoch 1 | time: 4098.49s | valid loss 0.94 | valid ppl 2.56 | learning rate 20.0000 | end of split 31 / 34 | epoch 1 | time: 4098.27s | valid loss 0.93 | valid ppl 2.55 | learning rate 20.0000 | end of split 32 / 34 | epoch 1 | time: 4079.04s | valid loss 0.94 | valid ppl 2.55 | learning rate 20.0000 | end of split 33 / 34 | epoch 1 | time: 3540.79s | valid loss 0.93 | valid ppl 2.55 | learning rate 20.0000 | end of split 34 / 34 | epoch 1 | time: 4825.08s | valid loss 0.94 | valid ppl 2.56 | learning rate 20.0000 | end of split 1 / 34 | epoch 2 | time: 390.35s | valid loss 0.94 | valid ppl 2.56 | learning rate 20.0000 | end of split 2 / 34 | epoch 2 | time: 4110.38s | valid loss 0.93 | valid ppl 2.55 | learning rate 20.0000 | end of split 3 / 34 | epoch 2 | time: 4098.69s | valid loss 0.93 | valid ppl 2.55 | learning rate 20.0000 | end of split 4 / 34 | epoch 2 | time: 390.46s | valid loss 1.13 | valid ppl 3.09 | learning rate 20.0000 | end of split 5 / 34 | epoch 2 | time: 4097.96s | valid loss 0.93 | valid ppl 2.55 | learning rate 20.0000 | end of split 6 / 34 | epoch 2 | time: 4099.00s | valid loss 0.93 | valid ppl 2.55 | learning rate 20.0000 | end of split 7 / 34 | epoch 2 | time: 4102.24s | valid loss 0.93 | valid ppl 2.54 | learning rate 20.0000 | end of split 8 / 34 | epoch 2 | time: 4101.77s | valid loss 0.93 | valid ppl 2.54 | learning rate 20.0000 | end of split 9 / 34 | epoch 2 | time: 4111.17s | valid loss 0.93 | valid ppl 2.54 | learning rate 20.0000