Recent runs of Menno on their cluster give tremendously bad runtime performance of the Tenstream solver in his DALES simulations. So to check what is going on, I tried to run his benchmark setup here in Munich on our linux machines.
The setup is as follows:
Running 64x64 pixel simulations on 64 cores (i.e. 8x8 pixels per core). The simulation has a horizontal resolution of 100m and 24m vertical, going up to 5484m which results in 229 vertical levels in the dynamics grid and about 260 levels for the radiation. Radiation is called every 30s and the simulations are integrated forward in time for 1200s. Tenstream also provides the adaptive spectral integration case where radiation is only guaranteed to be updated every 300s.
The following times are from a Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz with 64 cores (hyperthreaded)
Solver | Time(s) | Normed |
---|---|---|
orig. RRTMG | 46 | 0.27 |
-rrtmg_only | 53 | 0.31 |
-twostr_only | 170 | 1.0 |
-twostr_only -schwarzschild | 178 | 1.04 |
10str, ILU(default) | 9950 | 58 |
10str, MG | 14339 | 84 |
10str, adaptive, ILU(default) | 1643 | 9.6 |
10str, adaptive, MG | 2066 | 12.1 |
A factor of 60 slower is really not what I was hoping for. I dont yet know why this performs so bad. At this point in time I do not really have time to dig deep but it is concerning that I also get a lot of preconditioner failures:
Linear solar_diff_ solve did not converge due to DIVERGED_DTOL iterations 11
0 Resetted initial guess to zero and try again with gmres
This suggests that the reuse of an earlier solution negatively impacts the solver.
In fact, I ran the test again without reusing the solutions and the simulation was actually slightly faster! This is either a bug that sneaked in somewhere or a really weird system — I will have to investigate this further. I’ll hopefully come back to this end of this week or next week.
Update with recent fix to reusing solutions
Solver | Time(s) | Normed |
---|---|---|
orig. RRTMG | 41 | 0.27 |
-rrtmg_only | 49 | 0.33 |
-twostr_only | 148 | 1.0 |
-twostr_only -schwarzschild | 144 | 0.97 |
10str, ILU(default) | 7665 | 51 |
10str, MG | 8513 | 57 |
10str, adaptive, ILU(default) | 1391 | 9.3 |
10str, adaptive, MG | 1523 | 10.3 |
—————————————————- |
Still not too convincing….
Then to go back to my previous study on performance, I tried to keep the solar angles constant:
Solver | Time(s) | Normed |
---|---|---|
orig. RRTMG | 46 | 0.30 |
-rrtmg_only | 51 | 0.32 |
-twostr_only | 157 | 1.0 |
-twostr_only -schwarzschild | 160 | 1.02 |
10str, ILU(default) | 6463 | 41 |
10str, MG | 5187 | 33 |
10str, adaptive, ILU(default) | 1115 | 7.1 |
10str, adaptive, MG | 1341 | 8.5 |
———————————————— | ————: | ————-: |
A factor of 30 is still slower than I recall it being on the DKRZ Cluster Machine but without digging deeper as to why this is and where exactly the solvers spend their time, I guess this is as far as I will go at this moment.
To conclude I think the bad runtime comes from
- a bug in the usage of save solutions
- the fact that in your benchmark simulation the sun position is changing rapidly in the morning hours
- maybe machine specific reasons? (maybe you could run my setup again… I’ll push the setup next week)