Benchmarking Tenstream in <span class="caps">DALES</span>

Recent runs of Menno on their cluster give tremendously bad runtime performance of the Tenstream solver in his DALES simulations. So to check what is going on, I tried to run his benchmark setup here in Munich on our linux machines.

The setup is as follows:

Running 64x64 pixel simulations on 64 cores (i.e. 8x8 pixels per core). The simulation has a horizontal resolution of 100m and 24m vertical, going up to 5484m which results in 229 vertical levels in the dynamics grid and about 260 levels for the radiation. Radiation is called every 30s and the simulations are integrated forward in time for 1200s. Tenstream also provides the adaptive spectral integration case where radiation is only guaranteed to be updated every 300s.

The following times are from a Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz with 64 cores (hyperthreaded)

Solver	Time(s)	Normed
orig. RRTMG	46	0.27
-rrtmg_only	53	0.31
-twostr_only	170	1.0
-twostr_only -schwarzschild	178	1.04
10str, ILU(default)	9950	58
10str, MG	14339	84
10str, adaptive, ILU(default)	1643	9.6
10str, adaptive, MG	2066	12.1

A factor of 60 slower is really not what I was hoping for. I dont yet know why this performs so bad. At this point in time I do not really have time to dig deep but it is concerning that I also get a lot of preconditioner failures:

Linear solar_diff_ solve did not converge due to DIVERGED_DTOL iterations 11
0 Resetted initial guess to zero and try again with gmres

This suggests that the reuse of an earlier solution negatively impacts the solver.

In fact, I ran the test again without reusing the solutions and the simulation was actually slightly faster! This is either a bug that sneaked in somewhere or a really weird system — I will have to investigate this further. I’ll hopefully come back to this end of this week or next week.

Update with recent fix to reusing solutions

Solver	Time(s)	Normed
orig. RRTMG	41	0.27
-rrtmg_only	49	0.33
-twostr_only	148	1.0
-twostr_only -schwarzschild	144	0.97
10str, ILU(default)	7665	51
10str, MG	8513	57
10str, adaptive, ILU(default)	1391	9.3
10str, adaptive, MG	1523	10.3
—————————————————-

Still not too convincing….

Then to go back to my previous study on performance, I tried to keep the solar angles constant:

Solver	Time(s)	Normed
orig. RRTMG	46	0.30
-rrtmg_only	51	0.32
-twostr_only	157	1.0
-twostr_only -schwarzschild	160	1.02
10str, ILU(default)	6463	41
10str, MG	5187	33
10str, adaptive, ILU(default)	1115	7.1
10str, adaptive, MG	1341	8.5
————————————————	————:	————-:

A factor of 30 is still slower than I recall it being on the DKRZ Cluster Machine but without digging deeper as to why this is and where exactly the solvers spend their time, I guess this is as far as I will go at this moment.

To conclude I think the bad runtime comes from

a bug in the usage of save solutions
the fact that in your benchmark simulation the sun position is changing rapidly in the morning hours
maybe machine specific reasons? (maybe you could run my setup again… I’ll push the setup next week)