Resource Requirements¶

This chapter summarizes EDGE’s resource requirements for seismic simulations.

Memory¶

EDGE’s memory requirements depend on the chosen convergence rate and the number of fused runs.

In this section we only consider required memory for 4-node tetrahedral elements, the elastic wave equations (9 quantities), double precision arithmetic (64-bit per value) and the following data structures in every element:

Degrees Of Freedom (DOFs)

Time Integrated DOFs

Each of the eight Riemann solvers (sometimes called flux solvers)

Each of the three Jacobians (sometimes called star matrices)

Therefore, the memory requirements for the mesh, kinematic sources, internal dynamic rupture boundaries, etc. are neglected.

Increasing the convergence rate, increases the number of modes per element. An increase in the number of fused runs, also increases the memory footprint of every element. However, data is shared for fused simulations, which reduces the relative memory footprint per run. For example, a second order simulation without fused runs (C1) requires 6,336 bytes in theory. By fusing eight runs, the per-element footprint increases to 10,368 bytes. This is equivalent to an increase by \(\frac{10.368}{6,336} \approx 1.64\) in required memory. However, the memory footprint per element and forward run decreases to \(\frac{10,368}{8} = 1,296\). The corresponding improvement per forward run is therefore: \(\frac{6,336}{1,296} \approx 4.9\).

The following table gives the memory footprint per element in dependency of the order for a non-fused run (C1), four (C4), and eight fused runs (C8):

Order	Modes	C1-Bytes	C4-Bytes	C8-Bytes
1	1	5,904	6,336	6,912
2	4	6,336	8,064	10,368
3	10	7,200	11,520	17,280
4	20	8,640	17,280	28,800
5	35	10,800	25,920	46,080
6	56	13,824	38,016	70,272
7	84	17,856	54,144	102,528

Element Throughput¶

Analogue, to the discussed memory requirements, all considerations in this section are limited to 4-node tetrahedral elements, the elastic wave equations (9 quantities), and double precision arithmetic (64-bit per value). Further, the reported times per element and time step were measured simulating the LOH.1 benchmark with a total of 350,264 tetrahedral elements. Architecture was a single node of Cori Phase II (Intel Xeon Phi 7250 68-core processors at 1.4 GHz with Intel Turbo Boost enabled) and all data allocated in High Bandwidth Memory (HBM/MCDRAM).

The following table shows the required time per element and per time step in dependency of the order for non-fused configurations (C1) and eight fused runs (C8):

Order	C1-Seconds	C8-Seconds
2	6.06E-08	1.30E-07
3	1.14E-07	2.88E-07
4	2.08E-07	7.30E-07
5	4.41E-07
6	6.93E-07