iprof output
Basic info by running “iprof” with no arguments. --res ne4pg2_ne4pg2 --compset F2010-SCREAMv1 on 1 node of Aurora. Done during Aurora Hackathon, May 2025.
x4217c0s3b0n0.hostmgmt2217.cm.aurora.alcf.anl.gov 0: THAPI: Trace location: /home/jacob/thapi-traces/thapi_aggreg--2025-05-14T06:23:29+00:00
x4217c0s3b0n0.hostmgmt2217.cm.aurora.alcf.anl.gov 0: BACKEND_MPI | 1 Hostnames | 12 Processes | 12 Threads |
(total aggregated over all ranks)
Name | Time | Time(%) | Calls | Average | Min | Max |
MPI_Waitall | 33.15s | 36.74% | 219296 | 151.17us | 123ns | 450.40ms |
MPI_Bcast | 26.97s | 29.89% | 838708 | 32.16us | 117ns | 164.16ms |
MPI_Init | 9.70s | 10.75% | 12 | 808.56ms | 808.55ms | 808.57ms |
MPI_Barrier | 7.98s | 8.85% | 6040 | 1.32ms | 135ns | 128.28ms |
MPI_Waitany | 6.86s | 7.60% | 53591 | 127.99us | 149ns | 121.11ms |
MPI_Startall | 1.43s | 1.59% | 184368 | 7.77us | 316ns | 1.66ms |
MPI_Allreduce | 1.05s | 1.17% | 44268 | 23.83us | 1.26us | 24.10ms |
MPI_Wtime | 1.00s | 1.11% | 5720339 | 175.53ns | 101ns | 274.76us |
MPI_Type_create_hvector | 534.48ms | 0.59% | 1976 | 270.49us | 218ns | 11.11ms |
MPI_Wait | 400.33ms | 0.44% | 13594 | 29.45us | 112ns | 10.47ms |
MPI_Recv | 292.56ms | 0.32% | 1012 | 289.09us | 728ns | 10.65ms |
MPI_Allgather | 148.01ms | 0.16% | 48 | 3.08ms | 11.55us | 13.26ms |
MPI_Isend | 136.87ms | 0.15% | 76684 | 1.78us | 342ns | 40.60us |
MPI_Send | 62.70ms | 0.07% | 10765 | 5.82us | 335ns | 1.17ms |
MPI_Gather | 56.97ms | 0.06% | 600 | 94.95us | 627ns | 9.39ms |
MPI_Type_create_indexed_block | 56.62ms | 0.06% | 2568 | 22.05us | 797ns | 1.21ms |
MPI_Comm_dup | 54.11ms | 0.06% | 342 | 158.20us | 1.75us | 10.68ms |
PMPI_Bcast | 49.01ms | 0.05% | 1122 | 43.68us | 116ns | 11.75ms |
MPI_Irecv | 43.75ms | 0.05% | 86437 | 506.17ns | 141ns | 906.98us |
PMPI_Allreduce | 42.39ms | 0.05% | 1238 | 34.24us | 205ns | 7.42ms |
MPI_Comm_create | 40.01ms | 0.04% | 554 | 72.21us | 1.86us | 920.51us |
PMPI_Waitall | 31.51ms | 0.03% | 92 | 342.52us | 191ns | 9.99ms |
MPI_Scatterv | 16.85ms | 0.02% | 240 | 70.20us | 683ns | 700.86us |
MPI_Pack | 16.24ms | 0.02% | 10 | 1.62ms | 10.89us | 5.17ms |
MPI_Comm_rank | 14.05ms | 0.02% | 48762 | 288.09ns | 102ns | 11.79us |
MPI_Reduce | 13.80ms | 0.02% | 830 | 16.62us | 370ns | 889.11us |
MPI_Comm_size | 9.93ms | 0.01% | 40013 | 248.06ns | 102ns | 9.36us |
MPI_Info_set | 8.11ms | 0.01% | 1248 | 6.50us | 202ns | 1.41ms |
MPI_Type_create_hindexed_c | 7.17ms | 0.01% | 270 | 26.57us | 1.14us | 1.12ms |
MPI_File_open | 4.24ms | 0.00% | 110 | 38.53us | 377ns | 69.82us |
MPI_Type_create_struct | 3.42ms | 0.00% | 614 | 5.57us | 871ns | 180.14us |
MPI_Comm_free | 3.25ms | 0.00% | 290 | 11.22us | 668ns | 71.91us |
PMPI_Comm_free | 3.11ms | 0.00% | 98 | 31.73us | 1.74us | 79.31us |
MPI_Allgatherv | 2.69ms | 0.00% | 96 | 28.05us | 13.00us | 88.00us |
PMPI_Barrier | 2.65ms | 0.00% | 518 | 5.12us | 112ns | 249.24us |
MPI_File_close | 2.10ms | 0.00% | 98 | 21.46us | 1.85us | 938.91us |
PMPI_Comm_dup | 1.90ms | 0.00% | 110 | 17.29us | 3.80us | 47.63us |
MPI_Type_free | 1.85ms | 0.00% | 3674 | 504.57ns | 104ns | 8.66us |
PMPI_Info_set | 1.70ms | 0.00% | 3935 | 432.68ns | 152ns | 7.48us |
MPI_Get_address | 1.51ms | 0.00% | 9000 | 167.61ns | 99ns | 7.66us |
MPI_Type_commit | 1.27ms | 0.00% | 5498 | 231.53ns | 106ns | 11.32us |
PMPI_Allgather | 1.24ms | 0.00% | 380 | 3.26us | 1.69us | 19.28us |
MPI_File_set_view | 1.12ms | 0.00% | 420 | 2.66us | 235ns | 36.99us |
MPI_Request_free | 953.43us | 0.00% | 2176 | 438.16ns | 125ns | 13.99us |
MPI_Type_size | 878.78us | 0.00% | 5124 | 171.50ns | 103ns | 9.44us |
PMPI_Info_get | 614.89us | 0.00% | 1643 | 374.25ns | 140ns | 3.44us |
PMPI_Info_dup | 605.95us | 0.00% | 154 | 3.93us | 843ns | 8.32us |
PMPI_Type_size_x | 537.43us | 0.00% | 3295 | 163.11ns | 105ns | 1.43us |
MPI_Aint_diff | 502.47us | 0.00% | 3819 | 131.57ns | 99ns | 4.86us |
MPI_Send_init | 472.27us | 0.00% | 1020 | 463.01ns | 148ns | 8.09us |
PMPI_Gather | 459.15us | 0.00% | 88 | 5.22us | 1.30us | 23.15us |
PMPI_Type_create_struct | 457.07us | 0.00% | 110 | 4.16us | 2.27us | 9.84us |
PMPI_Alltoall | 445.51us | 0.00% | 50 | 8.91us | 4.61us | 28.99us |
MPI_Recv_init | 422.28us | 0.00% | 1020 | 414.00ns | 149ns | 5.23us |
MPI_Initialized | 411.30us | 0.00% | 1728 | 238.02ns | 104ns | 6.55us |
PMPI_Type_get_envelope_c | 410.34us | 0.00% | 1876 | 218.73ns | 119ns | 7.17us |
MPI_File_get_info | 396.00us | 0.00% | 86 | 4.60us | 3.21us | 6.73us |
PMPI_Type_free | 386.15us | 0.00% | 362 | 1.07us | 108ns | 15.08us |
MPI_Comm_get_info | 381.62us | 0.00% | 110 | 3.47us | 2.19us | 8.69us |
MPI_Type_get_envelope_c | 309.13us | 0.00% | 1747 | 176.95ns | 119ns | 1.05us |
PMPI_Info_free | 299.15us | 0.00% | 276 | 1.08us | 428ns | 4.45us |
PMPI_Get_address | 298.62us | 0.00% | 1809 | 165.07ns | 101ns | 1.80us |
PMPI_Gatherv | 279.82us | 0.00% | 88 | 3.18us | 1.37us | 9.56us |
PMPI_Type_get_attr | 270.25us | 0.00% | 1242 | 217.59ns | 116ns | 1.68us |
PMPI_Type_get_true_extent | 253.44us | 0.00% | 1328 | 190.84ns | 108ns | 805ns |
MPI_Comm_group | 206.77us | 0.00% | 854 | 242.12ns | 111ns | 1.31us |
PMPI_Comm_rank | 199.72us | 0.00% | 770 | 259.38ns | 114ns | 1.15us |
MPI_Group_translate_ranks | 193.34us | 0.00% | 732 | 264.13ns | 123ns | 1.43us |
MPI_Type_create_struct_c | 190.13us | 0.00% | 112 | 1.70us | 540ns | 10.44us |
PMPI_Status_set_elements_x | 184.18us | 0.00% | 684 | 269.27ns | 114ns | 1.68us |
MPI_Op_create | 179.88us | 0.00% | 182 | 988.37ns | 148ns | 7.23us |
MPI_Info_free | 153.23us | 0.00% | 94 | 1.63us | 602ns | 3.69us |
PMPI_Type_get_extent | 148.43us | 0.00% | 656 | 226.26ns | 112ns | 1.44us |
PMPI_Comm_size | 144.24us | 0.00% | 548 | 263.20ns | 115ns | 944ns |
MPI_Issend_c | 141.61us | 0.00% | 42 | 3.37us | 1.02us | 18.25us |
PMPI_Type_dup | 128.94us | 0.00% | 218 | 591.45ns | 201ns | 4.03us |
PMPI_Type_commit | 125.03us | 0.00% | 370 | 337.91ns | 113ns | 1.50us |
MPI_File_read_at_all | 118.71us | 0.00% | 184 | 645.15ns | 254ns | 2.52us |
MPI_File_read_at | 107.31us | 0.00% | 234 | 458.57ns | 247ns | 3.01us |
MPI_Group_union | 101.97us | 0.00% | 324 | 314.74ns | 178ns | 1.10us |
MPI_Isend_c | 98.38us | 0.00% | 42 | 2.34us | 873ns | 5.70us |
MPI_Get_count_c | 97.33us | 0.00% | 414 | 235.09ns | 122ns | 827ns |
MPI_Type_size_c | 94.45us | 0.00% | 414 | 228.15ns | 115ns | 1.00us |
PMPI_Comm_set_attr | 89.72us | 0.00% | 178 | 504.07ns | 129ns | 3.02us |
MPI_Op_free | 88.38us | 0.00% | 182 | 485.60ns | 141ns | 1.11us |
PMPI_File_set_errhandler | 86.39us | 0.00% | 98 | 881.53ns | 153ns | 2.10us |
PMPI_Type_set_attr | 83.19us | 0.00% | 220 | 378.13ns | 147ns | 2.25us |
PMPI_Comm_get_attr | 78.70us | 0.00% | 108 | 728.69ns | 379ns | 1.83us |
PMPI_Info_get_nkeys | 76.00us | 0.00% | 144 | 527.78ns | 115ns | 1.31us |
MPI_Comm_set_errhandler | 70.28us | 0.00% | 96 | 732.14ns | 131ns | 7.68us |
MPI_Group_free | 65.44us | 0.00% | 196 | 333.90ns | 127ns | 1.30us |
PMPI_Info_create | 61.80us | 0.00% | 112 | 551.80ns | 184ns | 2.95us |
PMPI_Get_processor_name | 60.23us | 0.00% | 88 | 684.45ns | 314ns | 1.56us |
MPI_Group_range_incl | 57.31us | 0.00% | 132 | 434.18ns | 166ns | 4.66us |
PMPI_Comm_test_inter | 54.33us | 0.00% | 110 | 493.93ns | 173ns | 1.15us |
MPI_File_write_at_all | 46.27us | 0.00% | 50 | 925.38ns | 412ns | 1.58us |
MPI_Get_processor_name | 40.50us | 0.00% | 24 | 1.69us | 615ns | 2.85us |
MPI_Info_get | 39.12us | 0.00% | 120 | 325.97ns | 138ns | 1.09us |
MPI_Group_incl | 36.05us | 0.00% | 98 | 367.90ns | 171ns | 1.12us |
MPI_Comm_set_attr | 31.10us | 0.00% | 12 | 2.59us | 1.04us | 3.84us |
MPI_Irecv_c | 29.10us | 0.00% | 42 | 692.93ns | 172ns | 2.14us |
PMPI_Irecv | 22.17us | 0.00% | 42 | 527.90ns | 160ns | 1.47us |
MPI_Info_dup | 20.53us | 0.00% | 10 | 2.05us | 1.39us | 2.62us |
MPI_Get_count | 19.89us | 0.00% | 46 | 432.43ns | 268ns | 939ns |
MPI_Comm_create_keyval | 17.82us | 0.00% | 12 | 1.49us | 864ns | 1.89us |
MPI_Info_create | 12.68us | 0.00% | 8 | 1.58us | 1.15us | 2.21us |
MPI_Comm_free_keyval | 9.49us | 0.00% | 12 | 790.83ns | 677ns | 919ns |
MPI_File_read_all | 6.15us | 0.00% | 6 | 1.03us | 292ns | 2.21us |
MPI_Finalized | 6.10us | 0.00% | 12 | 508.17ns | 398ns | 835ns |
PMPI_Comm_create_keyval | 5.32us | 0.00% | 4 | 1.33us | 531ns | 2.38us |
PMPI_Op_create | 2.86us | 0.00% | 2 | 1.43us | 1.39us | 1.47us |
PMPI_Type_create_keyval | 2.52us | 0.00% | 2 | 1.26us | 880ns | 1.64us |
PMPI_Initialized | 1.32us | 0.00% | 2 | 659.00ns | 619ns | 699ns |
Total | 1.50min | 100.00% | 7418061 |
BACKEND_ZE | 1 Hostnames | 12 Processes | 24 Threads |
(Level0 API calls, total aggregated over all ranks)
Name | Time | Time(%) | Calls | Average | Min | Max |
zeEventHostSynchronize | 6.83min | 82.84% | 5425221 | 75.53us | 142ns | 3.33ms |
zeCommandListAppendLaunchKernel | 1.15min | 13.91% | 6784752 | 10.14us | 6.02us | 620.53us |
zeCommandListAppendMemoryCopy | 6.72s | 1.36% | 576764 | 11.65us | 4.69us | 2.75ms |
zeKernelSetGroupSize | 2.28s | 0.46% | 6784752 | 336.23ns | 131ns | 284.99us |
zeModuleCreate | 2.00s | 0.41% | 6177 | 324.33us | 38.43us | 11.06ms |
zeCommandListAppendMemoryFill | 1.68s | 0.34% | 128424 | 13.05us | 6.25us | 443.13us |
zeCommandQueueExecuteCommandLists | 961.86ms | 0.19% | 395172 | 2.43us | 1.05us | 179.94us |
zeEventHostReset | 879.58ms | 0.18% | 395172 | 2.23us | 217ns | 160.08us |
zeMemGetAddressRange | 395.99ms | 0.08% | 1219561 | 324.70ns | 125ns | 90.77us |
zeMemAllocDevice | 382.13ms | 0.08% | 2425 | 157.58us | 13.26us | 426.61us |
zeCommandListReset | 214.06ms | 0.04% | 395112 | 541.76ns | 183ns | 31.82us |
zeCommandListClose | 118.90ms | 0.02% | 395172 | 300.88ns | 131ns | 123.26us |
zexMemOpenIpcHandles | 102.59ms | 0.02% | 7601 | 13.50us | 4.61us | 916.91us |
zeMemCloseIpcHandle | 102.21ms | 0.02% | 7425 | 13.77us | 5.88us | 76.08us |
zeModuleDestroy | 63.16ms | 0.01% | 5905 | 10.70us | 1.65us | 152.78us |
zeKernelCreate | 43.47ms | 0.01% | 9561 | 4.55us | 679ns | 42.38us |
zeMemFree | 38.26ms | 0.01% | 2521 | 15.18us | 2.84us | 160.61us |
zeContextMakeMemoryResident | 25.21ms | 0.01% | 2425 | 10.40us | 4.27us | 341.61us |
zeEventCreate | 20.04ms | 0.00% | 49976 | 401.04ns | 224ns | 15.73us |
zeKernelDestroy | 15.80ms | 0.00% | 5521 | 2.86us | 313ns | 92.11us |
zeCommandListCreateImmediate | 14.09ms | 0.00% | 48 | 293.52us | 32.95us | 891.31us |
zexDriverImportExternalPointer | 10.88ms | 0.00% | 24 | 453.13us | 29.39us | 1.07ms |
zeCommandQueueCreate | 5.77ms | 0.00% | 93 | 62.04us | 9.14us | 250.02us |
zeMemAllocShared | 5.14ms | 0.00% | 24 | 214.32us | 46.19us | 388.61us |
zeDeviceGetSubDevices | 4.78ms | 0.00% | 11114 | 430.20ns | 132ns | 12.15us |
zeMemAllocHost | 3.51ms | 0.00% | 84 | 41.75us | 15.00us | 381.62us |
zeDriverGetExtensionFunctionAddress | 3.12ms | 0.00% | 168 | 18.57us | 420ns | 324.98us |
zeKernelSetIndirectAccess | 2.26ms | 0.00% | 5521 | 408.81ns | 139ns | 2.40us |
zeCommandListCreate | 1.99ms | 0.00% | 60 | 33.15us | 6.48us | 789.91us |
zeContextDestroy | 1.65ms | 0.00% | 12 | 137.17us | 96.67us | 155.75us |
zeModuleBuildLogDestroy | 1.50ms | 0.00% | 5905 | 254.69ns | 119ns | 6.22us |
zeEventDestroy | 1.34ms | 0.00% | 528 | 2.55us | 1.30us | 16.01us |
zeEventPoolCreate | 839.12us | 0.00% | 36 | 23.31us | 8.52us | 185.83us |
zexMemGetIpcHandles | 611.94us | 0.00% | 116 | 5.28us | 783ns | 14.92us |
zeCommandListDestroy | 553.53us | 0.00% | 36 | 15.38us | 5.46us | 34.60us |
zeEventPoolDestroy | 313.45us | 0.00% | 12 | 26.12us | 23.91us | 30.57us |
zeDeviceGet | 150.82us | 0.00% | 120 | 1.26us | 142ns | 9.41us |
zeContextCreate | 66.48us | 0.00% | 48 | 1.39us | 931ns | 2.21us |
zeInit | 64.77us | 0.00% | 48 | 1.35us | 535ns | 4.98us |
zeDriverGet | 41.95us | 0.00% | 84 | 499.36ns | 153ns | 997ns |
zeDeviceGetRootDevice | 41.65us | 0.00% | 180 | 231.39ns | 114ns | 2.33us |
zeDriverGetApiVersion | 9.59us | 0.00% | 12 | 799.50ns | 435ns | 1.95us |
Total | 8.24min | 100.00% | 22623912 |
Device profiling | 1 Hostnames | 12 Processes | 12 Threads | 12 Devices | 12 Subdevices |
(time spent on kernels in GPU. Computation kernels and memcopies)
zeCommandListAppendMemoryCopy could be MPI copies D2D)
Name | Time | Time(%) | Calls | Average | Min | Max |
zeCommandListAppendMemoryCopy(D2D) | 1.92s | 99.82% | 395172 | 4.86us | 720ns | 30.88us |
Kokkos::Impl::ParallelFor<scream::p3:[...]nst::{lambda(sycl::_V1::nd_item<2>)#1} | 519.68us | 0.03% | 2 | 259.84us | 254.56us | 265.12us |
zeCommandListAppendMemoryCopy(D2H) | 496.64us | 0.03% | 148 | 3.36us | 1.92us | 6.72us |
Kokkos::Impl::ParallelFor<scream::RRT[...]nst::{lambda(sycl::_V1::nd_item<2>)#1} | 404.64us | 0.02% | 12 | 33.72us | 32.64us | 35.36us |
Kokkos::Impl::ParallelFor<scream::p3:[...]nst::{lambda(sycl::_V1::nd_item<2>)#1} | 372.96us | 0.02% | 12 | 31.08us | 27.68us | 42.08us |
Kokkos::Impl::ParallelFor<scream::RRT[...]nst::{lambda(sycl::_V1::nd_item<2>)#1} | 366.24us | 0.02% | 12 | 30.52us | 29.60us | 31.68us |
Kokkos::Impl::FunctorWrapperRangePoli[...]>, Kokkos::RangePolicy<Kokkos::SYCL> > | 270.88us | 0.01% | 12 | 22.57us | 21.76us | 23.52us |
Kokkos::Impl::ParallelFor<scream::Sur[...]nst::{lambda(sycl::_V1::nd_item<2>)#1} | 251.68us | 0.01% | 12 | 20.97us | 20.16us | 22.88us |
Kokkos::Impl::FunctorWrapperRangePoli[...]s::MemoryTraits<9u> > >::DestroyTag> > | 121.76us | 0.01% | 24 | 5.07us | 2.56us | 7.36us |
zeCommandListAppendMemoryCopy(H2D) | 106.08us | 0.01% | 148 | 716.76ns | 80ns | 1.28us |
Kokkos::Impl::FunctorWrapperRangePoli[...]s::MemoryTraits<9u> > >::DestroyTag> > | 101.28us | 0.01% | 36 | 2.81us | 2.40us | 3.52us |
Kokkos::Impl::FunctorWrapperRangePoli[...]>, Kokkos::RangePolicy<Kokkos::SYCL> > | 93.44us | 0.00% | 10 | 9.34us | 8.00us | 10.40us |
Kokkos::Impl::FunctorWrapperRangePoli[...]s::MemoryTraits<8u> > >::DestroyTag> > | 81.28us | 0.00% | 12 | 6.77us | 6.40us | 7.20us |
Kokkos::Impl::ParallelFor<scream::RRT[...]nst::{lambda(sycl::_V1::nd_item<2>)#1} | 59.84us | 0.00% | 24 | 2.49us | 1.76us | 3.04us |
Kokkos::Impl::ParallelFor<scream::RRT[...]nst::{lambda(sycl::_V1::nd_item<2>)#1} | 50.56us | 0.00% | 12 | 4.21us | 3.36us | 8.32us |
Kokkos::Impl::ParallelFor<scream::RRT[...]nst::{lambda(sycl::_V1::nd_item<2>)#1} | 47.04us | 0.00% | 12 | 3.92us | 3.20us | 4.48us |
Kokkos::Impl::ParallelFor<scream::RRT[...]nst::{lambda(sycl::_V1::nd_item<2>)#1} | 45.92us | 0.00% | 12 | 3.83us | 3.52us | 4.16us |
Kokkos::Impl::ParallelFor<scream::RRT[...]nst::{lambda(sycl::_V1::nd_item<2>)#1} | 44.32us | 0.00% | 12 | 3.69us | 3.36us | 4.32us |
Kokkos::Impl::FunctorWrapperRangePoli[...]s::MemoryTraits<8u> > >::DestroyTag> > | 44.32us | 0.00% | 12 | 3.69us | 3.04us | 4.32us |
zeCommandListAppendMemoryCopy(M2D) | 28.56us | 0.00% | 288 | 99.17ns | 80ns | 400ns |
Kokkos::Impl::ParallelFor<scream::phy[...]nst::{lambda(sycl::_V1::nd_item<2>)#1} | 28.48us | 0.00% | 12 | 2.37us | 1.60us | 3.52us |
Total | 1.93s | 100.00% | 395996 |
Explicit memory traffic (BACKEND_MPI) | 1 Hostnames | 12 Processes | 12 Threads |
(all MPI data, on and off-node)
Name | Byte | Byte(%) | Calls | Average | Min | Max |
MPI_Irecv | 9.51GB | 91.48% | 84591 | 112.38kB | 4B | 829.44kB |
MPI_Isend | 449.11MB | 4.32% | 75043 | 5.98kB | 4B | 294.91kB |
PMPI_Status_set_elements_x | 116.79MB | 1.12% | 684 | 170.75kB | 0B | 19.97MB |
MPI_Bcast | 66.27MB | 0.64% | 838708 | 79.02B | 0B | 262.14kB |
MPI_Send | 62.62MB | 0.60% | 9948 | 6.29kB | 4B | 737.28kB |
MPI_Send_init | 41.38MB | 0.40% | 1020 | 40.57kB | 288B | 177.76kB |
MPI_Recv_init | 41.38MB | 0.40% | 1020 | 40.57kB | 288B | 177.76kB |
MPI_File_write_at_all | 34.72MB | 0.33% | 50 | 694.42kB | 0B | 9.99MB |
MPI_Issend_c | 34.72MB | 0.33% | 42 | 826.69kB | 8B | 9.99MB |
MPI_File_read_at_all | 19.54MB | 0.19% | 184 | 106.20kB | 768B | 884.74kB |
MPI_File_read_at | 10.42MB | 0.10% | 234 | 44.52kB | 4B | 262.14kB |
MPI_Reduce | 5.96MB | 0.06% | 830 | 7.18kB | 4B | 2.36MB |
MPI_Allreduce | 2.25MB | 0.02% | 44268 | 50.93B | 4B | 1.02kB |
PMPI_Bcast | 405.54kB | 0.00% | 1029 | 394.11B | 4B | 44.35kB |
MPI_Isend_c | 21.90kB | 0.00% | 42 | 521.52B | 16B | 3.38kB |
MPI_Irecv_c | 21.90kB | 0.00% | 42 | 521.52B | 16B | 3.38kB |
PMPI_Allreduce | 6.78kB | 0.00% | 1238 | 5.48B | 4B | 32B |
MPI_Recv | 4.66kB | 0.00% | 1012 | 4.61B | 4B | 8B |
MPI_File_read_all | 0B | 0.00% | 6 | 0.00B | 0B | 0B |
Total | 10.39GB | 100.00% | 1059991 |
Explicit memory traffic (BACKEND_ZE) | 1 Hostnames | 12 Processes | 12 Threads |
(L0 memory transfer; "M" = malloc)
Name | Byte | Byte(%) | Calls | Average | Min | Max |
zeCommandListAppendMemoryCopy(D2D) | 16.67GB | 89.60% | 395172 | 42.17kB | 4.10kB | 177.76kB |
zeContextMakeMemoryResident | 1.93GB | 10.38% | 2425 | 795.99kB | 1B | 66.36MB |
zeCommandListAppendMemoryCopy(H2D) | 2.24MB | 0.01% | 148 | 15.10kB | 640B | 16.37kB |
zeCommandListAppendMemoryCopy(D2H) | 2.24MB | 0.01% | 148 | 15.10kB | 640B | 16.37kB |
zeCommandListAppendMemoryCopy(M2D) | 36.86kB | 0.00% | 288 | 128.00B | 128B | 128B |
Total | 18.60GB | 100.00% | 398181 |
, multiple selections available,