69bd4bfe44
* Import gda_devel back into develop Squashed commit of the following: commit 90761d552392ca1f5261fec2e6a08455b0ebc368 Author: Brandon Potter <BKP@users.noreply.github.com> Date: Thu Jul 24 14:50:47 2025 -0500 Only issue a single completion per wavefront (#199) commit 0056a8a4a7465d520b85c5cb6829ab88783e82f4 Author: Aurelien Bouteiller <aurelien.bouteiller@amd.com> Date: Thu Jul 24 14:12:35 2025 -0400 non-fetching amos are implicit nbi, we do not need the terminal quiet. (#179) commit 75d1bfe0b0afa5cfd5a7dfae89e9de6f1087e531 Author: Alsop, John <johnathan.alsop@amd.com> Date: Tue Jul 8 10:25:43 2025 -0700 Relax ibgda synchronization (#191) * rocshmem mcm: relax ibdga orderings convert all SEQ_CST orderings in queue_pair to RELAXED except: -system scope ring_doorbell access: required to flush push buffer (unless data is uncached - in which case a waitcnt is sufficient) -agent scope leader thread read in post_qpe_rma: unclear why this is necessary, but when relaxed, the code breaks. either the waitcnt or the L1inv associated with agent scope SEQ_CST is needed for functionality. * Undo changing atomic_signal_fence from SEQ_CST to RELAXED as this appears to have no performance advantage and we are not entirely sure is correct --------- Co-authored-by: Aurelien Bouteiller <abouteil@amd.com> commit c42139564afb47db27e9ec87c25ddc4f5c3e5ad2 Author: Edgar Gabriel <edgargabriel@users.noreply.github.com> Date: Mon Jul 7 13:56:19 2025 -0500 Make gda_devel branch work without MPI library (#188) * First cut on adding the no-mpi path to gpu_ib more functions to follow. add mpi_init_singleton stuff * make gda compile with no-mpi support * gda_device without mpi support * fixes for functional tests - disable the mpi_init_singleton tests in the unit tests. There is no point in fixing them on this branch to adjust to the new structure/logic. - replace MPI_Barrier with rocshmem_barrier_all in tester.cpp - I missed one Allgather statements in gda_device.cpp, add the non-MPI version for that call as well * Update src/gpu_ib/gda_device.cpp Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com> * Update tests/functional_tests/CMakeLists.txt Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com> --------- Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com> commit 0506e69cea2e2ef9bd6cab1207e750da1731ffa5 Author: Brandon Potter <BKP@users.noreply.github.com> Date: Thu Jun 26 19:12:49 2025 -0500 Check for counter load order update in send queue (#178) commit 5a18841111c96eb9b526f0bd11a853b38f69707e Author: Avinash Kethineedi <avinash.kethineedi@amd.com> Date: Thu Jun 26 15:10:44 2025 -0500 Refactor Barrier_all and Sync_all to use default context (GDA) (#175) - Removed context-specific implementations of barrier_all and sync_all - Added barrier_all and sync_all to the default context implementation - Updated functional tests to use the default context for barrier_all and sync_all commit 4d76d6bfca90aad9ca7b607c2800392ed025a695 Author: Aurelien Bouteiller <aurelien.bouteiller@amd.com> Date: Tue Jun 24 14:24:48 2025 -0400 Reeneable Release by default (#168) commit a68208f2b1c64b9db5f5589c44854b37168da557 Author: Brandon Potter <BKP@users.noreply.github.com> Date: Tue Jun 24 12:20:22 2025 -0500 Fix issues with queue_pair (#167) * Add amo fetch_add and non_fetch add self tester * Validate both ways * Intermediate debug for atomic hang * Fixes for amo test * Convert to release build * Revert SYSTEM to AGENT for scope * Restore tester arguments * Make nonfetch amo into blocking call commit 9085416fa4a51aa66ae3222493409679d0daff29 Author: Aurelien Bouteiller <abouteil@amd.com> Date: Mon Jun 23 22:30:00 2025 -0400 bugfix: prevent reuse of sqe items before they are ready commit 0c832b225c4abb8778e4e825fee5032871403557 Author: Edgar Gabriel <edgargabriel@users.noreply.github.com> Date: Tue Jun 17 09:17:24 2025 -0500 change default compilation mode for gda_devel (#162) for the moment, switch to Debug builds being the default, since it seems to be more stable with DeepEp commit 3b01d1a50f1531cb7f66c19cd61643d7d2742e4c Author: Yiltan <ytemucin@amd.com> Date: Thu Jun 12 16:08:32 2025 -0400 Add Broadcom support for gda_devel (#148) * Added bnxt headers * Updated bnxt headers to compile with rocSHMEM * Preliminary BNXT Support * Update direct verbs to 2025/05/30 drop * Use umem_reg to create queues commit 8db6465e27527855627a167ca58beee17895ed65 Author: Andrew Boyer <andrew.boyer@amd.com> Date: Tue May 20 17:01:39 2025 -0400 gpu_ib ionic: Address review comment (#137) commit 81512cc10349b1bd4874d5963632bd28d9201a1d Author: Brandon Potter <BKP@users.noreply.github.com> Date: Tue May 20 15:57:17 2025 -0500 Check RMA functional test data in GPU kernel (#91) (#132) Co-authored-by: Yiltan <ytemucin@amd.com> commit e9fc5914f5f4d9a89af6e417e6f096d8f235884a Author: Andrew Boyer <andrew.boyer@amd.com> Date: Tue May 20 16:35:07 2025 -0400 gpu_ib ionic: add gpu_ib provider for ionic (#133) Port gpu_ib ionic changes from earlier proof-of-concept codebase. Build with GPUIB_IONIC=1 to enable ionic and disable mlx5. Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> Signed-off-by: Andrew Boyer <andrew.boyer@amd.com> commit 986d1908fd126df027f9e189517260c3c7dbb48c Author: Andrew Boyer <andrew.boyer@amd.com> Date: Fri May 16 09:07:43 2025 -0400 gpu_ib: Cleanups to Mlx5 provider to ease Ionic integration (#129) Keep both pd_orig and pd_parent. Add some helpers for lane mask etc. Add generic defines in a few places. commit 4926a1067451c37dfec28385e70521e8ee5b693f Author: Andrew Boyer <andrew.boyer@amd.com> Date: Thu May 15 14:07:33 2025 -0400 gpu_ib: Fix up putmem_wave() (#128) Add a thread ID check to GPUIBContext::putmem_wave() so that only one thread gets through. Since the context layer checks, the QP layer doesn't need to. Thus QueuePair::put_nbi() and QueuePair::put_nbi_wave() are the same and can be combined. Signed-off-by: Andrew Boyer <andrew.boyer@amd.com> commit b81b84f63f470a2c8eecd1a9db82415c6ac4b2d7 Author: Edgar Gabriel <edgargabriel@users.noreply.github.com> Date: Thu May 15 11:41:21 2025 -0500 re-add code to select closest NIC to a GPU (#127) commit b87e7e84f6845ad18cf8286a84051d59b79218a2 Author: Brandon Potter <BKP@users.noreply.github.com> Date: Mon May 12 17:09:00 2025 -0500 Fix MPI_Comm bug (#123) commit 8cb3879047b8e36e23b91aaeb12c4f5563e974df Author: Avinash Kethineedi <avinash.kethineedi@amd.com> Date: Fri May 9 13:13:08 2025 -0500 Fix Barrier API implementation and add missing variants (#121) - Fixed issues in the existing Barrier API - Allocated sync buffers of team using the symmetric heap - Added missing thread-level and wavefront-level Barrier APIs - Updated functional tests to cover all Barrier variants commit 849f365487e59e264578e3eefaf483cad3233472 Author: Aurelien Bouteiller <Aurelien.bouteiller@gmail.com> Date: Thu May 8 16:58:58 2025 -0400 Missing variable in ibgda branch and use create_ctx to avoid default ctx (#120) in num_pes and my_pe commit da710c22b7f4182dca29d062cafb51c42a967356 Author: Brandon Potter <BKP@users.noreply.github.com> Date: Thu May 8 14:36:46 2025 -0500 Refactor several classes and bugfixes (#115) * Merge backend connection and network classes * Use agent scope instead of system scope for counters * Remove monitor thread commit 99238b1d92d922ede619469b74e82dd69ae4e3e8 Author: Aurelien Bouteiller <Aurelien.bouteiller@gmail.com> Date: Thu May 8 14:52:52 2025 -0400 Add verification, fix only rank0 runs the test (#114) commit d7ec7888a9c5f6c571284041d728911bd7d2562d Author: Aurelien Bouteiller <Aurelien.bouteiller@gmail.com> Date: Thu May 8 10:55:40 2025 -0400 new tester: put to all pes from all lanes concurrently - ibgda (#113) * Add put to all pes from all lanes concurrently * This runs on ro 64(8x8) pes, the workload increases with the num_pes so it gets very slow at scale * Adapt for ibgda branch commit 51fe737b2ec6606a5337fdf90a57b877899817e5 Author: Avinash Kethineedi <avinash.kethineedi@amd.com> Date: Wed May 7 18:20:12 2025 -0500 Fix and extend Barrier_All API support (#110) - Fixed issues in the existing Barrier_All API implementation - Added missing thread-level and wavefront-level Barrier_All APIs - Updated functional tests to cover all Barrier_All variants commit c971b4a27b82447e4a13ca226798a16ad00a7d34 Author: Brandon Potter <BKP@users.noreply.github.com> Date: Wed May 7 11:45:12 2025 -0500 Serialize entrance into queue pair code by PE (#108) commit c2d0fbbbf88c07b67616e55d5025f8b960542753 Author: Yiltan <ytemucin@amd.com> Date: Wed May 7 12:38:58 2025 -0400 Fix ibv_reg_mr when using subcommunicators (#104) commit ee79ccd01c35d2b54923cc510c8449268846ce73 Author: Edgar Gabriel <edgargabriel@users.noreply.github.com> Date: Tue May 6 11:10:12 2025 -0500 add code for determining closest NIC to a GPU (#100) add code for detecting the closest NIC given a GPU device ID. The code is based on the same functionality in Transferbench, and has been stripped down to the required functionality in rocSHMEM. (Note, there is probably more code that could be removed/simplified probably). There are two interfaces that are of interest: - int GetClosestNicToGpu(int gpuIndex, char **dev_name): returns the id of the NIC in the device list as well as the name of the device (if dev_name is not a nullptr); - void DisplayTopology(bool outputToCsv): prints out the entire topology detected on the node. THis does not happen automatically, but could be integrated in the future with some debugging output when the user sets an environment variable. commit e83c3dc9facb8d0b3a6029171ca8b055d4918e5a Author: Brandon Potter <BKP@users.noreply.github.com> Date: Tue May 6 11:09:56 2025 -0500 Fix several bugs of gda_devel branch (#103) * Revert "Use 32-bit counter values" This reverts commit 65a5b99c67624e221850bc405cfb6d79f754a7d6. * Call hipMemset after allocation on QueuePair members * Undo previous relaxations and use SEQ_CST atomics * Remove placement new on QueuePair creation * Bugfix on outstanding wqe table off by one commit 5b022083887808854efbc8cadb463dedd8d59bec Author: Brandon Potter <BKP@users.noreply.github.com> Date: Tue May 6 11:09:40 2025 -0500 Remove unused code (#102) * Remove unused code * Remove unused connection method commit dbaee3711f6b9d8bd43bc97d8127869d1e185d05 Author: Brandon Potter <brandon.potter@amd.com> Date: Fri May 2 15:45:51 2025 -0500 Add AMO support commit 96d7c3260f9acb85f94b60460fb6ec9645527d69 Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 30 23:13:34 2025 -0500 Change names around commit 0de5a5a87b6bc3b72cd459c11f952e10d5fe65bc Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 30 22:34:48 2025 -0500 Remove unused code commit 30d247ef5b9106f584a348c18fae7c4d2257d2f9 Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 30 22:23:47 2025 -0500 Replace do-while with while commit 65a5b99c67624e221850bc405cfb6d79f754a7d6 Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 30 22:11:22 2025 -0500 Use 32-bit counter values commit a65c4c9210cf6450ef70150178d2dfad5d326e43 Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 30 22:08:22 2025 -0500 Relax synchronization commit 7008f4f73d69bdd7de2aac79d73fe2bdf9dcdab7 Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 30 21:58:42 2025 -0500 Remove unused method commit 5c720484dafbc354db922b440fe747dbca7ca0c2 Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 30 21:48:53 2025 -0500 Use __shfl for broadcast commit 77ca7559ff9dcab2da81bc24b11fbc18a32216a0 Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 30 21:40:44 2025 -0500 Relax order commit f9196d946776c39d461237b21124bb8b7ad7b84e Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 30 21:29:58 2025 -0500 Relax synchronization commit a6e32c672278a0f4f6bc4ac8d9ba73d555669ce9 Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 30 21:10:27 2025 -0500 Rename sq variables commit c732a7d51e737b6deaaff037cb73759da3601d14 Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 30 20:37:54 2025 -0500 Rename variables in quiet commit 1a557219628e5009abb50bf247852abb1b28bc03 Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 30 20:27:59 2025 -0500 Rename quiet counter variables commit 0023ab69ea8db80d9fdb3e0a99796f723ec6f896 Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 30 20:24:15 2025 -0500 Refactor quiet commit 2b9a14f58d2b37df5cd5e6f386fe8d91ead8bd21 Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 30 20:05:19 2025 -0500 Replace some lds broadcasts with __shfl commit e34a9125dd393af58f7a89f454be53123b2e1cdc Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 30 19:42:47 2025 -0500 Use constant for wavefront size instead of literal commit e24e9811d204389610db7688667c135c915644b5 Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 30 19:38:42 2025 -0500 Remove debug statements commit e48a60d32cb12a7b05b6b6394558e4a44468229e Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 30 15:55:59 2025 -0500 Fixed several bugs - stable commit 82484d5e8ca194b31ca21040cad5747aab2dbdff Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 30 12:10:29 2025 -0500 Fix bug in post_wqe_rma commit 4f4897b70c95f4e160a496469ce78303ebca90a0 Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 30 08:22:17 2025 -0500 Use better variable name commit 13f83532132f028e0669a0baff98ffebd5f6f530 Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 30 08:18:42 2025 -0500 Remove atomics for cqe64 access commit 9d0dcb3d125ee2e29440b43e57704f0e71b838fd Author: Brandon Potter <brandon.potter@amd.com> Date: Tue Apr 29 22:40:28 2025 -0500 Use volatile on cqe polling commit 44e75211435055447ad1f4a08ce37c7eebc02e5a Author: Brandon Potter <brandon.potter@amd.com> Date: Tue Apr 29 21:35:15 2025 -0500 Debug synchronization commit 0abce72b38f7aade806444e1e442a52066941b14 Author: Brandon Potter <brandon.potter@amd.com> Date: Mon Apr 28 11:18:25 2025 -0500 Minor changes commit 2b8c7c12203081e4af2611eca42af1cec32b28d0 Author: Brandon Potter <brandon.potter@amd.com> Date: Mon Apr 28 09:39:16 2025 -0500 Implement mt queues commit c58e6031dc5ac905b64fd0be3e6bd3ea98b0dd24 Merge: d7b33a87 cb69467f Author: Brandon Potter <brandon.potter@amd.com> Date: Mon Apr 28 10:07:48 2025 -0500 Merge branch 'abouteil/gpuib_bare-dlmalloc' into bpotter/gpuib_bare-04_28_25-devel commit cb69467f46e8974ae0e5a7945f4c7c01ecb53454 Author: Aurelien Bouteiller <abouteil@amd.com> Date: Mon Apr 28 10:13:08 2025 -0400 dlmalloc: resolve drift with ibgda branch commit b1eb1f375a58b49bbf2a635191c528dd3c49be0a Author: Aurelien Bouteiller <aurelien.bouteiller@amd.com> Date: Wed Apr 9 11:57:07 2025 -0400 Add unit tester for dlmalloc, rework single_heap, pow2bins unit testers accordingly * add dlmalloc get_used/get_avail, and have all strats allocators also have a get_used * Rework memallocator unit tests: bin size is per strat, alignment is verified in singleheap Signed-off-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com> commit f8ff728719fa5039cd4280762a37e8a295e0790c Author: Aurelien Bouteiller <aurelien.bouteiller@amd.com> Date: Fri Mar 28 14:17:49 2025 -0400 Add dlmalloc_strat allocator strategy Use mspace variant to ease encapsulation Make pow2bins and dlmalloc cmake selectable Signed-off-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com> commit d7b33a870b8d5a43ecdee5712b4d0c7624821d94 Author: Brandon Potter <brandon.potter@amd.com> Date: Sun Apr 27 15:56:54 2025 -0500 Use SND DBR offset commit f8f5094dd87d6b495d2b647a4a40c87862d1d35b Merge: 397e058f 9ef5fa1e Author: Brandon Potter <BKP@users.noreply.github.com> Date: Sun Apr 27 11:09:03 2025 -0500 Merge pull request #74 from ROCm/ytemucin/gpuib_bare-04-25-25 Ytemucin/gpuib bare 04 25 25 commit 9ef5fa1e2f195cd7f0700fa3defd70004fe9acc1 Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 23 16:50:02 2025 -0500 Check default_ctx_ ptr before freeing commit 2bd8ffd20f890f6c4456fd025a4074b868dfd8ee Author: Avinash Kethineedi <avinash.kethineedi@amd.com> Date: Mon Apr 14 09:18:57 2025 -0500 Update backend to use provided MPI communicator during library initialization (#79) * Update backend to use provided MPI communicator during library initialization, default to `MPI_COMM_WORLD` * Update `rocshmem_my_pe` and `rocshmem_n_pes` host APIs - Return values from backend if initialized; otherwise, fallback to MPI_Singleton. commit 2bba0d133f05db927185eb314108a2608f064e25 Author: Edgar Gabriel <edgargabriel@users.noreply.github.com> Date: Mon Apr 14 12:02:09 2025 -0500 Revamp the uniqueId code to support subgroups of processes (#80) * add code for bootstrapping the bootstrapping code has been extracted from the MSCCLPP library, which in parts is based on the code from NVIDIA. The code has been modified to match the specific requirements of the rocSHMEM library. * add code to use the new uniqueId bootstrapping * adjust init_attr example extend the rocshmem_init_attr example to use two disjoint groups of processe, in order to trigger the new code path. * add env variable for bootstrap timeout * Update examples/rocshmem_init_attr_test.cc Co-authored-by: Aurelien Bouteiller <Aurelien.bouteiller@gmail.com> * Update src/rocshmem.cpp Co-authored-by: Aurelien Bouteiller <Aurelien.bouteiller@gmail.com> --------- Co-authored-by: Aurelien Bouteiller <Aurelien.bouteiller@gmail.com> commit 4c40fe180f1eabe208baf2d8b79045abc48da6bb Author: Yiltan <yiltan@amd.com> Date: Fri Apr 25 11:48:59 2025 -0500 Required changes to compile with deepep - three missing apis (barriers and fence) - Enable -fpic commit 397e058f4decef93045ab015647e247936feb83e Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 23 05:01:05 2025 -0500 Cleanup debug statements commit f12dc302067a4c72c88e143ca3dc80da4df2a07e Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 23 04:52:21 2025 -0500 Disabler tester and TicketMutex commit 637ba31aeff8c46460edcaf93bb1c91b96dbee6a Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 23 04:41:35 2025 -0500 Remove monitor thread commit 9366976d7082062e6cbd5e6804060caefa93afc7 Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 23 04:32:25 2025 -0500 Revert "Revert "Remove print statements"" This reverts commit fdff1dcf9f1a8ca5ff5f07e8fd7da50097991d15. commit fe0d4fafe056394b59eb10cb060913241bc26b64 Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 23 04:31:13 2025 -0500 Revert "Revert "Turn off debug"" This reverts commit 11a754c40cc2b07a4f6ef87030532a1ff3fdc02e. commit d79fbf06ff4ac385c8ecc95e3a623d9847fe928a Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 23 04:30:33 2025 -0500 Fix THE OTHER bug commit 11a754c40cc2b07a4f6ef87030532a1ff3fdc02e Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 23 04:02:33 2025 -0500 Revert "Turn off debug" This reverts commit 0584485ee0b5b0b772a1ecbb8afc167f91e09853. commit fdff1dcf9f1a8ca5ff5f07e8fd7da50097991d15 Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 23 03:53:16 2025 -0500 Revert "Remove print statements" This reverts commit 4f6fee0eca48c69f2581e9aca31cad4b67b11201. commit 0584485ee0b5b0b772a1ecbb8afc167f91e09853 Author: Brandon Potter <brandon.potter@amd.com> Date: Tue Apr 22 20:06:32 2025 -0500 Turn off debug commit 4f6fee0eca48c69f2581e9aca31cad4b67b11201 Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 23 03:46:29 2025 -0500 Remove print statements commit aef4122cf9d0aaf917337f495993bf024310263c Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 23 03:41:44 2025 -0500 Fixes THE bug commit 9fa906740bcb751adc3870a7dca85bdd60cc95d1 Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 23 03:04:24 2025 -0500 Undo tester changes commit 120d91f739d8e6d167240a70c7a9f8c5a2657f2a Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 23 02:58:27 2025 -0500 Viola? commit 024f9c1042237ecc15af4041e17b96d0a0efd4fa Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 23 02:24:12 2025 -0500 Add debug statments for dest_info commit 961499146e7d85a82764df51592541c5f0149854 Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 23 02:08:36 2025 -0500 Flip ctx destory commit b0fc2833a82d3ea7ceaa086f8715a3fc06325c9b Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 23 01:56:13 2025 -0500 Move ctx out of shared memory commit bd77f4cc7883175debea18480bebbd5c363ea5a0 Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 23 01:43:32 2025 -0500 Add a second context create commit c43d26f67a301fecc1d5c4f1039d29ef415aaf5c Author: Brandon Potter <brandon.potter@amd.com> Date: Tue Apr 22 20:04:33 2025 -0500 Simplify CQE checks commit e1b384a980958410e93ec0c5e79572dad40698f2 Author: Brandon Potter <brandon.potter@amd.com> Date: Tue Apr 22 18:09:28 2025 -0500 Use DPRINTF instead of printf commit dc7b6304076f9b4698fb7b3dee58aa33d1610e97 Author: Brandon Potter <brandon.potter@amd.com> Date: Tue Apr 22 13:13:35 2025 -0500 Remove ibv_fork_init commit 3ae9e9815095acd2403fe1fef189793e73d996d2 Author: Brandon Potter <brandon.potter@amd.com> Date: Sat Apr 19 21:15:48 2025 -0500 Try to use hipHostMalloc commit 70e1ff54868b891d899095e693fad116ca427e32 Author: Brandon Potter <brandon.potter@amd.com> Date: Sat Apr 19 21:05:11 2025 -0500 Use hipHostMalloc instead of default allocator commit ea35cf47976957b28e896edb39f60f616579eed2 Author: Brandon Potter <brandon.potter@amd.com> Date: Sat Apr 19 16:05:20 2025 -0500 rkey/lkey debug commit 5837f148df96a8b71cd23a6a8fba5ab475656cf0 Author: Brandon Potter <brandon.potter@amd.com> Date: Sat Apr 19 15:32:46 2025 -0500 Convert rkey/lkey back to BE commit 22a916d565dde29725d562280021ede456144634 Author: Brandon Potter <brandon.potter@amd.com> Date: Sat Apr 19 15:29:35 2025 -0500 rkey/lkey debug commit d2995708b818b6fea0d6d2b8649deafaa89faf7e Author: Brandon Potter <brandon.potter@amd.com> Date: Sat Apr 19 14:49:03 2025 -0500 Add monitor thread commit b3036fe91a00f81e13a9b6ff8b973e3c5e9a59bc Author: Brandon Potter <brandon.potter@amd.com> Date: Sat Apr 19 11:50:31 2025 -0500 Add more debug messages commit bba391f662c5e0d08483edeb72be4c5e8a09dd47 Author: Brandon Potter <brandon.potter@amd.com> Date: Sat Apr 19 11:17:53 2025 -0500 Minor changes to debug statements commit 2d87c1185474089abf89b8f58733f8bea4c73bda Author: Brandon Potter <brandon.potter@amd.com> Date: Sat Apr 19 10:41:42 2025 -0500 Allocate network queue pair memory in host memory commit 8bda2c170e125f325a68212c049a422aefa43c63 Author: Brandon Potter <brandon.potter@amd.com> Date: Fri Apr 18 01:29:37 2025 -0500 dbrec debugging commit 74eaae19d999e20f231f40eebeed11211f56891b Author: Brandon Potter <brandon.potter@amd.com> Date: Fri Apr 18 01:02:19 2025 -0500 Dump qp debug info commit 3bea876d7f9f9729dbf143a8457327f9696741c7 Author: Brandon Potter <brandon.potter@amd.com> Date: Fri Apr 18 00:33:29 2025 -0500 More debug info commit 058eb7a1f66f5fb410eefdcfe38f710f8c95b3de Author: Brandon Potter <brandon.potter@amd.com> Date: Thu Apr 17 23:48:16 2025 -0500 Debug information commit def7da96d8b4d78be628561aa7b692866ce5f56e Author: Brandon Potter <brandon.potter@amd.com> Date: Thu Apr 17 23:20:06 2025 -0500 Change init attr cap commit 3de749f72e7f302773e15dcc5d528927f9b1cc97 Author: Brandon Potter <brandon.potter@amd.com> Date: Thu Apr 17 23:08:57 2025 -0500 Bugfix on param type commit bd1c0db5b035d187d4f34c70b660e6bd1600d882 Author: Brandon Potter <brandon.potter@amd.com> Date: Thu Apr 17 23:05:13 2025 -0500 More debug commit 71472506517da890d635d75fc3e06d182f29aa52 Author: Brandon Potter <brandon.potter@amd.com> Date: Thu Apr 17 22:20:01 2025 -0500 Debug effort commit ae2cf6aa89818203dc21ad14015e3aca0c89d193 Author: Brandon Potter <brandon.potter@amd.com> Date: Thu Apr 17 14:14:25 2025 -0500 Remove unused functions commit 483f12cac9df35f8482b369b02d28b9dffb48bba Author: Brandon Potter <brandon.potter@amd.com> Date: Thu Apr 17 14:03:08 2025 -0500 Remove host-side calls into the qps commit f77a4f360e25cc4a9eb5d6d8adc906b2a061f1a4 Author: Brandon Potter <brandon.potter@amd.com> Date: Thu Apr 17 13:07:30 2025 -0500 Add device object file commit 109c2e42889c475c5fe95c91de2af1b5e94ed2bb Author: Brandon Potter <brandon.potter@amd.com> Date: Thu Apr 17 13:07:16 2025 -0500 Add ticket mutex file commit 59c36a2e7a13c558858efc9bcaffb960af0f7fb5 Author: Brandon Potter <brandon.potter@amd.com> Date: Thu Apr 17 13:05:59 2025 -0500 Try to protect doorbell with mutex commit 579601dee2f21cf9e818aec9629d541f9b44a28a Author: Brandon Potter <brandon.potter@amd.com> Date: Thu Apr 17 12:03:31 2025 -0500 Cleanup doorbell ringing code commit 82e446dd3d73d496d9479ffde04dfea6bbd30304 Author: Brandon Potter <brandon.potter@amd.com> Date: Thu Apr 17 10:43:03 2025 -0500 more doorbell prints commit 1d428caa847a82db086cf371f1c2a1b10c2c5c10 Author: Brandon Potter <brandon.potter@amd.com> Date: Thu Apr 17 10:33:01 2025 -0500 Add print statements commit abdd15872434d9c73d4eac54aacd31473e3ab654 Author: Brandon Potter <brandon.potter@amd.com> Date: Thu Apr 17 10:14:04 2025 -0500 Increase blueflame back to two reg and add prints commit 8ef245a39fbb8d13b851f99406a10d3a7d6df7cd Author: Brandon Potter <brandon.potter@amd.com> Date: Thu Apr 17 10:02:52 2025 -0500 Add print statuements commit 5a8874866c8e6e53601d733d153c858cbc77c417 Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 16 20:52:46 2025 -0500 Minor modifications to printf debug commit 61a02e8cce0a9c9e9832fc8b742c1f7e780a4e66 Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 16 20:46:54 2025 -0500 Remove ipc unit tests commit bdba6adba80fb734350ff89d241c98e4e5c471fa Author: Brandon Potter <brandon.potter@amd.com> Date: Tue Apr 15 19:33:53 2025 -0500 Add print statements commit 238c65bc60e94dee32282277af9283a4e04beba3 Author: Brandon Potter <brandon.potter@amd.com> Date: Tue Apr 15 16:25:00 2025 -0500 Remove optional doorbell ringing support commit 8aae494e048888a793be6ab56c21f0577785624b Author: Brandon Potter <brandon.potter@amd.com> Date: Tue Apr 15 14:40:51 2025 -0500 Only allocate space for one blueflame register commit 591b45b553712cdbd7d452e7b2e40386edd659eb Author: Brandon Potter <brandon.potter@amd.com> Date: Tue Apr 15 14:25:05 2025 -0500 Convert protected members to private commit 8024ba1e5f5b560523c6f7c9584e9681eb6b36a3 Author: Brandon Potter <brandon.potter@amd.com> Date: Tue Apr 15 13:53:25 2025 -0500 Fixes commit 5d427906c41bc1936b8a3156da9dbfd28a84ced7 Author: Brandon Potter <brandon.potter@amd.com> Date: Tue Apr 15 12:13:49 2025 -0500 Debug - omit address commit 110d98b48d5bddd2b6b9898912eed69999b34c46 Author: Brandon Potter <brandon.potter@amd.com> Date: Tue Apr 15 12:05:02 2025 -0500 Uncomment some code commit de6be1a04290c42803028e5927fc09ead978ed2d Author: Brandon Potter <brandon.potter@amd.com> Date: Tue Apr 15 11:39:41 2025 -0500 Modify print commit d6a1d2115c2c24a59699a9b4a8d9d83ef09cf694 Author: Brandon Potter <brandon.potter@amd.com> Date: Tue Apr 15 11:36:37 2025 -0500 Change tester arguments commit b0ce33992a932638d7e7632b20874e1f6f7fb337 Author: Brandon Potter <brandon.potter@amd.com> Date: Tue Apr 15 11:34:18 2025 -0500 Add prints commit a3e6111259fcce8fa21bc55050d8e5ec639f8956 Author: Brandon Potter <brandon.potter@amd.com> Date: Tue Apr 15 11:32:05 2025 -0500 Add print statements commit a6da7c32bb11f33046a6c82f2c607b1698f1486f Author: Brandon Potter <brandon.potter@amd.com> Date: Tue Apr 15 11:27:47 2025 -0500 Add device-side print commit 9a4a79a9a00f80b19d8859dc7df83e4062fdf301 Author: Brandon Potter <brandon.potter@amd.com> Date: Tue Apr 15 11:18:04 2025 -0500 Add wqe debug host print commit 6c8bb7cd7db827bb8f600e772b8938249f217074 Author: Brandon Potter <brandon.potter@amd.com> Date: Tue Apr 15 10:55:13 2025 -0500 Initialize wqe fields without host post call commit 4f7c7b94a23907205295ef8aa329b1b9576a7308 Author: Brandon Potter <brandon.potter@amd.com> Date: Tue Apr 15 10:30:06 2025 -0500 Remove endian conversion since it's done on host commit 2e97c16adf59c4bb6d8e63ef246ad07e5c423c04 Author: Brandon Potter <brandon.potter@amd.com> Date: Tue Apr 15 10:27:27 2025 -0500 Set rkey/lkey using backend commit 5b40fab11d692257a18c0e410aeb0f30415cdc94 Author: Brandon Potter <brandon.potter@amd.com> Date: Tue Apr 15 09:02:51 2025 -0500 bugfix endian commit bfad8ff80cf079ac7df57a62da0f656d1d84d798 Author: Brandon Potter <brandon.potter@amd.com> Date: Tue Apr 15 08:59:08 2025 -0500 endian conversion commit 2d2405c4eb4de901f2363639c82041e8a16be803 Author: Brandon Potter <brandon.potter@amd.com> Date: Tue Apr 15 08:38:14 2025 -0500 Enable tester commit d9a992511993d66c860de9e6ddc506e5a4e87f50 Author: Brandon Potter <brandon.potter@amd.com> Date: Tue Apr 15 08:31:39 2025 -0500 Add in rkey/lkey writes commit f04aad5cc99fc79b8884795fcadc3b6f16b43af3 Author: Brandon Potter <brandon.potter@amd.com> Date: Tue Apr 15 08:28:15 2025 -0500 Add rkey/lkey check commit c39f781f343053f83a9c2bf4bbb24d4a1fa13368 Author: Brandon Potter <brandon.potter@amd.com> Date: Tue Apr 15 08:13:29 2025 -0500 Add documentation, psuedocode, and modify commit f4397c0451d8c9d2e0e8184adb904db06dfb02aa Author: Brandon Potter <brandon.potter@amd.com> Date: Mon Apr 14 21:32:20 2025 -0500 Finish removing fence commit 93144311c33ed86f77bb903e2340690ce38f2271 Author: Brandon Potter <brandon.potter@amd.com> Date: Mon Apr 14 21:20:23 2025 -0500 Remove fence commit 149ad98c6e55fa1f7ce44d6d58eb404a0862bd84 Author: Brandon Potter <brandon.potter@amd.com> Date: Mon Apr 14 20:53:42 2025 -0500 Style change commit 5608d393d52b79c3b0c2ebeb3d5aefb00732d563 Author: Brandon Potter <brandon.potter@amd.com> Date: Mon Apr 14 20:50:51 2025 -0500 Remove comments commit 419fc03139e29cb0de86336448d2aed1425041df Author: Brandon Potter <brandon.potter@amd.com> Date: Mon Apr 14 20:42:45 2025 -0500 Straight line code commit eec14a54211f9da876d34fc368c58d3dff3d9032 Author: Brandon Potter <brandon.potter@amd.com> Date: Mon Apr 14 18:01:42 2025 -0500 Remove singlethreadpolicy commit a6f7023fd13c14af1c1870990084471a54567b17 Author: Brandon Potter <brandon.potter@amd.com> Date: Mon Apr 14 17:49:00 2025 -0500 Minor fixes commit ed51cca1d5f3f195c0ea98a8162a9b2aac6dbbb8 Author: Brandon Potter <brandon.potter@amd.com> Date: Mon Apr 14 14:49:32 2025 -0500 Style changes for backend commit aa1928382a8841c158f3be3715f0a806657f742b Author: Brandon Potter <brandon.potter@amd.com> Date: Mon Apr 14 14:28:05 2025 -0500 Minor fixes commit 039b3b6a168887190be3a3bbdd3a3d65920a1e2e Author: Brandon Potter <brandon.potter@amd.com> Date: Mon Apr 14 14:18:19 2025 -0500 Remove inlining mechanism commit 68c5dc8f6522a40da025b8163424c0e48911c5b9 Author: Brandon Potter <brandon.potter@amd.com> Date: Mon Apr 14 13:59:43 2025 -0500 Remove unused header file commit 377a1fc3a6dda454a1c4efd288e8efaf3226e205 Author: Brandon Potter <brandon.potter@amd.com> Date: Mon Apr 14 13:54:31 2025 -0500 Fix comment and variable name commit ef32def751f7c4c3aaf2e43b38ecc220568f9e3e Author: Brandon Potter <brandon.potter@amd.com> Date: Mon Apr 14 13:31:05 2025 -0500 Encapsulate members in queue_pair commit 079c9b337907d331095156da6468007b43d77d85 Author: Brandon Potter <brandon.potter@amd.com> Date: Mon Apr 14 13:19:17 2025 -0500 Style change commit d5ea67eb8e949003c2f7f2dcc737f7b8e3e34df2 Author: Brandon Potter <brandon.potter@amd.com> Date: Mon Apr 14 13:16:06 2025 -0500 Cleanup for queue_pair class commit 3bed59f7dd8dbab2716ae5d67368f673924c9f63 Author: Brandon Potter <brandon.potter@amd.com> Date: Mon Apr 14 12:53:33 2025 -0500 Add documentation for segments commit f25d2581db56e2a3495d64fb889a1ed5fb099069 Author: Brandon Potter <brandon.potter@amd.com> Date: Mon Apr 14 12:10:35 2025 -0500 Remove unused struct commit 1939224c0777d65512d8b255ec2f58afbd270910 Author: Brandon Potter <brandon.potter@amd.com> Date: Mon Apr 14 12:07:33 2025 -0500 Remove method commit 7ef084ebcaabad6d5ace155b5642e7720cd7b90d Author: Brandon Potter <brandon.potter@amd.com> Date: Mon Apr 14 12:04:16 2025 -0500 Remove unused variable commit 3f7f356d499b1628abc31b13cea5305a9aecf1de Author: Brandon Potter <brandon.potter@amd.com> Date: Mon Apr 14 11:52:19 2025 -0500 Cleanup files commit e9ee4bf908a83a89d0176fad034b53b27108e33a Author: Brandon Potter <brandon.potter@amd.com> Date: Mon Apr 14 11:35:56 2025 -0500 Style changes for queue_pair and segment_builder commit b9a697901ada72c4b00b96c3b3df99301de4fce7 Author: Brandon Potter <brandon.potter@amd.com> Date: Mon Apr 14 11:16:14 2025 -0500 Remove weird + 1 offset commit 0ea6f3438f31b19b1cfba1dd17ceeb736b5c90d5 Author: Brandon Potter <brandon.potter@amd.com> Date: Mon Apr 14 11:13:49 2025 -0500 Rename sq fields commit f8d7ca9bf0fe22f1108e0b938ac3e4a6d1e0ac87 Author: Brandon Potter <brandon.potter@amd.com> Date: Mon Apr 14 10:55:31 2025 -0500 Remove unused headers commit 840eb360a2323687aba187a302136d02b2a7bf2d Author: Brandon Potter <brandon.potter@amd.com> Date: Mon Apr 14 10:49:30 2025 -0500 Cleanup gpu_ib context files commit 3349683a72eebaca70880058547e11ac6e9a21d6 Author: Brandon Potter <brandon.potter@amd.com> Date: Sun Apr 13 23:03:50 2025 -0500 Continue document MLX structures commit 000b54b8dd123c2e9913467b51e954700ae328ea Author: Brandon Potter <brandon.potter@amd.com> Date: Sun Apr 13 22:46:54 2025 -0500 Document gpu queue-pair MLX structures commit 3151fd34c06b35fbf93f9a707924119c472a7b51 Author: Brandon Potter <brandon.potter@amd.com> Date: Sun Apr 13 22:08:21 2025 -0500 Bugfix for host RDMA_WRITE WQEs commit 441fa32e5031a3eab71468b76c94ad3de4456ebb Author: Brandon Potter <brandon.potter@amd.com> Date: Sun Apr 13 22:04:16 2025 -0500 Add host-side initial RDMA_WRITE WQEs back commit 1afe02368fb862f5b8ceb65c1bd9e504d20f3a74 Author: Brandon Potter <brandon.potter@amd.com> Date: Sun Apr 13 21:46:16 2025 -0500 Try to remove host-side post_wqe commit 3bb4b5527e8b17a4a66ab2544031ee2a73f36bbd Author: Brandon Potter <brandon.potter@amd.com> Date: Sun Apr 13 21:39:40 2025 -0500 Always allocate queues in gpu memory commit 4aca3989ec244012f32dc401c53eb5f5263cf04e Author: Brandon Potter <brandon.potter@amd.com> Date: Sun Apr 13 21:16:14 2025 -0500 Bugfix for connection class commit ceb0ebeb387f71a622ac10c0f276725cfe654d43 Author: Brandon Potter <brandon.potter@amd.com> Date: Sun Apr 13 21:07:45 2025 -0500 Refactor connection class commit 8b31d8927c021ed07d2766b9d05d155162df8fd3 Author: Brandon Potter <brandon.potter@amd.com> Date: Sun Apr 13 20:53:28 2025 -0500 Refactor some files commit 2998e9bf0565bf29ef39306445990034b30389a8 Author: Brandon Potter <brandon.potter@amd.com> Date: Sun Apr 13 20:18:05 2025 -0500 Update connection class commit 519e9f7538160f0d2246c189a635528c39841475 Author: Brandon Potter <brandon.potter@amd.com> Date: Sun Apr 13 19:12:06 2025 -0500 Cleanup connection and network classes commit c49d2e7097218cd6a7ef8ce565a8811a4b6061cb Author: Brandon Potter <brandon.potter@amd.com> Date: Sun Apr 13 18:50:50 2025 -0500 Remove unused member commit a2e2bd020b821176923ac62ca5f57645a8113f93 Author: Brandon Potter <brandon.potter@amd.com> Date: Sun Apr 13 17:54:29 2025 -0500 Add uncached heap option commit 629099694ed8d3a6c7480a6c342086676d6a8a9f Author: Brandon Potter <brandon.potter@amd.com> Date: Sun Apr 13 17:05:53 2025 -0500 Device mem for cq/sq queues commit 768a2211f36092b5fb869cb491dc24b4d65a2991 Author: Brandon Potter <brandon.potter@amd.com> Date: Sun Apr 13 16:49:40 2025 -0500 Change heap allocation policies commit 6afa39dbf092dcfede2f5d39ae1d4ebe2e350fdc Author: Brandon Potter <brandon.potter@amd.com> Date: Sun Apr 13 16:15:36 2025 -0500 Remove compile options and cleanup commit 40a5c52cedd993d4d47196916f2a551555f1e007 Author: Brandon Potter <brandon.potter@amd.com> Date: Sun Apr 13 15:24:29 2025 -0500 Cleanup coalescer files commit 37ea4b4332f2efd14b789a77259030bbe68ec77f Author: Brandon Potter <brandon.potter@amd.com> Date: Sun Apr 13 15:20:26 2025 -0500 Cleaup files commit 9742294b1356eed3978af8a9a7c58a717df2feb2 Author: Brandon Potter <brandon.potter@amd.com> Date: Sun Apr 13 15:11:17 2025 -0500 Cleanup rocshmemgpu and team files commit a9c11ce854be01399ebcbc7eb510ea6cab984c4e Author: Brandon Potter <brandon.potter@amd.com> Date: Sun Apr 13 14:56:34 2025 -0500 Cleanup gpu ib team files commit cca52a6656b8d7dece78556c60b80deb55e4cccc Author: Brandon Potter <brandon.potter@amd.com> Date: Sun Apr 13 14:38:53 2025 -0500 Add inline and cleanup commit 87cde8bfa470d0f90cf9f32af1014437cd9d29f7 Author: Brandon Potter <brandon.potter@amd.com> Date: Sun Apr 13 14:25:27 2025 -0500 Cleaup file commit d7d619b6d647a84b9fa0b2e90ffdeef32d7c3c04 Author: Brandon Potter <brandon.potter@amd.com> Date: Sun Apr 13 13:50:20 2025 -0500 Cleanup host files commit 8edc427a9185b4b0c9abe5bfb8a33cc5d597faa1 Author: Brandon Potter <brandon.potter@amd.com> Date: Sun Apr 13 13:18:50 2025 -0500 Minor style changes to context_device commit 66c8eb38303005628404641371f92e19060fcfbb Author: Brandon Potter <brandon.potter@amd.com> Date: Sun Apr 13 13:03:30 2025 -0500 Remove unused constants commit 2ccb34ee6046790ba1b0266deb91a0ede2bc17b0 Author: Brandon Potter <brandon.potter@amd.com> Date: Sun Apr 13 12:57:57 2025 -0500 Remove unnecessary init functions commit b9a0ac2b8416b858124079f7e9a3cf3ad251bd4c Author: Brandon Potter <brandon.potter@amd.com> Date: Sun Apr 13 12:45:48 2025 -0500 Remove manage memory stubs commit 6143152a019e3ad0491233e5554ce4c4422b7e91 Author: Brandon Potter <brandon.potter@amd.com> Date: Sun Apr 13 12:34:55 2025 -0500 Remove comment commit f56a95acf6aa58ffb21fddacc7ccf7fd570c6d30 Author: Brandon Potter <brandon.potter@amd.com> Date: Sun Apr 13 12:23:41 2025 -0500 Remove unused ThreadImpl types commit e9bb49011f16364d0c3f8644733792768a4bc946 Author: Brandon Potter <brandon.potter@amd.com> Date: Sun Apr 13 12:03:34 2025 -0500 Move constant into different file commit af0340d5d299205b9f9f1b285a74cdad33e929e7 Author: Brandon Potter <brandon.potter@amd.com> Date: Sun Apr 13 11:53:50 2025 -0500 Remove g_ret mechanisms commit 72d86f16e8d1df6a6d7fa5adba7b74e0b276c488 Author: Brandon Potter <brandon.potter@amd.com> Date: Sun Apr 13 11:36:42 2025 -0500 Remove unused externSharedBytes method commit c4f6e08c63dcef653b00459a6555a8552fbe616d Author: Brandon Potter <brandon.potter@amd.com> Date: Sun Apr 13 11:25:52 2025 -0500 Remove unused variables commit 4cde4a4cc2300dd5733a10dca98ca0ea860c737f Author: Brandon Potter <brandon.potter@amd.com> Date: Fri Apr 11 11:15:24 2025 -0500 Tear out internal references to removed atomics commit d335eac148b5bc3e4d828b530b16d1498e2999d1 Author: Brandon Potter <brandon.potter@amd.com> Date: Fri Apr 11 10:57:54 2025 -0500 Remove unused atomic types commit 0089c11fb0bbb538f70e1c95637dc77b7d5687cf Merge: 5b265666 b8dc6a2e Author: Brandon Potter <BKP@users.noreply.github.com> Date: Fri Apr 11 10:18:00 2025 -0500 Merge pull request #71 from Yiltan/yiltan-cleanup-april-11 Yiltan cleanup april 11 commit b8dc6a2edf380c538659724f693f635d9959f049 Author: Yiltan <yiltan@amd.com> Date: Fri Apr 11 10:09:02 2025 -0500 removed unused collevtive buffers commit 4e459483c2402712200237fb6cf32709d28b2a70 Author: Yiltan <yiltan@amd.com> Date: Fri Apr 11 10:02:05 2025 -0500 removed USE_SINGLE_NODE commit cf262a984e91486dd4324af2b7130052fe081966 Author: Yiltan <yiltan@amd.com> Date: Fri Apr 11 10:00:38 2025 -0500 removed network impl off commit 4e6188d2875cd9c19ed2ef346e87e011a29c020b Author: Yiltan <yiltan@amd.com> Date: Fri Apr 11 08:42:38 2025 -0500 removed reliable connection into connection commit 274f68f0d165486f1734945c24a8d054a926d3c4 Author: Yiltan <yiltan@amd.com> Date: Fri Apr 11 08:37:34 2025 -0500 remove rocshmem_calc.hpp commit 97c7ad44b205aa8fa7ae1127d1b0e8d60803c977 Author: Yiltan <yiltan@amd.com> Date: Fri Apr 11 08:33:17 2025 -0500 Removed more unused files commit 5b265666f561db70508094c472af3f655551d9dd Author: Brandon Potter <brandon.potter@amd.com> Date: Thu Apr 10 16:58:20 2025 -0500 Remove straggler wait_until variants commit 13bdb0c58e5256f7ee36dc99453649b083e649df Author: Brandon Potter <brandon.potter@amd.com> Date: Thu Apr 10 16:51:14 2025 -0500 Remove get variants commit 6f7ac10561a325059b395f9ed7a903ae468eda69 Author: Brandon Potter <brandon.potter@amd.com> Date: Thu Apr 10 14:42:32 2025 -0500 Remove unnecessary interfaces commit df386a2c9e27bb34b0aaf890b355f31ca396b438 Author: Brandon Potter <brandon.potter@amd.com> Date: Thu Apr 10 12:12:43 2025 -0500 Tear out SYNC, WG_RMA, related functional tests commit af6dcfdcb0ae88ccebbacfc4a3a2f2a0a1eebb49 Author: Brandon Potter <brandon.potter@amd.com> Date: Thu Apr 10 11:13:55 2025 -0500 Tear out signal ops from include and dependencies commit fc78420fe44b0be6667aa359dc8eec9f9ed6c306 Author: Brandon Potter <brandon.potter@amd.com> Date: Thu Apr 10 10:46:37 2025 -0500 Remove debug header commit 583edb968262bb14b4a72c62324760051016c0b6 Author: Brandon Potter <brandon.potter@amd.com> Date: Thu Apr 10 10:42:21 2025 -0500 Tear out collectives from include and dependencies commit a47c80e4e2c1a12315598146b5a0f8f4133e78f3 Author: Brandon Potter <brandon.potter@amd.com> Date: Thu Apr 10 08:46:57 2025 -0500 Remove empty RC functions commit 47e5387a42eec1b31fa152832cef0062311875ae Author: Brandon Potter <brandon.potter@amd.com> Date: Thu Apr 10 08:36:03 2025 -0500 Remove qe dumper and debug commit 10baf8d2471a0d0ac4de141ba11ca96a9e7d6a4c Author: Brandon Potter <brandon.potter@amd.com> Date: Thu Apr 10 08:32:06 2025 -0500 Remove helper_macros header since dependencies removed commit 5a13783e7e3519444723308710452a25edb1e98f Author: Brandon Potter <brandon.potter@amd.com> Date: Thu Apr 10 08:28:41 2025 -0500 Remove dev_mono_linear strategy commit b5666de736ff8d274eff0eef038b761383cc9419 Author: Brandon Potter <brandon.potter@amd.com> Date: Thu Apr 10 08:20:40 2025 -0500 Remove container strategies commit 6a3adf5ca423fddf3e38f6a2746690054b706841 Author: Brandon Potter <brandon.potter@amd.com> Date: Thu Apr 10 08:15:05 2025 -0500 Remove bitwise gtest and matrix container commit 8346c73a6ac79b35a0372ac19222a81d9f287bd9 Author: Brandon Potter <brandon.potter@amd.com> Date: Thu Apr 10 08:10:18 2025 -0500 Remove array container commit 7bb8e7efa0d35727f911bd9eccb4eb0b37dc9d59 Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 9 23:46:33 2025 -0500 Remove DC transport files commit 9f9de2338eff67d7984a8c8bf197e83bdc088bc4 Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 9 22:37:33 2025 -0500 Remove relative pathing for includes commit 92fe8e5bd94a788785aa61e1aec5d023f451baf4 Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 9 22:07:21 2025 -0500 Remove todo notes commit 7202d143147cea0240b80f89b95fb1e7a16401b1 Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 9 21:58:23 2025 -0500 Remove extra line commit 615236311a6b54a4fae13e19f0ea7b9d4796c5d4 Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 9 21:49:29 2025 -0500 Merge backend classes commit 7e9a7b1c46be201d8822b284b1f049d4c9737b2e Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 9 14:19:27 2025 -0500 Remove USE_RO and USE_IPC conditions commit adebaa285f29a8fdefe446b859e6df1957f81ae5 Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 9 13:52:36 2025 -0500 Tear out IPC call points commit a9f912bbca0fc71efa3381980ae3da789f79d637 Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 9 10:28:48 2025 -0500 Tear out hdp_policy commit b8de7035a738076a3c8e7c05f307505834c14e60 Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 9 10:19:29 2025 -0500 Convert backend_type to GPUIB only commit 582cfcde612f86c3987cd411c29e4992e0125847 Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 9 10:06:24 2025 -0500 Tear out IPC conduit commit dcab3a2b268d1724a0649f82204d73884c3160f0 Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 9 10:01:55 2025 -0500 Tear out RO conduit commit 7226902bc4a50a6223c665af2e0607607af77a6b Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 9 09:57:25 2025 -0500 Tear out atomic and notifier files commit 4295c43867294887b12711848a36b7d98e272066 Merge: d7809b3b 99942d91 Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 9 09:13:31 2025 -0500 Merge branch 'gpu-ib-working-draft' into bpotter/gpuib_mini commit 99942d919baedb27cf7d33962a35667f749d02ee Merge: 42fa4e9f dc61fb61 Author: Yiltan <ytemucin@amd.com> Date: Wed Apr 9 10:09:19 2025 -0400 Merge pull request #67 from Yiltan/gpu-ib-working-draft Removed HDP code and error checking to ibv_* functions commit d7809b3b5f86a6d5fdf704af36628be80a8e68ab Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 9 08:40:46 2025 -0500 Remove unused wrapper class commit e12c08e6fa5caf08a2ce5ec1a1be84e4b3bf8dcc Author: Brandon Potter <brandon.potter@amd.com> Date: Wed Apr 9 00:05:57 2025 -0500 Remove unused EBO spinlock commit 483ec9dc43195a9322f630f3e164359d95d9d3bc Author: Brandon Potter <brandon.potter@amd.com> Date: Tue Apr 8 23:56:15 2025 -0500 Remove slab heap commit d0c0991d42eec0756c9e6778a315b1a94a467744 Author: Brandon Potter <brandon.potter@amd.com> Date: Tue Apr 8 23:46:33 2025 -0500 Remove unused unit test for ipc commit 4f1661199c911924f9b1d3349231ff11769476c8 Author: Brandon Potter <brandon.potter@amd.com> Date: Tue Apr 8 23:44:45 2025 -0500 Fix store_asm function and util memcpy funcs commit dd72a4f4e28f4c2f0d99910a88e8457fa6564f7e Author: Brandon Potter <brandon.potter@amd.com> Date: Tue Apr 8 23:05:42 2025 -0500 Replace wallClk code with hip function commit 89e19320818902d4d7062f47afd7d39dcb8686c7 Author: Brandon Potter <brandon.potter@amd.com> Date: Tue Apr 8 22:44:11 2025 -0500 Remove unused __read_clock function commit a3b67d91ef02bcfbe626f8f6fb8ab9ab1bca610d Author: Brandon Potter <brandon.potter@amd.com> Date: Tue Apr 8 22:18:13 2025 -0500 Remove unused forward_list commit f6767d8a48afc7f0a90193bfa58aa0dbf0c40d16 Author: Brandon Potter <brandon.potter@amd.com> Date: Tue Apr 8 22:01:15 2025 -0500 Disable verification of functional tests commit 6158e946e29989f22d7fdc922e41c5402f063617 Author: Brandon Potter <brandon.potter@amd.com> Date: Tue Apr 8 22:00:52 2025 -0500 Increase functional test loop size to 200 commit dc61fb6102d30af41bc0dae6d9eb7ba40bc75394 Author: Yiltan <yiltan@amd.com> Date: Mon Apr 7 12:55:32 2025 -0500 Added error checking to verbs functions commit 8df72764da0447ecc91d9eb59fe01c3cd66b7f46 Author: Yiltan <yiltan@amd.com> Date: Mon Apr 7 10:21:06 2025 -0500 removed unused file commit f6beb9ef97e164b38645301a4dec075410eb41ee Author: Yiltan <yiltan@amd.com> Date: Mon Apr 7 10:19:07 2025 -0500 removed hdp comments commit bf032889e27bb1611dfd07d0d69c5dbe45a93ccd Author: Yiltan <yiltan@amd.com> Date: Mon Apr 7 09:54:46 2025 -0500 fixed dc commit 42fa4e9fe2137e595ba47e02d271887cc6a8dd28 Author: Yiltan <yiltan@amd.com> Date: Thu Apr 3 14:17:54 2025 -0500 cant lock cq if on device mem commit 27dd09ca5f92538f505223840b34c4fef70b812e Author: Yiltan <yiltan@amd.com> Date: Thu Apr 3 08:06:37 2025 -0500 null ptr commit f5f0efe88869faa2e3549c11fd6082553a548ed9 Author: Yiltan <yiltan@amd.com> Date: Wed Apr 2 15:25:44 2025 -0500 comment out hdp commit 0b631d593cad7ed7ca58716d5678ee41dc0c43c2 Author: Yiltan <yiltan@amd.com> Date: Wed Apr 2 12:50:00 2025 -0500 GPU_IB Compiles commit 9ba9b1fb6299491b54dd9a328df4702931947a05 Author: Yiltan <yiltan@amd.com> Date: Wed Apr 2 10:07:04 2025 -0500 Add GPU IB back * Revert "Only issue a single completion per wavefront (#199)" (#205) This reverts commit 90761d552392ca1f5261fec2e6a08455b0ebc368. (cherry picked from commit 99b4c93e1f8c9177bf1c236b86732c1209847519) * GDA Cmake modifications, move topology to gpu_ib specific folder * Do not use ../thing.h * Use WF_SIZE: AMDGCN_WAVEFRONT_SIZE is deprecated * 2-way merge between context_ipc and context_gpuib * Select MTU based on network config (#214) * rocSHMEM GDA BNXT POC (#213) * rocSHMEM GDA PoC for Thor 2 (233.2.76.0) (cherry picked from commit d0d5c51528e362858f5dc38a46d8214ac519b044) * Rename gpu_ib to gda * Renaming part2: includes and cmakery * Fix DISPATCH macro; use backend_comm when needed; some GPUDevices where left * Consolidate GDA_CHECK_NNULL/CHECK_ZERO/CHECK_HIP to look and feel similar * Update copyrights to the new style * Rework default-ctx init, missing heap init, missing qpe field * backend_gda: single init, use systematic naming for setup/cleanup, prefix team structures, * setup_wrk_psync must precede setup_teams etc * silence recasting error * Some remnants of GDADevice and missing friend classes, public some fields, it compiles * Fix redefinitaion of CHECK_HIP in functional testers, we still have a duplicate definition that would probably be better having only one * typo in backend_type * Undo unneeded change to functional test driver * Add -lnuma * ctx must be initialized after qps * gda: Disable non-functional tests (#216) * Do not try to run functional tests that are not implemented * Revert "Increase functional test loop size to 200" This reverts commit 6158e946e29989f22d7fdc922e41c5402f063617. * Make a specific test case for gda * Disabled further tests that do not currently pass with explanation as to why disabled (cherry picked from commit 27c5c6ff09f259e2b59fbe5934b88751ba47cbfc) * gda_devel: teams with MPI initalization (#229) * Fix missing communicator initialization * Reenable team functional testers --------- Co-authored-by: Edgar Gabriel <Edgar.Gabriel@amd.com> (cherry picked from commit 8d700a986f5e64e40b45babddc8e84d8d8028dea) * [GDA] Query for the correct GID index (#215) * Added GID query code for CX7/Thor2 NICs (cherry picked from commit 2bc0d6c719a0f43955cef1bbcec77261ae797e54) * Reorder code to make ipc and gda more similar * Do not double free Wrk_Sync, uniform styling with ipc * Remove unused includes * Abort when using not-implemented device functions * BNXT Compiles * Silence compiler warnings * Cleanup unused .h * Uniform indentation between ipc and gda * gda: add cleanups, address todos * Disable pingpong tests, enable defaultctxtest * Reenable testing non-fetching amos * build scripts: use a single script backed for all gda variants enable configuring INSTALL_PREFIX and BUILD_TYPE from the command line same order in all scripts * fix: prevent double free in `GDADefaultContextProxy` with custom move assignment * The default move assignment, invoked during initialization of `default_context_proxy_`, caused the default context’s QPs to be freed prematurely because the destructor is triggered by the xrvalue after initialization. * Undo changes to the amo standard tester during gda_devel dbaee371, as they cause RO failures --------- Co-authored-by: Yiltan <ytemucin@amd.com> Co-authored-by: Edgar Gabriel <Edgar.Gabriel@amd.com> Co-authored-by: Yiltan <yiltan@amd.com> Co-authored-by: avinashkethineedi <Avinash.kethineedi@amd.com> Co-authored-by: bpotter <brandon.potter@amd.com>
694 regels
27 KiB
Bash
Executable File
694 regels
27 KiB
Bash
Executable File
###############################################################################
|
|
# Copyright (c) Advanced Micro Devices, Inc. All rights reserved.
|
|
#
|
|
# SPDX-License-Identifier: MIT
|
|
#
|
|
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
# of this software and associated documentation files (the "Software"), to
|
|
# deal in the Software without restriction, including without limitation the
|
|
# rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
|
|
# sell copies of the Software, and to permit persons to whom the Software is
|
|
# furnished to do so, subject to the following conditions:
|
|
#
|
|
# The above copyright notice and this permission notice shall be included in
|
|
# all copies or substantial portions of the Software.
|
|
#
|
|
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
|
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
|
# IN THE SOFTWARE.
|
|
###############################################################################
|
|
|
|
#!/bin/bash
|
|
if true || tty -s; then
|
|
PRETTY_FAILED="\033[1;31mFAILED\033[0m"
|
|
PRETTY_PASSED="\033[1;32mPASSED\033[0m"
|
|
else
|
|
PRETTY_FAILED="FAILED"
|
|
PRETTY_PASSED="PASSED"
|
|
fi
|
|
|
|
# This names/values should match the TestType enum in rocSHMEM/tests/functional_tests/tester.hpp
|
|
declare -A TEST_NUMBERS=(
|
|
["get"]="0"
|
|
["getnbi"]="1"
|
|
["put"]="2"
|
|
["putnbi"]="3"
|
|
["amo_fadd"]="4"
|
|
["amo_finc"]="5"
|
|
["amo_fetch"]="6"
|
|
["amo_fcswap"]="7"
|
|
["amo_add"]="8"
|
|
["amo_inc"]="9"
|
|
["amo_cswap"]="10"
|
|
["init"]="11"
|
|
["pingpong"]="12"
|
|
["randomaccess"]="13"
|
|
["barrierall"]="14"
|
|
["syncall"]="15"
|
|
["sync"]="16"
|
|
["collect"]="17"
|
|
["fcollect"]="18"
|
|
["alltoall"]="19"
|
|
["alltoalls"]="20"
|
|
["shmemptr"]="21"
|
|
["p"]="22"
|
|
["g"]="23"
|
|
["wgget"]="24"
|
|
["wggetnbi"]="25"
|
|
["wgput"]="26"
|
|
["wgputnbi"]="27"
|
|
["waveget"]="28"
|
|
["wavegetnbi"]="29"
|
|
["waveput"]="30"
|
|
["waveputnbi"]="31"
|
|
["teambroadcast"]="32"
|
|
["teamreduction"]="33"
|
|
["teamctxget"]="34"
|
|
["teamctxgetnbi"]="35"
|
|
["teamctxput"]="36"
|
|
["teamctxputnbi"]="37"
|
|
["teamctxinfra"]="38"
|
|
["putnbimr"]="39"
|
|
["amo_set"]="40"
|
|
["amo_swap"]="41"
|
|
["amo_fetchand"]="42"
|
|
["amo_fetchor"]="43"
|
|
["amo_fetchxor"]="44"
|
|
["amo_and"]="45"
|
|
["amo_or"]="46"
|
|
["amo_xor"]="47"
|
|
["pingall"]="48"
|
|
["putsignal"]="49"
|
|
["wgputsignal"]="50"
|
|
["waveputsignal"]="51"
|
|
["putsignalnbi"]="52"
|
|
["wgputsignalnbi"]="53"
|
|
["waveputsignalnbi"]="54"
|
|
["signalfetch"]="55"
|
|
["wgsignalfetch"]="56"
|
|
["wavesignalfetch"]="57"
|
|
["teamwgbarrier"]="58"
|
|
["defaultctxget"]="59"
|
|
["defaultctxgetnbi"]="60"
|
|
["defaultctxput"]="61"
|
|
["defaultctxputnbi"]="62"
|
|
["defaultctxp"]="63"
|
|
["defaultctxg"]="64"
|
|
["wavebarrierall"]="65"
|
|
["wgbarrierall"]="66"
|
|
["wavesyncall"]="67"
|
|
["wgsyncall"]="68"
|
|
["teambarrier"]="69"
|
|
["teamwavebarrier"]="70"
|
|
["wavesync"]="71"
|
|
["wgsync"]="72"
|
|
["teamctxsingleinfra"]="73"
|
|
["teamctxblockinfra"]="74"
|
|
["teamctxoddeveninfra"]="75"
|
|
)
|
|
|
|
ExecTest() {
|
|
TEST_NAME=$1
|
|
NUM_RANKS=$2
|
|
NUM_WG=$3
|
|
NUM_THREADS=$4
|
|
MAX_MSG_SIZE=$5
|
|
|
|
TIMEOUT=$((5 * 60)) # Timeout in seconds
|
|
|
|
TEST_NUM=${TEST_NUMBERS[$TEST_NAME]}
|
|
|
|
if [[ "" == "$TEST_NUM" ]]
|
|
then
|
|
echo "Test $TEST_NAME does not exist" >&2
|
|
DRIVER_RETURN_STATUS=1
|
|
return
|
|
fi
|
|
|
|
if [[ "" == "$ROCSHMEM_MAX_NUM_CONTEXTS" ]]
|
|
then
|
|
ROCSHMEM_MAX_NUM_CONTEXTS=$NUM_WG
|
|
fi
|
|
|
|
# MPI Parameters
|
|
LAUNCHER=mpirun
|
|
OPTIONS=" -n $NUM_RANKS -mca pml ucx -mca osc ucx"
|
|
OPTIONS+=" -x ROCSHMEM_MAX_NUM_CONTEXTS=$ROCSHMEM_MAX_NUM_CONTEXTS"
|
|
OPTIONS+=" -x UCX_ROCM_IPC_SIGPOOL_MAX_ELEMS=16384"
|
|
OPTIONS+=" --map-by numa --timeout $TIMEOUT"
|
|
|
|
if [[ "" != "$HOSTFILE" ]]
|
|
then
|
|
OPTIONS+=" --hostfile $HOSTFILE"
|
|
fi
|
|
|
|
# Construct Test Command
|
|
TEST_LOG_NAME="$TEST_NAME"_n"$NUM_RANKS"_w"$NUM_WG"_z"$NUM_THREADS"
|
|
CMD="$LAUNCHER $OPTIONS $APP -a $TEST_NUM -w $NUM_WG -z $NUM_THREADS"
|
|
|
|
if [[ "" != "$MAX_MSG_SIZE" ]]
|
|
then
|
|
CMD+=" -s $MAX_MSG_SIZE"
|
|
TEST_LOG_NAME+=_"$MAX_MSG_SIZE"B
|
|
fi
|
|
|
|
CMD+=" >> $LOG_DIR/$TEST_LOG_NAME.log 2>&1"
|
|
|
|
# Run Test
|
|
echo $TEST_LOG_NAME
|
|
echo "# $CMD" >"$LOG_DIR/$TEST_LOG_NAME.log"
|
|
eval $CMD
|
|
|
|
# Validate Test
|
|
if [ $? -ne 0 ]
|
|
then
|
|
echo -e "$PRETTY_FAILED: $TEST_LOG_NAME" >&2
|
|
cat "$LOG_DIR/$TEST_LOG_NAME.log"
|
|
DRIVER_RETURN_STATUS=1
|
|
FAILED_LIST="$FAILED_LIST $TEST_LOG_NAME"
|
|
fi
|
|
|
|
unset ROCSHMEM_MAX_NUM_CONTEXTS
|
|
}
|
|
|
|
TestRMAPut() {
|
|
##############################################################################
|
|
# | Name | Ranks | Workgroups | Threads | Max Message Size #
|
|
##############################################################################
|
|
ExecTest "put" 2 1 1 1048576
|
|
ExecTest "put" 2 1 1024 512
|
|
ExecTest "put" 2 8 1 1048576
|
|
ExecTest "put" 2 16 128 8
|
|
ExecTest "put" 2 32 256 512
|
|
ExecTest "put" 2 64 1024 8
|
|
|
|
ExecTest "wgput" 2 1 64 1048576
|
|
ExecTest "wgput" 2 2 64 1048576
|
|
ExecTest "wgput" 2 16 64 8
|
|
|
|
ExecTest "waveput" 2 1 64 1048576
|
|
ExecTest "waveput" 2 2 64 1048576
|
|
ExecTest "waveput" 2 2 128 1048576
|
|
ExecTest "waveput" 2 16 128 8
|
|
|
|
ExecTest "defaultctxput" 2 4 128 1024
|
|
ExecTest "teamctxput" 2 4 128 1024
|
|
ExecTest "teamctxput" 2 16 256 1024
|
|
|
|
ExecTest "p" 2 1 1 128
|
|
ExecTest "p" 2 1 1024 2
|
|
ExecTest "p" 2 8 1 32
|
|
ExecTest "p" 2 16 128 4
|
|
|
|
ExecTest "shmemptr" 2 1 1 8
|
|
ExecTest "shmemptr" 2 1 1024 8
|
|
ExecTest "shmemptr" 2 8 1 8
|
|
ExecTest "shmemptr" 2 16 128 8
|
|
|
|
################################ Non-Blocking ################################
|
|
|
|
ExecTest "putnbi" 2 1 1 1048576
|
|
ExecTest "putnbi" 2 1 1024 512
|
|
ExecTest "putnbi" 2 8 1 1048576
|
|
ExecTest "putnbi" 2 16 128 8
|
|
ExecTest "putnbi" 2 32 256 512
|
|
ExecTest "putnbi" 2 64 1024 8
|
|
|
|
ExecTest "wgputnbi" 2 1 64 1048576
|
|
ExecTest "wgputnbi" 2 2 64 1048576
|
|
ExecTest "wgputnbi" 2 16 64 8
|
|
|
|
ExecTest "waveputnbi" 2 1 64 1048576
|
|
ExecTest "waveputnbi" 2 2 64 1048576
|
|
ExecTest "waveputnbi" 2 2 128 1048576
|
|
ExecTest "waveputnbi" 2 16 128 8
|
|
|
|
ExecTest "defaultctxputnbi" 2 4 128 1024
|
|
ExecTest "teamctxputnbi" 2 4 128 1024
|
|
ExecTest "teamctxputnbi" 2 16 256 1024
|
|
}
|
|
|
|
TestRMAGet() {
|
|
##############################################################################
|
|
# | Name | Ranks | Workgroups | Threads | Max Message Size #
|
|
##############################################################################
|
|
ExecTest "get" 2 1 1 1048576
|
|
ExecTest "get" 2 1 1024 512
|
|
ExecTest "get" 2 8 1 1048576
|
|
ExecTest "get" 2 16 128 8
|
|
ExecTest "get" 2 32 256 512
|
|
ExecTest "get" 2 64 1024 8
|
|
|
|
ExecTest "wgget" 2 1 64 1048576
|
|
ExecTest "wgget" 2 2 64 1048576
|
|
ExecTest "wgget" 2 16 64 8
|
|
|
|
ExecTest "waveget" 2 1 64 1048576
|
|
ExecTest "waveget" 2 2 64 1048576
|
|
ExecTest "waveget" 2 2 128 1048576
|
|
ExecTest "waveget" 2 16 128 8
|
|
|
|
ExecTest "defaultctxget" 2 4 128 1024
|
|
ExecTest "teamctxget" 2 4 128 1024
|
|
ExecTest "teamctxget" 2 16 256 1024
|
|
|
|
ExecTest "g" 2 1 1 128
|
|
ExecTest "g" 2 1 1024 1
|
|
ExecTest "g" 2 8 1 32
|
|
ExecTest "g" 2 16 128 4
|
|
|
|
################################ Non-Blocking ################################
|
|
|
|
ExecTest "getnbi" 2 1 1 1048576
|
|
ExecTest "getnbi" 2 1 1024 512
|
|
ExecTest "getnbi" 2 8 1 1048576
|
|
ExecTest "getnbi" 2 16 128 8
|
|
ExecTest "getnbi" 2 32 256 512
|
|
ExecTest "getnbi" 2 64 1024 8
|
|
|
|
ExecTest "wggetnbi" 2 1 64 1048576
|
|
ExecTest "wggetnbi" 2 2 64 1048576
|
|
ExecTest "wggetnbi" 2 16 64 8
|
|
|
|
ExecTest "wavegetnbi" 2 1 64 1048576
|
|
ExecTest "wavegetnbi" 2 2 64 1048576
|
|
ExecTest "wavegetnbi" 2 2 128 1048576
|
|
ExecTest "wavegetnbi" 2 16 128 8
|
|
|
|
ExecTest "defaultctxgetnbi" 2 4 128 1024
|
|
ExecTest "teamctxgetnbi" 2 4 128 1024
|
|
ExecTest "teamctxgetnbi" 2 16 256 1024
|
|
}
|
|
|
|
TestRMA() {
|
|
TestRMAPut
|
|
if [ "0" == "$ROCSHMEM_DRIVER_DISABLE_GET" ]; then
|
|
TestRMAGet
|
|
fi
|
|
}
|
|
|
|
TestAMO() {
|
|
##############################################################################
|
|
# | Name | Ranks | Workgroups | Threads | Max Message Size #
|
|
##############################################################################
|
|
ExecTest "amo_fetch" 2 1 1
|
|
ExecTest "amo_fetch" 2 1 1024
|
|
ExecTest "amo_fetch" 2 8 1
|
|
ExecTest "amo_fetch" 2 32 128
|
|
|
|
ExecTest "amo_set" 2 1 1
|
|
ExecTest "amo_set" 2 8 1
|
|
ExecTest "amo_set" 2 32 1
|
|
|
|
ExecTest "amo_fcswap" 2 1 1
|
|
ExecTest "amo_fcswap" 2 32 1
|
|
ExecTest "amo_fcswap" 2 8 1
|
|
|
|
ExecTest "amo_finc" 2 1 1
|
|
ExecTest "amo_finc" 2 1 1024
|
|
ExecTest "amo_finc" 2 8 1
|
|
ExecTest "amo_finc" 2 32 128
|
|
|
|
ExecTest "amo_inc" 2 1 1
|
|
ExecTest "amo_inc" 2 1 1024
|
|
ExecTest "amo_inc" 2 8 1
|
|
ExecTest "amo_inc" 2 32 128
|
|
|
|
ExecTest "amo_fadd" 2 1 1
|
|
ExecTest "amo_fadd" 2 1 1024
|
|
ExecTest "amo_fadd" 2 8 1
|
|
ExecTest "amo_fadd" 2 32 128
|
|
|
|
ExecTest "amo_add" 2 1 1
|
|
ExecTest "amo_add" 2 1 1024
|
|
ExecTest "amo_add" 2 8 1
|
|
ExecTest "amo_add" 2 32 128
|
|
|
|
ExecTest "amo_fetchand" 2 1 1
|
|
|
|
ExecTest "amo_and" 2 1 1
|
|
|
|
ExecTest "amo_xor" 2 1 1
|
|
}
|
|
|
|
TestSigOps() {
|
|
##############################################################################
|
|
# | Name | Ranks | Workgroups | Threads | Max Message Size #
|
|
##############################################################################
|
|
ExecTest "putsignal" 2 1 1 1048576
|
|
ExecTest "putsignal" 2 2 32 1048576
|
|
ExecTest "wgputsignal" 2 2 32 1048576
|
|
ExecTest "waveputsignal" 2 1 32 1048576
|
|
ExecTest "waveputsignal" 2 2 64 1048576
|
|
|
|
ExecTest "putsignalnbi" 2 1 1 1048576
|
|
ExecTest "putsignalnbi" 2 2 32 1048576
|
|
ExecTest "wgputsignalnbi" 2 2 32 1048576
|
|
ExecTest "waveputsignalnbi" 2 1 32 1048576
|
|
ExecTest "waveputsignalnbi" 2 2 64 1048576
|
|
|
|
ExecTest "signalfetch" 2 1 1
|
|
ExecTest "wgsignalfetch" 2 2 32
|
|
ExecTest "wavesignalfetch" 2 1 32
|
|
ExecTest "wavesignalfetch" 2 1 64
|
|
}
|
|
|
|
TestColl() {
|
|
##############################################################################
|
|
# | Name | Ranks | Workgroups | Threads | Max Message Size #
|
|
##############################################################################
|
|
ExecTest "barrierall" 2 1 1
|
|
|
|
ExecTest "wavebarrierall" 2 1 1
|
|
|
|
ExecTest "wgbarrierall" 2 1 1
|
|
|
|
ExecTest "teambarrier" 2 1 1
|
|
ExecTest "teambarrier" 2 16 64
|
|
ExecTest "teambarrier" 2 32 256
|
|
ExecTest "teambarrier" 2 39 1024
|
|
|
|
ExecTest "teamwavebarrier" 2 1 1
|
|
ExecTest "teamwavebarrier" 2 16 64
|
|
ExecTest "teamwavebarrier" 2 32 256
|
|
ExecTest "teamwavebarrier" 2 39 1024
|
|
|
|
ExecTest "teamwgbarrier" 2 1 1
|
|
ExecTest "teamwgbarrier" 2 16 64
|
|
ExecTest "teamwgbarrier" 2 32 256
|
|
ExecTest "teamwgbarrier" 2 39 1024
|
|
|
|
ExecTest "sync" 2 1 1
|
|
ExecTest "sync" 2 16 64
|
|
ExecTest "sync" 2 32 256
|
|
ExecTest "sync" 2 39 1024
|
|
|
|
ExecTest "wavesync" 2 1 1
|
|
ExecTest "wavesync" 2 16 64
|
|
ExecTest "wavesync" 2 32 256
|
|
ExecTest "wavesync" 2 39 1024
|
|
|
|
ExecTest "wgsync" 2 1 1
|
|
ExecTest "wgsync" 2 16 64
|
|
ExecTest "wgsync" 2 32 256
|
|
ExecTest "wgsync" 2 39 1024
|
|
|
|
ExecTest "syncall" 2 1 1
|
|
|
|
ExecTest "wavesyncall" 2 1 1
|
|
|
|
ExecTest "wgsyncall" 2 1 1
|
|
|
|
ExecTest "alltoall" 2 1 1 512
|
|
|
|
ExecTest "teambroadcast" 2 1 1 32768
|
|
|
|
ExecTest "fcollect" 2 1 1 512
|
|
ExecTest "fcollect" 2 1 1 32768
|
|
|
|
ExecTest "teamreduction" 2 1 1 32768
|
|
}
|
|
|
|
TestOther() {
|
|
##############################################################################
|
|
# | Name | Ranks | Workgroups | Threads | Max Message Size #
|
|
##############################################################################
|
|
ExecTest "init" 2 1 1
|
|
|
|
ExecTest "pingpong" 2 1 1
|
|
ExecTest "pingpong" 2 8 1
|
|
ExecTest "pingpong" 2 32 1
|
|
|
|
ExecTest "pingall" 2 1 1
|
|
ExecTest "pingall" 2 8 1
|
|
ExecTest "pingall" 2 32 1
|
|
|
|
# This test requires more contexts than workgroups
|
|
export ROCSHMEM_MAX_NUM_CONTEXTS=1024
|
|
ExecTest "teamctxinfra" 2 1 1
|
|
ExecTest "teamctxsingleinfra" 2 1 1
|
|
ExecTest "teamctxblockinfra" 4 1 1
|
|
ExecTest "teamctxblockinfra" 5 1 1
|
|
ExecTest "teamctxoddeveninfra" 4 1 1
|
|
ExecTest "teamctxoddeveninfra" 5 1 1
|
|
unset ROCSHMEM_MAX_NUM_CONTEXTS
|
|
}
|
|
|
|
# TODO: remove when GDA is feature complete
|
|
TestGDA() {
|
|
##############################################################################
|
|
# | Name | Ranks | Workgroups | Threads | Max Message Size #
|
|
##############################################################################
|
|
ExecTest "put" 2 1 1 1048576
|
|
ExecTest "put" 2 1 1024 512
|
|
ExecTest "put" 2 8 1 1048576
|
|
ExecTest "put" 2 16 128 8
|
|
ExecTest "put" 2 32 256 512
|
|
ExecTest "put" 2 64 1024 8
|
|
|
|
# ExecTest "wgput" 2 1 64 1048576
|
|
# ExecTest "wgput" 2 2 64 1048576
|
|
# ExecTest "wgput" 2 16 64 8
|
|
|
|
ExecTest "waveput" 2 1 64 1048576
|
|
ExecTest "waveput" 2 2 64 1048576
|
|
ExecTest "waveput" 2 2 128 1048576
|
|
ExecTest "waveput" 2 16 128 8
|
|
|
|
ExecTest "defaultctxput" 2 4 128 1024
|
|
ExecTest "teamctxput" 2 4 128 1024
|
|
ExecTest "teamctxput" 2 16 256 1024
|
|
|
|
# ExecTest "get" 2 1 1 1048576
|
|
# ExecTest "get" 2 1 1024 512
|
|
# ExecTest "get" 2 8 1 1048576
|
|
# ExecTest "get" 2 16 128 8
|
|
# ExecTest "get" 2 32 256 512
|
|
# ExecTest "get" 2 64 1024 8
|
|
|
|
# ExecTest "wgget" 2 1 64 1048576
|
|
# ExecTest "wgget" 2 2 64 1048576
|
|
# ExecTest "wgget" 2 16 64 8
|
|
|
|
# ExecTest "waveget" 2 1 64 1048576
|
|
# ExecTest "waveget" 2 2 64 1048576
|
|
# ExecTest "waveget" 2 2 128 1048576
|
|
# ExecTest "waveget" 2 16 128 8
|
|
|
|
# ExecTest "defaultctxget" 2 4 128 1024
|
|
# ExecTest "teamctxget" 2 4 128 1024
|
|
# ExecTest "teamctxget" 2 16 256 1024
|
|
|
|
# ExecTest "g" 2 1 1 128
|
|
# ExecTest "g" 2 1 1024 2
|
|
# ExecTest "g" 2 8 1 32
|
|
# ExecTest "g" 2 16 128 4
|
|
|
|
#Implemented but known incorrect
|
|
# ExecTest "p" 2 1 1 128
|
|
# ExecTest "p" 2 1 1024 2
|
|
# ExecTest "p" 2 8 1 32
|
|
# ExecTest "p" 2 16 128 4
|
|
|
|
################################ Non-Blocking ################################
|
|
|
|
ExecTest "putnbi" 2 1 1 1048576
|
|
ExecTest "putnbi" 2 1 1024 512
|
|
ExecTest "putnbi" 2 8 1 1048576
|
|
ExecTest "putnbi" 2 16 128 8
|
|
ExecTest "putnbi" 2 32 256 512
|
|
ExecTest "putnbi" 2 64 1024 8
|
|
|
|
# ExecTest "wgputnbi" 2 1 64 1048576
|
|
# ExecTest "wgputnbi" 2 2 64 1048576
|
|
# ExecTest "wgputnbi" 2 16 64 8
|
|
|
|
ExecTest "waveputnbi" 2 1 64 1048576
|
|
ExecTest "waveputnbi" 2 2 64 1048576
|
|
ExecTest "waveputnbi" 2 2 128 1048576
|
|
ExecTest "waveputnbi" 2 16 128 8
|
|
|
|
ExecTest "defaultctxputnbi" 2 4 128 1024
|
|
ExecTest "teamctxputnbi" 2 4 128 1024
|
|
ExecTest "teamctxputnbi" 2 16 256 1024
|
|
|
|
# ExecTest "getnbi" 2 1 1 1048576
|
|
# ExecTest "getnbi" 2 1 1024 512
|
|
# ExecTest "getnbi" 2 8 1 1048576
|
|
# ExecTest "getnbi" 2 16 128 8
|
|
# ExecTest "getnbi" 2 32 256 512
|
|
# ExecTest "getnbi" 2 64 1024 8
|
|
|
|
# ExecTest "wggetnbi" 2 1 64 1048576
|
|
# ExecTest "wggetnbi" 2 2 64 1048576
|
|
# ExecTest "wggetnbi" 2 16 64 8
|
|
|
|
# ExecTest "wavegetnbi" 2 1 64 1048576
|
|
# ExecTest "wavegetnbi" 2 2 64 1048576
|
|
# ExecTest "wavegetnbi" 2 2 128 1048576
|
|
# ExecTest "wavegetnbi" 2 16 128 8
|
|
|
|
# ExecTest "defaultctxgetnbi" 2 4 128 1024
|
|
# ExecTest "teamctxgetnbi" 2 4 128 1024
|
|
# ExecTest "teamctxgetnbi" 2 16 256 1024
|
|
|
|
#TestAMO() {
|
|
##############################################################################
|
|
# | Name | Ranks | Workgroups | Threads | Max Message Size #
|
|
##############################################################################
|
|
# ExecTest "amo_fetch" 2 1 1
|
|
# ExecTest "amo_fetch" 2 1 1024
|
|
# ExecTest "amo_fetch" 2 8 1
|
|
# ExecTest "amo_fetch" 2 32 128
|
|
|
|
# ExecTest "amo_set" 2 1 1
|
|
# ExecTest "amo_set" 2 8 1
|
|
# ExecTest "amo_set" 2 32 1
|
|
|
|
# ExecTest "amo_fcswap" 2 1 1
|
|
# ExecTest "amo_fcswap" 2 32 1
|
|
# ExecTest "amo_fcswap" 2 8 1
|
|
|
|
#Works on CX7, not implemented on BNXT
|
|
# ExecTest "amo_finc" 2 1 1
|
|
# ExecTest "amo_finc" 2 1 1024
|
|
# ExecTest "amo_finc" 2 8 1
|
|
# ExecTest "amo_finc" 2 32 128
|
|
|
|
#This works but tester requires get
|
|
# ExecTest "amo_inc" 2 1 1
|
|
# ExecTest "amo_inc" 2 1 1024
|
|
# ExecTest "amo_inc" 2 8 1
|
|
# ExecTest "amo_inc" 2 32 128
|
|
|
|
#Works on CX7, not implemented on BNXT
|
|
# ExecTest "amo_fadd" 2 1 1
|
|
# ExecTest "amo_fadd" 2 1 1024
|
|
# ExecTest "amo_fadd" 2 8 1
|
|
# ExecTest "amo_fadd" 2 32 128
|
|
|
|
#This works but tester requires get
|
|
# ExecTest "amo_add" 2 1 1
|
|
# ExecTest "amo_add" 2 1 1024
|
|
# ExecTest "amo_add" 2 8 1
|
|
# ExecTest "amo_add" 2 32 128
|
|
|
|
# ExecTest "amo_fetchand" 2 1 1
|
|
|
|
# ExecTest "amo_and" 2 1 1
|
|
|
|
# ExecTest "amo_xor" 2 1 1
|
|
|
|
#TestColl() {
|
|
##############################################################################
|
|
# | Name | Ranks | Workgroups | Threads | Max Message Size #
|
|
##############################################################################
|
|
ExecTest "barrierall" 2 1 1
|
|
ExecTest "teambarrier" 2 1 1
|
|
|
|
ExecTest "sync" 2 1 1
|
|
ExecTest "syncall" 2 1 1
|
|
|
|
# ExecTest "alltoall" 2 1 1 512
|
|
|
|
# ExecTest "teambroadcast" 2 1 1 32768
|
|
|
|
# ExecTest "fcollect" 2 1 1 512
|
|
# ExecTest "fcollect" 2 1 1 32768
|
|
|
|
# ExecTest "teamreduction" 2 1 1 32768
|
|
|
|
#TestOther() {
|
|
##############################################################################
|
|
# | Name | Ranks | Workgroups | Threads | Max Message Size #
|
|
##############################################################################
|
|
ExecTest "init" 2 1 1
|
|
|
|
# ExecTest "pingpong" 2 1 1
|
|
# ExecTest "pingpong" 2 8 1
|
|
# ExecTest "pingpong" 2 32 1
|
|
|
|
# This test requires more contexts than workgroups
|
|
export ROCSHMEM_MAX_NUM_CONTEXTS=1024
|
|
ExecTest "teamctxinfra" 2 1 1
|
|
unset ROCSHMEM_MAX_NUM_CONTEXTS
|
|
}
|
|
|
|
ValidateInput() {
|
|
INPUT_COUNT=$1
|
|
if [ $INPUT_COUNT -lt 3 ] ; then
|
|
echo "This script must be run with at least 3 arguments."
|
|
echo 'Usage: ${0} argument1 argument2 argument3 [argument4]'
|
|
echo " argument1 : path to the tester driver"
|
|
echo " argument2 : test type to run, e.g put"
|
|
echo " argument3 : directory to put the output logs"
|
|
echo " argument4 : path to hostfile"
|
|
exit 1
|
|
fi
|
|
}
|
|
|
|
ValidateLogDir() {
|
|
if [ ! -d $1 ]; then
|
|
echo "LOG_DIR=$1 does not exist"
|
|
mkdir -p $1
|
|
echo "Created $1"
|
|
fi
|
|
}
|
|
|
|
APP=$1
|
|
TEST=$2
|
|
LOG_DIR=$3
|
|
HOSTFILE=$4
|
|
|
|
DRIVER_RETURN_STATUS=0
|
|
ROCSHMEM_DRIVER_DISABLE_GET="${ROCSHMEM_DRIVER_DISABLE_GET:-1}"
|
|
|
|
ValidateInput $#
|
|
ValidateLogDir $LOG_DIR
|
|
|
|
case $TEST in
|
|
*"gda")
|
|
TestGDA
|
|
;;
|
|
*"all")
|
|
TestRMA
|
|
TestAMO
|
|
TestSigOps
|
|
TestColl
|
|
TestOther
|
|
;;
|
|
*"rma")
|
|
TestRMA
|
|
;;
|
|
*"amo")
|
|
TestAMO
|
|
;;
|
|
*"sigops")
|
|
TestSigOps
|
|
;;
|
|
*"coll")
|
|
TestColl
|
|
;;
|
|
*"other")
|
|
TestOther
|
|
;;
|
|
*)
|
|
##############################################################################
|
|
# | Name | Ranks | Workgroups | Threads | Max Message Size #
|
|
##############################################################################
|
|
ExecTest $TEST 2 1 1 8
|
|
;;
|
|
esac
|
|
|
|
EXIT_STATUS=$(($DRIVER_RETURN_STATUS || $?))
|
|
if [ $EXIT_STATUS -eq 0 ]; then
|
|
echo -e "TESTS PASSED"
|
|
else
|
|
echo -e "TESTS FAILED: $FAILED_LIST"
|
|
fi
|
|
exit $EXIT_STATUS
|