rocm-systems

Penulis	SHA1	Pesan	Tanggal
systems-assistant[bot]	327778ef18	Add nccl_debug variable and values (#2756 ) * nccl-debug variables table test * spacing * spacing * RCCL variable edits from SME * Update projects/rccl/docs/api-reference/env-variables.rst Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com> --------- Co-authored-by: Matt Williams <Matt.Williams+amdeng@amd.com> Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com> Co-authored-by: Matt Williams <matt.williams@amd.com> Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>	2026-01-30 12:00:18 -05:00
David Yat Sin	00e8a67165	rocr: Restore mmap flags back to MAP_PRIVATE (#2886 ) Change mmap flags back to MAP_PRIVATE as MAP_SHARED increases allocation time. Transparent huge pages are disabled for MAP_SHARED by default.	2026-01-30 08:36:05 -08:00
systems-assistant[bot]	1211790607	Direct Reduce Scatter Implementation (#2765 ) * Add new implementation of direct send/recv reduce scatter * Resolved conflicts * Add multiple channels support to the reduction kernel of direct reduce scatter and adjust offset into buffer to utilize multiple channels. * Resolve validation issue when number of elements is not divisible by number of channels leaving elements unaccount for in reduction. * fix proxy hang * set maxSrcs to 64 in reduceCopy * optimize multi-channel code * fix validation issue in single node MI300 * Tune the message size range for 2,4, and 8 Nodes * Move Direct RS into separate kernel * Add Copyright * resolve review comments * resolve review comments * fix merge build issue * revert move Direct RS into separate kernel * address review comments * address review comments --------- Co-authored-by: KawtharShafie <kawtharshafie@gmail.com> Co-authored-by: Ghadeer Alabandi <abandiga@gmail.com> Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com>	2026-01-30 09:27:27 -06:00
systems-assistant[bot]	055909d335	Set default max channels to 48 for MI350 multi-node (#2759 ) * make 48 the default max channels for MI350 * address review comments --------- Co-authored-by: Ghadeer Alabandi <abandiga@gmail.com> Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com> Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>	2026-01-30 09:22:42 -06:00
Alysa Liu	13091e18ad	libhsakmt: Add THEROCK_SANITIZER support for ASAN builds (#2978 ) Add THEROCK_SANITIZER support for ASAN builds. Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>	2026-01-30 10:02:10 -05:00
Jin Jung	25d0107d24	SWDEV-575867: Fix error code for mapped graphics resources (#2662 )	2026-01-30 07:47:13 -05:00
Alexandra Sidorova	8800e03058	[CLR] Added missed ostream include to amd_hip_bfloat16.h (#2960 )	2026-01-30 07:42:38 -05:00
Jan Stephan	20a745962f	Fix graph API binary paths (#2884 ) Signed-off-by: Jan Stephan <jan.stephan@amd.com>	2026-01-30 10:51:37 +01:00
Junhua Shen	0d98c3bdd5	libhsakmt: Implement per-context topology for multi-context KFD support (#2405 ) This enhances libhsakmt's capabilities for multi-context KFD support by implementing per-context topology management. Changes: * Add hsaKmtGetClockCountersCtx for multi-context support - Add context-aware version of hsaKmtGetClockCounters - Original API is retained as a wrapper calling the ctx-version with primary context * Enable independent debug sessions across multiple KFD contexts -Create hsa_kfd_debug_context, introduce context-aware debug APIs, shift debug state to per-context * Add perf sub-context for per-context performance counter management - Introduce hsa_kfd_perf_context, move counter properties, add context - aware perf APIs, and update initialization * Refactor FMM for per-context resource management - Refactor multiple global variables related to FMM, including GPU ID arrays , svm, cpuvm_aperture, and mem_handle_aperture to hsa_kfd_fmm_context * Implement per-context topology for complete context isolation - Migrate global topology data (g_system, g_props, map_user_to_sysfs_node_id) to per-context hsa_kfd_topology_context structure - Update all topology functions to accept HsaKFDContext parameter for context-aware operations (validate_nodeid, get_node_props, get_iolink_props, etc.) - Refactor topology snapshot management for per-context isolation - Add context-aware PMC trace access APIs Signed-off-by: Junhua Shen <Junhua.Shen@amd.com>	2026-01-30 09:42:25 +08:00
vedithal-amd	a838b0c07b	[rocprofiler-compute] Fix test case for MI 308 (#2934 ) * Fix test case for MI 308 * Use consistent naming of GPUs in comment	2026-01-29 18:54:52 -05:00
Venkateshwar Reddy Kandula	dea3da3a6f	[rocprofiler-sdk][CI] Fix rocprofiler-sdk CI change rocm version to 7.2.0 (#2979 ) * Update aqlprofile-continuous_integration.yml to use rocm-7.2.0 * Update rocprofiler-sdk-continuous_integration.yml to use rocm-7.2.0 * Update rocprofiler-sdk-docs.yml to use rocm-7.2.0	2026-01-29 17:28:52 -06:00
pghoshamd	adaaa883b2	SWDEV-576090 Fix mem leaks and double free of signals (#2817 )	2026-01-29 16:53:27 -05:00
Mark Meserve	94c246eb9e	attach: fix typos and older names in documentation (#2684 )	2026-01-29 16:46:24 -05:00
vedithal-amd	4b364df43b	[rocprofiler-compute] Enable panel level csv files for roofline panel (#2887 ) * Enable panel level csv files for roofline panel * Fixed comments	2026-01-29 15:32:54 -05:00
Rahul Manocha	c4f7593001	clr: Update signal count and pool size for staging buffer (#2889 ) * clr: Update signal count and pool size for staging buffer * Change to naming of variables etc --------- Co-authored-by: Rahul Manocha <rmanocha@amd.com>	2026-01-29 10:34:00 -08:00
systems-assistant[bot]	58c203e252	Fix channel overuse for 1 rank comms (#2760 ) * Fix channel overuse for 1 rank comms * limit channels when warpSpeed is enabled but not used * enable std::min check against # of CUs for maxChannels computation when warpSpeed is enabled --------- Co-authored-by: Mustafa Abduljabbar <muabdulj@amd.com> Co-authored-by: isaki001 <ioannissakiotis@gmail.com> Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>	2026-01-29 12:13:46 -06:00
Benjamin Welton	b509e9bd77	[rocprofiler-sdk] Fix domain_ops_padding for 515+ HIP operations (#2941 ) * [rocprofiler-sdk] Fix domain_ops_padding for 515+ HIP operations The HIP runtime API now has 515+ operations (as of ROCm 7.x), but domain_ops_padding was set to 512. This caused std::out_of_range exceptions when checking operations >= 512 via std::bitset::test(). Changes: - Increase domain_ops_padding from 512 to 1024 - Add compile-time static_assert to validate padding is sufficient for all API domains (HIP, HSA, marker, RCCL, rocDecode, rocJPEG) Co-Authored-By: Claude (claude-opus-4.5) <noreply@anthropic.com> * Update projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/context/domain.cpp Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * [rocprofiler-sdk] Apply clang-format-11 to domain.cpp Co-Authored-By: Claude (claude-opus-4.5) <noreply@anthropic.com> * Rework implementation to ensure coverage of all operation enums * Fix compiler error in unit test for enum_string.cpp * Fix data types of domain_ops_padding values * Revert some changes in domain.cpp --------- Co-authored-by: Claude (claude-opus-4.5) <noreply@anthropic.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>	2026-01-29 12:26:33 -05:00
Yiannis Papadopoulos	66ee941fea	rocr/aie: Throw exception for malformed packets and packet submission errors (#2528 )	2026-01-29 10:52:42 -05:00
pghoshamd	bc20b51f40	SWDEV-561708 Counted queue size from env var (#2844 ) * SWDEV-561708 Counted queue size from env var * use counted_queue_size for test * remove rocrtst changes; add a const for default queue size * Remove env var from test; use queue->size * Improve env var documentation * Correct type	2026-01-29 10:00:37 -05:00
Venkateshwar Reddy Kandula	a7c3e8392a	[rocprofiler-sdk] Use venv for fixing CI docker image workflow (#2955 ) * use python virtual env for aws cli * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * use 7.2 amdgpu for ubuntu --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-01-29 09:53:15 -05:00
David Galiffi	4c458fae9c	[rocprofiler-systems] Fix ROCM_VERSION guard used for the scratch_memory_record structure (#2948 ) - Fix ROCM_VERSION guard used for the scratch_memory_record structure - This fixes a rocm/7.0.2 build failure --------- Signed-off-by: David Galiffi <David.Galiffi@amd.com>	2026-01-29 09:34:27 -05:00
moniljethva	b5e4074c78	Adding support for GFX 11.5 in AQL Profiler (#2340 ) * Adding support of AQL Profiler for GFX 11.5 * Removing hard coded value for sa_number * Adding instance count for WGP block, removing hard coded values. * Fixed SQ counter block and TD counter block instances	2026-01-29 17:39:12 +05:30
Jaydeep	190d9a8e27	SWDEV-561273 - hip samples on TheRock build using HIP LANGUAGE and hip-lang package. (#1794 )	2026-01-29 09:15:58 +01:00
Bindhiya Kanangot Balakrishnan	fa6f071751	[SWDEV-574637] Avoid redundant hive gpu resets (#2657 ) Mode-1 GPU reset affects entire XGMI hive. Added xgmi_hive_id check to reset only once for same-hive GPUs while preserving separate resets for different hives or no hives. - Example: `sudo amd-smi reset -G` or `sudo amd-smi reset -G -g 0` on MI300 will reset all GPU's only once. Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>	2026-01-28 22:59:17 -06:00
Sumanth Gavini	e9c72b06b0	[ROCM-1036] Dynamic fan support detection in set -h (#2721 ) Show "N/A" for ASICs without fan support `amd-smi set -h` fan help text will be dynamic instead of "0-255 or 0-100%" Signed-off-by: Sumanth Gavini <sumanth.gavini@amd.com>	2026-01-28 22:44:25 -06:00
koushikbillakanti-amd	e9b143323a	[SWDEV-498649] Fix reset cli AttributeError (#2203 ) * Fix SWDEV-498649: Handle missing attributes safely in set_gpu --------- Co-authored-by: gabrpham <Gabriel.Pham@amd.com> Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>	2026-01-28 22:39:50 -06:00
Yazen AL Musaffar	19725abbf4	[SWDEV-560702] Per process MEM usages does not add up to per GPU MEM usage. (#2888 ) * Update pyhton docs for process memory usage * Added comment for processes total memory usage --------- Signed-off-by: yalmusaf <Yazen.ALMusaffar@amd.com>	2026-01-28 22:34:20 -06:00
Loganaden Velvindron	bf36e5f620	Fix disabled fortify source security flag (#2570 ) Fix spurious character that caused CI issue.	2026-01-28 22:30:24 -06:00
peterjunpark	159e751788	docs(amdsmi): add link to amd-smi-virt (#2543 ) Update install page virt references Signed-off-by: Peter Park <peter.park@amd.com>	2026-01-28 22:24:55 -06:00
Joseph Narlo	48a4cda75c	[SWDEV-552552] Provide CLI testing within amd-smi-lib-tests install (#2485 ) * Add common module * Added information to help with unknowns * Allow paring of cmds * change cmd print default * Reduce cmds to be tested --------- Signed-off-by: amd-josnarlo <joseph.narlo@amd.com> Co-authored-by: amd-josnarlo <joseph.narlo@amd.com>	2026-01-28 22:16:01 -06:00
Adam Pryor	cf3e283d85	[FMDEV-170733] Remove amd-smi ptl set check (#2933 ) Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>	2026-01-28 22:12:17 -06:00
systems-assistant[bot]	27be824745	[SWDEV-565483] Add power profile set/get to amd-smi CLI (#1905 ) * Fix exception handling in power profile commands * Update CHANGELOG.md * Update amdsmi_parser.py for the single character argument for --profile as -o --------- Co-authored-by: Koushik Billakanti <Koushik.Billakanti@amd.com> Co-authored-by: gabrpham <Gabriel.Pham@amd.com> Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>	2026-01-28 22:00:18 -06:00
Gopesh Bhardwaj	680a92769c	Fixing aqlprofile ASM statement (#2881 ) * Fixing aqlprofile ASM statement * Removing f16 tests	2026-01-29 09:01:41 +05:30
Tao Sang	66a1e38387	SWDEV-577011 Fix missing ais symbols in Windows (#2871 ) Fix missing ais symbols in rocr in Windows	2026-01-28 22:29:30 -05:00
Copilot	14f9f2537a	Add artifact upload steps to AMDSMI CI workflow for PR builds (#2936 )	2026-01-28 18:14:47 -05:00
David Yat Sin	99d88827fb	Update CODEOWNERS for ROCR-Runtime (#2790 )	2026-01-28 15:53:23 -05:00
Yazen AL Musaffar	0c54f1d6f6	[AMD-SMI] [SWDEV-572092] amd-smi does not redirect output to file when --json option is used. (#2389 ) * Fix for amd-smi json file redirection is broken Signed-off-by: yalmusaf_amdeng <Yazen.ALMusaffar@amd.com> * merge branch develop into SWDEV-572092 Signed-off-by: yalmusaf_amdeng <Yazen.ALMusaffar@amd.com> --------- Signed-off-by: yalmusaf_amdeng <Yazen.ALMusaffar@amd.com>	2026-01-28 13:54:44 -06:00
German Andryeyev	a5ada1e6e3	SWDEV-567852 - Clean-up HIP events (#2708 ) * SWDEV-567852 - Clean-up HIP events Removed unused fields, optimized memory allocation, improved encapsulation, modernized with C++11 auto, added documentation	2026-01-28 13:34:07 -05:00
Swati Rawat	9de4a2ebb1	Correct rocprofv3 usage instructions (#2925 ) * Correct rocprofv3 usage * Apply suggestion from @SwRaw * Apply suggestion from @SwRaw * Update .gitignore	2026-01-28 22:46:19 +05:30
Jason Bonnell	d917259953	Add --verbose to ctest to get more output (#2928 )	2026-01-28 22:43:14 +05:30
Sajina PK	e265e0e24f	[rocprofiler-systems]: Add documentation for communication API tracing (#2478 ) Add documentation for communication runtime tracing for MPI, UCX, RCCL. --------- Co-authored-by: David Galiffi <David.Galiffi@amd.com>	2026-01-27 23:48:27 -05:00
SaleelK	5c7c549301	clr: Fix some nullptr checks and prints (#2825 )	2026-01-27 16:45:17 -08:00
vedithal-amd	996202f560	[rocprofiler-compute] Backport documentation changes from ROCm 7.1 release branch (#2894 ) * Backport documentation changes from ROCm 7.1 release branch * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Address review comments --------- Co-authored-by: Pratik Basyal <pratik.basyal@amd.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-01-27 17:22:41 -05:00
vedithal-amd	717cdde126	Update test_metric_validation.py to handle MI325X (#2866 )	2026-01-27 16:12:05 -05:00
vedithal-amd	93407271df	[rocprofiler-compute] Fix docker file for testing (#2883 ) * Fix docker file for testing * Add correct WORKDIR	2026-01-27 16:11:29 -05:00
cfallows-amd	4d7f709510	[rocprofiler-compute] Update baseline comparison notes in documentation (#2878 ) * Update baseline comparison with anchor, text, samples, image in CLI page. Fixes broken 404 links after grafana was removed. Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com> * Update options in list to full name, correct gpu id option. Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com> * Formatting and broken intersphinx fixed * Indentation formatting fixed --------- Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com> Co-authored-by: prbasyal <prbasyal@amd.com> Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>	2026-01-27 16:04:21 -05:00
Yiannis Papadopoulos	fdb19e5a4c	rocr: Format script skips non-existing files in sparse checkouts (#2360 )	2026-01-27 15:58:53 -05:00
Shadi Dashmiz	b816d10802	Fix for pntr attri query from a peer device (#2722 ) * Fix for pntr attri query from a peer device Signed-off-by: sdashmiz <shadi.dashmiz@amd.com> * SWDEV-577116 : Fix qeury on peer device - if access is disabled query should return error. Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Signed-off-by: sdashmiz <shadi.dashmiz@amd.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-01-27 15:25:14 -05:00
sluzynsk-amd	f37b100c34	SWDEV-563777 - further reduce compilation warnings (#2331 ) This change resolves some of the warnings generated during clr builds. Quiet regular output of doxygen. Disable non-documented warnings of doxygen. Signed-off-by: Sebastian Luzynski <Sebastian.Luzynski@amd.com>	2026-01-27 20:51:16 +01:00
Yazen AL Musaffar	b7829db10a	[AMD-SMI] [SWDEV-553392] Removed Driver Reload capability from amd-smi cli only. (#2665 ) * Removed Driver Reload capability from amd-smi cli only Signed-off-by: yalmusaf_amdeng <Yazen.ALMusaffar@amd.com> * Updates Signed-off-by: yalmusaf_amdeng <Yazen.ALMusaffar@amd.com> * updates Signed-off-by: yalmusaf_amdeng <Yazen.ALMusaffar@amd.com> * Update CHANGELOG.md --------- Signed-off-by: yalmusaf_amdeng <Yazen.ALMusaffar@amd.com> Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>	2026-01-27 13:33:03 -06:00

1 2 3 4 5 ...

74799 Melakukan