* Add HasExpertSchedMode device prop
* Add unit tests for HasExpertSchedMode
* Add gfx12 check for HasExpertSchedMode prop
* Update gfx major version check and test for ExpertSchedMode
* Minor fix and ROCr version bump
* Update projects/rocr-runtime/runtime/hsa-runtime/inc/hsa_ext_amd.h
* Update projects/rocr-runtime/runtime/hsa-runtime/inc/hsa_ext_amd.h
* Apply suggestion from @dayatsin-amd
* Apply suggestion from @dayatsin-amd
---------
Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com>
Co-authored-by: David Yat Sin <77975354+dayatsin-amd@users.noreply.github.com>
* Update README documentation links for clarity and consistency across projects
- Changed links in the README files for `clr`, `hipother`, and `hip-tests` to use relative paths instead of absolute URLs, improving navigation within the repository.
* Update CONTRIBUTING documentation to use relative links for improved navigation
- Changed absolute URLs to relative paths in the CONTRIBUTING.md files for the hip and hipother projects, enhancing consistency and ease of access within the repository.
* [hip] Docs: Overhaul HW implementation page
* Update hardware implementation and glossary
* Update programming model
* Add performance optimization
* Split into how-to and understanding
---------
Signed-off-by: Jan Stephan <jan.stephan@amd.com>
Co-authored-by: Jan Stephan <jan.stephan@amd.com>
Co-authored-by: Julia Jiang <julia.jiang@amd.com>
* SWDEV-533237 Add initial support for hipOccupancyAvailableDynamicSMemPerBlock API
* SWDEV-533237 Add hipOccupancyAvailableDynamicSMemPerBlock wrapper for nvidia
* SWDEV-533237 Add implementation of hipOccupancyAvailableDynamicSMemPerBlock API
* SWDEV-533237 Add LDSAlignment field in Isa table
---------
Co-authored-by: Rahul Manocha <rmanocha@amd.com>
* Add examples to tools folder
* Correct P2P memory access section
* Sync poriting guide
* Add HIP Graph tutorial
* Add hint about using amdgpu-dkms for IPC API
* Add a few more env variables
* SWDEV-546311 - implement hipKernelGetLibrary & hipLibraryEnumerateKernels API
* Fix for LibraryEnumerateKernel and KernelGetName
* Update Enumerate Kernels to handle 0 numKernels
* Minor fixes to function names
* fix error checking in internal function
* Update changelog for new apis
---------
Co-authored-by: Rahul Manocha <rmanocha@amd.com>
1. Create a set of mini numa interface.
In Linux, the interface is based on system call rather than libnuma.
In Windows, the interface can also work, but the policy class is dummy.
Different from Linux, Windows doesn't provide numactl tool or numa lib to setup numa policy, thus
the default policy is followed in Windows, that is, using the closest host numa node to allocate
pinned host memory in hipHostMalloc().
To get the closest host numa node of a GPU device, you need query the new attribute
hipDeviceAttributeHostNumaId. Then you can create a thread with CPU affinity on the numa node.
For example, reference the test in hip-tests/catch/perftests/memory/hipPerfHostNumaAllocWin.cc.
2. Remove pfnSetThreadGroupAffinity and pfnGetNumaNodeProcessorMaskEx as the functions have been exposed since Win7 and Win server 2008.
3. Other minor fixes.
* SWDEV-554608 - Add hipHostRegisterIoMemory for hipHostRegister
* SWDEV-554608 - Add hipHostRegisterIoMemory for hipHostRegister
* SWDEV-554174 Added hipHostRegisterIoMemory flag in test cases
* SWDEV-554174 : Did formatting corrections
* SWDEV-554608 - set HSA_AMD_MEMORY_POOL_UNCACHED_FLAG if IoMemory is set
* SWDEV-554608 - set HSA_AMD_MEMORY_POOL_UNCACHED_FLAG if IoMemory is set
* SWDEV-554608 - Add hipHostRegisterIoMemory for hipHostRegister
---------
Co-authored-by: Anavena Venkatesh <Anavena.Venkatesh@amd.com>
Co-authored-by: Rambabu Swargam <rambabu.swargam@amd.com>
* SWDEV-545950 - Add hipStreamCopyAttributes API Implementation
* Add unit test for hipStreamCopyAttributes API
* Add ChangeLog and nvidia mapping for the API
* Update rocprofiler-sdk with new HIP API details
* [rocprofiler-sdk] handle hipStreamCopyAttributes in stream tracing service
- this new HIP function has multiple stream arguments and needs to be skipped because it does not have an explicit create/destroy/set functionality
* Update HIP_RUNTIME_API_TABLE_STEP_VERSION in clr and rocprofiler-sdk
* Resolve merge conflicts
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Return error when ext_fine_grain_pool is unavailable for
hipHostMallocUncached, hipHostAllocUncached and
hipExtHostRegisterUncached.
Disable related tests on Navi4x where
ext_fine_grain_pool is unavailable
- Clean up and standardization of MIT licenses after discussion with legal team.
- Update README.md with blurb for top-level files.
- MIT License explicitly mentioned for relevant projects.
- Removal of years.
- Copyright attribution should be to `Advanced Micro Devices, Inc.` and not `AMD ROCm(TM) Software`
- Removal of `All rights reserved.`
- Reduce line width of the text for readability.
- Add clear visual separators for additional licenses.
- Convert text files to markdown format for aforementioned separators.
- Update build scripts to point to renamed files.
- Fixed SMI doc references
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
* Revert "SWDEV-547589 - Add hipDeviceMallocUncached to hipMemCreate (#815)"
This reverts commit 5ce7103555.
* Revert "SWDEV-547589 - comment for flag hipDeviceMallocUncached in hipMemcreate (#339)"
This reverts commit 04dac5eae3.
* SWDEV-551942 - implement hipMemAllocationTypeUncached in hipMemCreate
- Renaming old `README.md` files to keep their information intact.
- Default `README.md` files will have the deprecation notice to be mirrored back into the individual repos.
- Change ROCR-Runtime mirroring to `develop` branch.