Legal Requirements:
For AMD software being released as open source, add copyright at the top of each new file.
Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
This PR intends to cover the edge case seen in https://github.com/ROCm/rocm-systems/issues/694.
`hip-config-amd.cmake` uses rocm_agent_enumerator to determine which GPU architecture to target when no target is specified.
https://github.com/ROCm/rocm-systems/blob/9a02dae75f8df9d8f08923d34d06d76e96ced7b4/projects/clr/hipamd/hip-config-amd.cmake.in#L86-L95
On WSL, both `readFromKFD` and `readFromLSPCI` are skipped. If `readFromTargetLstFile()` isn't in use, `readFromROCMINFO()` is called on. If rocminfo times out, it prints the following message to stdout.
```
"Timeout querying rocminfo. Are you compiling with more than 254 threads?"
```
Because this is output and not an explicit error message, `execute_command` in the previous code blocks treats the output as `OUTPUT_VARIABLE` and passes it on as a valid gfx arch which causes these errors in CMake,
```
lang++: error: invalid target ID 'Timeout'; format is a processor name followed by an optional colon-delimited list of features followed by an enable/disable sign (e.g., 'gfx908:sramecc+:xnack-')
clang++: error: invalid target ID 'querying'; format is a processor name followed by an optional colon-delimited list of features followed by an enable/disable sign (e.g., 'gfx908:sramecc+:xnack-')
clang++: error: invalid target ID 'rocminfo.'; format is a processor name followed by an optional colon-delimited list of features followed by an enable/disable sign (e.g., 'gfx908:sramecc+:xnack-')
clang++: error: invalid target ID 'Are'; format is a processor name followed by an optional colon-delimited list of features followed by an enable/disable sign (e.g., 'gfx908:sramecc+:xnack-')
clang++: error: invalid target ID 'you'; format is a processor name followed by an optional colon-delimited list of features followed by an enable/disable sign (e.g., 'gfx908:sramecc+:xnack-')
clang++: error: invalid target ID 'compiling'; format is a processor name followed by an optional colon-delimited list of features followed by an enable/disable sign (e.g., 'gfx908:sramecc+:xnack-')
```
The output can be properly pushed to `ERROR_VARIABLE` if rocm_agent_enumerator pushes the output to stderr instead of stdout. This can be done with the changes to the print statement in this PR or using the `logging` module.
Removing extra print that was added for backward compatibility.
Change-Id: I12a5346708886861a6e3cd6440830e6425e647d9
[ROCm/rocminfo commit: 9f6d7cdf6b]
Makes all re.compile function calls use raw string to prevent Syntax warning in future, if backslash escape characters are used in regular expressions
https: //github.com/ROCm/rocminfo/pull/66
Suggested-by: Author: Yiyang Wu <xgreenlandforwyy@gmail.com
Change-Id: I6c7aaf016c588bb2ae5a0f979da7d423a78d6ec3
[ROCm/rocminfo commit: e1716642ff]
In Python3, unescaped backslashes in regular expressions are deprecated, and these were generating SyntaxWarnings.
Patch submitted by (Tianao Ge <getianao@gmail.com>) on github:
https://github.com/ROCm/rocminfo/pull/55
Change-Id: Icbcf2803291add5b5f3971ac9901a8927d23f225
[ROCm/rocminfo commit: 429baf04fb]
rocm_agent_enumerator may invoke rocminfo. Rocminfo opens the
GPU device which allocates limited resource. Beyond 254
concurrent processes this resource will be exhausted and rocminfo
will return an error.
This patch loops rocm_agent_enumerator when recieving a failure
message from rocminfo indicating KFD is out of memory.
Change-Id: I8637e214f5fa012642975c28578ae6bf9200eda8
[ROCm/rocminfo commit: b57c02d131]
New versions of amdkfd include the gfx architecture version number
for all GPUs surfaced in the HSA topology. This patch adds this as
the preferred way for rocm_agent_enumerator to check for supported
gfx architecture numbers.
Kernels that are missing this feature will not have the value in
the topology. rocm_agent_enumerator will fall back to checking
against the PCI IDs in this case. If PCI IDs fail, we fall back
to the heavyweight rocminfo method.
Change-Id: I5cf22e1069114675092e97ae52331b829cfafb04
[ROCm/rocminfo commit: f419b81bdf]
rocminfo is a very heavyweight mechanism for learning a lot of
information about the GPUs that are attached to the system.
It opens up the limited /dev/kfd resource to gather lots of
information about each device, while rocm_agent_enumerator really
only wants the gfx number of AMD devices attached to the system.
To avoid this heavyweight lookup in most cases, this patch switches
the order of tests. Rather than starting with rocminfo and then
falling back to a poorly-maintained PCI ID list, this patch changes
the agent enumerator to start by checking in the PCI ID list (fast
case) and then falling back to rocminfo (slow case) if the PCI ID
list is out of date.
Change-Id: If24b8bc3baeeb6adad362abbb288ef3728383bce
[ROCm/rocminfo commit: e9b7de43be]
The PCI ID backup method in rocm_agent_enumerator, where the
tool uses lspci to find all AMD GPU devices in the system and
manaully match them to gfx version, is extremely outdated. The
PCI ID list did not include anything after Vega 10, and the
actual call to lspci no longer returned anything due to some
missing conversions.
The patch adds all GPUs that might be needed by ROCr up through
Navy Flounder. The PCI ID to gfx matching pulls from the amdgpu
driver and libhsakmt.
Change-Id: I58b77bb6aa631f575352fc444d2542f265909706
[ROCm/rocminfo commit: ea5ce46fb4]
On Ubuntu 20.04, there is no more python or python2. Currently I get
this: /usr/bin/env: ‘python’: No such file or directory
Change-Id: Ib310b8aa7c1bd62973ef3cc8bcaf571831ad4435
[ROCm/rocminfo commit: 5d6be5b808]
This is a python port of rocm_agent_enumerator, which is used by HIP/HCC to
determine available AMDGPU targets on a system.
Its previous implementation was written in C++ which makes it somewhat hard to
deploy onto different distros / architectures. A python port should remove such
issue.
[ROCm/rocminfo commit: 8b018900f6]