## Motivation
Resolved: SWDEV-566226
The current implementation of agents inside of rocprof-systems keeps just the minimal necessary set of information required for populating the `info_agent` table inside of rocpd database. There is a sufficient amount of data that is being left out from database, so this change should fix that and store the additional agent information as an `extdata` row inside of `info_agent` table.
## Technical Details
This PR introduces additional filed inside of `agent` structure inside which is representing the JSON formatted string of all the additional information we can acquire about particular agent. This data is processed and added during the initial fetching of agents, and afterwards pushed inside of the database.
---------
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
* SWDEV-557412 - Incorporate proper chunk offset when remapping virtual memory (#1848)
* SWDEV-557412 - Incorporate proper offset when remapping virtual memory
* Fix condition to check if VMHeap allocation address matches a chunk address
* Move offset calculation outside if/else block
---------
Co-authored-by: JeniferC99 <150404595+JeniferC99@users.noreply.github.com>
* SWDEV-567852 - Clean-up hip::init() (#1948)
* SWDEV-559267 - Use CLPrint to DevLogPrintf with Log Level - detail debug. (#1160)
* SWDEV-548892 - Stop using ocml isinf wrapper (#1854)
* SWDEV-562708 - change default maximum SVM size to 256GB (#1731)
* SWDEV-503089 - Fix and enable disabled HIP tests from math group (#1319)
* SWDEV-503089 - Fix and enable disabled HIP tests from math group
* SWDEV-503089 - Move single precision reduced run to a common function
* SWDEV-548892 - Stop using ockl steadyctr function (#1882)
Directly use the builtin
* Implement PTL support (#1957)
* Implement PTL support
Signed-off-by: adapryor <Adam.pryor@amd.com>
(cherry picked from commit 45bc31292e7940a3b8fca044ef7df22047b95733)
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
---------
Signed-off-by: adapryor <Adam.pryor@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
* SWDEV-558080 - Add recommended granularity (#1176)
* Add recommended granularity
* Improve granularity testing
* Update based on feedback
* Fix and enable VMM tests on cuda (#1855)
* Fix and enable VMM tests on cuda
* Minor syntax fixes
---------
Co-authored-by: Rahul Manocha <rmanocha@amd.com>
* [rocprofiler-systems] Add support for ompt_callback_thread_begin (#1681)
* Add thread_begin callback
* Make OMPT callbacks that are instant have start_ts = end_ts
* SWDEV-567514: Remove default stream wait (#1977)
- when virtual map command is called
- can create deadlock
Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
* Fix flaky test Unit_hipStreamAddCallback_StrmSyncTiming (#2022)
* Review comments
* skip the 3 failing tests to merge hip-tests rocm-systems PR
---------
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: adapryor <Adam.pryor@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
Co-authored-by: GunaShekar <agunashe@amd.com>
Co-authored-by: agunashe <ajay.gunashekar@amd.com>
Co-authored-by: Ethan Trinh <Ethan.Trinh@amd.com>
Co-authored-by: JeniferC99 <150404595+JeniferC99@users.noreply.github.com>
Co-authored-by: Victor Zhang <111778801+victzhan@users.noreply.github.com>
Co-authored-by: German Andryeyev <56892148+gandryey@users.noreply.github.com>
Co-authored-by: usrihari123 <srihari.u@amd.com>
Co-authored-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Co-authored-by: anujshuk-amd <anujshuk@amd.com>
Co-authored-by: itrowbri <Ian.Trowbridge@amd.com>
Co-authored-by: marantic-amd <marantic@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
Co-authored-by: cadolphe-amd <chris.adolphe@amd.com>
Co-authored-by: Karthik Jayaprakash <54370791+kjayapra-amd@users.noreply.github.com>
Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
Co-authored-by: Todd tiantuo Li <88386084+lttamd@users.noreply.github.com>
Co-authored-by: amilanov-amd <Aleksandar.Milanov@amd.com>
Co-authored-by: Adam Pryor <61172547+adam360x@users.noreply.github.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: AidanBeltonS <abeltons@amd.com>
Co-authored-by: Rahul Manocha <153310294+manocharahul@users.noreply.github.com>
Co-authored-by: Rahul Manocha <rmanocha@amd.com>
Co-authored-by: Kian Cossettini <Kian.Cossettini@amd.com>
Co-authored-by: Shadi Dashmiz <94885391+shadidashmiz@users.noreply.github.com>
Co-authored-by: Ioannis Assiouras <38722728+iassiour@users.noreply.github.com>
Co-authored-by: Ajay GunaShekar <86270081+agunashe@users.noreply.github.com>
1. Create a set of mini numa interface.
In Linux, the interface is based on system call rather than libnuma.
In Windows, the interface can also work, but the policy class is dummy.
Different from Linux, Windows doesn't provide numactl tool or numa lib to setup numa policy, thus
the default policy is followed in Windows, that is, using the closest host numa node to allocate
pinned host memory in hipHostMalloc().
To get the closest host numa node of a GPU device, you need query the new attribute
hipDeviceAttributeHostNumaId. Then you can create a thread with CPU affinity on the numa node.
For example, reference the test in hip-tests/catch/perftests/memory/hipPerfHostNumaAllocWin.cc.
2. Remove pfnSetThreadGroupAffinity and pfnGetNumaNodeProcessorMaskEx as the functions have been exposed since Win7 and Win server 2008.
3. Other minor fixes.
* SWDEV-546485 Port and clean up for hipPerfBufferCopyRectSpeed
* SWDEV-546485 Port and clean up for hipPerfDevMemReadSpeed
* SWDEV-546485 Port and clean up for hipPerfDevMemWriteSpeed
* SWDEV-546485 Port and clean up for hipPerfHostNumaAlloc
* SWDEV-546485 Port and clean up for hipPerfMemcpy
* SWDEV-546485 Port and clean up for hipPerfMemMallocCpyFree
* SWDEV-546485 Port and clean up for hipPerfMemset
* SWDEV-546485 Port and clean up for hipPerfSampleRate
* SWDEV-546485 Port and clean up for hipPerfSharedMemReadSpeed
* SWDEV-546485 Ported and fixed up segfault for hipPerfMemFill
* SWDEV-545485 Returning to unedited stage
[ROCm/hip-tests commit: 04469c0cde]
* SWDEV-543981 new kernel latency test with different timing modes and taking multiple iterations of same test
* SWDEV-543981 cleanup
* SWDEV-543981 removed outdated hit test
* SWDEV-543981 Updated timing kernel
[ROCm/hip-tests commit: d227a8110c]
* SWDEV-532641 Inter GPU copy performance improvements
* SWDEV-532641 changed source data pointer type to vector type
[ROCm/hip-tests commit: feaa82ac46]
1.Remove clock functions from some tests that don't need them.
2.In some memory pool tests and coherency tests, timer-based kernel
delay isn't reliable, use pinned host based notification instead.
3.Add CHECK_PCIE_ATOMICS_SUPPORT before some tests.
4.catch/unit/memory/hipMemoryAllocateCoherent.cc is removed
as it is useless and originally excluded in building.
5.Some tests can still pass even if clock rate =0, thus they
will be kept as is.
6.Some logic and format improvement in some tests.
Change-Id: I6b3c6bf54c61cffd45cd6f17c75998f751b75725
[ROCm/hip-tests commit: ec8ff45a1d]
Unit_hipMemPoolApi_BasicAlloc expects to work on device 0, but other
tests will set not-0 devices in mgpu. This leads to hang of
Unit_hipMemPoolApi_BasicAlloc. Fix by set device 0 in head code
of Unit_hipMemPoolApi_BasicAlloc.
SWDEV-508872 - Fix Perf_hipPerfMemFill_test
When mem size is 2G, the test is so slow that it looks like stuckness.
Set top mem size to 1G can make the test pass in an acceptiable time.
Change-Id: Ie26dbf597e5ba8cb898d1aae5ed5ecf0267c3228
[ROCm/hip-tests commit: 94eea4db59]
- Removed unnecessary hipEventRecord and fixed time calculation in
hipPerfDispatchSpeed.cc where it was off by a factor of 1,000.
Change-Id: If538e1d236cf0e6d3c69caf7af53c9095d812ad6
[ROCm/hip-tests commit: b1f9f86543]
Having same target name causes same includes to be called twice
Change-Id: I53469a07e6dee375ea4a4700ccac3c9487b79e4a
[ROCm/hip-tests commit: c03ad253fd]
- Do an hipEventRecord on null stream, that creates the streams and
avoids stream creation overhead when we time the core functionality
Change-Id: I117dccc42c92836fa113214d31bf14da49deba77
[ROCm/hip-tests commit: fb5e1d33d9]
Correct the size of allocated buffers.
Extend the number of executed tests
Make sure warm-up finishes, before starting the test
Use a non-blocking stream for Async tests
Align up the output with results
Change-Id: Ie107fd83c0a95dacb537d8bca0b534cf6a6d5032
[ROCm/hip-tests commit: 9971540ac8]