rocm-systems

作成者	SHA1	メッセージ	日付
Sean Keely	3ebe99f96d	Add experimental option to force discovery of all copy agents. Discards all user provided async copy agent info and relies on pointer info discovery. Change-Id: Ife3e708a49ffccbede4983ab47d5ed0032970857	2022-05-14 18:08:57 -05:00
Sean Keely	13a0cdfa77	Use block pointer info in async copy. Only block info can return an agent which is disabled in the process. Change-Id: I34cb1f9eea9217e10a484726c90d930e3414e769	2022-05-14 18:08:57 -05:00
Sean Keely	247606c455	Report owning agent with pointer info block information. Physical owning agent may not be visible to the current process due to RVD. Change-Id: Ib463336a5ed73a479f3aa74eb140932b9e0435fb	2022-05-14 18:08:57 -05:00
Sean Keely	c289a43e88	Allow zero agent handle in AsyncCopy APIs. IPC use cases with RVD set can't convey proper agent handles. Runtime discovery is required to properly route the copy in this case. Change-Id: I4c97e132fb4b6ac1040de1cb17fe5a3e36d6be48	2022-05-14 18:08:49 -05:00
Sean Keely	ace0599c69	Report pointer info queries to released fragments as type UNKNOWN. We should not leak suballocation info to users. Change-Id: I13b2a22bf5517b523ba04ddc039b49da8378b55f	2022-05-09 13:46:16 -05:00
Sean Keely	0ba9b162db	Ensure IPC imports always create an allocation map entry. Simplifies behavior. A memory type now either always generates an entry or never does. Change-Id: Ie98cddea01e801308ac0ba650795fdef92b7e47d	2022-05-09 13:46:16 -05:00
Sean Keely	752cfd5ffd	Adjust include paths for new header locations. Thunk and rocm_smi_lib paths have been updated. Change-Id: If2948172f8064dd992cbccbc2a80f9161ad4d457	2022-05-09 14:44:32 -04:00
Ranjith Ramakrishnan	bb4da8545a	File Reorganization changes with backward compatibility Wrapper header files and library soft links for backward compatibility Install interface updated with /opt/rocm/include Change-Id: If772b24320f9d1de90f9be0930b1f2aa1d073777	2022-05-06 19:12:14 -04:00
Sean Keely	7f370dd84c	Drop build dependency on DeviceLibs. DeviceLibs is still needed but is found and included by clang now. Change-Id: I03ff7dc91c028d2ee6747aa1779d223a9ba13915	2022-05-06 01:01:05 -04:00
Sean Keely	0ee82742a7	Switch to CLOCK_BOOTTIME for HSA system clock. This is consistent with KFD and has significantly better latency. KFD is taking this as the definition of the SystemClockCounter. Change-Id: I4c1b3bc58c738206265c55ebefd41356c013bfe5	2022-05-05 15:27:29 -04:00
David Yat Sin	cd0788938c	Remove unused variable Change-Id: Ie29eb1cabef38c259280237c32d83aaa126e3b7a	2022-05-04 13:32:06 -04:00
Yifan Zhang	54c8b7900d	add gfx1036 support Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Change-Id: Ifc1b3cf2e46cf753f57470ebc6b034c1a349d3d2	2022-04-29 17:52:22 -04:00
Shweta Khatri	1b0440e7b3	Assemble trap handler at build time. Eliminates the need for manually assembling the source of the second level trap handler to produce the shader binary. Also separated blit shaders' binary source and version one second level trap handler binary sources into different header files. Change-Id: If29a18ee06dc083ec880ea962f234c6b5cac806a	2022-04-28 20:14:14 -04:00
Jonathan Kim	658b053943	Bypass HDP flush during SDMA copies on A+A GPU-CPU xGMI connections Host to device SDMA copies do not require an HDP cache flush when connected by xGMI since data copies over the data fabric and not HDP. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Sean Keely <sean.keely@amd.com> Change-Id: I78d73a47edcc1a9c0ba59f33cf91485f13f1c45b	2022-04-27 21:45:26 -04:00
Sean Keely	64dae113b1	Minor typo fixes. Declare the type of HSA_AMD_AGENT_INFO_COOPERATIVE_COMPUTE_UNIT_COUNT and add a missing break statement. Change-Id: I86ce8a2e620438e046b60cee991ce1fbe07a3e88	2022-04-26 15:51:22 -04:00
Sean Keely	2eedf953f3	Handle scratch interleave per SE for gfx10+ On gfx10+ we need to issue a minimum count of active lanes or groups before ADC moves on. Ensure that scratch allocations attempt to reach this limit. Occupancy throttling due to OOM condition may still drop below this limit. Change-Id: I0edf2e40fbe1a95e9a262564cebd2b6a82501a0b	2022-04-26 15:32:03 -04:00
Jeremy Newton	178a7a5cfa	Drop some unnecessary definitions __x86_64__ and __AMD64__ should be already defined by the compiler to specify the compilation target and shouldn't be defined manually. I fixed two x86_64 checks to include VS variables, as removing this might cause it to fail to compile on that compiler. Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com> Change-Id: I600ff449af85bf7d83ecab167d97933922e2d917	2022-04-19 12:22:42 -04:00
Jeremy Newton	ddf4edcafc	Use CMAKE_INSTALL_* Instead of installing to lib or include, use CMAKE_INSTALL_LIBDIR and CMAKE_INSTALL_INCLUDEDIR to allow the builder to override if desired. The default LIBDIR should be "lib" to avoid breaking ROCm packaging, but using GNUInstallDirs would use lib64 on RHEL. By setting a default value prior to including GNUInstallDirs, we can always use "lib" unless the builder explicitly overrides it via "-DCMAKE_INSTALL_LIBDIR", which is typical in most distro scripts. Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com> Change-Id: I135f21bcfeb02b6849f6e8ca403b39c029a02d5c	2022-04-19 12:22:42 -04:00
Jeremy Newton	a0931f4a3c	Only default IMAGE_SUPPORT=ON for x86 Image support does not compile on other archectures, since it relies on the x86 only header "x86intrin.h". Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com> Change-Id: I120d15870e74e20bd618e6f5da8c05e28fb1203b	2022-04-12 09:24:45 -04:00
Konstantin Zhuravlyov	9265409f08	Add code object v5 support Change-Id: I03522765056e99ed49e6c5e213ee3753852de27b	2022-04-12 08:53:27 -04:00
Sean Keely	b3caf6782b	Revert "Release host buffers after segment freeze." This reverts commit `03a52655a8`. Change-Id: Idc7e568b2b54a226dbe4d189b25a78be3bd16eea	2022-04-11 20:43:07 -05:00
Sean Keely	4e9849034d	Correct inf loop defect in fast clock init. Each time delay is grown we need to reset elapsed. We want to take the most accurate sample from the set at fixed delay. Without this we will hang if there is ever an insufficiently accurate, high unit clock read. Change-Id: Ic65f364067789ac85a6572d67af2d77528e265bb	2022-04-01 16:15:37 -04:00
Sean Keely	03a52655a8	Release host buffers after segment freeze. Release staging buffers after loading has completed. The debugger no longer uses this copy. Change-Id: I46f36b50033bebe5a9ebc648b291d46f1d09b21d	2022-03-23 23:53:02 -05:00
Sean Keely	048700f2e7	Correct loader memory interfaces. The loader must use internal interfaces to access page allocation flags. Code pages should also ensure use of cached memory. Also relocate i-cache flush after code page copy. Change-Id: I86d36243b6eebb1d46b991b372a5236baaf941ab	2022-03-23 23:52:56 -05:00
Sean Keely	fbc48521dc	Correct queue error reporting. VM faults should not report via the queue error handler. The system event contains much more useful information. Change-Id: I744d9b97b23334d7ed2c0f450111c1b8032567e3	2022-03-23 23:37:53 -05:00
Sean Keely	af0f90800d	Ignore hive id for CPUs when selecting copy paths. Hive ID is used during copy path selection to locate an optimal pool of SDMA engines. However, for CPU-GPU connections we always want to use the host port facing engines, known generally as the PCIe optimzed engines. We want this selection even when the connection is XGMI hence dropping the hive id for CPUs. Change-Id: Iffe44174afecfc0bb3272b806fce549c930a49d9	2022-03-18 18:48:44 -05:00
Sean Keely	7e73760cd0	Revert "add gfx1036 support" Compiler is not promoted to mainline yet. This reverts commit `2f97f17df9`. Change-Id: I7256aeb3698ee3ae640a9f457a929abe24d5ef17	2022-03-18 02:35:01 -05:00
Yifan Zhang	2f97f17df9	add gfx1036 support Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Change-Id: I075779b1369fde759c29572fa2027a3748d6ed4c	2022-03-05 13:16:19 +08:00
Sean Keely	8a6954c63c	Do not allow occupancy restriction on cooperative groups. Excessive scratch allocations can normally trigger occupancy reduction. This breaks cooperative groups so if occupancy reduction is required on a cooperative dispatch fail with OOM. Change-Id: I64612a2e38bf1286f3b74c1c2a68ab0c85452771	2022-03-02 19:59:30 -06:00
Sean Keely	552dcead93	Correct scratch allocation logic to account for asymmetric harvest. With asym. harvest hw does not issue groups equally to each SE, occasionally hw will skip an SE so that the distribution reflects each SE's CU count. Scratch resources must be allocated to reflect this asymmetric distribution of groups. Change-Id: I65e26206500483ea18e6e8796e65ecba5354b029	2022-03-02 19:59:30 -06:00
Sean Keely	cedc3e80a8	Do not bump up total scratch size for large cached allocations. HW does not ignore low bits of the scratch wave count and will stride beyond the end of the allocation if the wave count is ever indivisible by SE count. Rather than returning the allocation size for cached large scratch allocations, use the requested scratch size in scratch setup. Scratch cache will retain the cached allocation's size. Change-Id: I0129ddc99a8940d01d8fbcd0b02d5061f31f456d	2022-03-02 20:48:19 -05:00
Saravanan Solaiyappan	a496adafaa	Consider apt/yum upgrade operation check in package scripts. Include the upgrade operation check in the prerm and postun scripts in package. Signed-off-by: Saravanan Solaiyappan <saravanan.solaiyappan@amd.com> Change-Id: Ic766d8d68b5168e5f1b065d846ca2604d281e5be	2022-02-24 10:26:04 -05:00
Sean Keely	b9a0c1d313	Do not discard fragment allocator blocks multiple times. discardBlock may be called multiple times on the same block. We must not discard the block multiple times or we will corrupt in-use memory accounting. Change-Id: Ife9f3162785965a795dcf81887d4d447cc096e62	2022-02-10 18:39:46 -06:00
Sean Keely	266cd68524	Add fallback case for cache line size. KFD sometimes returns 0 for cache line sizes. Change-Id: If82de0068318bbc138f0d1d4692ff908359174ad	2022-02-10 18:39:46 -06:00
Sean Keely	21291b48c6	Retrieve cache line size from KFD topology. Change-Id: I16ddd9d9888bb973eccf3c562619894c88c7df15	2022-01-16 08:44:44 -06:00
Sean Keely	a6742209f7	Correct queue minimum size enforcement. Minimum queue size was not enforced at the Agent level. Minimum size should be one page to give unifority across all asics. Change-Id: I26394f79458d09fbceb79fc8aaf495e2c26a8ff3	2022-01-16 08:28:34 -06:00
Sean Keely	a65f3f5b71	Improve scratch error detection in debug mode. Adds asserts for invalid dispatch dims and scratch requests that don't actually use scratch. Change-Id: I6e6eef3f17dc38adaf96550fa55bd8625868efa3	2022-01-31 20:53:24 -05:00
Sean Keely	37942c982a	Add HSA_AMD_AGENT_INFO_COOPERATIVE_COMPUTE_UNIT_COUNT. On gfx90a only a reduced number of CUs must be used for cooperative dispatches due to CWSR and launcher interactions with asymetric harvest. We must use one fewer CUs per SE than the lowest count of CUs on any SE. Also adds env var HSA_COOP_CU_COUNT which enables the cooperative CU count computation. Set to 1 to enable the new computation. This is an opt-in feature that will become enabled by default (opt-out) in a future release. Change-Id: Ifbb75ced3bbc15876eef44922c6a4f6fde8c4c28	2022-01-31 15:22:07 -05:00
Chen Gong	dec63b4f15	Correct the gfx version of gfx90c to 90c Corrections have been made in libhsakmt, and corresponding changes are required here as well. Signed-off-by: Chen Gong <curry.gong@amd.com> Change-Id: Ib697ce25278c2c5ac6ef0206930ec285f46c60d1	2022-01-25 19:05:46 +08:00
Jeremy Newton	bd1a4adf35	Install license file See Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com> Change-Id: I80e9664b5ade520d9bf9b9a20ac36d67cfe85107	2022-01-17 10:54:54 -05:00
David Yat Sin	86164fbfec	Fix for segfault after removing PrefetchRange from map The start iterator becomes invalid after it is removed from std::map prefetch_map_. This was causing a segfault when the iterator is incremented afterwards. Signed-off-by: David Yat Sin <david.yatsin@amd.com> Change-Id: I4b0b763d2cb4ee99c0b8571c2c526b834e74077a	2022-01-10 17:47:02 -05:00
Sean Keely	fce6ba052e	Correct documentation typo. ROCM_VISIBLE_DEVICES was used where ROCR_VISIBLE_DEVICES was intended. Change-Id: I644a546f3c9dd0b50898ef8a21dbb8f5c3a36926	2021-12-10 16:19:30 -06:00
Sean Keely	df55cb0450	Rework memory locks to allow device parallelism in alloc/free. Prior solution used a single global lock to protect the memory tracking structures. This change protects the memory tracking structure with a shared mutex (rw lock) in shared (r) mode for memory allocations and frees so that long duration processes, calling to kfd, can be done in parallel. Operations which must modify the memory map take the mutex in exclusive mode (w) and must not call to the thunk while holding the mutex. The fragment allocator now requires separate protection and is protected with a mutex at the device level. Protecting at the device level, rather than pool, allows retention of the current recursive design and allows calling Trim from withing Allocate. This could be made finer (pool level locks) but would require backing out of Allocate entirely to call Trim. Trim and any retried Allocation must be done in isolation (per device) or we may report OOM when memory is actually available in some pool's fragment cache. So some device level serialization is required in at least some paths. Change-Id: I7c1e94d6965ffcc602b12fefdd3a6e97b84b5e00	2021-11-24 19:22:05 -06:00
Sean Keely	fc75731034	Add comments to GetPcieBlit. Comments call out the specific operation being selected since the ternary nest is a bit hard to read. Change-Id: If033dbaa6cba132e96196ad3fc6d5572042041f4	2021-11-15 19:34:03 -06:00
Sean Keely	b198016949	Fix leak in hsa_amd_interop_map_buffer. Agent temp array could have leaked if one of the given agent handles was invalid. Change-Id: I9e638b3a4f6bb917a4e3209ad81a1253bb603365	2021-11-15 19:22:20 -06:00
Sean Keely	f48a786662	Correct order of argument check and default assignment in lock APIs. Argument must be checked for nullptr before being dereferenced and filled with the default return value. Change-Id: I9ff366f066a5e18c78129bf59cc3ba00fca3ef18	2021-11-15 19:22:02 -06:00
Sean Keely	322588a60e	Add missing return in ScopeGuard::operator=. This omission did not cause problems earlier due to having not been instanced. Change-Id: I7a54f82e06c299902f3bf6b4d3737cc5e30961ad	2021-11-15 18:50:46 -06:00
Sean Keely	19454fcf26	Correct node id assertion in pointer info. Size of the node map was used as the max node id previously. This is wrong when RVD is used. Change-Id: Ic632ec96891b92186e5b68cd53f81414db34f59f	2021-11-10 22:09:24 -06:00
Sean Keely	c9eb85e205	Correct size of SVM node array. Was size of the map. Needs to be size of the node id range. Change-Id: I92501ea7adca5c30dbb0fdabd2c421dea58f8d6f	2021-11-10 21:23:42 -06:00
Sean Keely	d65e00bcc5	Include event_id in SDMA interrupt payload. The event id assists KFD in locating the proper event associated with the interrupt. Change-Id: I75d58b6be74dd5b1edb0c5fe2b9d01538a649ba1	2021-11-10 20:57:11 -06:00

1 2 3 4 5 ...

655 コミット