rocm-systems

Autor	SHA1	Mensaje	Fecha
Amber Lin	19b4a16ead	Correct NumCaches against the CPU node In a NUMA system, topology should report NumCaches as the number of caches within the node but current code reports the total caches in the system. This patch fixes the error. This patch also uses cpuid to get cache information instead of reading from sysfs files. See "Intel Corporation, Intel 64 and IA-32 Architectures Software Developer's Manual Volume 2(2A, 2B & 2C) Instruction Set Reference" 3-179 for cpuid instruction features used in this patch. Change-Id: I8ecece6c2b230741822620b44e66ddc201ff5112 [ROCm/ROCR-Runtime commit: `73ad0a1942`]	2016-05-03 11:39:33 -04:00
Felix Kuehling	87bd249ed5	Add gfx70x support Change-Id: I400adb62b5225ef3a42da279d067fb0a62907089 [ROCm/ROCR-Runtime commit: `97e51ce33d`]	2016-04-25 14:27:44 -04:00
Andres Rodriguez	e8d96eac7a	package: rename to hsathk-rocm-dev Since we include headers and not just a library anymore, we should be considered a -dev package and not a lib package. Change-Id: I220465ea4ffc8d66d8d76e6716e6c6c50cdacea1 [ROCm/ROCR-Runtime commit: `44572965f6`]	2016-04-13 19:39:54 -04:00
Andres Rodriguez	ade12f4ec1	Adopt new ROCm packaging guidelines All files should go into /opt/rocm/$component For developer convenience, a single include directory is created through symlinks, from the component include directory to /opt/rocm/include. Similarly, a unified linked directory is present in /opt/rocm/lib The component lib directory should not include linker names (library names without version numbers). This commit also fixes 'make rpm' running correctly without the need for sourcing build/envsetup.sh Change-Id: I95a680f6d3e3bd1ae688d0694934a0577dbd007c [ROCm/ROCR-Runtime commit: `9f355b78a0`]	2016-04-11 18:30:54 -04:00
Felix Kuehling	f0af6eceed	Fix 4GB and larger system memory allocations Intermediate size was stored in a 32-bit variable. This resulted in 4GB allocations to fail in KFD due to 0 size. Larger allocations would allocate the wrong amount of memory. Change-Id: If19dedf64952f1d2edd813793241e12c0362d220 [ROCm/ROCR-Runtime commit: `82b3fad320`]	2016-04-11 11:17:06 -04:00
Andres Rodriguez	e4f1d95ef2	package: change install directory to /opt/rocm Align with the rest of the driver stack on the new installation path /opt/rocm/* This mechanism for generating packages should be changed for something nicer and more standards compliant in the future. Change-Id: Ic31409b0d0b8f6ee4b25296d2580982a76aab564 [ROCm/ROCR-Runtime commit: `31861c838e`]	2016-04-08 11:41:49 -04:00
David Ogbeide	9abf85c06b	libhsakmt: get CPU model name from proc/cpuinfo HSA thunk is currently only aware of GPU node model info, CPU names are NULL. Signed-off-by: David Ogbeide <davidboyowa.ogbeide@amd.com> Change-Id: I3c2adbb8566a5048b44c39fff4fd8228912468ff [ROCm/ROCR-Runtime commit: `682776d89a`]	2016-03-23 11:11:18 -04:00
Felix Kuehling	a8a5960095	Add environment variable to disable GPU caching This option may help debug synchronization or coherency issues involving the GPU caches. It works only on dGPUs, by changing the cache policy of the GPUVM default aperture to "cohrent", which is implemented as non-cached on current dGPU hardware. Change-Id: I544ac9cc5c0cf1fa5c4e30f67aa42b3b5e44ae67 [ROCm/ROCR-Runtime commit: `06d391c6c9`]	2016-03-17 18:51:47 -04:00
Harish Kasiviswanathan	718e3600b8	Add QPI or HT io_links Create QPI or HT links among all NUMA nodes. For now, assume all the NUMA nodes are interconnected with same Weight (=1). Change-Id: Id48ba95b9d75515a186f7dc5006b19bd92743ae3 [ROCm/ROCR-Runtime commit: `f1fbacca15`]	2016-03-15 21:10:53 -04:00
Harish Kasiviswanathan	14e60b6ab3	Get processor vendor from /proc/cpuinfo Change-Id: I9039385d268ef1693fab121cbf1caf442129a12e [ROCm/ROCR-Runtime commit: `ee1dd5d9c2`]	2016-03-15 15:37:52 -04:00
shaoyunl	0c6a45ca49	Add Imprecise flag for memory access fault KFD may not be able to provide the precise VM fault address and status. This flag will indicate whether the event data has the fault details Change-Id: I15ffd5c25f555003c6450cc0700efb769418f76b [ROCm/ROCR-Runtime commit: `79077811f5`]	2016-03-14 15:17:17 -04:00
Felix Kuehling	a31106ee4c	Report SVM heap in topology The Runtime requested this information so they can tell easily whether a pointer is part of HSA shared address space or not. Change-Id: If2041ed34031636677d692bc2dc6625634027ed4 [ROCm/ROCR-Runtime commit: `0ed29f5191`]	2016-03-14 11:52:36 -04:00
Harish Kasiviswanathan	dbe8c8faba	Fix indirect io_links Connect only (Peer-to-Peer) GPUs that belong to same NUMA node. Without this additional check non direct GPUs would also get connected. Change-Id: I9a5ed19b8f06cd0527854cbbdb51ede99eade28b [ROCm/ROCR-Runtime commit: `8ff2bcd48d`]	2016-03-11 18:54:32 -05:00
Felix Kuehling	68f1b37518	Fix lstopo Lstopo doesn't have system memory mappings at low addresses. Make sure we leave enough GPUVM address space for kernel allocations (currently only CWSR) before the start of the user-managed SVM aperture. Change-Id: Ic197f7bd5a3cfb150a0da2bfdbc848664e7869be [ROCm/ROCR-Runtime commit: `cac0c08496`]	2016-03-11 11:01:12 -05:00
Harish Kasiviswanathan	6409b00f8f	Add indirect io_links Connect (Peer-to-Peer) GPUs that belong to same NUMA node. Connect all [GPU] <--> [Non Parent NUMA] node Change-Id: Ib4b08a6545d28b7dce4c9b1a90378bfc51bed07e [ROCm/ROCR-Runtime commit: `7042292c60`]	2016-03-10 15:11:17 -05:00
Harish Kasiviswanathan	35e6692134	Allocate memory for indirect io_links To simplify, allocate maximum needed memory for node_t->link array. No need for realloc when indirect links are added. Trade off - for some nodes more memory than required will be allocated. This means the loop to compute the number of direct (reverse) io_links for a CPU node is not necessary. Change-Id: I2b2559142cbec3b262d0b4ea5fdebfd8f36c28fc [ROCm/ROCR-Runtime commit: `1e729510d2`]	2016-03-10 15:10:48 -05:00
Felix Kuehling	029002d073	Add support for hsaKmtRegisterGraphicsHandleToNodes Change-Id: I6fd7154dea78188480d5cb89ac237bad572356c4 [ROCm/ROCR-Runtime commit: `61ec3df2f9`]	2016-03-10 11:16:02 -05:00
Ben Goz	c32a504b59	Support MapMemoryToGPUNodes on APU Change-Id: Ie77a2eb23cd9fe6671ff9e0630977220218e55dd Signed-off-by: Ben Goz <ben.goz@amd.com> [ROCm/ROCR-Runtime commit: `b1393f8224`]	2016-03-09 21:31:52 -05:00
Felix Kuehling	f171fef754	Clean up GPUVM aperture management Non-canonical GPUVM aperture doesn't exist on dGPUs. Remove comments and code that say otherwise. Fix alignment of GPUVM aperture for gfx801. Requires the same workaround as gfx802. It's not used for anything on gfx801 yet, but will be soon. Change-Id: I88607fe7b340081cc0715b85f28fdbf5f1bb0ad7 [ROCm/ROCR-Runtime commit: `b837c3e7b0`]	2016-03-09 10:55:12 -05:00
Harish Kasiviswanathan	ac547f8cb2	Add reverse direct io_links The Kernel only creates one way direct link - GPU(PCI_BUS) --> [Parent NUMA Node] Create the reverse direct io_link here - [Parent NUMA Node] --> GPU(PCI_BUS) Change-Id: I829a1b1b7f34bda42871ede3472d60915e88418c [ROCm/ROCR-Runtime commit: `1d1c30db7c`]	2016-03-04 15:54:03 -05:00
Harish Kasiviswanathan	5fc05ab059	Add free_nodes() helper function Change-Id: I18ae0ac91b05275d7ad9d93175bae06870080844 [ROCm/ROCR-Runtime commit: `a80d2f2303`]	2016-03-03 18:33:59 -05:00
Ben Goz	8baf22651d	Align hsaKmtMapMemoryToGPUNodes according thunk spec Change-Id: I507ba5c6029ca5e7088c25930d46f5221679ace4 Signed-off-by: Ben Goz <ben.goz@amd.com> [ROCm/ROCR-Runtime commit: `e2fb4bc312`]	2016-03-02 16:12:03 +02:00
shaoyunl	8067849931	Export libKmtSetTrapHandler symbol as global Change-Id: I065dbecd05e992bc528128d893edaf636c1beff7 [ROCm/ROCR-Runtime commit: `fea5ab9114`]	2016-03-01 10:30:02 -05:00
Harish Kasiviswanathan	268045084d	Fix io_links sysfs directory name typo Change-Id: I4f6fb43c4a038b94c0f94f66ee383e83ad0ffa62 [ROCm/ROCR-Runtime commit: `bf03058112`]	2016-02-29 11:15:29 -05:00
Jay Cornwall	537f217f11	Fix race in dGPU event page setup events_page is unprotected from multiple allocation. The first event creation ioctl is unprotected from a race with args.event_page_offset being set (for page setup) and null (all subsequent invocations). Change-Id: I40ba712a17e9eff257785f90c553a74ad09c661d Signed-off-by: Yair Shachar <Yair.Shachar@amd.com> [ROCm/ROCR-Runtime commit: `3a662ac712`]	2016-02-28 07:14:23 -05:00
Felix Kuehling	d56931260a	Fix address space leak in __fmm_release Use the object size when freeing address space, instead of the parameter passed in by the caller. The parameter may be incorrect due to app or runtime bugs, or when the buffers is an AQL ring buffer with double mapping workaround. Change-Id: I00bb31d4520ef969a49d6d5ea723e8a33418acc3 [ROCm/ROCR-Runtime commit: `006f3ee41b`]	2016-02-26 09:19:21 -05:00
Felix Kuehling	8ae4e547bc	Use aligned size for looking up userptr object after allocation The alignment performed in vm_find_object_by_address isn't sufficient because it doesn't take into account the offset from the start of the page. This fixes a bug where certain unaligned userpointers and sizes fail to register correctly. Change-Id: I17872e264467a619f5e1bedb7e1ed3d994a856bf [ROCm/ROCR-Runtime commit: `8a0161d6bb`]	2016-02-25 19:47:05 -05:00
Ben Goz	18953cfa9a	Mapping public VRAM BO to cpu Change-Id: I2ff62ff0784f8ce556ad80739a177b90d866f1b4 Signed-off-by: Ben Goz <ben.goz@amd.com> [ROCm/ROCR-Runtime commit: `3f02a3cf0b`]	2016-02-24 17:30:15 +02:00
Felix Kuehling	6d856ebbff	Fix memory leaks due to stale CPU mappings Use the aligned size of the buffer objects for CPU unmapping in __fmm_release instead of relying on the unaligned size passed in by the caller. Change-Id: If986ec24e9a05d32981549fddbf143221fc40bac [ROCm/ROCR-Runtime commit: `7a383f9d88`]	2016-02-16 18:12:05 -05:00
Felix Kuehling	99325bf7c4	Add support for register/deregister memory for dGPU Allocate SVM address space for the registered memory and use new userptr support in KFD to create a system memory BO associated with the given user pointer. Map this BO at the SVM address for CPU access. MapMemoryToGPU can be used with the registered user pointer and will return the SVM address as alternate GPUVA. Change-Id: I4886e193c51fb6870a567878870c36bf8b5c3748 [ROCm/ROCR-Runtime commit: `85f9efb1a0`]	2016-02-16 18:12:05 -05:00
Ben Goz	89905c0cd7	Align gpu-id-array size to multiple of sizeof(uint32_t) Change-Id: I9f46b6a331a8d928ef570b420fb60b99b2edfdd1 Signed-off-by: Ben Goz <ben.goz@amd.com> [ROCm/ROCR-Runtime commit: `00386734b1`]	2016-02-16 11:27:06 -05:00
Harish Kasiviswanathan	e8327090b9	gfx803: Add performance counter information Change-Id: Id81b43e90029306f03c84752cef06dc336e3a4a9 [ROCm/ROCR-Runtime commit: `04b92b8e05`]	2016-02-12 16:39:39 -05:00
Harish Kasiviswanathan	f4f0ffc8cb	Adding missing performance counters for gfx801 Few more counters are now available in GFX8 register specs. So adding them. Also for gfx700 and gfx801 report correct number of SQ perf counter slots Change-Id: I9e6b4b10238230aabeccbfaa5e491a28b5e54f2d [ROCm/ROCR-Runtime commit: `1a0f915957`]	2016-02-12 16:37:21 -05:00
Ben Goz	0dc374e1a4	Fix double free issue and pointer alignment Change-Id: Id5bab454d53d404883a92282168b3f6cbc468cbb Signed-off-by: Ben Goz <ben.goz@amd.com> [ROCm/ROCR-Runtime commit: `b37f99a01e`]	2016-02-12 11:21:32 -05:00
Felix Kuehling	03720306b9	Make hsaKmtAllocMemory more compliant with the Thunk spec Allocations from GPU nodes will return VRAM, not system memory. Only non-paged allocation from GPU nodes is supported. System memory can only be allocated from CPU nodes (usually node 0). The HostAccess flag is no longer used to distinguish the memory type. It only indicates, whether the memory is mapped for CPU access. Maintain compatibility with broken KfdTests by returning system memory for paged-memory requested from GPU nodes. Change-Id: I514defede735f55e6de436f41944125b6f2c4ccf [ROCm/ROCR-Runtime commit: `887b32fe86`]	2016-02-10 10:29:54 -05:00
Yair Shachar	8359dc3119	Disable scratch Host allocation - via debug registration flags. Change-Id: Ia6e5f86ec3979c4a49800f7af4509442a4e5be27 Signed-off-by: Yair Shachar <Yair.Shachar@amd.com> [ROCm/ROCR-Runtime commit: `a815a4337f`]	2016-02-10 07:52:32 -05:00
Ben Goz	18aab410cc	Adding support to hsaKmtMapMemoryToGPUNodes Change-Id: Iab6222402a43c3cd31b0efc5a316a6482986258e Signed-off-by: Ben Goz <ben.goz@amd.com> [ROCm/ROCR-Runtime commit: `7070f7ec5e`]	2016-02-09 17:34:29 +02:00
shaoyunl	60bbf00fb1	libhsaKmt: Add CWSR support on dGPU This is thunk part of the CWSR support. 1. SDMA queue don't support CWSR , no necessary to allocate the context save/restore memory 2. Allocate the context save/restore memory in local frame buffer for dGPU Change-Id: Ie83506f0cced2a5a537c49d68125796d831c2764 [ROCm/ROCR-Runtime commit: `4e6c25e55b`]	2016-02-04 15:00:58 -05:00
shaoyunl	4c5a3ca774	libhsakmt: Use GPU ID instead of Node ID in set_process_dgpu_aperture Change-Id: I0e66ca4a018c15c009a3516d250f0044a4407878 [ROCm/ROCR-Runtime commit: `7e40877e81`]	2016-02-04 10:32:23 -05:00
Felix Kuehling	61039bcd36	Remove gfx802 page size workaround on gfx803 All tonga page size alignment is done in the memory management functions in fmm.c. All other code only specifies the minimum alignment it needs and lets fmm.c handle the HW-specific alignment. Clean up aligned-exec memory allocation in queue.c to remove hard-coded TONGA_PAGE_SIZE alignments and remove code duplication. Make sure alignments are consistent between allocate and free. Change-Id: Ia8923448173d1cef315af24cebff12adef385cb0 [ROCm/ROCR-Runtime commit: `cc9fc386bd`]	2016-01-28 16:05:18 -05:00
David Ogbeide	8fce9f7026	libhsakmt: Add marketing names for GPU nodes HSA thunk API returns null when querying for GPU node marketing names due to empty system topology file. - Add marketing names to device GFX IP data structs. - Modify name retrieval to pull from data structs instead of file. Signed-off by: David Ogbeide <davidboyowa.ogbeide@amd.com> Change-Id: I30ea04111be7e0df2e93894f801fbeb414ffa790 [ROCm/ROCR-Runtime commit: `4e4a881940`]	2016-01-25 11:03:54 -05:00
Felix Kuehling	db5b6fd35a	Link libhsakmt with -z nodelete This prevents the library from being unloaded at runtime, even when dlclose is called. This preserves global variables, such as state about the SVM address space and avoids catastrophic leaks on dlclose. Change-Id: I34f1d19a450835200e9d4815458e8d1b3045053c [ROCm/ROCR-Runtime commit: `cc7491ec71`]	2016-01-22 18:08:19 -05:00
Amber Lin	07500db1df	Revert "Free resources when dlclose is called" This reverts commit `4dd9dbb128`. Conflicts: src/fmm.c src/perfctr.c Change-Id: Ib6113c2dd3962c72100c7f74cdef6897e1df40b3 [ROCm/ROCR-Runtime commit: `7416805a44`]	2016-01-22 17:58:33 -05:00
Harish Kasiviswanathan	b687eaf2c2	Don't limit number of supported HSA Nodes Remove #define MAX_NODES 8 Change-Id: I756cadc652543dd17ea48a1c956adc08c3d2631a [ROCm/ROCR-Runtime commit: `5e53205b9e`]	2016-01-15 17:27:43 -05:00
Harish Kasiviswanathan	14358ee07f	Don't limit number of supported GPUs Stop using NUM_OF_SUPPORTED_GPUS. For now the definitions itself cannot be removed as ioctl code is in upstream Kernel. Change-Id: If846625a8ad5062d5483e762850c793d3c00b9d0 [ROCm/ROCR-Runtime commit: `ce83dc623f`]	2016-01-15 11:44:42 -05:00
Harish Kasiviswanathan	add443f1ef	Use new ioctl for getting process apertures Change-Id: I73678744ad73942edec442ad9c6d38637f7e1235 [ROCm/ROCR-Runtime commit: `e7e1361c3d`]	2016-01-12 12:09:25 -05:00
Felix Kuehling	c89d3124d9	Implement hsaKmtRegisterMemoryToNodes Fix hsaKmtRegisterMemory to be a no-op for now and move the multi-GPU implementation to hsaKmtRegisterMemoryToNodes. Make GPU memory mappings of host memory visible to all GPUs by default. Device memory is still visible to the allocating GPU only by default (but can be overridden with hsaKmtRegisterMemoryToNodes for experimenting with P2P). Change-Id: I73408afbe3b10c8dad2ab3a780f58413249692e6 [ROCm/ROCR-Runtime commit: `063ad3ad9e`]	2016-01-08 16:00:23 -05:00
Ben Goz	2fa7eef572	Adding support for mGPU Change-Id: I5ed184e6a58b38d9dde48867f14513d161cf41a9 Signed-off-by: Ben Goz <ben.goz@amd.com> [ROCm/ROCR-Runtime commit: `ea0f9d2a0b`]	2016-01-04 15:35:15 +02:00
Ben Goz	d874bcd8b3	Fix AQL Double buffer allocation mode Change-Id: I5162ffd89416d317fd0ca0fc51da523298488922 Signed-off-by: Ben Goz <ben.goz@amd.com> [ROCm/ROCR-Runtime commit: `53b208adf2`]	2016-01-04 15:34:53 +02:00
Yair Shachar	63f646d050	Add support for scratch GPUVM on host memory This is required when we have a debug session Change-Id: If9d6d2d23a9016b6ca9562e02a91fc16e0354ee4 Signed-off-by: Yair Shachar <Yair.Shachar@amd.com> [ROCm/ROCR-Runtime commit: `681f4dcecc`]	2015-12-20 15:50:50 +02:00

1 2 3

142 Commits