62f3f250ce
Optimizations include: - Greedy gang by placing gang leaders on first D2D sdma blit context to avoid dead locking with other gang leaders and items. Note that this is fine since we can't avoid an oversubscription problem when there is only 1 xGMI link anyways, so treat all xGMI links as a single pipe for ganging. - Non-leader gang items don't have to poll on dependency signals so this opens up more non-blocking SDMA channels. - unlock gang lock when gangs are not needed. - Change gang factor lookup from vector pair to map and register all gpus in gang factor lookup regardless of link type so that we can take advantage of the O(logN) direct key/value lookup time. Fixes include: - HSA_PAGE_SIZE_4KB was an incorrect macro to use for gang size limit. As a result, small copies ended up ganging and hitting latency limit. Use hardcoded 4096 bytes instead. - Cap auxillary gang factor to the number of non-XGMI SDMA engines. Change-Id: Ic23fde131502906a807134a04599aa6d012e8cbb