This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
#include "../include/hip/hcc_detail/program_state.hpp"
|
|
|
|
|
|
|
|
|
|
#include "../include/hip/hcc_detail/code_object_bundle.hpp"
|
|
|
|
|
|
|
|
|
|
#include "hip_hcc_internal.h"
|
2017-12-03 23:09:06 +00:00
|
|
|
#include "hsa_helpers.hpp"
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
#include "trace_helper.h"
|
|
|
|
|
|
|
|
|
|
#include "elfio/elfio.hpp"
|
|
|
|
|
|
|
|
|
|
#include <link.h>
|
|
|
|
|
|
|
|
|
|
#include <hsa/hsa.h>
|
|
|
|
|
#include <hsa/hsa_ext_amd.h>
|
|
|
|
|
|
|
|
|
|
#include <cassert>
|
|
|
|
|
#include <cstddef>
|
|
|
|
|
#include <cstdint>
|
|
|
|
|
#include <memory>
|
|
|
|
|
#include <mutex>
|
2017-11-20 22:41:46 +00:00
|
|
|
#include <sstream>
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
#include <stdexcept>
|
|
|
|
|
#include <string>
|
|
|
|
|
#include <unordered_map>
|
2017-11-20 22:41:46 +00:00
|
|
|
#include <unordered_set>
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
#include <utility>
|
|
|
|
|
#include <vector>
|
|
|
|
|
|
|
|
|
|
using namespace ELFIO;
|
|
|
|
|
using namespace hip_impl;
|
|
|
|
|
using namespace std;
|
|
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
namespace {
|
|
|
|
|
struct Symbol {
|
|
|
|
|
string name;
|
|
|
|
|
ELFIO::Elf64_Addr value = 0;
|
|
|
|
|
Elf_Xword size = 0;
|
|
|
|
|
Elf_Half sect_idx = 0;
|
|
|
|
|
uint8_t bind = 0;
|
|
|
|
|
uint8_t type = 0;
|
|
|
|
|
uint8_t other = 0;
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
inline Symbol read_symbol(const symbol_section_accessor& section, unsigned int idx) {
|
|
|
|
|
assert(idx < section.get_symbols_num());
|
|
|
|
|
|
|
|
|
|
Symbol r;
|
|
|
|
|
section.get_symbol(idx, r.name, r.value, r.size, r.bind, r.type, r.sect_idx, r.other);
|
|
|
|
|
|
|
|
|
|
return r;
|
|
|
|
|
}
|
2017-11-21 13:15:13 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
template <typename P>
|
|
|
|
|
inline section* find_section_if(elfio& reader, P p) {
|
|
|
|
|
const auto it = find_if(reader.sections.begin(), reader.sections.end(), move(p));
|
2017-11-21 13:15:13 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
return it != reader.sections.end() ? *it : nullptr;
|
|
|
|
|
}
|
2017-11-21 13:15:13 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
vector<string> copy_names_of_undefined_symbols(const symbol_section_accessor& section) {
|
|
|
|
|
vector<string> r;
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
for (auto i = 0u; i != section.get_symbols_num(); ++i) {
|
|
|
|
|
// TODO: this is boyscout code, caching the temporaries
|
|
|
|
|
// may be of worth.
|
2017-11-21 13:15:13 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
auto tmp = read_symbol(section, i);
|
|
|
|
|
if (tmp.sect_idx == SHN_UNDEF && !tmp.name.empty()) {
|
|
|
|
|
r.push_back(std::move(tmp.name));
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
return r;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
const std::unordered_map<std::string, std::pair<ELFIO::Elf64_Addr, ELFIO::Elf_Xword>>&
|
2018-05-18 10:14:46 -05:00
|
|
|
symbol_addresses(bool rebuild = false) {
|
2018-03-12 11:29:03 +05:30
|
|
|
static unordered_map<string, pair<Elf64_Addr, Elf_Xword>> r;
|
|
|
|
|
static once_flag f;
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
|
2018-05-18 10:14:46 -05:00
|
|
|
auto cons = [rebuild]() {
|
|
|
|
|
if (rebuild) {
|
|
|
|
|
r.clear();
|
|
|
|
|
}
|
|
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
dl_iterate_phdr(
|
|
|
|
|
[](dl_phdr_info* info, size_t, void*) {
|
2017-11-21 13:15:13 +00:00
|
|
|
static constexpr const char self[] = "/proc/self/exe";
|
|
|
|
|
elfio reader;
|
|
|
|
|
|
|
|
|
|
static unsigned int iter = 0u;
|
|
|
|
|
if (reader.load(!iter ? self : info->dlpi_name)) {
|
|
|
|
|
auto it = find_section_if(
|
2018-03-12 11:29:03 +05:30
|
|
|
reader, [](const class section* x) { return x->get_type() == SHT_SYMTAB; });
|
2017-11-21 13:15:13 +00:00
|
|
|
|
|
|
|
|
if (it) {
|
|
|
|
|
const symbol_section_accessor symtab{reader, it};
|
|
|
|
|
|
|
|
|
|
for (auto i = 0u; i != symtab.get_symbols_num(); ++i) {
|
|
|
|
|
auto tmp = read_symbol(symtab, i);
|
|
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
if (tmp.type == STT_OBJECT && tmp.sect_idx != SHN_UNDEF) {
|
|
|
|
|
const auto addr = tmp.value + (iter ? info->dlpi_addr : 0);
|
|
|
|
|
r.emplace(move(tmp.name), make_pair(addr, tmp.size));
|
2017-11-21 13:15:13 +00:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
++iter;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return 0;
|
2018-03-12 11:29:03 +05:30
|
|
|
},
|
|
|
|
|
nullptr);
|
2018-05-18 10:14:46 -05:00
|
|
|
};
|
|
|
|
|
|
|
|
|
|
call_once(f, cons);
|
|
|
|
|
if (rebuild) {
|
|
|
|
|
cons();
|
|
|
|
|
}
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
return r;
|
|
|
|
|
}
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
void associate_code_object_symbols_with_host_allocation(const elfio& reader,
|
|
|
|
|
section* code_object_dynsym,
|
|
|
|
|
hsa_agent_t agent,
|
|
|
|
|
hsa_executable_t executable) {
|
|
|
|
|
if (!code_object_dynsym) return;
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
const auto undefined_symbols =
|
|
|
|
|
copy_names_of_undefined_symbols(symbol_section_accessor{reader, code_object_dynsym});
|
2017-11-21 13:15:13 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
for (auto&& x : undefined_symbols) {
|
|
|
|
|
if (globals().find(x) != globals().cend()) return;
|
2017-11-21 13:15:13 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
const auto it1 = symbol_addresses().find(x);
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
if (it1 == symbol_addresses().cend()) {
|
|
|
|
|
throw runtime_error{"Global symbol: " + x + " is undefined."};
|
|
|
|
|
}
|
2017-11-20 22:41:46 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
static mutex mtx;
|
|
|
|
|
lock_guard<mutex> lck{mtx};
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
if (globals().find(x) != globals().cend()) return;
|
|
|
|
|
globals().emplace(x, (void*)(it1->second.first));
|
|
|
|
|
void* p = nullptr;
|
|
|
|
|
hsa_amd_memory_lock(reinterpret_cast<void*>(it1->second.first), it1->second.second,
|
|
|
|
|
nullptr, // All agents.
|
|
|
|
|
0, &p);
|
2017-11-21 13:15:13 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
hsa_executable_agent_global_variable_define(executable, agent, x.c_str(), p);
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
}
|
2018-03-12 11:29:03 +05:30
|
|
|
}
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
vector<char> code_object_blob_for_process() {
|
|
|
|
|
static constexpr const char self[] = "/proc/self/exe";
|
|
|
|
|
static constexpr const char kernel_section[] = ".kernel";
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
elfio reader;
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
if (!reader.load(self)) {
|
|
|
|
|
throw runtime_error{"Failed to load ELF file for current process."};
|
|
|
|
|
}
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
auto kernels =
|
|
|
|
|
find_section_if(reader, [](const section* x) { return x->get_name() == kernel_section; });
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
vector<char> r;
|
|
|
|
|
if (kernels) {
|
|
|
|
|
r.insert(r.end(), kernels->get_data(), kernels->get_data() + kernels->get_size());
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
}
|
|
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
return r;
|
|
|
|
|
}
|
|
|
|
|
|
2018-05-18 10:14:46 -05:00
|
|
|
const unordered_map<hsa_isa_t, vector<vector<char>>>& code_object_blobs(bool rebuild = false) {
|
2018-03-12 11:29:03 +05:30
|
|
|
static unordered_map<hsa_isa_t, vector<vector<char>>> r;
|
|
|
|
|
static once_flag f;
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
|
2018-05-18 10:14:46 -05:00
|
|
|
auto cons = [rebuild]() {
|
2018-06-15 16:45:03 -05:00
|
|
|
// names of shared libraries who .kernel sections already loaded
|
|
|
|
|
static unordered_set<string> lib_names;
|
2018-03-12 11:29:03 +05:30
|
|
|
static vector<vector<char>> blobs{code_object_blob_for_process()};
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
|
2018-05-18 10:14:46 -05:00
|
|
|
if (rebuild) {
|
2018-06-15 16:45:03 -05:00
|
|
|
r.clear();
|
2018-05-18 10:14:46 -05:00
|
|
|
blobs.clear();
|
|
|
|
|
}
|
|
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
dl_iterate_phdr(
|
|
|
|
|
[](dl_phdr_info* info, std::size_t, void*) {
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
elfio tmp;
|
2018-06-15 16:45:03 -05:00
|
|
|
if ((lib_names.find(info->dlpi_name) == lib_names.end()) &&
|
|
|
|
|
(tmp.load(info->dlpi_name))) {
|
2018-03-12 11:29:03 +05:30
|
|
|
const auto it = find_section_if(
|
|
|
|
|
tmp, [](const section* x) { return x->get_name() == ".kernel"; });
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
|
2018-06-15 16:45:03 -05:00
|
|
|
if (it) {
|
|
|
|
|
blobs.emplace_back(
|
|
|
|
|
it->get_data(), it->get_data() + it->get_size());
|
|
|
|
|
// register the shared library as already loaded
|
|
|
|
|
lib_names.emplace(info->dlpi_name);
|
|
|
|
|
}
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
}
|
|
|
|
|
return 0;
|
2018-03-12 11:29:03 +05:30
|
|
|
},
|
|
|
|
|
nullptr);
|
|
|
|
|
|
|
|
|
|
for (auto&& blob : blobs) {
|
|
|
|
|
Bundled_code_header tmp{blob};
|
|
|
|
|
if (valid(tmp)) {
|
|
|
|
|
for (auto&& bundle : bundles(tmp)) {
|
|
|
|
|
r[triple_to_hsa_isa(bundle.triple)].push_back(bundle.blob);
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
}
|
|
|
|
|
}
|
2018-03-12 11:29:03 +05:30
|
|
|
}
|
2018-05-18 10:14:46 -05:00
|
|
|
};
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
call_once(f, cons);
|
|
|
|
|
if (rebuild) {
|
|
|
|
|
cons();
|
|
|
|
|
}
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
return r;
|
|
|
|
|
}
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
vector<pair<uintptr_t, string>> function_names_for(const elfio& reader, section* symtab) {
|
|
|
|
|
vector<pair<uintptr_t, string>> r;
|
|
|
|
|
symbol_section_accessor symbols{reader, symtab};
|
2017-11-01 22:33:13 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
for (auto i = 0u; i != symbols.get_symbols_num(); ++i) {
|
|
|
|
|
// TODO: this is boyscout code, caching the temporaries
|
|
|
|
|
// may be of worth.
|
|
|
|
|
auto tmp = read_symbol(symbols, i);
|
|
|
|
|
|
|
|
|
|
if (tmp.type == STT_FUNC && tmp.sect_idx != SHN_UNDEF && !tmp.name.empty()) {
|
|
|
|
|
r.emplace_back(tmp.value, tmp.name);
|
|
|
|
|
}
|
2017-11-01 22:33:13 +00:00
|
|
|
}
|
|
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
return r;
|
|
|
|
|
}
|
2017-11-01 22:33:13 +00:00
|
|
|
|
2018-05-18 10:14:46 -05:00
|
|
|
const vector<pair<uintptr_t, string>>& function_names_for_process(bool rebuild = false) {
|
2018-03-12 11:29:03 +05:30
|
|
|
static constexpr const char self[] = "/proc/self/exe";
|
2017-11-01 22:33:13 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
static vector<pair<uintptr_t, string>> r;
|
|
|
|
|
static once_flag f;
|
2017-11-01 22:33:13 +00:00
|
|
|
|
2018-05-18 10:14:46 -05:00
|
|
|
auto cons = [rebuild]() {
|
2018-03-12 11:29:03 +05:30
|
|
|
elfio reader;
|
2017-11-01 22:33:13 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
if (!reader.load(self)) {
|
|
|
|
|
throw runtime_error{"Failed to load the ELF file for the current process."};
|
|
|
|
|
}
|
2017-11-01 22:33:13 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
auto symtab =
|
|
|
|
|
find_section_if(reader, [](const section* x) { return x->get_type() == SHT_SYMTAB; });
|
2017-11-01 22:33:13 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
if (symtab) r = function_names_for(reader, symtab);
|
2018-05-18 10:14:46 -05:00
|
|
|
};
|
|
|
|
|
|
|
|
|
|
call_once(f, cons);
|
|
|
|
|
if (rebuild) {
|
|
|
|
|
cons();
|
|
|
|
|
}
|
2017-11-01 22:33:13 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
return r;
|
|
|
|
|
}
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
|
2018-05-18 10:14:46 -05:00
|
|
|
const unordered_map<string, vector<hsa_executable_symbol_t>>& kernels(bool rebuild = false) {
|
2018-03-12 11:29:03 +05:30
|
|
|
static unordered_map<string, vector<hsa_executable_symbol_t>> r;
|
|
|
|
|
static once_flag f;
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
|
2018-05-18 10:14:46 -05:00
|
|
|
auto cons = [rebuild]() {
|
|
|
|
|
if (rebuild) {
|
|
|
|
|
r.clear();
|
|
|
|
|
executables(rebuild);
|
|
|
|
|
}
|
|
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
static const auto copy_kernels = [](hsa_executable_t, hsa_agent_t,
|
|
|
|
|
hsa_executable_symbol_t s, void*) {
|
|
|
|
|
if (type(s) == HSA_SYMBOL_KIND_KERNEL) r[name(s)].push_back(s);
|
|
|
|
|
|
|
|
|
|
return HSA_STATUS_SUCCESS;
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
};
|
|
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
for (auto&& agent_executables : executables()) {
|
|
|
|
|
for (auto&& executable : agent_executables.second) {
|
|
|
|
|
hsa_executable_iterate_agent_symbols(executable, agent_executables.first,
|
|
|
|
|
copy_kernels, nullptr);
|
|
|
|
|
}
|
|
|
|
|
}
|
2018-05-18 10:14:46 -05:00
|
|
|
};
|
|
|
|
|
|
|
|
|
|
call_once(f, cons);
|
|
|
|
|
if (rebuild) {
|
|
|
|
|
cons();
|
|
|
|
|
}
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
return r;
|
|
|
|
|
}
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
void load_code_object_and_freeze_executable(
|
|
|
|
|
const string& file, hsa_agent_t agent,
|
|
|
|
|
hsa_executable_t
|
|
|
|
|
executable) { // TODO: the following sequence is inefficient, should be refactored
|
|
|
|
|
// into a single load of the file and subsequent ELFIO
|
|
|
|
|
// processing.
|
|
|
|
|
static const auto cor_deleter = [](hsa_code_object_reader_t* p) {
|
|
|
|
|
if (p) {
|
|
|
|
|
hsa_code_object_reader_destroy(*p);
|
|
|
|
|
delete p;
|
|
|
|
|
}
|
|
|
|
|
};
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
using RAII_code_reader = unique_ptr<hsa_code_object_reader_t, decltype(cor_deleter)>;
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
if (!file.empty()) {
|
|
|
|
|
RAII_code_reader tmp{new hsa_code_object_reader_t, cor_deleter};
|
|
|
|
|
hsa_code_object_reader_create_from_memory(file.data(), file.size(), tmp.get());
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
hsa_executable_load_agent_code_object(executable, agent, *tmp, nullptr, nullptr);
|
|
|
|
|
|
|
|
|
|
hsa_executable_freeze(executable, nullptr);
|
|
|
|
|
|
|
|
|
|
static vector<RAII_code_reader> code_readers;
|
|
|
|
|
static mutex mtx;
|
|
|
|
|
|
|
|
|
|
lock_guard<mutex> lck{mtx};
|
|
|
|
|
code_readers.push_back(move(tmp));
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
}
|
|
|
|
|
}
|
2018-03-12 11:29:03 +05:30
|
|
|
} // namespace
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
namespace hip_impl {
|
|
|
|
|
const unordered_map<hsa_agent_t, vector<hsa_executable_t>>&
|
2018-05-18 10:14:46 -05:00
|
|
|
executables(bool rebuild) { // TODO: This leaks the hsa_executable_ts, it should use RAII.
|
2018-03-12 11:29:03 +05:30
|
|
|
static unordered_map<hsa_agent_t, vector<hsa_executable_t>> r;
|
|
|
|
|
static once_flag f;
|
2017-11-21 02:40:34 +00:00
|
|
|
|
2018-05-18 10:14:46 -05:00
|
|
|
auto cons = [rebuild]() {
|
2018-03-12 11:29:03 +05:30
|
|
|
static const auto accelerators = hc::accelerator::get_all();
|
2017-11-21 02:40:34 +00:00
|
|
|
|
2018-05-18 10:14:46 -05:00
|
|
|
if (rebuild) {
|
2018-06-15 16:45:03 -05:00
|
|
|
// do NOT clear r so we reuse instances of hsa_executable_t
|
|
|
|
|
// created previously
|
2018-05-18 10:14:46 -05:00
|
|
|
code_object_blobs(rebuild);
|
|
|
|
|
}
|
|
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
for (auto&& acc : accelerators) {
|
|
|
|
|
auto agent = static_cast<hsa_agent_t*>(acc.get_hsa_agent());
|
2017-11-21 02:40:34 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
if (!agent || !acc.is_hsa_accelerator()) continue;
|
2017-11-21 02:40:34 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
hsa_agent_iterate_isas(*agent,
|
|
|
|
|
[](hsa_isa_t x, void* pa) {
|
|
|
|
|
const auto it = code_object_blobs().find(x);
|
2017-11-21 02:40:34 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
if (it != code_object_blobs().cend()) {
|
|
|
|
|
hsa_agent_t a = *static_cast<hsa_agent_t*>(pa);
|
2017-11-21 02:40:34 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
for (auto&& blob : it->second) {
|
|
|
|
|
hsa_executable_t tmp = {};
|
2017-11-21 02:40:34 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
hsa_executable_create_alt(
|
|
|
|
|
HSA_PROFILE_FULL,
|
|
|
|
|
HSA_DEFAULT_FLOAT_ROUNDING_MODE_DEFAULT, nullptr,
|
|
|
|
|
&tmp);
|
2017-11-21 02:40:34 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
// TODO: this is massively inefficient and only
|
|
|
|
|
// meant for illustration.
|
|
|
|
|
string blob_to_str{blob.cbegin(), blob.cend()};
|
|
|
|
|
tmp = load_executable(blob_to_str, tmp, a);
|
2017-11-21 02:40:34 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
if (tmp.handle) r[a].push_back(tmp);
|
|
|
|
|
}
|
|
|
|
|
}
|
2017-11-21 02:40:34 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
return HSA_STATUS_SUCCESS;
|
|
|
|
|
},
|
|
|
|
|
agent);
|
|
|
|
|
}
|
2018-05-18 10:14:46 -05:00
|
|
|
};
|
|
|
|
|
|
|
|
|
|
call_once(f, cons);
|
|
|
|
|
if (rebuild) {
|
|
|
|
|
cons();
|
|
|
|
|
}
|
2017-11-21 02:40:34 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
return r;
|
|
|
|
|
}
|
2017-11-21 02:40:34 +00:00
|
|
|
|
2018-05-18 10:14:46 -05:00
|
|
|
const unordered_map<uintptr_t, string>& function_names(bool rebuild) {
|
2018-03-12 11:29:03 +05:30
|
|
|
static unordered_map<uintptr_t, string> r{function_names_for_process().cbegin(),
|
|
|
|
|
function_names_for_process().cend()};
|
|
|
|
|
static once_flag f;
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
|
2018-05-18 10:14:46 -05:00
|
|
|
auto cons = [rebuild]() {
|
|
|
|
|
if (rebuild) {
|
|
|
|
|
r.clear();
|
|
|
|
|
function_names_for_process(rebuild);
|
|
|
|
|
r.insert(function_names_for_process().cbegin(),
|
|
|
|
|
function_names_for_process().cend());
|
|
|
|
|
}
|
|
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
dl_iterate_phdr(
|
|
|
|
|
[](dl_phdr_info* info, size_t, void*) {
|
2017-11-01 22:33:13 +00:00
|
|
|
elfio tmp;
|
|
|
|
|
if (tmp.load(info->dlpi_name)) {
|
2018-03-12 11:29:03 +05:30
|
|
|
const auto it = find_section_if(
|
|
|
|
|
tmp, [](const section* x) { return x->get_type() == SHT_SYMTAB; });
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
|
2017-11-01 22:33:13 +00:00
|
|
|
if (it) {
|
|
|
|
|
auto n = function_names_for(tmp, it);
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
|
2017-11-01 22:33:13 +00:00
|
|
|
for (auto&& f : n) f.first += info->dlpi_addr;
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
r.insert(make_move_iterator(n.begin()), make_move_iterator(n.end()));
|
2017-11-01 22:33:13 +00:00
|
|
|
}
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
}
|
2017-11-01 22:33:13 +00:00
|
|
|
|
|
|
|
|
return 0;
|
2018-03-12 11:29:03 +05:30
|
|
|
},
|
|
|
|
|
nullptr);
|
2018-05-18 10:14:46 -05:00
|
|
|
};
|
|
|
|
|
|
|
|
|
|
call_once(f, cons);
|
|
|
|
|
if (rebuild) {
|
|
|
|
|
static mutex mtx;
|
|
|
|
|
lock_guard<mutex> lck{mtx};
|
|
|
|
|
cons();
|
|
|
|
|
}
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
return r;
|
|
|
|
|
}
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
|
2018-05-18 10:14:46 -05:00
|
|
|
const unordered_map<uintptr_t, vector<pair<hsa_agent_t, Kernel_descriptor>>>& functions(bool rebuild) {
|
2018-03-12 11:29:03 +05:30
|
|
|
static unordered_map<uintptr_t, vector<pair<hsa_agent_t, Kernel_descriptor>>> r;
|
|
|
|
|
static once_flag f;
|
|
|
|
|
|
2018-05-18 10:14:46 -05:00
|
|
|
auto cons = [rebuild]() {
|
|
|
|
|
if (rebuild) {
|
2018-08-03 17:02:50 -05:00
|
|
|
r.clear();
|
2018-05-18 10:14:46 -05:00
|
|
|
function_names(rebuild);
|
|
|
|
|
kernels(rebuild);
|
|
|
|
|
globals(rebuild);
|
|
|
|
|
}
|
|
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
for (auto&& function : function_names()) {
|
|
|
|
|
const auto it = kernels().find(function.second);
|
|
|
|
|
|
|
|
|
|
if (it != kernels().cend()) {
|
|
|
|
|
for (auto&& kernel_symbol : it->second) {
|
|
|
|
|
r[function.first].emplace_back(
|
|
|
|
|
agent(kernel_symbol),
|
2018-05-11 03:35:10 +01:00
|
|
|
Kernel_descriptor{kernel_object(kernel_symbol), it->first});
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
}
|
|
|
|
|
}
|
2018-03-12 11:29:03 +05:30
|
|
|
}
|
2018-05-18 10:14:46 -05:00
|
|
|
};
|
|
|
|
|
|
|
|
|
|
call_once(f, cons);
|
|
|
|
|
if (rebuild) {
|
|
|
|
|
static mutex mtx;
|
|
|
|
|
lock_guard<mutex> lck{mtx};
|
|
|
|
|
cons();
|
|
|
|
|
}
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
return r;
|
|
|
|
|
}
|
2017-11-28 19:15:29 +00:00
|
|
|
|
2018-05-18 10:14:46 -05:00
|
|
|
unordered_map<string, void*>& globals(bool rebuild) {
|
2018-03-12 11:29:03 +05:30
|
|
|
static unordered_map<string, void*> r;
|
|
|
|
|
static once_flag f;
|
2018-05-18 10:14:46 -05:00
|
|
|
auto cons =[rebuild]() {
|
|
|
|
|
if (rebuild) {
|
|
|
|
|
r.clear();
|
|
|
|
|
symbol_addresses(rebuild);
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
r.reserve(symbol_addresses().size());
|
|
|
|
|
};
|
|
|
|
|
|
|
|
|
|
call_once(f, cons);
|
|
|
|
|
if (rebuild) {
|
|
|
|
|
cons();
|
|
|
|
|
}
|
2017-11-28 19:15:29 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
return r;
|
|
|
|
|
}
|
2017-12-03 23:09:06 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
hsa_executable_t load_executable(const string& file, hsa_executable_t executable,
|
|
|
|
|
hsa_agent_t agent) {
|
|
|
|
|
elfio reader;
|
|
|
|
|
stringstream tmp{file};
|
2017-12-03 23:09:06 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
if (!reader.load(tmp)) return hsa_executable_t{};
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
const auto code_object_dynsym = find_section_if(
|
|
|
|
|
reader, [](const ELFIO::section* x) { return x->get_type() == SHT_DYNSYM; });
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
associate_code_object_symbols_with_host_allocation(reader, code_object_dynsym, agent,
|
|
|
|
|
executable);
|
This switches HIP from its currently convoluted macro + pfe based dispatch mechanism to a more natural one partially based on the existing module API. The basic idea is that HCC will always correctly emit __global__ functions: as empty-bodied stubs, on host, and as kernels, on device. It then becomes trivial to obtain the mangled name on host, at dispatch, from the function's address, and then to use the mangled name to retrieve the kernel. This should address all problems stemming from serialisation, dubious mismatches due to the manufactured functor, macro-isms et al. It also immediately enables support for generalised globals as a consequence of that being available in the module API. Finally, it will make debug much easier, since the actual names of the __global__ functions will automatically be used in traces etc. One detail is that due to how dispatch works now (hipLaunchKernel and hipLaunchKernelGGL are themselves variadic function templates which deduce the function type of the callee), in certain cases it may be necesssary to insert explicit casts to ensure that the variadic argument list selects a viable overload - this can be observed in some unit tests. Eventually we may be able to remove this limitation, but for now it does not appear terribly onerous. The code is not extremely HIPpie, nor is it fully optimised, but rather is intended as a starting point for the HIP team to make its own.
2017-11-01 15:09:59 +00:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
load_code_object_and_freeze_executable(file, agent, executable);
|
2018-02-18 14:19:21 -05:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
return executable;
|
|
|
|
|
}
|
2018-02-18 14:19:21 -05:00
|
|
|
|
2018-03-12 11:29:03 +05:30
|
|
|
// To force HIP to load the kernels and to setup the function
|
|
|
|
|
// symbol map on program startup
|
|
|
|
|
class startup_kernel_loader {
|
|
|
|
|
private:
|
|
|
|
|
startup_kernel_loader() { functions(); }
|
|
|
|
|
startup_kernel_loader(const startup_kernel_loader&) = delete;
|
|
|
|
|
startup_kernel_loader& operator=(const startup_kernel_loader&) = delete;
|
|
|
|
|
static startup_kernel_loader skl;
|
|
|
|
|
};
|
|
|
|
|
startup_kernel_loader startup_kernel_loader::skl;
|
|
|
|
|
|
|
|
|
|
} // Namespace hip_impl.
|