Files
rocm-systems/source/lib/rocprofiler/agent.cpp
T
Benjamin Welton e8a5845661 Buffered Counter Collection API (#179)
* Added buffer counter collection API.

Initial testing added into counter-collection sample.

Added support for constant metrics in counter collection (#194)

* Added support for constant metrics in counter collection

Adds support and test cases for constant metrics (such as max wave size)
and adds the metric kernel duration (though this is still not yet
calculated).

* Minor doc updates

* Simple counter unit tests (#199)

* Simple counter unit tests

Unit tests and some minor fixes for simple and derived counter evaluation

* Added unit tests for reduction operations (#200)

* Added unit tests for reduction operations

* added tests for combo (constant+regular) counters (#201)

source formatting (clang-format v11) (#202)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

source formatting (clang-format v11) (#203)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

Local changes

source formatting (clang-format v11) (#205)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

Minor doc fix

Remove kernel_duration, migrate over set_dimensions to after HSA init

source formatting (clang-format v11) (#207)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

Added output to ROCPROFILER_SAMPLE_OUTPUT_FILE:

* Remove integer based counter in return struct

This casues a lot of complications and seems to provide limit benefit
of just treating all counters as doubles. For ease of use, drop the integer
based counter.

* source formatting (clang-format v11) (#217)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Add correlation id support to counters (#218)

Adds correlation id support to counter collection. Requires tracing
to be enabled to return any useful value currently (since we do not
have HIP kernel tracing yet).

* source formatting (clang-format v11) (#223)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Add sample that attempts to fetch all counters

On whatever machine this test is run on, all counters available on
the platform will attempted to be fetched from a kernel execution.
Each counter will be fetched one time to check that the counter can be
fetched on the platform and that the counter is returning the correct instance
count (however due to the lack of transparency from AQL profiler this
check is not functional for some counters). We do not do any implicit
reduction on any counter, the result is that we see more counters than
the number of events being requested.

Below is the status of all counters on MI210.  All counters appear
functional with the changes in this PR. However, the instance count
retruned will be greater than that returned by
rocprofiler_query_counter_instance_count.

Got 516 counters collected
Counter ID: 0 (size) expected 1 instances and got 1
Counter ID: 1 (processor_id_low) expected 1 instances and got 1
Counter ID: 2 (capability) expected 1 instances and got 1
Counter ID: 3 (local_mem_size) expected 1 instances and got 1
Counter ID: 4 (min_latency) expected 1 instances and got 1
Counter ID: 5 (weight) expected 1 instances and got 1
Counter ID: 6 (node_from) expected 1 instances and got 1
Counter ID: 7 (version_major) expected 1 instances and got 1
Counter ID: 8 (version_minor) expected 1 instances and got 1
Counter ID: 9 (mem_clk_max) expected 1 instances and got 1
Counter ID: 10 (num_xcc) expected 1 instances and got 1
Counter ID: 11 (width) expected 1 instances and got 1
Counter ID: 12 (flags) expected 1 instances and got 1
Counter ID: 13 (size_in_bytes) expected 1 instances and got 1
Counter ID: 14 (array_count) expected 1 instances and got 1
Counter ID: 15 (num_gws) expected 1 instances and got 1
Counter ID: 16 (simd_id_base) expected 1 instances and got 1
Counter ID: 17 (max_waves_per_simd) expected 1 instances and got 1
Counter ID: 18 (sdma_fw_version) expected 1 instances and got 1
Counter ID: 19 (gfx_target_version) expected 1 instances and got 1
Counter ID: 20 (max_bandwidth) expected 1 instances and got 1
Counter ID: 21 (cpu_core_id_base) expected 1 instances and got 1
Counter ID: 22 (cache_line_size) expected 1 instances and got 1
Counter ID: 23 (level) expected 1 instances and got 1
Counter ID: 24 (min_bandwidth) expected 1 instances and got 1
Counter ID: 25 (location_id) expected 1 instances and got 1
Counter ID: 26 (wave_front_size) expected 1 instances and got 1
Counter ID: 27 (lds_size_in_kb) expected 1 instances and got 1
Counter ID: 28 (simd_count) expected 1 instances and got 1
Counter ID: 29 (fw_version) expected 1 instances and got 1
Counter ID: 30 (recommended_transfer_size) expected 1 instances and got 1
Counter ID: 31 (simd_per_cu) expected 1 instances and got 1
Counter ID: 32 (association) expected 1 instances and got 1
Counter ID: 33 (mem_banks_count) expected 1 instances and got 1
Counter ID: 34 (latency) expected 1 instances and got 1
Counter ID: 35 (max_latency) expected 1 instances and got 1
Counter ID: 36 (cpu_cores_count) expected 1 instances and got 1
Counter ID: 37 (io_links_count) expected 1 instances and got 1
Counter ID: 38 (domain) expected 1 instances and got 1
Counter ID: 39 (max_engine_clk_fcompute) expected 1 instances and got 1
Counter ID: 40 (caches_count) expected 1 instances and got 1
Counter ID: 41 (simd_arrays_per_engine) expected 1 instances and got 1
Counter ID: 42 (cache_lines_per_tag) expected 1 instances and got 1
Counter ID: 43 (gds_size_in_kb) expected 1 instances and got 1
Counter ID: 44 (cu_per_simd_array) expected 1 instances and got 1
Counter ID: 45 (type) expected 1 instances and got 1
Counter ID: 46 (max_slots_scratch_cu) expected 1 instances and got 1
Counter ID: 47 (vendor_id) expected 1 instances and got 1
Counter ID: 48 (device_id) expected 1 instances and got 1
Counter ID: 49 (heap_type) expected 1 instances and got 1
Counter ID: 50 (drm_render_minor) expected 1 instances and got 1
Counter ID: 51 (num_sdma_engines) expected 1 instances and got 1
Counter ID: 52 (node_to) expected 1 instances and got 1
Counter ID: 53 (num_sdma_xgmi_engines) expected 1 instances and got 1
Counter ID: 54 (num_sdma_queues_per_engine) expected 1 instances and got 1
Counter ID: 55 (hive_id) expected 1 instances and got 1
Counter ID: 56 (num_cp_queues) expected 1 instances and got 1
Counter ID: 57 (max_engine_clk_ccompute) expected 1 instances and got 1
Counter ID: 517 (MAX_WAVE_SIZE) expected 1 instances and got 1
Counter ID: 518 (SE_NUM) expected 1 instances and got 1
Counter ID: 519 (SIMD_NUM) expected 1 instances and got 1
Counter ID: 520 (CU_NUM) expected 1 instances and got 1
[ERROR]Counter ID: 521 (SQ_WAIT_INST_LDS) expected 1 instances and got 8
[ERROR]Counter ID: 522 (TCP_TCP_TA_DATA_STALL_CYCLES) expected 16 instances and got 128
Counter ID: 523 (GRBM_COUNT) expected 1 instances and got 1
Counter ID: 524 (GRBM_GUI_ACTIVE) expected 1 instances and got 1
Counter ID: 525 (GRBM_CP_BUSY) expected 1 instances and got 1
Counter ID: 526 (GRBM_SPI_BUSY) expected 1 instances and got 1
Counter ID: 527 (GRBM_TA_BUSY) expected 1 instances and got 1
Counter ID: 528 (GRBM_TC_BUSY) expected 1 instances and got 1
Counter ID: 529 (GRBM_CPC_BUSY) expected 1 instances and got 1
Counter ID: 530 (GRBM_CPF_BUSY) expected 1 instances and got 1
Counter ID: 531 (GRBM_UTCL2_BUSY) expected 1 instances and got 1
Counter ID: 532 (GRBM_EA_BUSY) expected 1 instances and got 1
Counter ID: 533 (CPC_ME1_BUSY_FOR_PACKET_DECODE) expected 1 instances and got 1
Counter ID: 534 (CPC_UTCL1_STALL_ON_TRANSLATION) expected 1 instances and got 1
Counter ID: 535 (CPC_CPC_STAT_BUSY) expected 1 instances and got 1
Counter ID: 536 (CPC_CPC_STAT_IDLE) expected 1 instances and got 1
Counter ID: 537 (CPC_CPC_STAT_STALL) expected 1 instances and got 1
Counter ID: 538 (CPC_CPC_TCIU_BUSY) expected 1 instances and got 1
Counter ID: 539 (CPC_CPC_TCIU_IDLE) expected 1 instances and got 1
Counter ID: 540 (CPC_CPC_UTCL2IU_BUSY) expected 1 instances and got 1
Counter ID: 541 (CPC_CPC_UTCL2IU_IDLE) expected 1 instances and got 1
Counter ID: 542 (CPC_CPC_UTCL2IU_STALL) expected 1 instances and got 1
Counter ID: 543 (CPC_ME1_DC0_SPI_BUSY) expected 1 instances and got 1
Counter ID: 544 (CPF_CMP_UTCL1_STALL_ON_TRANSLATION) expected 1 instances and got 1
Counter ID: 545 (CPF_CPF_STAT_BUSY) expected 1 instances and got 1
Counter ID: 546 (CPF_CPF_STAT_IDLE) expected 1 instances and got 1
Counter ID: 547 (CPF_CPF_STAT_STALL) expected 1 instances and got 1
Counter ID: 548 (CPF_CPF_TCIU_BUSY) expected 1 instances and got 1
Counter ID: 549 (CPF_CPF_TCIU_IDLE) expected 1 instances and got 1
Counter ID: 550 (CPF_CPF_TCIU_STALL) expected 1 instances and got 1
[ERROR]Counter ID: 551 (SPI_CSN_WINDOW_VALID) expected 1 instances and got 8
[ERROR]Counter ID: 552 (SPI_CSN_BUSY) expected 1 instances and got 8
[ERROR]Counter ID: 553 (SPI_CSN_NUM_THREADGROUPS) expected 1 instances and got 8
[ERROR]Counter ID: 554 (SPI_CSN_WAVE) expected 1 instances and got 8
[ERROR]Counter ID: 555 (SPI_RA_REQ_NO_ALLOC) expected 1 instances and got 8
[ERROR]Counter ID: 556 (SPI_RA_REQ_NO_ALLOC_CSN) expected 1 instances and got 8
[ERROR]Counter ID: 557 (SPI_RA_RES_STALL_CSN) expected 1 instances and got 8
[ERROR]Counter ID: 558 (SPI_RA_TMP_STALL_CSN) expected 1 instances and got 8
[ERROR]Counter ID: 559 (SPI_RA_WAVE_SIMD_FULL_CSN) expected 1 instances and got 8
[ERROR]Counter ID: 560 (SPI_RA_VGPR_SIMD_FULL_CSN) expected 1 instances and got 8
[ERROR]Counter ID: 561 (SPI_RA_SGPR_SIMD_FULL_CSN) expected 1 instances and got 8
[ERROR]Counter ID: 562 (SPI_RA_LDS_CU_FULL_CSN) expected 1 instances and got 8
[ERROR]Counter ID: 563 (SPI_RA_BAR_CU_FULL_CSN) expected 1 instances and got 8
[ERROR]Counter ID: 564 (SPI_RA_BULKY_CU_FULL_CSN) expected 1 instances and got 8
[ERROR]Counter ID: 565 (SPI_RA_TGLIM_CU_FULL_CSN) expected 1 instances and got 8
[ERROR]Counter ID: 566 (SPI_RA_WVLIM_STALL_CSN) expected 1 instances and got 8
[ERROR]Counter ID: 567 (SPI_SWC_CSC_WR) expected 1 instances and got 8
[ERROR]Counter ID: 568 (SPI_VWC_CSC_WR) expected 1 instances and got 8
[ERROR]Counter ID: 569 (SQ_ACCUM_PREV) expected 1 instances and got 8
[ERROR]Counter ID: 570 (SQ_CYCLES) expected 1 instances and got 8
[ERROR]Counter ID: 571 (SQ_BUSY_CYCLES) expected 1 instances and got 8
[ERROR]Counter ID: 572 (SQ_WAVES) expected 1 instances and got 8
[ERROR]Counter ID: 573 (SQ_LEVEL_WAVES) expected 1 instances and got 8
[ERROR]Counter ID: 574 (SQ_WAVES_EQ_64) expected 1 instances and got 8
[ERROR]Counter ID: 575 (SQ_WAVES_LT_64) expected 1 instances and got 8
[ERROR]Counter ID: 576 (SQ_WAVES_LT_48) expected 1 instances and got 8
[ERROR]Counter ID: 577 (SQ_WAVES_LT_32) expected 1 instances and got 8
[ERROR]Counter ID: 578 (SQ_WAVES_LT_16) expected 1 instances and got 8
[ERROR]Counter ID: 579 (SQ_BUSY_CU_CYCLES) expected 1 instances and got 8
[ERROR]Counter ID: 580 (SQ_ITEMS) expected 1 instances and got 8
[ERROR]Counter ID: 581 (SQ_INSTS) expected 1 instances and got 8
[ERROR]Counter ID: 582 (SQ_INSTS_VALU) expected 1 instances and got 8
[ERROR]Counter ID: 583 (SQ_INSTS_VALU_ADD_F16) expected 1 instances and got 8
[ERROR]Counter ID: 584 (SQ_INSTS_VALU_MUL_F16) expected 1 instances and got 8
[ERROR]Counter ID: 585 (SQ_INSTS_VALU_FMA_F16) expected 1 instances and got 8
[ERROR]Counter ID: 586 (SQ_INSTS_VALU_TRANS_F16) expected 1 instances and got 8
[ERROR]Counter ID: 587 (SQ_INSTS_VALU_ADD_F32) expected 1 instances and got 8
[ERROR]Counter ID: 588 (SQ_INSTS_VALU_MUL_F32) expected 1 instances and got 8
[ERROR]Counter ID: 589 (SQ_INSTS_VALU_FMA_F32) expected 1 instances and got 8
[ERROR]Counter ID: 590 (SQ_INSTS_VALU_TRANS_F32) expected 1 instances and got 8
[ERROR]Counter ID: 591 (SQ_INSTS_VALU_ADD_F64) expected 1 instances and got 8
[ERROR]Counter ID: 592 (SQ_INSTS_VALU_MUL_F64) expected 1 instances and got 8
[ERROR]Counter ID: 593 (SQ_INSTS_VALU_FMA_F64) expected 1 instances and got 8
[ERROR]Counter ID: 594 (SQ_INSTS_VALU_TRANS_F64) expected 1 instances and got 8
[ERROR]Counter ID: 595 (SQ_INSTS_VALU_INT32) expected 1 instances and got 8
[ERROR]Counter ID: 596 (SQ_INSTS_VALU_INT64) expected 1 instances and got 8
[ERROR]Counter ID: 597 (SQ_INSTS_VALU_CVT) expected 1 instances and got 8
[ERROR]Counter ID: 598 (SQ_INSTS_VALU_MFMA_I8) expected 1 instances and got 8
[ERROR]Counter ID: 599 (SQ_INSTS_VALU_MFMA_F16) expected 1 instances and got 8
[ERROR]Counter ID: 600 (SQ_INSTS_VALU_MFMA_BF16) expected 1 instances and got 8
[ERROR]Counter ID: 601 (SQ_INSTS_VALU_MFMA_F32) expected 1 instances and got 8
[ERROR]Counter ID: 602 (SQ_INSTS_VALU_MFMA_F64) expected 1 instances and got 8
[ERROR]Counter ID: 603 (SQ_INSTS_VALU_MFMA_MOPS_I8) expected 1 instances and got 8
[ERROR]Counter ID: 604 (SQ_INSTS_VALU_MFMA_MOPS_F16) expected 1 instances and got 8
[ERROR]Counter ID: 605 (SQ_INSTS_VALU_MFMA_MOPS_BF16) expected 1 instances and got 8
[ERROR]Counter ID: 606 (SQ_INSTS_VALU_MFMA_MOPS_F32) expected 1 instances and got 8
[ERROR]Counter ID: 607 (SQ_INSTS_VALU_MFMA_MOPS_F64) expected 1 instances and got 8
[ERROR]Counter ID: 608 (SQ_INSTS_MFMA) expected 1 instances and got 8
[ERROR]Counter ID: 609 (SQ_INSTS_VMEM_WR) expected 1 instances and got 8
[ERROR]Counter ID: 610 (SQ_INSTS_VMEM_RD) expected 1 instances and got 8
[ERROR]Counter ID: 611 (SQ_INSTS_VMEM) expected 1 instances and got 8
[ERROR]Counter ID: 612 (SQ_INSTS_SALU) expected 1 instances and got 8
[ERROR]Counter ID: 613 (SQ_INSTS_SMEM) expected 1 instances and got 8
[ERROR]Counter ID: 614 (SQ_INSTS_FLAT) expected 1 instances and got 8
[ERROR]Counter ID: 615 (SQ_INSTS_FLAT_LDS_ONLY) expected 1 instances and got 8
[ERROR]Counter ID: 616 (SQ_INSTS_LDS) expected 1 instances and got 8
[ERROR]Counter ID: 617 (SQ_INSTS_GDS) expected 1 instances and got 8
[ERROR]Counter ID: 618 (SQ_INSTS_EXP_GDS) expected 1 instances and got 8
[ERROR]Counter ID: 619 (SQ_INSTS_BRANCH) expected 1 instances and got 8
[ERROR]Counter ID: 620 (SQ_INSTS_SENDMSG) expected 1 instances and got 8
[ERROR]Counter ID: 621 (SQ_INSTS_VSKIPPED) expected 1 instances and got 8
[ERROR]Counter ID: 622 (SQ_INST_LEVEL_VMEM) expected 1 instances and got 8
[ERROR]Counter ID: 623 (SQ_INST_LEVEL_SMEM) expected 1 instances and got 8
[ERROR]Counter ID: 624 (SQ_INST_LEVEL_LDS) expected 1 instances and got 8
[ERROR]Counter ID: 625 (SQ_VALU_MFMA_BUSY_CYCLES) expected 1 instances and got 8
[ERROR]Counter ID: 626 (SQ_WAVE_CYCLES) expected 1 instances and got 8
[ERROR]Counter ID: 627 (SQ_WAIT_ANY) expected 1 instances and got 8
[ERROR]Counter ID: 628 (SQ_WAIT_INST_ANY) expected 1 instances and got 8
[ERROR]Counter ID: 629 (SQ_ACTIVE_INST_ANY) expected 1 instances and got 8
[ERROR]Counter ID: 630 (SQ_ACTIVE_INST_VMEM) expected 1 instances and got 8
[ERROR]Counter ID: 631 (SQ_ACTIVE_INST_LDS) expected 1 instances and got 8
[ERROR]Counter ID: 632 (SQ_ACTIVE_INST_VALU) expected 1 instances and got 8
[ERROR]Counter ID: 633 (SQ_ACTIVE_INST_SCA) expected 1 instances and got 8
[ERROR]Counter ID: 634 (SQ_ACTIVE_INST_EXP_GDS) expected 1 instances and got 8
[ERROR]Counter ID: 635 (SQ_ACTIVE_INST_MISC) expected 1 instances and got 8
[ERROR]Counter ID: 636 (SQ_ACTIVE_INST_FLAT) expected 1 instances and got 8
[ERROR]Counter ID: 637 (SQ_INST_CYCLES_VMEM_WR) expected 1 instances and got 8
[ERROR]Counter ID: 638 (SQ_INST_CYCLES_VMEM_RD) expected 1 instances and got 8
[ERROR]Counter ID: 639 (SQ_INST_CYCLES_SMEM) expected 1 instances and got 8
[ERROR]Counter ID: 640 (SQ_INST_CYCLES_SALU) expected 1 instances and got 8
[ERROR]Counter ID: 641 (SQ_THREAD_CYCLES_VALU) expected 1 instances and got 8
[ERROR]Counter ID: 642 (SQ_IFETCH) expected 1 instances and got 8
[ERROR]Counter ID: 643 (SQ_IFETCH_LEVEL) expected 1 instances and got 8
[ERROR]Counter ID: 644 (SQ_LDS_BANK_CONFLICT) expected 1 instances and got 8
[ERROR]Counter ID: 645 (SQ_LDS_ADDR_CONFLICT) expected 1 instances and got 8
[ERROR]Counter ID: 646 (SQ_LDS_UNALIGNED_STALL) expected 1 instances and got 8
[ERROR]Counter ID: 647 (SQ_LDS_MEM_VIOLATIONS) expected 1 instances and got 8
[ERROR]Counter ID: 648 (SQ_LDS_ATOMIC_RETURN) expected 1 instances and got 8
[ERROR]Counter ID: 649 (SQ_LDS_IDX_ACTIVE) expected 1 instances and got 8
[ERROR]Counter ID: 650 (SQ_ACCUM_PREV_HIRES) expected 1 instances and got 8
[ERROR]Counter ID: 651 (SQ_WAVES_RESTORED) expected 1 instances and got 8
[ERROR]Counter ID: 652 (SQ_WAVES_SAVED) expected 1 instances and got 8
[ERROR]Counter ID: 653 (SQ_INSTS_SMEM_NORM) expected 1 instances and got 8
[ERROR]Counter ID: 654 (SQC_DCACHE_INPUT_VALID_READYB) expected 1 instances and got 8
[ERROR]Counter ID: 655 (SQC_TC_REQ) expected 1 instances and got 8
[ERROR]Counter ID: 656 (SQC_TC_INST_REQ) expected 1 instances and got 8
[ERROR]Counter ID: 657 (SQC_TC_DATA_READ_REQ) expected 1 instances and got 8
[ERROR]Counter ID: 658 (SQC_TC_DATA_WRITE_REQ) expected 1 instances and got 8
[ERROR]Counter ID: 659 (SQC_TC_DATA_ATOMIC_REQ) expected 1 instances and got 8
[ERROR]Counter ID: 660 (SQC_TC_STALL) expected 1 instances and got 8
[ERROR]Counter ID: 661 (SQC_ICACHE_REQ) expected 1 instances and got 8
[ERROR]Counter ID: 662 (SQC_ICACHE_HITS) expected 1 instances and got 8
[ERROR]Counter ID: 663 (SQC_ICACHE_MISSES) expected 1 instances and got 8
[ERROR]Counter ID: 664 (SQC_ICACHE_MISSES_DUPLICATE) expected 1 instances and got 8
[ERROR]Counter ID: 665 (SQC_DCACHE_REQ) expected 1 instances and got 8
[ERROR]Counter ID: 666 (SQC_DCACHE_HITS) expected 1 instances and got 8
[ERROR]Counter ID: 667 (SQC_DCACHE_MISSES) expected 1 instances and got 8
[ERROR]Counter ID: 668 (SQC_DCACHE_MISSES_DUPLICATE) expected 1 instances and got 8
[ERROR]Counter ID: 669 (SQC_DCACHE_ATOMIC) expected 1 instances and got 8
[ERROR]Counter ID: 670 (SQC_DCACHE_REQ_READ_1) expected 1 instances and got 8
[ERROR]Counter ID: 671 (SQC_DCACHE_REQ_READ_2) expected 1 instances and got 8
[ERROR]Counter ID: 672 (SQC_DCACHE_REQ_READ_4) expected 1 instances and got 8
[ERROR]Counter ID: 673 (SQC_DCACHE_REQ_READ_8) expected 1 instances and got 8
[ERROR]Counter ID: 674 (SQC_DCACHE_REQ_READ_16) expected 1 instances and got 8
[ERROR]Counter ID: 675 (TA_TA_BUSY) expected 16 instances and got 128
[ERROR]Counter ID: 676 (TA_TOTAL_WAVEFRONTS) expected 16 instances and got 128
[ERROR]Counter ID: 677 (TA_BUFFER_WAVEFRONTS) expected 16 instances and got 128
[ERROR]Counter ID: 678 (TA_BUFFER_READ_WAVEFRONTS) expected 16 instances and got 128
[ERROR]Counter ID: 679 (TA_BUFFER_WRITE_WAVEFRONTS) expected 16 instances and got 128
[ERROR]Counter ID: 680 (TA_BUFFER_ATOMIC_WAVEFRONTS) expected 16 instances and got 128
[ERROR]Counter ID: 681 (TA_BUFFER_TOTAL_CYCLES) expected 16 instances and got 128
[ERROR]Counter ID: 682 (TA_BUFFER_COALESCED_READ_CYCLES) expected 16 instances and got 128
[ERROR]Counter ID: 683 (TA_BUFFER_COALESCED_WRITE_CYCLES) expected 16 instances and got 128
[ERROR]Counter ID: 684 (TA_ADDR_STALLED_BY_TC_CYCLES) expected 16 instances and got 128
[ERROR]Counter ID: 685 (TA_ADDR_STALLED_BY_TD_CYCLES) expected 16 instances and got 128
[ERROR]Counter ID: 686 (TA_DATA_STALLED_BY_TC_CYCLES) expected 16 instances and got 128
[ERROR]Counter ID: 687 (TA_FLAT_WAVEFRONTS) expected 16 instances and got 128
[ERROR]Counter ID: 688 (TA_FLAT_READ_WAVEFRONTS) expected 16 instances and got 128
[ERROR]Counter ID: 689 (TA_FLAT_WRITE_WAVEFRONTS) expected 16 instances and got 128
[ERROR]Counter ID: 690 (TA_FLAT_ATOMIC_WAVEFRONTS) expected 16 instances and got 128
[ERROR]Counter ID: 691 (TD_TD_BUSY) expected 16 instances and got 128
[ERROR]Counter ID: 692 (TD_TC_STALL) expected 16 instances and got 128
[ERROR]Counter ID: 693 (TD_SPI_STALL) expected 16 instances and got 128
[ERROR]Counter ID: 694 (TD_LOAD_WAVEFRONT) expected 16 instances and got 128
[ERROR]Counter ID: 695 (TD_ATOMIC_WAVEFRONT) expected 16 instances and got 128
[ERROR]Counter ID: 696 (TD_STORE_WAVEFRONT) expected 16 instances and got 128
[ERROR]Counter ID: 697 (TD_COALESCABLE_WAVEFRONT) expected 16 instances and got 128
[ERROR]Counter ID: 698 (TCP_GATE_EN1) expected 16 instances and got 128
[ERROR]Counter ID: 699 (TCP_GATE_EN2) expected 16 instances and got 128
[ERROR]Counter ID: 700 (TCP_TD_TCP_STALL_CYCLES) expected 16 instances and got 128
[ERROR]Counter ID: 701 (TCP_TCR_TCP_STALL_CYCLES) expected 16 instances and got 128
[ERROR]Counter ID: 702 (TCP_READ_TAGCONFLICT_STALL_CYCLES) expected 16 instances and got 128
[ERROR]Counter ID: 703 (TCP_WRITE_TAGCONFLICT_STALL_CYCLES) expected 16 instances and got 128
[ERROR]Counter ID: 704 (TCP_ATOMIC_TAGCONFLICT_STALL_CYCLES) expected 16 instances and got 128
[ERROR]Counter ID: 705 (TCP_PENDING_STALL_CYCLES) expected 16 instances and got 128
[ERROR]Counter ID: 706 (TCP_TA_TCP_STATE_READ) expected 16 instances and got 128
[ERROR]Counter ID: 707 (TCP_VOLATILE) expected 16 instances and got 128
[ERROR]Counter ID: 708 (TCP_TOTAL_ACCESSES) expected 16 instances and got 128
[ERROR]Counter ID: 709 (TCP_TOTAL_READ) expected 16 instances and got 128
[ERROR]Counter ID: 710 (TCP_TOTAL_WRITE) expected 16 instances and got 128
[ERROR]Counter ID: 711 (TCP_TOTAL_ATOMIC_WITH_RET) expected 16 instances and got 128
[ERROR]Counter ID: 712 (TCP_TOTAL_ATOMIC_WITHOUT_RET) expected 16 instances and got 128
[ERROR]Counter ID: 713 (TCP_TOTAL_WRITEBACK_INVALIDATES) expected 16 instances and got 128
[ERROR]Counter ID: 714 (TCP_UTCL1_REQUEST) expected 16 instances and got 128
[ERROR]Counter ID: 715 (TCP_UTCL1_TRANSLATION_MISS) expected 16 instances and got 128
[ERROR]Counter ID: 716 (TCP_UTCL1_TRANSLATION_HIT) expected 16 instances and got 128
[ERROR]Counter ID: 717 (TCP_UTCL1_PERMISSION_MISS) expected 16 instances and got 128
[ERROR]Counter ID: 718 (TCP_TOTAL_CACHE_ACCESSES) expected 16 instances and got 128
[ERROR]Counter ID: 719 (TCP_TCP_LATENCY) expected 16 instances and got 128
[ERROR]Counter ID: 720 (TCP_TCC_READ_REQ_LATENCY) expected 16 instances and got 128
[ERROR]Counter ID: 721 (TCP_TCC_WRITE_REQ_LATENCY) expected 16 instances and got 128
[ERROR]Counter ID: 722 (TCP_TCC_READ_REQ) expected 16 instances and got 128
[ERROR]Counter ID: 723 (TCP_TCC_WRITE_REQ) expected 16 instances and got 128
[ERROR]Counter ID: 724 (TCP_TCC_ATOMIC_WITH_RET_REQ) expected 16 instances and got 128
[ERROR]Counter ID: 725 (TCP_TCC_ATOMIC_WITHOUT_RET_REQ) expected 16 instances and got 128
[ERROR]Counter ID: 726 (TCP_TCC_NC_READ_REQ) expected 16 instances and got 128
[ERROR]Counter ID: 727 (TCP_TCC_NC_WRITE_REQ) expected 16 instances and got 128
[ERROR]Counter ID: 728 (TCP_TCC_NC_ATOMIC_REQ) expected 16 instances and got 128
[ERROR]Counter ID: 729 (TCP_TCC_UC_READ_REQ) expected 16 instances and got 128
[ERROR]Counter ID: 730 (TCP_TCC_UC_WRITE_REQ) expected 16 instances and got 128
[ERROR]Counter ID: 731 (TCP_TCC_UC_ATOMIC_REQ) expected 16 instances and got 128
[ERROR]Counter ID: 732 (TCP_TCC_CC_READ_REQ) expected 16 instances and got 128
[ERROR]Counter ID: 733 (TCP_TCC_CC_WRITE_REQ) expected 16 instances and got 128
[ERROR]Counter ID: 734 (TCP_TCC_CC_ATOMIC_REQ) expected 16 instances and got 128
[ERROR]Counter ID: 735 (TCP_TCC_RW_READ_REQ) expected 16 instances and got 128
[ERROR]Counter ID: 736 (TCP_TCC_RW_WRITE_REQ) expected 16 instances and got 128
[ERROR]Counter ID: 737 (TCP_TCC_RW_ATOMIC_REQ) expected 16 instances and got 128
Counter ID: 738 (TCA_CYCLE) expected 32 instances and got 32
Counter ID: 739 (TCA_BUSY) expected 32 instances and got 32
Counter ID: 740 (TCC_CYCLE) expected 32 instances and got 32
Counter ID: 741 (TCC_BUSY) expected 32 instances and got 32
Counter ID: 742 (TCC_REQ) expected 32 instances and got 32
Counter ID: 743 (TCC_STREAMING_REQ) expected 32 instances and got 32
Counter ID: 744 (TCC_NC_REQ) expected 32 instances and got 32
Counter ID: 745 (TCC_UC_REQ) expected 32 instances and got 32
Counter ID: 746 (TCC_CC_REQ) expected 32 instances and got 32
Counter ID: 747 (TCC_RW_REQ) expected 32 instances and got 32
Counter ID: 748 (TCC_PROBE) expected 32 instances and got 32
Counter ID: 749 (TCC_PROBE_ALL) expected 32 instances and got 32
Counter ID: 750 (TCC_READ) expected 32 instances and got 32
Counter ID: 751 (TCC_WRITE) expected 32 instances and got 32
Counter ID: 752 (TCC_ATOMIC) expected 32 instances and got 32
Counter ID: 753 (TCC_HIT) expected 32 instances and got 32
Counter ID: 754 (TCC_MISS) expected 32 instances and got 32
Counter ID: 755 (TCC_WRITEBACK) expected 32 instances and got 32
Counter ID: 756 (TCC_EA_WRREQ) expected 32 instances and got 32
Counter ID: 757 (TCC_EA_WRREQ_64B) expected 32 instances and got 32
Counter ID: 758 (TCC_EA_WR_UNCACHED_32B) expected 32 instances and got 32
Counter ID: 759 (TCC_EA_WRREQ_STALL) expected 32 instances and got 32
Counter ID: 760 (TCC_EA_WRREQ_IO_CREDIT_STALL) expected 32 instances and got 32
Counter ID: 761 (TCC_EA_WRREQ_GMI_CREDIT_STALL) expected 32 instances and got 32
Counter ID: 762 (TCC_EA_WRREQ_DRAM_CREDIT_STALL) expected 32 instances and got 32
Counter ID: 763 (TCC_TOO_MANY_EA_WRREQS_STALL) expected 32 instances and got 32
Counter ID: 764 (TCC_EA_WRREQ_LEVEL) expected 32 instances and got 32
Counter ID: 765 (TCC_EA_ATOMIC) expected 32 instances and got 32
Counter ID: 766 (TCC_EA_ATOMIC_LEVEL) expected 32 instances and got 32
Counter ID: 767 (TCC_EA_RDREQ) expected 32 instances and got 32
Counter ID: 768 (TCC_EA_RDREQ_32B) expected 32 instances and got 32
Counter ID: 769 (TCC_EA_RD_UNCACHED_32B) expected 32 instances and got 32
Counter ID: 770 (TCC_EA_RDREQ_IO_CREDIT_STALL) expected 32 instances and got 32
Counter ID: 771 (TCC_EA_RDREQ_GMI_CREDIT_STALL) expected 32 instances and got 32
Counter ID: 772 (TCC_EA_RDREQ_DRAM_CREDIT_STALL) expected 32 instances and got 32
Counter ID: 773 (TCC_EA_RDREQ_LEVEL) expected 32 instances and got 32
Counter ID: 774 (TCC_TAG_STALL) expected 32 instances and got 32
Counter ID: 775 (TCC_NORMAL_WRITEBACK) expected 32 instances and got 32
Counter ID: 776 (TCC_ALL_TC_OP_WB_WRITEBACK) expected 32 instances and got 32
Counter ID: 777 (TCC_NORMAL_EVICT) expected 32 instances and got 32
Counter ID: 778 (TCC_ALL_TC_OP_INV_EVICT) expected 32 instances and got 32
Counter ID: 779 (TCC_EA_RDREQ_DRAM) expected 32 instances and got 32
Counter ID: 780 (TCC_EA_WRREQ_DRAM) expected 32 instances and got 32
[ERROR]Counter ID: 1893 (MeanOccupancyPerCU) expected 1 instances and got 8
[ERROR]Counter ID: 1894 (MeanOccupancyPerActiveCU) expected 1 instances and got 8
[ERROR]Counter ID: 1895 (TA_BUSY_avr) expected 16 instances and got 1
[ERROR]Counter ID: 1896 (TA_BUSY_max) expected 16 instances and got 1
[ERROR]Counter ID: 1897 (TA_BUSY_min) expected 16 instances and got 1
[ERROR]Counter ID: 1898 (TA_TA_BUSY_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1899 (TA_TOTAL_WAVEFRONTS_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1900 (TA_ADDR_STALLED_BY_TC_CYCLES_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1901 (TA_ADDR_STALLED_BY_TD_CYCLES_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1902 (TA_DATA_STALLED_BY_TC_CYCLES_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1903 (TA_FLAT_WAVEFRONTS_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1904 (TA_FLAT_READ_WAVEFRONTS_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1905 (TA_FLAT_WRITE_WAVEFRONTS_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1906 (TA_FLAT_ATOMIC_WAVEFRONTS_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1907 (TA_BUFFER_WAVEFRONTS_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1908 (TA_BUFFER_READ_WAVEFRONTS_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1909 (TA_BUFFER_WRITE_WAVEFRONTS_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1910 (TA_BUFFER_ATOMIC_WAVEFRONTS_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1911 (TA_BUFFER_TOTAL_CYCLES_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1912 (TA_BUFFER_COALESCED_READ_CYCLES_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1913 (TA_BUFFER_COALESCED_WRITE_CYCLES_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1914 (TD_TD_BUSY_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1915 (TD_TC_STALL_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1916 (TD_LOAD_WAVEFRONT_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1917 (TD_ATOMIC_WAVEFRONT_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1918 (TD_STORE_WAVEFRONT_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1919 (TD_COALESCABLE_WAVEFRONT_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1920 (TD_SPI_STALL_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1921 (TCP_GATE_EN1_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1922 (TCP_GATE_EN2_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1923 (TCP_TD_TCP_STALL_CYCLES_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1924 (TCP_TCR_TCP_STALL_CYCLES_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1925 (TCP_READ_TAGCONFLICT_STALL_CYCLES_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1926 (TCP_WRITE_TAGCONFLICT_STALL_CYCLES_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1927 (TCP_ATOMIC_TAGCONFLICT_STALL_CYCLES_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1928 (TCP_VOLATILE_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1929 (TCP_TOTAL_ACCESSES_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1930 (TCP_TOTAL_READ_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1931 (TCP_TOTAL_WRITE_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1932 (TCP_TOTAL_ATOMIC_WITH_RET_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1933 (TCP_TOTAL_ATOMIC_WITHOUT_RET_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1934 (TCP_TOTAL_WRITEBACK_INVALIDATES_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1935 (TCP_UTCL1_REQUEST_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1936 (TCP_UTCL1_TRANSLATION_MISS_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1937 (TCP_UTCL1_TRANSLATION_HIT_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1938 (TCP_UTCL1_PERMISSION_MISS_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1939 (TCP_TOTAL_CACHE_ACCESSES_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1940 (TCP_TCP_LATENCY_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1941 (TCP_TA_TCP_STATE_READ_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1942 (TCP_TCC_READ_REQ_LATENCY_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1943 (TCP_TCC_WRITE_REQ_LATENCY_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1944 (TCP_TCC_READ_REQ_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1945 (TCP_TCC_WRITE_REQ_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1946 (TCP_TCC_ATOMIC_WITH_RET_REQ_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1947 (TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1948 (TCP_TCC_NC_READ_REQ_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1949 (TCP_TCC_NC_WRITE_REQ_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1950 (TCP_TCC_NC_ATOMIC_REQ_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1951 (TCP_TCC_UC_READ_REQ_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1952 (TCP_TCC_UC_WRITE_REQ_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1953 (TCP_TCC_UC_ATOMIC_REQ_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1954 (TCP_TCC_CC_READ_REQ_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1955 (TCP_TCC_CC_WRITE_REQ_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1956 (TCP_TCC_CC_ATOMIC_REQ_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1957 (TCP_TCC_RW_READ_REQ_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1958 (TCP_TCC_RW_WRITE_REQ_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1959 (TCP_TCC_RW_ATOMIC_REQ_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1960 (TCP_PENDING_STALL_CYCLES_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1961 (TCA_CYCLE_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1962 (TCA_BUSY_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1963 (TCC_BUSY_avr) expected 32 instances and got 1
[ERROR]Counter ID: 1964 (TCC_WRREQ_STALL_max) expected 32 instances and got 1
[ERROR]Counter ID: 1965 (TCC_CYCLE_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1966 (TCC_BUSY_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1967 (TCC_REQ_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1968 (TCC_STREAMING_REQ_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1969 (TCC_NC_REQ_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1970 (TCC_UC_REQ_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1971 (TCC_CC_REQ_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1972 (TCC_RW_REQ_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1973 (TCC_PROBE_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1974 (TCC_PROBE_ALL_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1975 (TCC_READ_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1976 (TCC_WRITE_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1977 (TCC_ATOMIC_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1978 (TCC_HIT_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1979 (TCC_MISS_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1980 (TCC_WRITEBACK_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1981 (TCC_EA_WRREQ_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1982 (TCC_EA_WRREQ_64B_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1983 (TCC_EA_WR_UNCACHED_32B_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1984 (TCC_EA_WRREQ_STALL_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1985 (TCC_EA_WRREQ_IO_CREDIT_STALL_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1986 (TCC_EA_WRREQ_GMI_CREDIT_STALL_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1987 (TCC_EA_WRREQ_DRAM_CREDIT_STALL_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1988 (TCC_TOO_MANY_EA_WRREQS_STALL_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1989 (TCC_EA_WRREQ_LEVEL_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1990 (TCC_EA_RDREQ_LEVEL_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1991 (TCC_EA_ATOMIC_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1992 (TCC_EA_ATOMIC_LEVEL_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1993 (TCC_EA_RDREQ_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1994 (TCC_EA_RDREQ_32B_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1995 (TCC_EA_RD_UNCACHED_32B_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1996 (TCC_EA_RDREQ_IO_CREDIT_STALL_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1997 (TCC_EA_RDREQ_GMI_CREDIT_STALL_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1998 (TCC_EA_RDREQ_DRAM_CREDIT_STALL_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1999 (TCC_TAG_STALL_sum) expected 32 instances and got 1
[ERROR]Counter ID: 2000 (TCC_NORMAL_WRITEBACK_sum) expected 32 instances and got 1
[ERROR]Counter ID: 2001 (TCC_ALL_TC_OP_WB_WRITEBACK_sum) expected 32 instances and got 1
[ERROR]Counter ID: 2002 (TCC_NORMAL_EVICT_sum) expected 32 instances and got 1
[ERROR]Counter ID: 2003 (TCC_ALL_TC_OP_INV_EVICT_sum) expected 32 instances and got 1
[ERROR]Counter ID: 2004 (TCC_EA_RDREQ_DRAM_sum) expected 32 instances and got 1
[ERROR]Counter ID: 2005 (TCC_EA_WRREQ_DRAM_sum) expected 32 instances and got 1
[ERROR]Counter ID: 2006 (FETCH_SIZE) expected 32 instances and got 1
[ERROR]Counter ID: 2007 (WRITE_SIZE) expected 32 instances and got 1
[ERROR]Counter ID: 2008 (WRITE_REQ_32B) expected 32 instances and got 1
[ERROR]Counter ID: 2009 (CU_OCCUPANCY) expected 1 instances and got 8
Counter ID: 2010 (CU_UTILIZATION) expected 1 instances and got 1
[ERROR]Counter ID: 2011 (TOTAL_16_OPS) expected 1 instances and got 8
[ERROR]Counter ID: 2012 (TOTAL_32_OPS) expected 1 instances and got 8
[ERROR]Counter ID: 2013 (TOTAL_64_OPS) expected 1 instances and got 8
Counter ID: 2014 (AggSysCycles) expected 1 instances and got 1
Counter ID: 2015 (GpuUtil) expected 1 instances and got 1
Counter ID: 2016 (CpUtil) expected 1 instances and got 1
Counter ID: 2017 (SpiUtil) expected 1 instances and got 1
Counter ID: 2018 (TaUtil) expected 1 instances and got 1
Counter ID: 2019 (TcUtil) expected 1 instances and got 1
Counter ID: 2020 (EaUtil) expected 1 instances and got 1
[ERROR]Counter ID: 2021 (InstrFetchLatency) expected 1 instances and got 8
[ERROR]Counter ID: 2022 (WaveOccupancy) expected 1 instances and got 8
[ERROR]Counter ID: 2023 (WaveDuration) expected 1 instances and got 8
[ERROR]Counter ID: 2024 (WaveDepWait) expected 1 instances and got 8
[ERROR]Counter ID: 2025 (WaveIssueWait) expected 1 instances and got 8
[ERROR]Counter ID: 2026 (WaveExec) expected 1 instances and got 8
[ERROR]Counter ID: 2027 (ValuIops) expected 1 instances and got 8
[ERROR]Counter ID: 2028 (MfmaFlops) expected 1 instances and got 8
[ERROR]Counter ID: 2029 (MfmaFlopsF16) expected 1 instances and got 8
[ERROR]Counter ID: 2030 (MfmaFlopsBF16) expected 1 instances and got 8
[ERROR]Counter ID: 2031 (MfmaFlopsF32) expected 1 instances and got 8
[ERROR]Counter ID: 2032 (MfmaFlopsF64) expected 1 instances and got 8
[ERROR]Counter ID: 2033 (ScaPipeIssueUtil) expected 1 instances and got 8
[ERROR]Counter ID: 2034 (ValuPipeIssueUtil) expected 1 instances and got 8
[ERROR]Counter ID: 2035 (VmemPipeIssueUtil) expected 1 instances and got 8
[ERROR]Counter ID: 2036 (MfmaUtil) expected 1 instances and got 8
[ERROR]Counter ID: 2037 (AvgNumActiveThreads) expected 1 instances and got 8
[ERROR]Counter ID: 2038 (VmemLatency) expected 1 instances and got 8
[ERROR]Counter ID: 2039 (SmemLatency) expected 1 instances and got 8
[ERROR]Counter ID: 2040 (LdsUtil) expected 1 instances and got 8
[ERROR]Counter ID: 2041 (LdsPipeIssueUtil) expected 1 instances and got 8
[ERROR]Counter ID: 2042 (LdsLatency) expected 1 instances and got 8
[ERROR]Counter ID: 2043 (LdsBankConflict) expected 1 instances and got 8
[ERROR]Counter ID: 2044 (L1iCacheHitRate) expected 1 instances and got 8
[ERROR]Counter ID: 2045 (sL1dCacheHitRate) expected 1 instances and got 8
[ERROR]Counter ID: 2046 (vL1dBufCoalesceRate) expected 16 instances and got 1
[ERROR]Counter ID: 2047 (vL1dCacheUtil) expected 16 instances and got 1
[ERROR]Counter ID: 2048 (vL1dCacheTcbHitRate) expected 16 instances and got 1
[ERROR]Counter ID: 2049 (vL1dCacheWaveLatency) expected 16 instances and got 1
[ERROR]Counter ID: 2050 (vL1dReadFromL2Latency) expected 16 instances and got 1
[ERROR]Counter ID: 2051 (vL1dWriteToL2Latency) expected 16 instances and got 1
[ERROR]Counter ID: 2052 (vL1dRdTagConfStallRate) expected 16 instances and got 1
[ERROR]Counter ID: 2053 (vL1dWrTagConfStallRate) expected 16 instances and got 1
[ERROR]Counter ID: 2054 (vL1dAtomicTagConfStallRate) expected 16 instances and got 1
[ERROR]Counter ID: 2055 (vL1dMissReqStallRate) expected 16 instances and got 1
[ERROR]Counter ID: 2056 (vL1dDataPendRate) expected 16 instances and got 1
[ERROR]Counter ID: 2057 (vL1dDataRetStallRate) expected 16 instances and got 1
[ERROR]Counter ID: 2058 (L2CacheHitRate) expected 32 instances and got 1
[ERROR]Counter ID: 2059 (L2CacheTagRamStallRate) expected 32 instances and got 1
[ERROR]Counter ID: 2060 (EaRdLatency) expected 32 instances and got 1
[ERROR]Counter ID: 2061 (EaRdIoStallRate) expected 32 instances and got 1
[ERROR]Counter ID: 2062 (EaRdGmiStallRate) expected 32 instances and got 1
[ERROR]Counter ID: 2063 (EaRdDramStallRate) expected 32 instances and got 1
[ERROR]Counter ID: 2064 (EaWrLatency) expected 32 instances and got 1
[ERROR]Counter ID: 2065 (EaWrIoStallRate) expected 32 instances and got 1
[ERROR]Counter ID: 2066 (EaWrGmiStallRate) expected 32 instances and got 1
[ERROR]Counter ID: 2067 (EaWrDramStallRate) expected 32 instances and got 1
[ERROR]Counter ID: 2068 (EaWrStarveRate) expected 32 instances and got 1
[ERROR]Counter ID: 2069 (EaAtomicLatency) expected 32 instances and got 1
[ERROR]Counter ID: 2070 (TCP_TCP_TA_DATA_STALL_CYCLES_sum) expected 16 instances and got 1
[ERROR]Counter ID: 2071 (TCP_TCP_TA_DATA_STALL_CYCLES_max) expected 16 instances and got 1
[ERROR]Counter ID: 2072 (VFetchInsts) expected 16 instances and got 8
[ERROR]Counter ID: 2073 (VWriteInsts) expected 16 instances and got 8
[ERROR]Counter ID: 2074 (FlatVMemInsts) expected 1 instances and got 8
[ERROR]Counter ID: 2075 (LDSInsts) expected 1 instances and got 8
[ERROR]Counter ID: 2076 (FlatLDSInsts) expected 1 instances and got 8
[ERROR]Counter ID: 2077 (VALUUtilization) expected 1 instances and got 8
[ERROR]Counter ID: 2078 (VALUBusy) expected 1 instances and got 8
[ERROR]Counter ID: 2079 (SALUBusy) expected 1 instances and got 8
[ERROR]Counter ID: 2080 (FetchSize) expected 32 instances and got 1
[ERROR]Counter ID: 2081 (WriteSize) expected 32 instances and got 1
[ERROR]Counter ID: 2082 (MemWrites32B) expected 32 instances and got 1
[ERROR]Counter ID: 2083 (L2CacheHit) expected 32 instances and got 1
[ERROR]Counter ID: 2084 (MemUnitStalled) expected 16 instances and got 1
[ERROR]Counter ID: 2085 (WriteUnitStalled) expected 32 instances and got 1
[ERROR]Counter ID: 2086 (LDSBankConflict) expected 1 instances and got 8

* source formatting (clang-format v11) (#225)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* cmake formatting (cmake-format) (#224)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Minor fixes

* source formatting (clang-format v11) (#226)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Minor test change

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bwelton <bwelton@users.noreply.github.com>
2023-11-17 01:49:51 -08:00

844 خطوط
31 KiB
C++

// MIT License
//
// Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#include <rocprofiler/agent.h>
#include <rocprofiler/fwd.h>
#include <rocprofiler/rocprofiler.h>
#include "lib/rocprofiler/agent.hpp"
#include "lib/rocprofiler/hsa/agent_cache.hpp"
#include <fmt/core.h>
#include <glog/logging.h>
#include <hsa/hsa_api_trace.h>
#include <libdrm/amdgpu.h>
#include <xf86drm.h>
#include <filesystem>
#include <fstream>
#include <limits>
#include <regex>
#include <sstream>
#include <string>
#include <type_traits>
#include <unordered_map>
#include <vector>
namespace rocprofiler
{
namespace agent
{
namespace
{
namespace fs = ::std::filesystem;
struct cpu_info
{
long processor = -1;
long family = -1;
long model = -1;
long physical_id = -1;
long core_id = -1;
long apicid = -1;
std::string vendor_id = {};
std::string model_name = {};
bool is_valid() const
{
return !(processor < 0 || family < 0 || model < 0 || physical_id < 0 || core_id < 0 ||
apicid < 0 || vendor_id.empty() || model_name.empty());
}
};
auto
parse_cpu_info()
{
auto ifs = std::ifstream{"/proc/cpuinfo"};
auto data = std::vector<cpu_info>{};
if(!ifs) return data;
auto read_blocks = [&ifs]() {
auto blocks = std::vector<std::vector<std::string>>{};
auto current_block = std::vector<std::string>{};
auto line = std::string{};
while(std::getline(ifs, line))
{
if(ifs.eof())
{
if(!current_block.empty()) blocks.emplace_back(std::move(current_block));
break;
}
if(line.empty())
{
if(!current_block.empty()) blocks.emplace_back(std::move(current_block));
current_block.clear();
}
else
{
current_block.emplace_back(line);
}
}
return blocks;
};
auto processor_blocks = read_blocks();
auto processor_info = std::vector<cpu_info>{};
processor_info.reserve(processor_blocks.size());
for(const auto& bitr : processor_blocks)
{
auto info_v = cpu_info{};
for(const auto& itr : bitr)
{
auto match = std::smatch{};
const std::regex re{".*: (.*)$"};
if(std::regex_match(itr, match, re))
{
if(match.size() == 2)
{
std::ssub_match value = match[1];
if(itr.find("vendor_id") == 0)
info_v.vendor_id = value.str();
else if(itr.find("model name") == 0)
info_v.model_name = value.str();
else if(itr.find("processor") == 0)
info_v.processor = std::stol(value.str());
else if(itr.find("cpu family") == 0)
info_v.family = std::stol(value.str());
else if(itr.find("model") == 0 && itr.find("model name") != 0)
info_v.model = std::stol(value.str());
else if(itr.find("physical id") == 0)
info_v.physical_id = std::stol(value.str());
else if(itr.find("core id") == 0)
info_v.core_id = std::stol(value.str());
else if(itr.find("apicid") == 0)
info_v.apicid = std::stol(value.str());
}
}
}
if(info_v.is_valid())
processor_info.emplace_back(info_v);
else
{
LOG(ERROR) << "Invalid processor info: "
<< fmt::format("processor={}, vendor={}, family={}, model={}, name={}, "
"physical id={}, core id={}, apicid={}",
info_v.processor,
info_v.vendor_id,
info_v.family,
info_v.model,
info_v.model_name,
info_v.physical_id,
info_v.core_id,
info_v.apicid);
}
}
return processor_info;
}
auto&
get_cpu_info()
{
static auto _v = parse_cpu_info();
return _v;
}
// check to see if the file is readable
bool
is_readable(const fs::path& fpath)
{
auto ec = std::error_code{};
auto perms = fs::status(fpath, ec).permissions();
LOG_IF(ERROR, ec) << fmt::format(
"Error getting status for file '{}': {}", fpath.string(), ec.message());
return (!ec && (perms & fs::perms::owner_read) != fs::perms::none);
}
auto
read_file(const std::string& fname)
{
auto data = std::vector<std::string>{};
if(!is_readable(fs::path{fname}))
throw std::runtime_error{fmt::format("file '{}' cannot be read", fname)};
auto ifs = std::ifstream{fname};
if(!ifs || !ifs.good())
throw std::runtime_error{fmt::format("file '{}' cannot be read", fname)};
while(true)
{
auto value = std::string{};
ifs >> value;
if(ifs.eof() || value.empty()) break;
data.emplace_back(value);
}
return data;
}
auto
read_map(const std::string& fname)
{
auto data = std::unordered_map<std::string, std::string>{};
if(!is_readable(fs::path{fname}))
throw std::runtime_error{fmt::format("file '{}' cannot be read", fname)};
auto ifs = std::ifstream{fname};
if(!ifs || !ifs.good())
throw std::runtime_error{fmt::format("file '{}' cannot be read", fname)};
auto last_label = std::string{};
while(true)
{
auto label = std::string{};
ifs >> label;
if(ifs.eof() || label.empty()) break;
auto entry = std::string{};
ifs >> entry;
if(ifs.eof())
throw std::runtime_error{
fmt::format("unexpected file format in '{}' at {}", fname, label)};
auto ret = data.emplace(label, entry);
if(!ret.second)
throw std::runtime_error{
fmt::format("duplicate entry in '{}': '{}' (='{}'). last label was '{}'",
fname,
label,
entry,
last_label)};
if(!label.empty()) last_label = std::move(label);
}
return data;
}
template <typename MapT, typename Tp>
void
read_property(const MapT& data, const std::string& label, Tp& value)
{
using mutable_type = std::remove_const_t<Tp>;
get_agent_available_properties().insert(label);
if constexpr(std::is_enum<Tp>::value)
{
using value_type = std::underlying_type_t<mutable_type>;
// never expect this to be true but it does guard against infinite recursion
static_assert(!std::is_enum<value_type>::value, "Expected non-enum type");
auto value_v = static_cast<value_type>(value);
read_property(data, label, value_v);
if constexpr(std::is_const<Tp>::value)
const_cast<mutable_type&>(value) = static_cast<mutable_type>(value_v);
else
value = static_cast<Tp>(value_v);
}
else
{
static_assert(std::is_integral<Tp>::value, "Expected integral type");
using value_type = std::conditional_t<std::is_signed<Tp>::value, intmax_t, uintmax_t>;
if(data.find(label) == data.end())
{
LOG(ERROR) << "agent properties map missing " << label << " entry";
return;
}
auto iss = std::istringstream{data.at(label)};
value_type local_value;
iss >> local_value;
// verify that we have used the correct data sizes
constexpr auto min_value = std::numeric_limits<Tp>::min();
constexpr auto max_value = std::numeric_limits<Tp>::max();
if(local_value < min_value)
{
throw std::runtime_error{
fmt::format("data with label {} has a value (={}) which is less "
"than the min value for the type (={})",
label,
local_value,
min_value)};
}
else if(local_value > max_value)
{
throw std::runtime_error{fmt::format("data with label {} has a value (={}) which is "
"greater "
"than the max value for the type (={})",
label,
local_value,
max_value)};
}
if constexpr(std::is_const<Tp>::value)
const_cast<mutable_type&>(value) = static_cast<mutable_type>(local_value);
else
value = static_cast<Tp>(local_value);
}
}
constexpr auto
compute_version(uint32_t major_v, uint32_t minor_v, uint32_t patch_v)
{
return (major_v * 10000) + (minor_v * 100) + patch_v;
}
auto
read_topology()
{
using unique_agent_t = std::unique_ptr<rocprofiler_agent_t, void (*)(rocprofiler_agent_t*)>;
auto sysfs_nodes_path = fs::path{"/sys/class/kfd/kfd/topology/nodes/"};
if(!fs::exists(sysfs_nodes_path))
throw std::runtime_error{
fmt::format("sysfs nodes path '{}' does not exist", sysfs_nodes_path.string())};
using pc_sampling_config_vec_t = std::vector<rocprofiler_pc_sampling_configuration_t>;
static auto mi200_pc_sampling_config = pc_sampling_config_vec_t{
rocprofiler_pc_sampling_configuration_t{ROCPROFILER_PC_SAMPLING_METHOD_HOST_TRAP,
ROCPROFILER_PC_SAMPLING_UNIT_TIME,
1UL,
1000000000UL,
0}};
const auto& cpu_info_v = get_cpu_info();
auto data = std::vector<unique_agent_t>{};
uint64_t idcount = 0;
uint64_t nodecount = 0;
while(true)
{
auto idx = idcount++;
auto node_path = sysfs_nodes_path / std::to_string(idx);
// assumes that nodes are monotonically increasing and thus once we are missing a node
// folder for a number, there are no more nodes
if(!fs::exists(node_path)) break;
// skip if we don't have permission to read the file
if(!is_readable(node_path)) continue;
auto properties = std::unordered_map<std::string, std::string>{};
auto name_prop = std::vector<std::string>{};
auto gpu_id_prop = std::vector<std::string>{};
try
{
properties = read_map(node_path / "properties");
name_prop = read_file(node_path / "name");
gpu_id_prop = read_file(node_path / "gpu_id");
} catch(std::runtime_error& e)
{
LOG(ERROR) << "Error reading '" << (node_path / "properties").string()
<< "' :: " << e.what();
continue;
}
// we may have been able to open the properties file but if it was empty, we ignore it
if(properties.empty()) continue;
auto agent_info = rocprofiler_agent_t{};
memset(&agent_info, 0, sizeof(agent_info));
agent_info.size = sizeof(rocprofiler_agent_t);
agent_info.id.handle = idx;
agent_info.type = ROCPROFILER_AGENT_TYPE_NONE;
agent_info.node_id = nodecount++;
if(!name_prop.empty())
agent_info.model_name = strdup(name_prop.front().c_str());
else
agent_info.model_name = "";
if(!gpu_id_prop.empty()) agent_info.gpu_id = std::stoull(gpu_id_prop.front());
read_property(properties, "cpu_cores_count", agent_info.cpu_cores_count);
read_property(properties, "simd_count", agent_info.simd_count);
if(agent_info.cpu_cores_count > 0)
agent_info.type = ROCPROFILER_AGENT_TYPE_CPU;
else if(agent_info.simd_count > 0)
agent_info.type = ROCPROFILER_AGENT_TYPE_GPU;
read_property(properties, "mem_banks_count", agent_info.mem_banks_count);
read_property(properties, "caches_count", agent_info.caches_count);
read_property(properties, "io_links_count", agent_info.io_links_count);
read_property(properties, "cpu_core_id_base", agent_info.cpu_core_id_base);
read_property(properties, "simd_id_base", agent_info.simd_id_base);
read_property(properties, "max_waves_per_simd", agent_info.max_waves_per_simd);
read_property(properties, "lds_size_in_kb", agent_info.lds_size_in_kb);
read_property(properties, "gds_size_in_kb", agent_info.gds_size_in_kb);
read_property(properties, "num_gws", agent_info.num_gws);
read_property(properties, "wave_front_size", agent_info.wave_front_size);
read_property(properties, "array_count", agent_info.array_count);
read_property(properties, "simd_arrays_per_engine", agent_info.simd_arrays_per_engine);
read_property(properties, "cu_per_simd_array", agent_info.cu_per_simd_array);
read_property(properties, "simd_per_cu", agent_info.simd_per_cu);
read_property(properties, "max_slots_scratch_cu", agent_info.max_slots_scratch_cu);
read_property(properties, "gfx_target_version", agent_info.gfx_target_version);
read_property(properties, "vendor_id", agent_info.vendor_id);
read_property(properties, "device_id", agent_info.device_id);
read_property(properties, "location_id", agent_info.location_id);
read_property(properties, "domain", agent_info.domain);
read_property(properties, "drm_render_minor", agent_info.drm_render_minor);
read_property(properties, "hive_id", agent_info.hive_id);
read_property(properties, "num_sdma_engines", agent_info.num_sdma_engines);
read_property(properties, "num_sdma_xgmi_engines", agent_info.num_sdma_xgmi_engines);
read_property(
properties, "num_sdma_queues_per_engine", agent_info.num_sdma_queues_per_engine);
read_property(properties, "num_cp_queues", agent_info.num_cp_queues);
read_property(properties, "max_engine_clk_ccompute", agent_info.max_engine_clk_ccompute);
agent_info.name = "";
agent_info.product_name = "";
agent_info.vendor_name = "";
if(agent_info.type == ROCPROFILER_AGENT_TYPE_GPU)
{
constexpr auto workgrp_max = 1024;
constexpr auto grid_max = std::numeric_limits<uint32_t>::max();
read_property(
properties, "max_engine_clk_fcompute", agent_info.max_engine_clk_fcompute);
read_property(properties, "local_mem_size", agent_info.local_mem_size);
read_property(properties, "fw_version", agent_info.fw_version.Value);
read_property(properties, "capability", agent_info.capability.Value);
read_property(properties, "sdma_fw_version", agent_info.sdma_fw_version.Value);
agent_info.fw_version.Value &= 0x3ff;
agent_info.sdma_fw_version.Value &= 0x3ff;
agent_info.workgroup_max_size = workgrp_max; // hardcoded in hsa-runtime
agent_info.workgroup_max_dim = {workgrp_max, workgrp_max, workgrp_max};
agent_info.grid_max_size = grid_max; // hardcoded in hsa-runtime
agent_info.grid_max_dim = {grid_max, grid_max, grid_max};
agent_info.cu_count = agent_info.simd_count / agent_info.simd_per_cu;
if(int drm_fd = 0; (drm_fd = drmOpenRender(agent_info.drm_render_minor)) >= 0)
{
uint32_t major_version = 0;
uint32_t minor_version = 0;
auto* device_handle = amdgpu_device_handle{};
if(amdgpu_device_initialize(
drm_fd, &major_version, &minor_version, &device_handle) == 0)
{
auto major = (agent_info.gfx_target_version / 10000) % 100;
auto minor = (agent_info.gfx_target_version / 100) % 100;
auto step = (agent_info.gfx_target_version % 100);
agent_info.name =
strdup(fmt::format("gfx{}{}{:x}", major, minor, step).c_str());
agent_info.product_name = strdup(amdgpu_get_marketing_name(device_handle));
agent_info.vendor_name = strdup("AMD");
amdgpu_gpu_info gpu_info = {};
if(amdgpu_query_gpu_info(device_handle, &gpu_info) == 0)
{
agent_info.family_id = gpu_info.family_id;
}
amdgpu_device_deinitialize(device_handle);
}
drmClose(drm_fd);
}
// TODO(jomadsen): make contingent on whether this process acquired the PC sampling
// device lock
{
constexpr auto gfx90a_version = compute_version(9, 0, 10);
if(agent_info.gfx_target_version >= gfx90a_version)
{
agent_info.pc_sampling_configs = mi200_pc_sampling_config.data();
agent_info.num_pc_sampling_configs = mi200_pc_sampling_config.size();
}
}
}
else if(agent_info.type == ROCPROFILER_AGENT_TYPE_CPU)
{
agent_info.cu_count = agent_info.cpu_cores_count;
agent_info.vendor_name = strdup("CPU");
for(const auto& itr : cpu_info_v)
{
if(agent_info.cpu_core_id_base == itr.apicid)
{
agent_info.name = strdup(itr.model_name.c_str());
agent_info.product_name = strdup(agent_info.name);
agent_info.family_id = itr.family;
break;
}
}
}
if(properties.count("num_xcc") > 0)
read_property(properties, "num_xcc", agent_info.num_xcc);
else
agent_info.num_xcc = 1;
agent_info.max_waves_per_cu = agent_info.simd_per_cu * agent_info.max_waves_per_simd;
if(agent_info.simd_arrays_per_engine > 0)
{
agent_info.num_shader_banks =
agent_info.array_count / agent_info.simd_arrays_per_engine;
// depends on above
if(agent_info.num_shader_banks * agent_info.simd_arrays_per_engine > 0)
{
agent_info.cu_per_engine =
(agent_info.simd_count / agent_info.simd_per_cu) /
(agent_info.num_shader_banks * agent_info.simd_arrays_per_engine);
}
}
agent_info.mem_banks = nullptr;
agent_info.caches = nullptr;
agent_info.io_links = nullptr;
if(agent_info.mem_banks_count > 0)
{
agent_info.mem_banks = new rocprofiler_agent_mem_bank_t[agent_info.mem_banks_count];
for(uint32_t i = 0; i < agent_info.mem_banks_count; ++i)
{
auto subproperties =
read_map(node_path / "mem_banks" / std::to_string(i) / "properties");
read_property(subproperties, "heap_type", agent_info.mem_banks[i].heap_type);
read_property(
subproperties, "size_in_bytes", agent_info.mem_banks[i].size_in_bytes);
read_property(subproperties, "flags", agent_info.mem_banks[i].flags.MemoryProperty);
read_property(subproperties, "width", agent_info.mem_banks[i].width);
read_property(subproperties, "mem_clk_max", agent_info.mem_banks[i].mem_clk_max);
}
}
if(agent_info.caches_count > 0)
{
agent_info.caches = new rocprofiler_agent_cache_t[agent_info.caches_count];
for(uint32_t i = 0; i < agent_info.caches_count; ++i)
{
auto subproperties =
read_map(node_path / "caches" / std::to_string(i) / "properties");
read_property(
subproperties, "processor_id_low", agent_info.caches[i].processor_id_low);
read_property(subproperties, "level", agent_info.caches[i].level);
read_property(subproperties, "size", agent_info.caches[i].size);
read_property(
subproperties, "cache_line_size", agent_info.caches[i].cache_line_size);
read_property(
subproperties, "cache_lines_per_tag", agent_info.caches[i].cache_lines_per_tag);
read_property(subproperties, "association", agent_info.caches[i].association);
read_property(subproperties, "latency", agent_info.caches[i].latency);
read_property(subproperties, "type", agent_info.caches[i].type.Value);
}
}
if(agent_info.io_links_count > 0)
{
agent_info.io_links = new rocprofiler_agent_io_link_t[agent_info.io_links_count];
for(uint32_t i = 0; i < agent_info.io_links_count; ++i)
{
auto subproperties =
read_map(node_path / "io_links" / std::to_string(i) / "properties");
read_property(subproperties, "type", agent_info.io_links[i].type);
read_property(subproperties, "version_major", agent_info.io_links[i].version_major);
read_property(subproperties, "version_minor", agent_info.io_links[i].version_minor);
read_property(subproperties, "node_from", agent_info.io_links[i].node_from);
read_property(subproperties, "node_to", agent_info.io_links[i].node_to);
read_property(subproperties, "weight", agent_info.io_links[i].weight);
read_property(subproperties, "min_latency", agent_info.io_links[i].min_latency);
read_property(subproperties, "max_latency", agent_info.io_links[i].max_latency);
read_property(subproperties, "min_bandwidth", agent_info.io_links[i].min_bandwidth);
read_property(subproperties, "max_bandwidth", agent_info.io_links[i].max_bandwidth);
read_property(subproperties,
"recommended_transfer_size",
agent_info.io_links[i].recommended_transfer_size);
read_property(subproperties, "flags", agent_info.io_links[i].flags.LinkProperty);
}
}
data.emplace_back(new rocprofiler_agent_t{agent_info}, [](rocprofiler_agent_t* ptr) {
if(ptr)
{
auto free_cstring = [](const char*& val) {
if(val && ::strnlen(val, 1) > 0) ::free(const_cast<char*>(val));
val = "";
};
delete[] ptr->mem_banks;
delete[] ptr->caches;
delete[] ptr->io_links;
free_cstring(ptr->name);
free_cstring(ptr->vendor_name);
free_cstring(ptr->product_name);
free_cstring(ptr->model_name);
}
delete ptr;
});
}
return data;
}
auto&
get_agent_topology()
{
static auto _v = read_topology();
return _v;
}
auto&
get_agent_caches()
{
static auto _v = std::vector<hsa::AgentCache>{};
return _v;
}
} // namespace
std::vector<const rocprofiler_agent_t*>
get_agents()
{
auto& agents = rocprofiler::agent::get_agent_topology();
auto pointers = std::vector<const rocprofiler_agent_t*>{};
pointers.reserve(agents.size());
for(auto& agent : agents)
{
pointers.emplace_back(agent.get());
}
return pointers;
}
void
construct_agent_cache(::HsaApiTable* table)
{
if(!table) return;
auto rocp_agents = agent::get_agents();
auto hsa_agents = std::vector<hsa_agent_t>{};
// Get HSA Agents
table->core_->hsa_iterate_agents_fn(
[](hsa_agent_t agent, void* data) {
CHECK_NOTNULL(static_cast<std::vector<hsa_agent_t>*>(data))->emplace_back(agent);
return HSA_STATUS_SUCCESS;
},
&hsa_agents);
LOG_IF(FATAL, rocp_agents.size() != hsa_agents.size())
<< "Found " << rocp_agents.size() << " rocprofiler agents and " << hsa_agents.size()
<< " HSA agents";
auto hsa_agent_node_map = std::unordered_map<uint32_t, hsa_agent_t>{};
for(const auto& itr : hsa_agents)
{
if(uint32_t node_id = 0;
table->core_->hsa_agent_get_info_fn(
itr, static_cast<hsa_agent_info_t>(HSA_AMD_AGENT_INFO_DRIVER_NODE_ID), &node_id) ==
HSA_STATUS_SUCCESS)
{
hsa_agent_node_map[node_id] = itr;
}
}
auto agent_map =
std::unordered_map<uint32_t, std::tuple<const rocprofiler_agent_t*, hsa_agent_t>>{};
for(const auto* ritr : rocp_agents)
{
for(auto hitr : hsa_agents)
{
if(uint32_t node_id = 0;
table->core_->hsa_agent_get_info_fn(
hitr,
static_cast<hsa_agent_info_t>(HSA_AMD_AGENT_INFO_DRIVER_NODE_ID),
&node_id) == HSA_STATUS_SUCCESS)
{
if(ritr->node_id == node_id)
{
agent_map.emplace(ritr->node_id, std::make_tuple(ritr, hitr));
break;
}
}
}
}
LOG_IF(ERROR, agent_map.size() != hsa_agents.size())
<< "rocprofiler was only able to map " << agent_map.size()
<< " rocprofiler agents to HSA agents, expected " << hsa_agents.size();
// For Pre-ROCm 6.0 releases
#if ROCPROFILER_HSA_RUNTIME_VERSION <= 100900
# define HSA_AMD_AGENT_INFO_NEAREST_CPU 0xA113
#endif
auto find_nearest_hsa_cpu_agent = [&table, &agent_map](uint32_t node_id) {
auto _nearest_cpu = hsa_agent_t{.handle = 0};
auto _hsa_agent = std::get<1>(agent_map.at(node_id));
if(table->core_->hsa_agent_get_info_fn(
_hsa_agent,
static_cast<hsa_agent_info_t>(HSA_AMD_AGENT_INFO_NEAREST_CPU),
&_nearest_cpu) != HSA_STATUS_SUCCESS)
{
const auto* _rocp_agent = std::get<0>(agent_map.at(node_id));
auto distance_min = std::numeric_limits<int32_t>::max();
for(uint32_t i = 0; i < _rocp_agent->io_links_count; ++i)
{
const auto& io_link = _rocp_agent->io_links[i];
auto _from = io_link.node_from;
auto _to = io_link.node_to;
LOG_IF(FATAL, _from != node_id)
<< "unexpected condition for node_id=" << node_id << ". io_link[" << i
<< "].node_from=" << _from
<< ". Expected this to match the node_id (node_to=" << _to << ")";
if(agent_map.find(_to) == agent_map.end())
{
LOG(WARNING) << "no agent mapping for io_link[" << i << "].node_to=" << _to
<< " in rocprofiler agent " << node_id;
continue;
}
auto [_to_rocp_agent, _to_hsa_agent] = agent_map.at(_to);
auto _distance = std::abs(static_cast<int32_t>(_from - _to));
if(_distance > 0 && _distance < distance_min &&
_to_rocp_agent->type == ROCPROFILER_AGENT_TYPE_CPU)
{
distance_min = _distance;
_nearest_cpu = _to_hsa_agent;
}
}
}
return _nearest_cpu;
};
auto is_duplicate = [](const auto* agent_v) {
for(const auto& itr : get_agent_caches())
{
if(itr == agent_v) return true;
}
return false;
};
// Generate supported agents
for(const auto& itr : agent_map)
{
const auto* rocp_agent = std::get<0>(itr.second);
auto hsa_agent = std::get<1>(itr.second);
if(is_duplicate(rocp_agent)) continue;
// AgentCache is only for GPU agents
if(rocp_agent->type != ROCPROFILER_AGENT_TYPE_GPU) continue;
auto _nearest_cpu = find_nearest_hsa_cpu_agent(itr.first);
try
{
get_agent_caches().emplace_back(
rocp_agent, hsa_agent, itr.first, _nearest_cpu, *table->amd_ext_);
} catch(std::runtime_error& err)
{
if(rocp_agent->type == ROCPROFILER_AGENT_TYPE_GPU)
{
LOG(ERROR) << fmt::format("rocprofiler agent <-> HSA agent mapping failed: {} ({})",
rocp_agent->node_id,
err.what());
}
}
}
}
std::optional<hsa_agent_t>
get_hsa_agent(const rocprofiler_agent_t* agent)
{
for(const auto& itr : get_agent_caches())
{
if(itr == agent) return itr.get_hsa_agent();
}
return std::nullopt;
}
const rocprofiler_agent_t*
get_rocprofiler_agent(hsa_agent_t agent)
{
for(const auto& itr : get_agent_caches())
{
if(itr == agent) return itr.get_rocp_agent();
}
return nullptr;
}
std::optional<hsa::AgentCache>
get_agent_cache(const rocprofiler_agent_t* agent)
{
for(const auto& itr : get_agent_caches())
{
if(itr == agent) return itr;
}
return std::nullopt;
}
std::optional<hsa::AgentCache>
get_agent_cache(hsa_agent_t agent)
{
for(const auto& itr : get_agent_caches())
{
if(itr == agent) return itr;
}
return std::nullopt;
}
std::unordered_set<std::string>&
get_agent_available_properties()
{
static std::unordered_set<std::string> _prop;
return _prop;
}
} // namespace agent
} // namespace rocprofiler
extern "C" {
rocprofiler_status_t
rocprofiler_query_available_agents(rocprofiler_available_agents_cb_t callback,
size_t agent_size,
void* user_data)
{
if(agent_size > sizeof(rocprofiler_agent_t))
{
LOG(ERROR) << "rocprofiler_agent_t used by caller is ABI-incompatible with "
"rocprofiler_agent_t in rocprofiler";
return ROCPROFILER_STATUS_ERROR_INCOMPATIBLE_ABI;
}
auto&& pointers = rocprofiler::agent::get_agents();
return callback(pointers.data(), pointers.size(), user_data);
}
}