Files
rocm-systems/projects
Benjamin Welton ed5b2ac165 Fix deadlock in InterceptQueue::Submit when packet count exceeds queue capacity (#855)
InterceptQueue::Submit had an "all-or-nothing" packet submission policy that
could cause infinite retry loops when the number of packets to submit exceeded
the available queue slots. When 504+ packets needed submission to a ~500-slot
queue, the system would:
1. Set submitted_count=0 (submit nothing)
2. Add retry barrier packet
3. Trigger async handler via StoreRelaxed
4. Attempt to submit overflow packets
5. Fail again due to same space constraints
6. Repeat

Solution:
Added partial packet submission capability during overflow processing while
preserving the original "all-or-nothing" behavior for normal operations.
When processing overflow packets and insufficient space exists for all packets,
the system now submits as many packets as possible rather than none.

The fix:
- Detects overflow processing via !overflow_.empty()
- Allows partial submission: submitted_count = free_slots - barrier_reservation
- Maintains atomicity guarantees for normal packet rewrites
- Prevents infinite retry loops by ensuring forward progress

This resolves deadlocks in high-throughput scenarios while maintaining
backward compatibility and the original design intent for packet rewrite
atomicity.
2025-09-09 14:06:29 -07:00
..
2025-08-25 09:41:25 +05:30