Fixed buffer overflow in ReduceOrCopy
Bug caused AllGathers and ReduceScatters of less than
8 bytes to fail in certain cases.
Change-Id: I33e1beb50805bfdb457ae16a90e3f91c1b283b9b
Reviewed-on: http://git-master/r/1011505
Reviewed-by: Przemek Tredak <ptredak@nvidia.com>
Tested-by: Przemek Tredak <ptredak@nvidia.com>
[ROCm/rccl commit: 9442285526]
Этот коммит содержится в:
коммит произвёл
Przemek Tredak
родитель
60799f9427
Коммит
b5a2ca138d
@@ -1,5 +1,5 @@
|
||||
/*************************************************************************
|
||||
* Copyright (c) 2015, NVIDIA CORPORATION. All rights reserved.
|
||||
* Copyright (c) 2015-2016, NVIDIA CORPORATION. All rights reserved.
|
||||
*
|
||||
* Redistribution and use in source and binary forms, with or without
|
||||
* modification, are permitted provided that the following conditions
|
||||
@@ -274,7 +274,7 @@ __device__ inline void ReduceOrCopy(const int tid,
|
||||
const int UNROLL2 = (UNROLL >= 2) ? (UNROLL / 2) : 1;
|
||||
const bool NOUNROLL2 = ((UNROLL / 2) == 0);
|
||||
|
||||
int Npreamble = AlignUp(dest0, alignof(PackType)) - dest0;
|
||||
int Npreamble = (N<alignof(PackType)) ? N : AlignUp(dest0, alignof(PackType)) - dest0;
|
||||
|
||||
// stage 0: check if we'll be able to use the fast, 64-bit aligned path.
|
||||
// If not, we'll just use the slow preamble path for the whole operation
|
||||
|
||||
Ссылка в новой задаче
Block a user