Files
rocm-systems/test/common/ProcessIsolatedTestRunner.md
T
Atul Kulkarni 7e10267dfd Added a Process Isolated Test Runner (#1993)
* Added single process isolation support to execute tests

* Address review comments

* Update README

* Removed requirement of explicit call to clear method

* Added macros for simplified usage

* Updated tests to use process isolation framework

* Adjust summary output format for isolated tests

* Updated rccl_wrap tests

* Used process isolation in AllocTests

* Used process isolation and fixed failing tests

* Modified test output, added signal handling

Updated macros to handle lambdas

* Convert argcheck tests to isolated tests

* Convert proxy tests to isolated tests

* Remove non-supported test

* Fixed file descriptor handling and clearing env vars for tests
2025-12-08 10:36:05 -06:00

1131 rader
33 KiB
Markdown

# Process Isolated Test Runner
A lightweight C++ testing framework for running Google Test cases in isolated processes with clean environment settings.
## Table of Contents
- [Overview](#overview)
- [Why Use Process Isolation?](#why-use-process-isolation)
- [Quick Start](#quick-start)
- [Core Concepts](#core-concepts)
- [API Reference](#api-reference)
- [Examples](#examples)
- [Best Practices](#best-practices)
- [Troubleshooting](#troubleshooting)
---
## Overview
`ProcessIsolatedTestRunner` is a framework that executes tests in separate processes using `fork()`. This ensures complete isolation between tests, particularly useful when testing code with static variables or environment-dependent behavior.
**Key Features:**
- ✅ Process-based test isolation (each test runs in its own process)
- ✅ Per-test environment variable management
- ✅ Configurable timeouts
- ✅ Sequential or stop-on-failure execution
- ✅ Thread-safe test registration
- ✅ Detailed test result reporting
**Location:** `test/common/ProcessIsolatedTestRunner.hpp`
---
## Why use Process Isolation?
### Problem: Static Variable Pollution
Consider this RCCL code with static variables:
```cpp
void rcclSetP2pNetChunkSize(struct ncclComm* comm, int& chunkSize) {
static int p2pNetChunkSize = RCCL_VALUE_UNSET; // ← Static variable!
if (p2pNetChunkSize == RCCL_VALUE_UNSET) {
const char* inputStr = getenv("NCCL_P2P_NET_CHUNKSIZE");
if (inputStr) {
// Parse the environment variable value
p2pNetChunkSize = parseValue(inputStr); // e.g., "12345" → 12345
} else {
// No env var set, calculate value based on architecture...
p2pNetChunkSize = calculateValue();
}
}
chunkSize = p2pNetChunkSize;
}
```
**How the static variable gets set:**
1. First time called: `p2pNetChunkSize == RCCL_VALUE_UNSET` is true
2. Code reads environment variable with `getenv("NCCL_P2P_NET_CHUNKSIZE")`
3. If env var exists → parse its value (e.g., "12345" string) and assign to static variable
4. If env var doesn't exist → calculate default value and assign to static variable
5. Static variable is now set and **persists for the lifetime of the process**
**Without Process Isolation:**
```cpp
TEST(MyTest, FirstTest) {
setenv("NCCL_P2P_NET_CHUNKSIZE", "12345", 1);
rcclSetP2pNetChunkSize(comm, chunkSize);
// ✓ getenv() returns "12345"
// ✓ Static variable p2pNetChunkSize gets set to 12345
// ✓ chunkSize is now 12345
}
TEST(MyTest, SecondTest) {
unsetenv("NCCL_P2P_NET_CHUNKSIZE");
rcclSetP2pNetChunkSize(comm, chunkSize);
// ❌ getenv() returns nullptr (env var cleared)
// ❌ BUT: p2pNetChunkSize != RCCL_VALUE_UNSET (still 12345 from FirstTest!)
// ❌ Code skips the if-block, never reads env var or recalculates
// ❌ chunkSize is STILL 12345 from previous test!
// This test will fail or produce incorrect results
}
```
**The Problem:** Static variables are initialized once per process and persist across multiple tests. Even if you change or clear environment variables, the static variable retains its old value.
**With Process Isolation:**
```cpp
// Each test runs in a separate process
// Static variables are reset for each test
// ✅ Tests are truly independent
```
### Common Use Cases
1. **Testing environment variable behavior** - When code reads env vars into static variables
2. **Testing architecture-specific logic** - Different GPU architectures with cached state
3. **Testing initialization code** - One-time initialization patterns
4. **Testing configuration changes** - When config is cached statically
---
## Quick Start
### Basic Example (Using Macros)
The simplest way to use ProcessIsolatedTestRunner is with the macros:
```cpp
#include "common/ProcessIsolatedTestRunner.hpp"
TEST(Rcclwrap, MyIsolatedTest) {
// Single test with environment variables - all in one call!
RUN_ISOLATED_TEST_WITH_ENV("TestWithCleanEnvironment",
[]() {
// This runs in a separate process
const char* value = getenv("MY_VARIABLE");
EXPECT_STREQ(value, "test_value");
EXPECT_TRUE(someFunction());
},
{{"MY_VARIABLE", "test_value"}}
);
}
TEST(Rcclwrap, MyIsolatedTests) {
// Multiple tests with different configurations
RUN_ISOLATED_TESTS(
ProcessIsolatedTestRunner::TestConfig("Test1", []() {
EXPECT_TRUE(checkCondition1());
}),
ProcessIsolatedTestRunner::TestConfig("Test2", []() {
EXPECT_TRUE(checkCondition2());
}).withEnvironment({{"VAR", "value"}}),
ProcessIsolatedTestRunner::TestConfig("Test3", []() {
EXPECT_TRUE(checkCondition3());
}).withTimeout(std::chrono::seconds(60))
);
}
```
### Manual API (For Advanced Use Cases)
You can also use the API directly for more control:
```cpp
#include "common/ProcessIsolatedTestRunner.hpp"
TEST(Rcclwrap, MyIsolatedTests) {
// Register a test with environment variables
ProcessIsolatedTestRunner::registerTest(
ProcessIsolatedTestRunner::TestConfig(
"TestWithCleanEnvironment",
[]() {
// This runs in a separate process
const char* value = getenv("MY_VARIABLE");
EXPECT_STREQ(value, "test_value");
// Your test logic here
EXPECT_TRUE(someFunction());
})
.withEnvironment({{"MY_VARIABLE", "test_value"}})
);
// Execute all registered tests
bool allTestsPassed = ProcessIsolatedTestRunner::executeAllTests();
EXPECT_TRUE(allTestsPassed);
}
```
---
## Core Concepts
### 1. Test Configuration (`TestConfig`)
Defines how a test should be executed:
```cpp
TestConfig config(
"TestName", // Test name (for reporting)
[]() { /* logic */ } // Test function (lambda or function pointer)
);
// Optional configurations
config.withEnvironment({{"VAR1", "value1"}, {"VAR2", "value2"}})
.withTimeout(std::chrono::seconds(60))
.withCleanEnvironment(false); // Inherit parent environment
```
### 2. Test Registration
Tests must be registered before execution:
```cpp
// Method 1: Full configuration
ProcessIsolatedTestRunner::registerTest(config);
// Method 2: Simple (name + logic only)
ProcessIsolatedTestRunner::registerTest("SimplTest", []() {
EXPECT_TRUE(true);
});
// Method 3: With environment
ProcessIsolatedTestRunner::registerTest(
"EnvTest",
[]() { /* logic */ },
{{"ENV_VAR", "value"}}
);
```
### 3. Test Execution
**⚠️ IMPORTANT:** Tests do NOT run automatically after registration. You **MUST** explicitly call `executeAllTests()` to run them.
Execute all registered tests:
```cpp
// Default options (continue on failure, no verbose logging)
bool passed = ProcessIsolatedTestRunner::executeAllTests();
// Custom options
ProcessIsolatedTestRunner::ExecutionOptions options;
options.stopOnFirstFailure = true; // Stop after first failure
options.verboseLogging = true; // Print detailed logs
bool passed = ProcessIsolatedTestRunner::executeAllTests(options);
```
**Common Mistake:**
```cpp
// ❌ BAD: Tests registered but never executed!
TEST(MyTest, IsolatedTests) {
ProcessIsolatedTestRunner::registerTest("Test1", []() { /* ... */ });
ProcessIsolatedTestRunner::registerTest("Test2", []() { /* ... */ });
// Missing executeAllTests() - tests will NOT run!
}
// ✅ GOOD: Tests registered and executed
TEST(MyTest, IsolatedTests) {
ProcessIsolatedTestRunner::registerTest("Test1", []() { /* ... */ });
ProcessIsolatedTestRunner::registerTest("Test2", []() { /* ... */ });
bool passed = ProcessIsolatedTestRunner::executeAllTests();
EXPECT_TRUE(passed);
}
```
### 4. Test Results
Each test produces a `TestResult`:
```cpp
struct TestResult {
std::string testName; // Name of the test
bool passed; // Whether the test passed
bool skipped; // Whether the test was skipped
int exitCode; // Process exit code
pid_t processId; // Process ID that ran the test
std::chrono::milliseconds duration; // Execution duration
std::string errorMessage; // Error message if failed
std::unordered_map<std::string, std::string> environment; // Env used
};
```
---
## API Reference
### Macros (Recommended)
These macros provide the simplest way to use ProcessIsolatedTestRunner with minimal boilerplate.
#### `RUN_ISOLATED_TEST(test_name, test_body)`
Register and execute a single isolated test.
```cpp
RUN_ISOLATED_TEST("MySimpleTest", []() {
EXPECT_TRUE(someFunction());
});
```
#### `RUN_ISOLATED_TEST_WITH_ENV(test_name, test_body, ...)`
Register and execute a single isolated test with environment variables.
**Uses variadic macros** (`...` and `__VA_ARGS__`) to automatically handle commas in initializer lists without requiring extra parentheses.
```cpp
RUN_ISOLATED_TEST_WITH_ENV("MyEnvTest",
[]() {
const char* value = getenv("MY_VAR");
EXPECT_STREQ(value, "expected_value");
},
{{"MY_VAR", "expected_value"}}
);
// Multiple environment variables work naturally:
RUN_ISOLATED_TEST_WITH_ENV("MultiEnvTest",
[]() { /* test code */ },
{{"VAR1", "val1"}, {"VAR2", "val2"}, {"VAR3", "val3"}} // Commas handled automatically
);
```
**Note:** The macro uses `__VA_ARGS__` internally, which automatically handles commas in the environment variable initializer list. Users don't need to worry about preprocessor comma issues.
#### `RUN_ISOLATED_TESTS(...)`
Register and execute multiple isolated tests with various configurations.
```cpp
RUN_ISOLATED_TESTS(
ProcessIsolatedTestRunner::TestConfig("Test1", []() { ... }),
ProcessIsolatedTestRunner::TestConfig("Test2", []() { ... })
.withEnvironment({{"VAR", "value"}}),
ProcessIsolatedTestRunner::TestConfig("Test3", []() { ... })
.withTimeout(std::chrono::seconds(60))
);
```
#### `RUN_ISOLATED_TESTS_WITH_OPTIONS(options, ...)`
Register and execute multiple isolated tests with custom execution options.
```cpp
ProcessIsolatedTestRunner::ExecutionOptions opts;
opts.stopOnFirstFailure = true;
opts.verboseLogging = true;
RUN_ISOLATED_TESTS_WITH_OPTIONS(opts,
ProcessIsolatedTestRunner::TestConfig("Test1", []() { ... }),
ProcessIsolatedTestRunner::TestConfig("Test2", []() { ... })
);
```
### Main Methods (For Manual Use)
#### `registerTest()`
Register a test for later execution.
```cpp
// Variant 1: Full configuration
static void registerTest(const TestConfig& config);
// Variant 2: Simple registration
static void registerTest(
const std::string& name,
std::function<void()> testLogic
);
// Variant 3: With environment
static void registerTest(
const std::string& name,
std::function<void()> testLogic,
const std::unordered_map<std::string, std::string>& env
);
```
#### `executeAllTests()`
Execute all registered tests sequentially.
```cpp
static bool executeAllTests(
const ExecutionOptions& options = ExecutionOptions()
);
```
**Returns:** `true` if all tests passed, `false` if any failed.
**Note:** This method automatically clears all test registrations and results after execution, ensuring a clean state for the next test suite. Users do not need to call `clear()` manually.
#### `getTestResults()`
Retrieve detailed results from the last execution.
```cpp
static std::vector<TestResult> getTestResults();
```
#### `clear()`
Clear all registered tests and results.
```cpp
static void clear();
```
**Note:** Calling this method manually is typically not necessary, as `executeAllTests()` automatically clears registrations after execution. This method is primarily useful for advanced use cases or when tests are registered but not executed.
**⚠️ Automatic Warning:** If `clear()` is called when tests have been registered but not fully executed, it will automatically print a warning to stderr:
```
⚠️ WARNING: ProcessIsolatedTestRunner::clear() called with 2 unexecuted test(s)!
Registered: 2 test(s)
Executed: 0 test(s)
Did you forget to call executeAllTests()?
```
#### `getTestCount()`
Get the number of currently registered tests (before execution).
```cpp
static size_t getTestCount();
```
**Use case:** Verify that tests were actually registered and executed.
```cpp
TEST(MyTest, VerifyExecution) {
ProcessIsolatedTestRunner::clear();
// Register tests
ProcessIsolatedTestRunner::registerTest("Test1", []() { /* ... */ });
ProcessIsolatedTestRunner::registerTest("Test2", []() { /* ... */ });
// Check registration count
size_t registeredCount = ProcessIsolatedTestRunner::getTestCount();
EXPECT_EQ(registeredCount, 2) << "Expected 2 tests to be registered";
// Execute
bool passed = ProcessIsolatedTestRunner::executeAllTests();
EXPECT_TRUE(passed);
// Verify execution count
auto results = ProcessIsolatedTestRunner::getTestResults();
EXPECT_EQ(results.size(), registeredCount)
<< "Registered " << registeredCount << " tests but only "
<< results.size() << " executed";
}
```
### TestConfig Methods
#### `withEnvironment()`
Set environment variables for the test.
```cpp
TestConfig& withEnvironment(
const std::unordered_map<std::string, std::string>& env
);
```
**Note:** Variables are set in the child process only.
#### `withTimeout()`
Set a timeout for test execution.
```cpp
TestConfig& withTimeout(std::chrono::seconds timeoutSeconds);
```
**Default:** 30 seconds
#### `withCleanEnvironment()`
Control whether to inherit parent process environment.
```cpp
TestConfig& withCleanEnvironment(bool inherit = true);
```
**Default:** `true` (inherits parent environment)
---
## Examples
**Note:** The examples below use helper functions from `RcclWrapTests.cpp`:
```cpp
// Helper to create a mock NCCL communicator with specified architecture and ranks
static void CreateMockComm(ncclComm_t &mockComm,
struct ncclTopoSystem &mockTopo,
struct ncclTopoNode &mockGpuNode,
const char *arch,
int nRanks);
// Helper to cleanup a mock communicator
static void CleanupMockComm(ncclComm_t &mockComm);
```
### Example 1: Testing Environment Variable Behavior
```cpp
TEST(Rcclwrap, EnvironmentVariableTests) {
// Test 1: With environment variable set
ProcessIsolatedTestRunner::registerTest(
ProcessIsolatedTestRunner::TestConfig(
"WithEnvVarSet",
[]() {
ncclComm_t mockComm = nullptr;
struct ncclTopoSystem mockTopo;
struct ncclTopoNode mockGpuNode;
CreateMockComm(mockComm, mockTopo, mockGpuNode, "gfx942", 128);
int chunkSize = RCCL_VALUE_UNSET;
rcclSetP2pNetChunkSize(mockComm, chunkSize);
// Should use default architecture-based value
EXPECT_EQ(chunkSize, 1 << 19);
CleanupMockComm(mockComm);
})
.withEnvironment({{"NCCL_P2P_NET_CHUNKSIZE", "999999"}})
);
// Test 2: Without environment variable (clean state)
ProcessIsolatedTestRunner::registerTest(
ProcessIsolatedTestRunner::TestConfig(
"WithoutEnvVar",
[]() {
// Verify environment is clean
const char* value = getenv("NCCL_P2P_NET_CHUNKSIZE");
EXPECT_EQ(value, nullptr);
// Test default behavior
ncclComm_t mockComm = nullptr;
struct ncclTopoSystem mockTopo;
struct ncclTopoNode mockGpuNode;
CreateMockComm(mockComm, mockTopo, mockGpuNode, "gfx942", 32);
int chunkSize = RCCL_VALUE_UNSET;
rcclSetP2pNetChunkSize(mockComm, chunkSize);
EXPECT_EQ(chunkSize, 1 << 17); // Default for < 64 ranks
CleanupMockComm(mockComm);
})
);
// Execute both tests in isolated processes
bool passed = ProcessIsolatedTestRunner::executeAllTests();
EXPECT_TRUE(passed);
}
```
### Example 2: Testing Multiple Architectures
```cpp
TEST(Rcclwrap, ArchitectureTests) {
struct TestCase {
std::string name;
std::string arch;
int ranks;
int expectedChunkSize;
};
std::vector<TestCase> testCases = {
{"GFX942_SmallRanks", "gfx942", 32, 1 << 17},
{"GFX942_LargeRanks", "gfx942", 128, 1 << 19},
{"GFX950_SmallRanks", "gfx950", 8, 1 << 17},
{"GFX950_MediumRanks", "gfx950", 24, 1 << 18},
{"GFX950_LargeRanks", "gfx950", 64, 1 << 19},
};
for (const auto& tc : testCases) {
ProcessIsolatedTestRunner::registerTest(
ProcessIsolatedTestRunner::TestConfig(
tc.name,
[tc]() {
ncclComm_t mockComm = nullptr;
struct ncclTopoSystem mockTopo;
struct ncclTopoNode mockGpuNode;
CreateMockComm(mockComm, mockTopo, mockGpuNode, tc.arch.c_str(), tc.ranks);
int chunkSize = RCCL_VALUE_UNSET;
rcclSetP2pNetChunkSize(mockComm, chunkSize);
EXPECT_EQ(chunkSize, tc.expectedChunkSize)
<< "Failed for " << tc.arch << " with " << tc.ranks << " ranks";
CleanupMockComm(mockComm);
})
);
}
ProcessIsolatedTestRunner::ExecutionOptions options;
options.verboseLogging = true;
options.stopOnFirstFailure = false; // Run all tests even if one fails
bool passed = ProcessIsolatedTestRunner::executeAllTests(options);
EXPECT_TRUE(passed);
}
```
### Example 3: Testing with Timeouts
```cpp
TEST(Rcclwrap, TimeoutHandling) {
// Test that completes quickly
ProcessIsolatedTestRunner::registerTest(
ProcessIsolatedTestRunner::TestConfig(
"FastTest",
[]() {
EXPECT_TRUE(true);
})
.withTimeout(std::chrono::seconds(5))
);
// Test with longer timeout for complex operations
ProcessIsolatedTestRunner::registerTest(
ProcessIsolatedTestRunner::TestConfig(
"SlowTest",
[]() {
// Simulate slow operation
std::this_thread::sleep_for(std::chrono::seconds(2));
EXPECT_TRUE(true);
})
.withTimeout(std::chrono::seconds(10))
);
bool passed = ProcessIsolatedTestRunner::executeAllTests();
EXPECT_TRUE(passed);
}
```
### Example 4: Stop on First Failure
```cpp
TEST(Rcclwrap, CriticalTests) {
// Register multiple critical tests
ProcessIsolatedTestRunner::registerTest(
"CriticalTest1", []() { EXPECT_TRUE(checkCriticalCondition1()); });
ProcessIsolatedTestRunner::registerTest(
"CriticalTest2", []() { EXPECT_TRUE(checkCriticalCondition2()); });
ProcessIsolatedTestRunner::registerTest(
"CriticalTest3", []() { EXPECT_TRUE(checkCriticalCondition3()); });
// Stop on first failure - don't waste time if critical tests fail
ProcessIsolatedTestRunner::ExecutionOptions options;
options.stopOnFirstFailure = true;
bool passed = ProcessIsolatedTestRunner::executeAllTests(options);
EXPECT_TRUE(passed) << "Critical test suite failed";
}
```
---
## Best Practices
### 1. Use Macros for Simple Cases
```cpp
// ✅ GOOD: Simple and clean using macros
TEST(MyTest, SimpleIsolatedTest) {
RUN_ISOLATED_TEST("CheckSomething", []() {
EXPECT_TRUE(checkSomething());
});
}
// ❌ MORE VERBOSE: Manual registration (still valid for complex cases)
TEST(MyTest, SimpleIsolatedTest) {
ProcessIsolatedTestRunner::registerTest("CheckSomething", []() {
EXPECT_TRUE(checkSomething());
});
bool passed = ProcessIsolatedTestRunner::executeAllTests();
EXPECT_TRUE(passed);
}
```
### 2. Always Execute Registered Tests (When Using Manual API)
```cpp
TEST(MyTest, IsolatedTests) {
// Register tests
ProcessIsolatedTestRunner::registerTest(/* ... */);
// ✅ IMPORTANT: Don't forget to execute!
bool passed = ProcessIsolatedTestRunner::executeAllTests();
EXPECT_TRUE(passed);
}
```
**When Using Manual API (Optional Verification):**
You can verify that tests were registered and executed:
```cpp
TEST(MyTest, IsolatedTests) {
// Register tests
ProcessIsolatedTestRunner::registerTest("Test1", []() { /* ... */ });
ProcessIsolatedTestRunner::registerTest("Test2", []() { /* ... */ });
// Get count of registered tests
size_t registeredCount = ProcessIsolatedTestRunner::getTestCount();
EXPECT_EQ(registeredCount, 2) << "Expected 2 tests to be registered";
// Execute all tests (automatically clears after execution)
bool passed = ProcessIsolatedTestRunner::executeAllTests();
EXPECT_TRUE(passed);
// Optional: Verify execution count matches registration count
auto results = ProcessIsolatedTestRunner::getTestResults();
EXPECT_EQ(results.size(), registeredCount)
<< "Registered " << registeredCount << " but executed " << results.size();
}
```
### 3. Use Descriptive Test Names
```cpp
// ❌ BAD: Vague name
RUN_ISOLATED_TEST("Test1", []() { /* ... */ });
// ✅ GOOD: Descriptive name
RUN_ISOLATED_TEST("GFX942_LargeRanks_P2PChunkSize_ExpectHighValue",
[]() { /* ... */ }
);
```
### 4. Group Related Tests
```cpp
TEST(Rcclwrap, AllP2PChunkSizeTests) {
// Using macros to group related tests
RUN_ISOLATED_TESTS(
ProcessIsolatedTestRunner::TestConfig("GFX942_Test1", []() { ... }),
ProcessIsolatedTestRunner::TestConfig("GFX942_Test2", []() { ... }),
ProcessIsolatedTestRunner::TestConfig("GFX950_Test1", []() { ... }),
ProcessIsolatedTestRunner::TestConfig("GFX950_Test2", []() { ... })
);
}
```
### 5. Use Options for Better Control
```cpp
// For debugging: verbose + stop on failure
ProcessIsolatedTestRunner::ExecutionOptions debugOptions;
debugOptions.stopOnFirstFailure = true;
debugOptions.verboseLogging = true;
RUN_ISOLATED_TESTS_WITH_OPTIONS(debugOptions,
ProcessIsolatedTestRunner::TestConfig("Test1", []() { ... }),
ProcessIsolatedTestRunner::TestConfig("Test2", []() { ... })
);
// For CI: run all tests, collect all failures
ProcessIsolatedTestRunner::ExecutionOptions ciOptions;
ciOptions.stopOnFirstFailure = false;
ciOptions.verboseLogging = false;
RUN_ISOLATED_TESTS_WITH_OPTIONS(ciOptions,
ProcessIsolatedTestRunner::TestConfig("Test1", []() { ... }),
ProcessIsolatedTestRunner::TestConfig("Test2", []() { ... })
);
```
### 6. Set Appropriate Timeouts
```cpp
// ✅ GOOD: Different timeouts for different test types
RUN_ISOLATED_TESTS(
ProcessIsolatedTestRunner::TestConfig("QuickTest", []() { ... })
.withTimeout(std::chrono::seconds(5)),
ProcessIsolatedTestRunner::TestConfig("NormalTest", []() { ... })
.withTimeout(std::chrono::seconds(30)),
ProcessIsolatedTestRunner::TestConfig("SlowTest", []() { ... })
.withTimeout(std::chrono::seconds(120))
);
// ❌ BAD: Same long timeout for everything
RUN_ISOLATED_TESTS(
ProcessIsolatedTestRunner::TestConfig("Test1", []() { ... })
.withTimeout(std::chrono::seconds(300)),
ProcessIsolatedTestRunner::TestConfig("Test2", []() { ... })
.withTimeout(std::chrono::seconds(300))
);
```
### 7. Clean Up Resources in Tests
```cpp
RUN_ISOLATED_TEST("ResourceTest", []() {
ncclComm_t comm = nullptr;
struct ncclTopoSystem topo;
struct ncclTopoNode gpuNode;
CreateMockComm(comm, topo, gpuNode, "gfx942", 32);
try {
// Your test logic
EXPECT_TRUE(someTest(comm));
// ✅ GOOD: Clean up in all paths
CleanupMockComm(comm);
} catch (...) {
CleanupMockComm(comm);
throw;
}
});
```
### 8. Use RAII for GPU Resource Management
When tests allocate GPU memory, use RAII wrappers to ensure cleanup:
```cpp
// ✅ GOOD: RAII ensures cleanup even on failure
struct GPUBuffer {
void* ptr = nullptr;
size_t size;
GPUBuffer(size_t s) : size(s) {
hipError_t err = hipMalloc(&ptr, size);
ASSERT_EQ(err, hipSuccess);
}
~GPUBuffer() {
if (ptr) {
hipFree(ptr);
ptr = nullptr;
}
}
// Prevent copying
GPUBuffer(const GPUBuffer&) = delete;
GPUBuffer& operator=(const GPUBuffer&) = delete;
};
RUN_ISOLATED_TEST("GPUTest", []() {
GPUBuffer buffer(1024); // Automatically cleaned up
// ... test logic ...
// No manual cleanup needed - destructor handles it
});
// ❌ BAD: Manual cleanup can be forgotten
RUN_ISOLATED_TEST("GPUTest", []() {
void* buffer;
hipMalloc(&buffer, 1024);
// ... test logic ...
// If test fails before this line, buffer leaks!
hipFree(buffer);
});
```
### 9. Avoid GPU Initialization in Test Fixtures
When using process isolation, avoid initializing GPU resources in test fixture `SetUp()` methods:
```cpp
// ❌ BAD: GPU initialization in fixture (runs in parent process)
class GPUTests : public ::testing::Test {
protected:
void SetUp() override {
hipMalloc(&gpuBuffer, 1024); // Parent process - will pollute fork()!
}
void* gpuBuffer;
};
// ✅ GOOD: GPU initialization inside isolated test
class GPUTests : public ::testing::Test {
// Empty fixture or only CPU resources in SetUp()
};
TEST_F(GPUTests, MyTest) {
RUN_ISOLATED_TEST("GPUOperation", []() {
void* gpuBuffer;
hipMalloc(&gpuBuffer, 1024); // Child process only - safe!
// ... test logic ...
hipFree(gpuBuffer);
});
}
// ✅ EVEN BETTER: Use RAII + helper structure
struct GPUTestEnvironment {
void* buffer;
void setup() { hipMalloc(&buffer, 1024); }
void cleanup() { if (buffer) hipFree(buffer); }
~GPUTestEnvironment() { cleanup(); }
};
TEST_F(GPUTests, MyTest) {
RUN_ISOLATED_TEST("GPUOperation", []() {
GPUTestEnvironment env;
env.setup();
// ... test logic ...
env.cleanup(); // Explicit + destructor cleanup
});
}
```
---
## Troubleshooting
### Test Hangs / Times Out
**Symptom:** Test never completes, eventually times out.
**Solutions:**
1. Increase timeout: `.withTimeout(std::chrono::seconds(120))`
2. Check for deadlocks in test logic
3. Enable verbose logging to see where it hangs:
```cpp
options.verboseLogging = true;
```
### Environment Variables Not Being Set
**Symptom:** `getenv()` returns `nullptr` in test.
**Solutions:**
1. Verify environment variable name is correct
2. Check that you're calling `withEnvironment()`:
```cpp
config.withEnvironment({{"VAR_NAME", "value"}})
```
3. Verify the test is actually executing (check test name)
### Tests Pass Individually but Fail Together
**Symptom:** Individual tests pass, but fail when run in a suite.
**Cause:** This is the **exact problem** that ProcessIsolatedTestRunner solves!
**Solution:** Already solved - each test runs in isolated process. If you're still seeing this, check:
1. Are you using `executeAllTests()` correctly?
2. Are there shared external resources (files, network, etc.)?
### Fork Failures
**Symptom:** Error messages about fork() failing.
**Solutions:**
1. Check system resource limits: `ulimit -u` (max processes)
2. Reduce number of tests or run in smaller batches
3. Check for resource leaks in parent process
### Test Results Not Available
**Symptom:** `getTestResults()` returns empty vector.
**Solution:**
```cpp
// Call executeAllTests() first
ProcessIsolatedTestRunner::executeAllTests();
// Then get results
auto results = ProcessIsolatedTestRunner::getTestResults();
```
### Tests Registered but Never Executed
**Symptom:** Tests pass but you suspect they didn't actually run.
**Cause:** Forgot to call `executeAllTests()` after registration.
**Detection:**
```cpp
TEST(MyTest, IsolatedTests) {
// Register tests
ProcessIsolatedTestRunner::registerTest("Test1", []() { EXPECT_TRUE(true); });
ProcessIsolatedTestRunner::registerTest("Test2", []() { EXPECT_TRUE(true); });
// ❌ FORGOT TO CALL executeAllTests()!
// Later, when the test ends, registered tests are lost
}
```
**Solution:**
```cpp
TEST(MyTest, IsolatedTests) {
// Register tests
ProcessIsolatedTestRunner::registerTest("Test1", []() { EXPECT_TRUE(true); });
ProcessIsolatedTestRunner::registerTest("Test2", []() { EXPECT_TRUE(true); });
// ✅ ALWAYS execute registered tests
bool passed = ProcessIsolatedTestRunner::executeAllTests();
EXPECT_TRUE(passed);
// ✅ Optionally verify execution count
auto results = ProcessIsolatedTestRunner::getTestResults();
EXPECT_EQ(results.size(), 2) << "Expected 2 tests to execute";
}
```
**Prevention:** Always verify that `getTestResults().size()` matches your expected number of tests:
```cpp
// After execution
auto results = ProcessIsolatedTestRunner::getTestResults();
EXPECT_EQ(results.size(), expectedTestCount)
<< "Test count mismatch - some tests may not have executed";
```
---
## Implementation Details
### How It Works
1. **Registration Phase:**
- Tests are registered into a static vector
- Each test gets a `TestConfig` with name, logic, and environment
2. **Execution Phase:**
- Parent process iterates through registered tests
- For each test:
- `fork()` creates a child process
- Child applies environment variables
- Child executes test logic
- Parent waits for child to complete
- Result is collected and stored
3. **Result Collection:**
- Exit codes are captured from child processes
- Timing information is recorded
- All results stored in static vector
4. **Automatic Cleanup:**
- After execution completes, `executeAllTests()` automatically clears all test registrations and results
- This ensures a clean state for the next test suite without manual intervention
### Exit Codes
```cpp
enum RcclTestCode {
RCCL_TEST_SUCCESS = 0, // Test passed
RCCL_TEST_FAILURE = 1, // Test failed (assertion)
RCCL_TEST_UNKNOWN_EXCEPTION = 2, // Uncaught exception
RCCL_TEST_TIMEOUT = 3, // Test timed out
RCCL_TEST_SKIPPED = 4 // Test was skipped
};
```
### Thread Safety
The framework uses mutexes for thread-safe operations:
- Test registration (write)
- Result recording (write)
- Result retrieval (read)
---
## Limitations
1. **Process Overhead:** Each test creates a new process (fork overhead)
2. **Sequential Execution:** Tests run one at a time (not parallel)
3. **Linux/Unix Only:** Uses `fork()` - not available on Windows
4. **Memory Duplication:** Each forked process duplicates memory
5. **No Shared State:** Tests cannot share data between processes
---
## FAQ
**Q: When should I use ProcessIsolatedTestRunner vs regular Google Test?**
A: Use ProcessIsolatedTestRunner when:
- Testing code with static variables
- Testing environment variable behavior
- Testing one-time initialization
- Need guaranteed clean state between tests
Use regular Google Test when:
- Tests are truly independent
- No static state concerns
- Need parallel execution
- Testing simple units
**Q: Can I use this with MPI tests?**
A: Not directly. Process Isolated test runner is for single-process tests. For MPI tests, use `MPI Test Runner` instead. Process Isolated test runner is currently hooked into `rccl-UnitTestsFixtures` binary and MPI test runner is hooked into `rccl-UnitTestsMPI` binary. These are two independent implementation.
**Q: How do I debug a test that's running in an isolated process?**
A:
1. Enable verbose logging
2. Add print statements in your test lambda
3. Temporarily run the test logic outside the framework
4. Use GDB
**Q: Can I run tests in parallel?**
A: No, the current implementation only supports sequential execution.
**Q: Does this work with CTest/CMake?**
A: Yes! The tests are still Google Test cases, so they work with standard test runners.
**Q: Should I use the macros or the manual API?**
A: Use the macros (`RUN_ISOLATED_TEST`, `RUN_ISOLATED_TESTS`, etc.) for most cases - they're simpler and less error-prone. Use the manual API (`registerTest()` + `executeAllTests()`) only when you need more control over the registration/execution flow, such as:
- Dynamically generating test configurations at runtime
- Sharing test registration logic across multiple TEST blocks
- Advanced control flow scenarios
**Q: Do tests run automatically after registration, or do I need to call executeAllTests()?**
A: **You MUST call `executeAllTests()` explicitly.** Tests do NOT run automatically. If you forget to call it, your tests will be silently ignored. Always follow this pattern:
```cpp
TEST(MyTest, IsolatedTests) {
ProcessIsolatedTestRunner::registerTest("MyTest", []() { /* ... */ });
// ✅ REQUIRED: Execute the tests
bool passed = ProcessIsolatedTestRunner::executeAllTests();
EXPECT_TRUE(passed);
}
```
**Q: How can I detect if I forgot to execute registered tests?**
A: After `executeAllTests()`, verify that `getTestResults().size()` matches your expected test count:
```cpp
// Register N tests
ProcessIsolatedTestRunner::registerTest("Test1", []() { /* ... */ });
ProcessIsolatedTestRunner::registerTest("Test2", []() { /* ... */ });
// Execute
bool passed = ProcessIsolatedTestRunner::executeAllTests();
// Verify count
auto results = ProcessIsolatedTestRunner::getTestResults();
EXPECT_EQ(results.size(), 2) << "Expected 2 tests to run";
```
**Q: Do I need to call clear() manually?**
A: No. The `clear()` method is only useful for advanced use cases where you need to clear tests that were registered but never executed. If you manually call `clear()` when tests were registered but not executed, it will warn you:
```
⚠️ WARNING: ProcessIsolatedTestRunner::clear() called with 2 unexecuted test(s)!
Registered: 2 test(s)
Executed: 0 test(s)
Did you forget to call executeAllTests()?
```
---
## See Also
- **ProcessIsolatedTestRunner.hpp** - Full API documentation
- **ProcessIsolatedTestRunner.cpp** - Implementation details
- **RcclWrapTests.cpp** - Usage examples