Experiments
A/B test playbook strategies to find what works best for your users
Experiments let you split users into control and test groups to compare different intervention strategies. Retivo handles assignment, tracking, and statistical analysis — experiments auto-conclude when results are significant.
How It Works
Create Experiment (playbook + variant config)
│
├── Start → Users randomly assigned
│ ├── Control: playbook defaults
│ └── Test: overridden strategy_hints
│
├── Outcomes tracked per variant
│
└── Auto-concludes when p < 0.05 (two-proportion z-test)- Pick a playbook to test
- Define what's different in the test variant (channel, tone, timing, etc.)
- Set the traffic split (e.g., 50/50)
- Start the experiment — Retivo assigns users deterministically
- Results update daily. When statistical significance is reached, Retivo auto-concludes and declares a winner
Create an Experiment
curl -X POST https://retivo.ai/api/experiments \
-H "Authorization: Bearer rt_live_..." \
-H "Content-Type: application/json" \
-d '{
"playbook_id": "pb_abc123",
"name": "Re-engagement: email vs in-app",
"variant_config": {
"control": { "weight": 0.5 },
"test": {
"weight": 0.5,
"strategy_hints_override": {
"preferred_channel": "in_app",
"tone": "casual"
}
}
}
}'Response:
{
"id": "exp_xyz789",
"name": "Re-engagement: email vs in-app",
"status": "draft"
}Variant Config
| Field | Type | Description |
|---|---|---|
control.weight | number (0-1) | Fraction of users in control group |
test.weight | number (0-1) | Fraction of users in test group |
test.strategy_hints_override | object | Strategy hints that override the playbook defaults for the test group |
Weights must sum to 1.0. Common splits: 50/50, 70/30 (when you want to limit exposure to the test variant).
What You Can Test
The strategy_hints_override can change any field the decision engine reads:
| Override | What it tests |
|---|---|
preferred_channel: "in_app" | Email vs in-app delivery |
tone: "casual" | Formal vs casual message tone |
focus: "highlight new features" | Different messaging strategy |
cooldown_hours: 24 | Contact frequency |
Start an Experiment
Experiments start in draft status. Start when ready:
curl -X PUT https://retivo.ai/api/experiments/{id}/start \
-H "Authorization: Bearer rt_live_..."Once running, every user evaluated against the associated playbook is deterministically assigned to a variant using SHA256(experiment_id:user_id). The same user always gets the same variant.
View Results
curl https://retivo.ai/api/experiments/{id}/results \
-H "Authorization: Bearer rt_live_..."Response:
{
"experiment": {
"id": "exp_xyz789",
"name": "Re-engagement: email vs in-app",
"status": "running",
"winner": null
},
"assignments": {
"control": 142,
"test": 138
},
"outcomes": [
{ "variant": "control", "outcome_type": "positive", "count": 45 },
{ "variant": "control", "outcome_type": "negative", "count": 28 },
{ "variant": "control", "outcome_type": "neutral", "count": 12 },
{ "variant": "test", "outcome_type": "positive", "count": 58 },
{ "variant": "test", "outcome_type": "negative", "count": 22 },
{ "variant": "test", "outcome_type": "neutral", "count": 15 }
]
}Understanding Results
- Assignments: How many users were placed in each variant
- Outcomes: Positive/negative/neutral outcomes per variant, tracked over a 7-day attribution window
- Positive rate:
positive / (positive + negative + neutral)per variant - Lift:
(test_rate - control_rate) / control_rate × 100%
Auto-Conclusion
Retivo's daily learning cron analyzes running experiments using a two-proportion z-test. When the p-value drops below 0.05, the experiment auto-concludes:
- Winner declared: The variant with the higher positive outcome rate
- Playbook updated: If the test variant wins, its
strategy_hints_overrideis promoted to the playbook's default hints (via the tuning log) - Status: Changes to
concluded
You can also manually cancel an experiment:
curl -X PUT https://retivo.ai/api/experiments/{id}/cancel \
-H "Authorization: Bearer rt_live_..."Experiment Lifecycle
draft ──► running ──► concluded (auto, when significant)
│ │
└──► cancelled ◄──┘ (manual)| Status | Description |
|---|---|
draft | Created but not active. No users assigned yet. |
running | Active. Users being assigned and outcomes tracked. |
concluded | Statistically significant result found. Winner declared. |
cancelled | Manually stopped. No winner declared. |
Dashboard
Experiments can also be managed from the dashboard at Insights → Experiments. The UI provides:
- Create experiments with a visual form
- Start/cancel with one click
- Live results with outcome comparison bars and lift calculation
- Winner badge when concluded
Best Practices
- Run one experiment per playbook at a time. Multiple experiments on the same playbook will interfere with each other's results.
- Wait for significance. Don't manually conclude experiments early — the auto-conclusion ensures the result is statistically valid.
- Start with 50/50 splits. Unless you have a strong reason to limit exposure, equal splits reach significance faster.
- Test one variable at a time. If you change channel AND tone simultaneously, you won't know which change drove the result.
- Minimum sample size. Experiments typically need 50-100 outcomes per variant to reach significance, depending on effect size.