A/B testing in Flutter apps
By Ann Tech · 1 March 2026
A/B testing lets you compare two (or more) variants of a UI or flow and measure which performs better. In Flutter, you can run A/B tests with Firebase Remote Config, which handles user assignment and experiment management.
Setting up the experiment in Firebase
- Firebase Console → Remote Config → Experiments (A/B Testing) → Create experiment
- Choose "Remote Config" experiment type
- Define:
- Target: percentage of users, platform filter, app version
- Variants: control (existing) and test variants
- Parameter: e.g.,
checkout_button_color: 'blue'vs'green' - Goal metric: conversion event, retention, or custom metric
Flutter implementation
final remoteConfig = FirebaseRemoteConfig.instance;
await remoteConfig.setDefaults({
'checkout_button_color': 'blue', // Control variant default
'checkout_button_label': 'Buy Now',
});
await remoteConfig.fetchAndActivate();
// Read the variant this user is assigned to
final buttonColor = remoteConfig.getString('checkout_button_color');
final buttonLabel = remoteConfig.getString('checkout_button_label');
Tracking the goal metric
Firebase A/B Testing measures Firebase Analytics events. Log the conversion event:
// Log the experiment exposure (which variant the user saw)
FirebaseAnalytics.instance.logEvent(
name: 'experiment_exposure',
parameters: {
'experiment_id': 'checkout_button_test',
'variant': remoteConfig.getString('checkout_button_color'),
},
);
// Log the conversion event
FirebaseAnalytics.instance.logPurchase(
currency: 'USD',
value: order.total,
);
Custom A/B test without Firebase
If you want full control over assignment:
class ABTestService {
final _config = FirebaseRemoteConfig.instance;
final _analytics = FirebaseAnalytics.instance;
/// Assigns user to a variant deterministically based on their user ID.
/// Same user always gets the same variant.
String getVariant(String experimentId, List<String> variants) {
final userId = FirebaseAuth.instance.currentUser?.uid ?? 'anonymous';
final hash = '${experimentId}_${userId}'.hashCode.abs();
final index = hash % variants.length;
return variants[index];
}
/// Log which variant the user was shown.
void logExposure(String experimentId, String variant) {
_analytics.logEvent(
name: 'ab_exposure',
parameters: {
'experiment_id': experimentId,
'variant': variant,
},
);
}
/// Log a conversion.
void logConversion(String experimentId, {Map<String, Object>? extra}) {
_analytics.logEvent(
name: 'ab_conversion',
parameters: {
'experiment_id': experimentId,
...?extra,
},
);
}
}
Usage:
final abTest = ref.read(abTestServiceProvider);
final variant = abTest.getVariant('checkout_v2', ['control', 'variant_a']);
abTest.logExposure('checkout_v2', variant);
// Show the right UI
if (variant == 'variant_a') {
return const NewCheckoutFlow();
}
return const LegacyCheckoutFlow();
Multivariate testing
Test multiple parameters simultaneously:
// Firebase Remote Config parameters:
// checkout_layout: 'single_page' | 'multi_step'
// checkout_cta: 'Buy Now' | 'Complete Order' | 'Place Order'
// checkout_trust_badges: true | false
final layout = remoteConfig.getString('checkout_layout');
final cta = remoteConfig.getString('checkout_cta');
final showTrustBadges = remoteConfig.getBool('checkout_trust_badges');
return CheckoutScreen(
layout: layout,
ctaText: cta,
showTrustBadges: showTrustBadges,
);
Statistical significance
A/B testing requires adequate sample size before you can trust results. Firebase A/B Testing shows confidence levels in the dashboard — don't conclude an experiment early just because one variant is ahead.
Rules of thumb:
- Run for at least 1-2 weeks to catch weekly cycle effects
- Aim for 95% confidence before declaring a winner
- Don't stop the test when you see a result you like ("p-hacking")
Experiment lifecycle
- Design: What are you testing? What is the success metric?
- Launch: Start with a small percentage (10-20%) to catch bugs
- Monitor: Watch for crashes or negative UX signals in the test variant
- Analyze: Wait for statistical significance
- Ship: Roll out the winner to 100%
- Clean up: Remove the losing variant code and the experiment flag
Common pitfalls
Testing too many things at once. If you change color, layout, AND copy simultaneously, you can't know which change caused the result. Test one change at a time (unless running a factorial experiment intentionally).
Not logging exposure. If you A/B test a feature but only 50% of users scroll to see it, logging conversion on the full cohort is wrong. Log exposure when the user actually sees the variant.
Stopping tests early. The "winner" at day 3 is often not the winner at day 14. Novelty effects, weekly cycles, and small sample sizes all produce misleading early results.
Sign in to like, dislike, or report.