A/B testing in Flutter apps

By Ann Tech · 1 March 2026

#flutter #ab-testing #firebase #experiments

A/B testing lets you compare two (or more) variants of a UI or flow and measure which performs better. In Flutter, you can run A/B tests with Firebase Remote Config, which handles user assignment and experiment management.

Setting up the experiment in Firebase

Firebase Console → Remote Config → Experiments (A/B Testing) → Create experiment
Choose "Remote Config" experiment type
Define:
- Target: percentage of users, platform filter, app version
- Variants: control (existing) and test variants
- Parameter: e.g., checkout_button_color: 'blue' vs 'green'
- Goal metric: conversion event, retention, or custom metric

Flutter implementation

final remoteConfig = FirebaseRemoteConfig.instance;

await remoteConfig.setDefaults({
  'checkout_button_color': 'blue',   // Control variant default
  'checkout_button_label': 'Buy Now',
});

await remoteConfig.fetchAndActivate();

// Read the variant this user is assigned to
final buttonColor = remoteConfig.getString('checkout_button_color');
final buttonLabel = remoteConfig.getString('checkout_button_label');

Tracking the goal metric

Firebase A/B Testing measures Firebase Analytics events. Log the conversion event:

// Log the experiment exposure (which variant the user saw)
FirebaseAnalytics.instance.logEvent(
  name: 'experiment_exposure',
  parameters: {
    'experiment_id': 'checkout_button_test',
    'variant': remoteConfig.getString('checkout_button_color'),
  },
);

// Log the conversion event
FirebaseAnalytics.instance.logPurchase(
  currency: 'USD',
  value: order.total,
);

Custom A/B test without Firebase

If you want full control over assignment:

class ABTestService {
  final _config = FirebaseRemoteConfig.instance;
  final _analytics = FirebaseAnalytics.instance;

  /// Assigns user to a variant deterministically based on their user ID.
  /// Same user always gets the same variant.
  String getVariant(String experimentId, List<String> variants) {
    final userId = FirebaseAuth.instance.currentUser?.uid ?? 'anonymous';
    final hash = '${experimentId}_${userId}'.hashCode.abs();
    final index = hash % variants.length;
    return variants[index];
  }

  /// Log which variant the user was shown.
  void logExposure(String experimentId, String variant) {
    _analytics.logEvent(
      name: 'ab_exposure',
      parameters: {
        'experiment_id': experimentId,
        'variant': variant,
      },
    );
  }

  /// Log a conversion.
  void logConversion(String experimentId, {Map<String, Object>? extra}) {
    _analytics.logEvent(
      name: 'ab_conversion',
      parameters: {
        'experiment_id': experimentId,
        ...?extra,
      },
    );
  }
}

Usage:

final abTest = ref.read(abTestServiceProvider);
final variant = abTest.getVariant('checkout_v2', ['control', 'variant_a']);
abTest.logExposure('checkout_v2', variant);

// Show the right UI
if (variant == 'variant_a') {
  return const NewCheckoutFlow();
}
return const LegacyCheckoutFlow();

Multivariate testing

Test multiple parameters simultaneously:

// Firebase Remote Config parameters:
// checkout_layout: 'single_page' | 'multi_step'
// checkout_cta: 'Buy Now' | 'Complete Order' | 'Place Order'
// checkout_trust_badges: true | false

final layout = remoteConfig.getString('checkout_layout');
final cta = remoteConfig.getString('checkout_cta');
final showTrustBadges = remoteConfig.getBool('checkout_trust_badges');

return CheckoutScreen(
  layout: layout,
  ctaText: cta,
  showTrustBadges: showTrustBadges,
);

Statistical significance

A/B testing requires adequate sample size before you can trust results. Firebase A/B Testing shows confidence levels in the dashboard — don't conclude an experiment early just because one variant is ahead.

Rules of thumb:

Run for at least 1-2 weeks to catch weekly cycle effects
Aim for 95% confidence before declaring a winner
Don't stop the test when you see a result you like ("p-hacking")

Experiment lifecycle

Design: What are you testing? What is the success metric?
Launch: Start with a small percentage (10-20%) to catch bugs
Monitor: Watch for crashes or negative UX signals in the test variant
Analyze: Wait for statistical significance
Ship: Roll out the winner to 100%
Clean up: Remove the losing variant code and the experiment flag

Common pitfalls

Testing too many things at once. If you change color, layout, AND copy simultaneously, you can't know which change caused the result. Test one change at a time (unless running a factorial experiment intentionally).

Not logging exposure. If you A/B test a feature but only 50% of users scroll to see it, logging conversion on the full cohort is wrong. Log exposure when the user actually sees the variant.

Stopping tests early. The "winner" at day 3 is often not the winner at day 14. Novelty effects, weekly cycles, and small sample sizes all produce misleading early results.