Written by Amber Wang and Yoonji Kim at Lyft.
Background
Whenever you use the Lyft app, there is a complex balancing act happening behind the scenes. Various levers are used to keep the marketplace running smoothly; Base prices and coupons for riders affect demand, while driver pay and bonuses impact the level of available supply. Since every change to prices and payments impacts Lyft’s costs and revenue, they lead to key optimization problems, such as:
- How should we allocate budget between driver incentives and rider incentives?
- How do we invest resources to achieve x% rides growth, and how much does it cost in terms of short term profit?
These are the questions the Foundational Models team at Lyft tries to answer in a systematic way. A key ingredient is understanding the effects of different types of investments — for instance, what will happen if we increase the total budget for driver incentives by x%? What will happen if we increase the rider price of all rides by y%? It’s worth noting that the long term effects of such decisions tend to dominate the short term effects: we may earn more short term profit from a ride if we charge riders more and pay drivers less, but lose riders and drivers in the long run.
Estimating the long term effects of resource allocation decisions is challenging in a multi-sided marketplace such as Lyft. Because these decisions tend to be consequential, their effects go beyond first order effects on directly affected users. For example, if we increase driver incentive spending by x% in week 1, drivers will drive more in week 1 (short term effect), and may return to drive a bit more in the following weeks (direct long term effects). But this is not the full picture: in week 1, when there is a positive increase in driver hours as the result of more incentives, riders will enjoy better experiences (e.g. less surge pricing, shorter wait times) and may want to return to Lyft in the future. However, more driver hours in the market also mean the average driver is less busy, and that idleness may discourage them from driving in the future. An analog holds for any decision that boosts short term demand — riders are more likely to encounter bad experiences such as higher surge prices and longer wait times, whereas drivers may become busier and earn a bit more. At Lyft, we refer to such indirect effects as the “market-mediated effects.” Compared to the direct long term effects, these market-mediated long term effects are much harder to estimate.
Below is an illustration of the causal relationships we consider for the policy change “increase driver incentive spend.”
Summary of our solution
This article introduces the methodology framework we use to estimate the “market-mediated long term effects,” the more challenging component of the long-term effects. In short, we use a two-step approach that can be thought of as a surrogacy approach in its broad sense. Under the assumption that the “market-mediated” outcomes are fully mediated through negative user experiences, (1) we first estimate how our policy changes affect the distribution of core negative user experiences, then (2) estimate how negative user experiences affect users’ future behavior.
Both steps are done with observational causal inference, allowing fast and inexpensive updating. Verification is another major challenge. There is no single form of experiment that can provide a perfect verification; therefore, we have to combine multiple imperfect signals. Specifically, the first step methodology can be verified using a switch-back experiment, and the second step of estimation can in principle be verified using user-split experiments.
As the last step, we combine the direct long term effects (estimated separately) and the market-mediate long term effects and use region-split experiments to verify the overall long term effects. In general, region-split experiments in general suffer from poor pre-intervention fit and low power; we developed a forward selection algorithm to optimize experiment design by picking the treated and control regions.
Below, we describe our methodology in detail.
Step 1: from decisions to negative user experiences
We first start with a question: when we move a policy decision today (e.g., raise rider price water-level or change driver incentive budgets), how does the short-term user experience shift?
What we model: How does a policy change affect short-term user experiences? We focus on a set of negative user experiences that matter most downstream — such as long waits, high surge, and cancellations for riders, or lower hourly earnings, idleness and incentive earnings for drivers — and maintain the critical assumption that these are the only channels through which today’s decisions affect the future (by affecting today’s market). The goal of this step is to estimate the effect of our policy decisions on the distribution of these negative user experiences.
Get Iraklikhorguani’s stories in your inbox
Join Medium for free to get updates from this writer.
The challenge is that negative user experiences are highly cyclical and seasonal, varying with time‑of‑week, holidays, weather, and shifting supply/demand. A naive regression would blend these rhythms with the true effect of our decisions. Therefore, our approach:
- Residualizes predictable time-of-week patterns so we compare like-with-like, which mimics a “random shock” by capturing the deviation from this period’s normal status.
- Controls for remaining market information such as supply and demand so the coefficient of our policy decision reflects its incremental impact on negative user experience.
How we estimate (observational, residualized regressions): Conceptually, for any negative user experience metric (e.g., wait time), we fit a residualized model on deviations from the market’s own baseline:
Below is a visualization of a simulated example where we have one contextual variable. With residualized terms, we can capture how policy change would marginally change negative experience (green solid line).
The coefficient beta policy tells us how much negative user experience moves when a policy decision (like rider price water‑level or incentive spend) deviates from its usual level, after accounting for typical time‑of‑week rhythms and market conditions. Because we measure everything as “deviation from normal,” the effects read like elasticities around everyday operating conditions. For example, a higher‑than‑usual price suppresses demand and hence is associated with a predictable decrease in negative rider experience (e.g. high surge, long wait time), holding everything else constant. The output is a calibrated response function that turns a policy change into a forecasted shift in the distribution of negative user experience, not just its average, with uncertainty to reflect real‑world variability.
How we validate: To validate this mapping, we use switch‑back experiments that alternate policy settings across comparable time slots and compare the model’s predicted changes in negative user experience to the experimental lifts we observe. The experiment either verifies our model or informs its iteration (e.g., changing controls or introducing additional ones).
Step 2: from negative user experiences to future outcomes
The next question is: given a change in a certain negative user experience today, how do future outcomes move (e.g., future rides, retention, driver hours)? In the broader surrogacy framing, Step 1 captures short‑term shifts in experience, and Step 2 translates those shifts into long‑term behavior under the assumption that long‑term market-mediated effects are completely mediated by short‑term negative user experiences.
What we model: The long term effects of negative user experiences. We treat these negative experiences as exposures that vary naturally across users, times, and places, and estimate their impact on future outcomes — such as future rides — while controlling for confounders (time, location, and rider history).
How we estimate (observational, double‑robust): We use Augmented Inverse Probability Weighting (AIPW, Chernozhukov et.al, 2021), a doubly robust causal estimator combining (i) a propensity model for exposure (the likelihood of facing a given level of negative user experience, given context) and (ii) outcome models for future metrics, conditional on confounders. This yields average treatment effects for negative user experience. We summarize the mapping via a “surrogacy index” that quantifies how much short-term negative experiences will affect long‑term outcomes; this is the scaling we use to move from short‑term exposure to negative experiences to long‑term impact.
How we validate: We run user‑split experiments that perturb negative experiences and compare the model’s predicted changes in future outcomes to the experimental lifts, checking calibration (predicted vs. observed) for validation.
How it works together with Step 1: Step 1 converts a policy change into a shift in the distribution of negative user experiences; Step 2 converts that exposure shift into forecasted changes in future outcomes (e.g., future rides). Together, via the surrogacy index, they provide a causal link from today’s decision to long‑term business impact.
Step 3: verify overall LTE by region-split experiments
With the market‑mediated effect from Steps 1–2, we combine the direct long term effect (estimated separately) using a transparent formula grounded in market mechanics. This yields a single policy‑level forecast for long‑run rides and financials that reflects both mediated and direct channels.
To validate end‑to‑end predictions, we run region‑split experiments. We developed a forward‑selection algorithm, inspired by the forward difference‑in‑differences (FDiD, Li, 2024) approach, to choose treated and control regions: starting from a single treated region, we iteratively add treated regions that best improve pre‑period fit and expected power. The region-split experiments allow us to observe the overall long-term effects of a policy intervention to an entire market.
Below is a simulated example (using simulated data) of a region-split experiment. We find a set of treated regions where we can find another set of control regions that mimic the average behavior of treated regions. Once we inject an intervention shock on the treated regions (e.g., increase of incentives), we track the discrepancy between the average rides of treated and control regions.
Conclusion
We presented a framework to connect today’s resource decisions to long‑run marketplace impact. First, we translate policy changes into shifts in negative user experience using residualized modeling and verify those short‑term responses with switch‑back experiments. Second, we map negative user experience to future outcomes with doubly‑robust observational inference (AIPW) and a surrogacy index, then validate with user‑split experiments. Finally, we combine the market‑mediated and direct long-term effects and validate end‑to‑end predictions via region‑split experiments, using a forward‑selection design to choose treated and control regions.
What this enables:
- Scenario planning and policy evaluation.
- Budget allocation across levers (pricing, incentives) informed by the short-term profit v.s. long-term rides Pareto Frontier.
- Continuous calibration as markets evolve, grounded in experimentally-verified observational causal inference.
The result is a model‑based, experiment-verified causal engine: decisions move user experience, user experience moves future behavior, and the composition yields long‑term business impact. This enables fully-informed decisions on resource allocation.
Lyft is hiring! If you’re passionate about Data Science, visit Lyft Careers to see our openings.
Source: eng.lyft.com
