Clarity through fairness

Fixing incorrect insurance penalties

Role: Lead product/service designer
Company: Metromile
Platforms: Web, SMS, push, email
Timeline: March 2019 – May 2019

Animated mockup of an email asking a Metromile customer to troubleshoot their Pulse device.

Metromile, now part of Lemonade, is a pay-per-mile car insurance company. Instead of paying a flat monthly premium, customers paid a base rate plus a per-mile rate for every mile they drove. The less you drove, the less you paid. The whole business model hinged on knowing, precisely, how many miles each customer drove each month.

That's what the Pulse device was for. A small, cube-shaped device that plugged into a port in the car, the Pulse collected driving data (time stamps, GPS coordinates, accelerometer readings) and sent it back to Metromile, where engineering used it to calculate daily mileage. But the device didn't always stay connected. It could lose signal in a dead zone, stop transmitting if the car sat idle for a while, lose power every time an EV turned off, get unplugged by a mechanic during service, or get pulled out deliberately by a customer trying to drive for free. No device signal, no mileage data. No mileage data, no bill.

So what happened when the device stopped transmitting? The original answer was to assume the worst: that the customer had unplugged it on purpose to drive for free. Metromile sent stern emails and eventually started charging a flat 250 "Penalty Miles" per day, roughly fifteen times what the average customer actually drove. It was a blunt instrument, and it hit hardest in exactly the wrong place: customers who drove the least (Metromile's ideal customers) were the most likely to lose device signal, because their cars sat idle for longer periods. The system was eroding trust with the very people the company's business model was built for.

The project was to fix this. But the deeper design challenge was that when a device lost signal, we could never know for certain why. All of those causes (dead zones, idle cars, EVs, mechanics, fraud) looked similar from the system's perspective, and for a long time there was no way to tell them apart. That changed when the data science team developed a model that could analyze device signal losses and partnered with product and design to figure out how it could be used. But the model assigned probabilities, not certainties. And every communication, every charge, every flow had to hold that uncertainty responsibly, without either punishing innocent customers or letting fraud go unchecked.

This was 2019, but the design problem was one the industry is now encountering everywhere as AI systems become decision-makers: how do you build an experience on top of a probabilistic system? How do you communicate with a user when the system that's making decisions about them isn't sure? What do you owe them when it's wrong? When the system can't be certain, clarity means designing for the people it gets wrong.

The broken status quo

Before this project, every customer whose Pulse device lost signal went through the same flow: a sequence of escalating emails, followed by flat-rate charges of 250 miles per day. The average Metromile customer drove about 6,000 miles a year (about 16 miles a day), so a 250-mile daily charge was severe. It was designed to motivate customers to reconnect their devices quickly. It also meant that any customer whose device lost signal for any reason was treated as a fraud suspect.

A flow diagram of the original Penalty Miles system, showing escalating emails followed by 250-mile-per-day charges applied to any customer whose device lost signal.

Two problems followed from this.

The first was that good customers were being punished. Device signal loss had many innocent causes: a dead zone, an infrequently driven car, a mechanic unplugging it during service. Customers in those situations were getting threatening emails and real charges.

"Your device is defective and you're threatening to charge me 250 miles a day until I fix your defective device. I don't have time for that." — New Customer NPS Survey

The second was that bad actors were getting away with it. A customer who actually wanted to drive for free could unplug their device, drive for eleven days before charges kicked in, replug before the first charge, and repeat. The system was too lenient for fraud and too harsh for everyone else.

A diagram showing the chain of negative outcomes from the Penalty Miles system: punished good customers, lenient on fraud, eroded NPS, and increased churn.

The scale of the problem was significant. In 2019, about 8% of Metromile customers were charged Penalty Miles every month. Of those, roughly 65% had been charged incorrectly for at least one day. That meant about 5% of customers were being wrongly charged every single month, a number large enough to show up in NPS, in churn, and in customer experience ticket volume.

What the NPS comments told us

To understand the texture of the problem, I worked with our user researcher to analyze historic device-related NPS comments. We surfaced four themes:

Frustrating when users missed the emails and racked up charges.

"You didn't even tell me I wasn't connected and then just charged me."

Frustrating when users got notified because they weren't driving.

"Stop requiring customers to call in when a vehicle is inactive. The entire point of Metromile is for people who don't drive that often. I've been a customer for two years. Two years of driving history. You know that I rarely drive any of my vehicles."

The tone of our communications felt threatening.

"Your device is defective and you're threatening to charge me 250 miles a day until I fix your defective device. I don't have time for that."

When we overcharged users, we lost their trust.

"You botched my mileage refund and now I have to go clean it up. Honestly, I'm going to be an active detractor of your company now."

Every one of these themes pointed back to the same underlying issue: the system assumed the worst, and then charged accordingly.

Designing around uncertainty

Around this time, a firmware update to the Pulse device unlocked new accelerometer and gyroscope data, which created an opening. The data science team used the new signals to build what we called the Unplug Model, a classifier that produced an "unplug score," a number between 0 and 1 where 0 meant the data showed no sign of device removal and 1 meant the data strongly indicated the device had been physically removed. The score told us what likely happened to the device, but not why.

A table showing the data science model's output: an unplug score between 0 and 1 for each device signal-loss event. — The data science model output table.

The model itself worked at the sensor level. It read accelerometer spikes, gravity changes, GPS gaps, and battery status, and produced a score. It didn't know why a device had lost signal. That interpretation was the work I did with PM and data science: mapping the model's outputs to real customer scenarios and deciding what each score range should mean for the experience. A few examples show why this was a probabilistic design problem rather than a deterministic one:

If a customer drove an electric vehicle, the model returned a score of 0. We couldn't tell whether they had unplugged the device or just turned their car off, so we defaulted to assuming innocence.

If a customer parked or drove in an area without signal and the device had an internal battery, there would be no accelerometer data returned and the model returned a score of 0.6.

If a car was in the shop and the device was unplugged by a mechanic, the model returned a score somewhere between 0.7 and 0.95.

If a customer purposefully unplugged their device, regardless of motive, the score returned was effectively the same as the mechanic case. (One data scientist joked that if they were really angry when they unplugged it, it might register a little higher.)

If a customer was in a car accident and the impact dislodged the device, the model returned a score of 0.9, nearly identical to a deliberate unplug. A customer dealing with the aftermath of a crash could be flagged as a fraud suspect.

Two things jumped out from that mapping. First, the model was useful but fuzzy: it could differentiate broad clusters of behavior, but not intent. A mechanic and a fraudster looked identical to it. Second, we were going to have to draw a line somewhere. After working through the scenarios with the PM and data science team, we decided that any score above 0.7 at the time of signal loss would be coded as a purposeful unplug.

That threshold decision was consequential. It meant that some customers in the 0.7-and-above bucket would be innocent (the mechanic or car accident case, for example) and we had to design the experience knowing that would happen.

A second complicating factor was the hardware itself. Metromile had shipped two kinds of Pulse devices: battery-powered devices, which could transmit even when the car was off, and non-battery devices, which could only transmit when the car was running. For non-battery devices, the model couldn't return an unplug score until after the device was reconnected, meaning we had to make decisions about a customer's experience before we knew anything at all.

The Unplug Model reduced the probability of wrongly charging a customer. It did not eliminate it. Everything downstream of the model had to account for the fact that the model itself would sometimes be wrong.

Fairness as a design principle

With the model in hand and the research grounded, I established three guiding principles for the project.

Assume best intentions. There are times when Metromile doesn't have the data to know what a user did. In those instances, give users the benefit of the doubt. When interacting with the experience, users should never feel like we're pointing the finger.

Be as quiet as possible. Users don't want to have to think about their car insurance, especially people who drive infrequently, but when necessary make sure our users get the info they need. Don't over burden our users.

Proactively provide support. Handling the Pulse device can be confusing. Give people the relevant resources to feel empowered.

Looking back, these read like operating instructions for any system that makes decisions about people under uncertainty. Assume best intention is how you'd want an AI to interpret ambiguous user behavior. Be as quiet as possible is the argument against over-triggering notifications or actions. Proactively provide support is the case for surfacing relevant context before someone has to ask. Today, teams are writing versions of these for LLMs. In 2019, there was no AI to do the reasoning, so we did it ourselves: mapping every plausible scenario against what the system could tell us and encoding the right response for each one. It's the same muscle you need now to write good AI operating instructions, the ability to imagine every edge case and decide, in advance, how the system should behave.

Mapping scenarios to experiences

I worked with the PM, data science, and engineering to enumerate every way a device could lose signal, then mapped each scenario to the model's output. The goal was to understand where our system could differentiate between situations and where it couldn't, because that boundary was where the experience needed to do the most careful work.

The questions I was asking:

What are all the possible scenarios where a device could lose signal? What's the ideal service experience for each scenario? Which scenarios look the same quantitatively even if they're qualitatively different? Where is there a quantitative difference, but enough qualitative overlap that the experience can be consolidated?

In this experience map, each user intention has its own column while the color coding represents our system's ability to differentiate between scenarios. All user experiences coded in red, for instance, are ones where we knew a user had unplugged their device at the time of signal loss.

A matrix mapping every device signal-loss scenario to the model's ability to detect intent. Color coding shows where the system could differentiate scenarios versus where the experience had to consolidate ambiguous cases. — Data science x user experience map. The battery versus non-battery split described earlier showed up here as a major complicating factor. In practice, Metromile could get real-time data from battery devices (which had their own internal power source and could send a signal even when the car was off) but could only get data from non-battery devices (which had no internal power source) once the device was reconnected to the car. Which device a user received was completely random, so through no fault of their own, some customers were in a fundamentally more ambiguous situation than others.

How the flows evolved

The Unplug Model reduced the probability of mistakenly charging customers, but it didn't eliminate it. The communications strategy had to account for that lingering uncertainty: addressing the worst-case user actions while not alienating users who had done nothing wrong.

I broke the flows into detailed comms flow charts that went through three iterations. Between each iteration, we worked cross-functionally to pressure-test the design against user needs, business goals, and technical constraints:

A flowchart showing two options for handling devices with no Unplug Score: routing to Underwriting versus applying No Signal charges. — The first iteration explored two options for devices with no Unplug Score (the blue columns). Option 1 routed these users to Underwriting for review, keeping them out of the charges flow entirely. This was more aligned with our principles but added operational load. Option 2 gave them No Signal charges, which was simpler but risked repeating the same problem we were trying to fix.

The second iteration of the flowchart, showing the Underwriting approach with consolidated flows after legal and engineering constraints required reworking the strategy. — The second iteration moved forward with the Underwriting approach. This revision also reflected two new constraints: legal and engineering confirmed that back-charging No Signal charges was not an option, which required reworking the strategy, and we realized two of the four flows were close enough to consolidate into one, reducing complexity.

The approach that moved forward routed ambiguous cases to Underwriting rather than charging them, and consolidated four flows down to fewer. Several other decisions shaped the final iteration.

Grace periods. Letting high-unplug-score customers self-serve a grace period was tempting, but it would have been too easy to exploit. Instead, customers with a high score could contact the customer experience team directly, who could listen to their story, look at the device history, and use discretion to decide whether a grace period was warranted. Human judgment stayed in the loop precisely at the point where the model was least reliable.

Delayed unplug scores. Non-battery devices couldn't return an unplug score until they reconnected, but once they did, the model sometimes came back with a high score. In collaboration with the underwriting team, we added reporting to flag those cases so underwriting could investigate without affecting innocent customers.

Crash override. If a customer reported an accident, the adjuster could manually exclude them from the signal loss flow by marking their device in the system, preventing them from receiving escalating communications while dealing with a claim. This was the design accommodating a real limitation in the model: a crash that dislodges a device and a deliberate unplug look identical to the model. For customers who didn't report the accident, this remained a blind spot, though they would still receive the gentler language and more supportive tone built into every communication, rather than the old threatening Penalty Miles emails.

A detailed view of the comms flow for customers with a high unplug score, showing the path from signal loss through underwriting review to communication. — A closer look at the high-unplug-score flow.

The final service design document mapping each device scenario to its backend decision points, communications timeline, and user touchpoints. — The final service design and comms flow incorporating all three iterations of feedback. Each row maps a different device scenario to its backend decision points, communications timeline, and user touchpoints. This document served as both the design specification and the roadmap engineering built from.

Getting input early and often from product, engineering, customer experience, and underwriting prevented eleventh-hour changes. As the flows shifted, I updated the master flow document so nothing was lost in translation. This asset also became the roadmap engineering built from.

Renaming the experience

While reading through NPS comments, I noticed something smaller but meaningful. Customers routinely didn't know what the "Pulse device" was. Internal shorthand like "the Pulse" or just "Pulse" had leaked into customer-facing communications, and customers had no idea what it referred to. They felt daunted by the term "install." They didn't understand why they were getting "Penalty Miles."

I presented the evidence to the Creative Director, and with the support of her team we established new brand guidelines. "Plug in" replaced "install" because it felt like an approachable task. "No Signal charges" replaced "Penalty Miles" because it was more descriptive and less punitive. References to the device were standardized: "your Pulse device" or "the Metromile Pulse device," never just "Pulse" or "the Pulse." The language around charges shifted too, from accusatory ("we charged you miles") to factual ("a No Signal charge was applied to your account").

The updated brand guideline terminology changes.

These changes extended well beyond the signal loss flows. They cascaded through all Metromile communications and digital experiences, a total content overhaul across every touchpoint: web, app, email, SMS, push, and help center.

The renaming wasn't cosmetic. It was assume best intention showing up in tone, and proactively provide support showing up in clarity: customers can't feel empowered if they don't understand what's being said to them.

Collaboration and craft

Service design at this scale meant that what looked like one project was actually a coordinated effort across creative, engineering, customer experience, underwriting, legal, marketing, and content.

With the animator. Email was the hardest channel. Customers associate car-insurance emails with bad news, and we were asking them to pay attention to messages that often involved money. I worked with our animator to concept lightweight, playful animations for the emails, with two goals: convince people the emails were worth opening, and defuse the frustration of being told their car insurance needed their attention.

With content design on the comms. Writing the messages started in two Google Docs. I translated the skeleton of the flow into the docs, took an initial pass on every piece of copy, and then the cross-functional team swarmed, pointing out missing information, challenging the messaging, and refining the copy until it held. The comms had to meet the emotional bar the principles set: firm where necessary, kind by default, never threatening.

A Google Doc used for drafting and refining the email copy with the cross-functional team. — A snapshot of one of the docs I created to hand off the copy for the comms.

With the FAQ and help content. To make sure we were proactively answering customer questions (the third principle in action), I worked with the communications team to build a new set of Help Center FAQs tied to the updated flow, and I tracked the FAQs in a spreadsheet for engineering handoff.

A spreadsheet tracking the FAQs created for the new Help Center, with status and engineering handoff notes for each entry. — The FAQ tracking spreadsheet for dev handoff.

With marketing, for delivery. Once the copy was final, it moved into a recently revamped Iterable template and went through marketing review before shipping.

With engineering, as a source of truth. Throughout all of this, the master flow document stayed current. It was the single reference that captured backend decision points, customer communications, timing, variables to include in each message (vehicle make, model, year), assumptions, and tone guidelines. As new information came in, the flow updated; as the flow updated, engineering built from it.

It was a small team diligently revising every surface of the Metromile experience. Every flow had its own communication plan, and every communication plan had to be implemented across multiple channels with the right message, the right tone, and the right timing.

Outcomes

Animated email mockup notifying a customer that their Pulse device is having trouble (initial communication).

Animated email mockup asking the customer if they need help reconnecting their Pulse device.

Animated email mockup with a final notice that No Signal charges will begin if the device isn't reconnected.

Animated email mockup confirming the device is reconnected and no charges were applied.

Animated email mockup offering to send a new Pulse device when the existing one couldn't be reconnected.

1 / 7

A walkthrough of the final email and push notification mockups, in the order a customer would experience them — from the initial "trouble" notice through to the resolution or new-device offer.

The redesigned service launched in mid-2019. Within a few months, the numbers had moved.

Median days to reconnect an unplugged device went from 9 to 3. Customers were getting their devices back online faster, which meant less time in the uncertain middle for everyone.

Customer experience contact volumes dropped from ~15,000 per month to ~4,000 per month. The flows were doing the work that customer experience had previously been doing by hand.

NPS comments mentioning No Signal charges (the renamed Penalty Miles) dropped 30%. The sharpest signal that the experience felt different to customers, not just to the team.

Beyond the original scope

The service design was built flexibly enough that after launch, we were able to repurpose the flows and the comms strategy for all of our device onboarding and replacement flows, not just signal loss. That meant the investment paid back many times over: less engineering work across downstream projects, better customer experience across every Pulse interaction, and higher device adoption overall. Device-related premium collection improved as a result.

The strongest version of a service design is one that doesn't just solve the problem in front of it; it becomes reusable scaffolding for problems that didn't have a solution yet.

Back to clarity

When the system is uncertain, the burden belongs with the designer, not the user.

The question I kept coming back to on this project was: when a system can't be certain, who pays the cost of that uncertainty? The original design had answered that question by defaulting to charging the customer. The redesign answered it differently: by designing every flow, every piece of copy, every animation, every escalation path to hold the uncertainty on our side of the experience, not theirs.

If I'd kept working on this, I'd have pushed further on in-product device status. Buried in the Metromile app was a place where customers could check the connectivity of their device, but it wasn't easily accessible and the information wasn't always reliable. Bringing device status front and center in the app and web dashboard would reduce the stress around the device, give Metromile a clearer channel for communicating when action was needed, and give customers a natural jumping-off point when something was wrong. The service design addressed uncertainty after it happened. The next step was making it visible before it did. I communicated this recommendation to the Lemonade design team before I left, and today, users can see their device status in the Lemonade app.

A mobile screenshot of the Lemonade app showing the customer's Pulse device status, including connection state and a clear path to fix it. — Device status in the Lemonade app. Customers can now see when their device has lost connection and get a clear path to fix it, before charges begin.

An internal CX-agent view showing connection status, last heartbeat, and device type across all of a customer's vehicles. — The internal device management view. CX agents can see connection status, last heartbeat, and device type across all of a customer's vehicles, giving them the context to help proactively.

Six years later, the design problem the industry is working on (how to build humane, trustworthy experiences on top of models that can't be fully certain) looks a lot like this one, scaled up. The principles still hold.