7 common problems that derail A/B/n email testing success
Don't blame the channel if your efforts don't deliver results you need. Keep this old and reliable tool shiny and new.
Whenever I begin working with new clients who face major problems with their email marketing, one of the first things I review is how they conduct their email testing.
A/B/n testing is the best way I know to structure effective campaigns and to measure whether a brand’s email strategies and tactics are succeeding or failing. But all too often, teams struggle to set up tests correctly and measure results accurately. That usually leads to ineffective email experiments and poor results.
If your testing program is unreliable, you won’t know whether your chosen strategies and tactics are working or failing. Don’t blame the email channel itself if your email efforts don’t deliver the results you need. Instead, look at how you test and measure results.
7 common testing problems and how to fix them
These crop up most often in my work with clients. Solutions to some of these challenges will require a total mindset change. For others, just learning the proper way to set up tests can resolve many of your current issues.
That’s the good part about testing. For every problem, there’s a way to correct it. Every time you solve a problem via testing, you take another step toward putting your email program on the right path.
1. Testing without a hypothesis
Many email marketers pick up the rudiments of testing by using the tools their ESPs give them, mainly for setting up basic A/B split tests on simple features such as subject lines.
However, this ad hoc, one-off approach is like learning to drive a car without knowing how to read a map. You can turn the car on just fine. But you need map skills to plan out a journey that will get you where you want to go with the fewest traffic jams and detours.
Yes, you could let Google Maps do the planning work for you. But all the data – what you provide and what they pull from other sources – must line up right. If you type in the wrong destination or drive into a dead zone, you could end up miles from where you want to be.
That’s what happens to your email program when you either don’t test or test incorrectly. Your hypothesis is your road map for testing. It lays out what you think might happen and guides your choices for variables, testing segments, success metrics and even how to use the results.
2. Using the wrong conversion calculation
This relates to the customer‘s journey and the test’s objective.
When you do a standard A/B split test on a website landing page, you often use “transactions/web sessions” as your conversion calculation to see how well the page is converting. This makes sense because you don’t know the path your customers took to get there on the site, so you focus on this particular part of the journey, as it ignores everything that happens before it.
In email, we do know the path our customers took to get from the email to the landing page. We put them on it, and we want to optimize it. We want to understand how well our email converted, so we need to use “transactions/emails delivered” to calculate our conversion. This takes the whole email journey into account and doesn’t just look at how well the landing page converted.
As you can see in these two client examples, the conversion followed through with what the opens and clicks signified. Marketers use the “page sessions/purchases” calculation for vanity as it yields a higher percentage. However, it means that you could be optimizing for the wrong result.
Testing segments via business-as-usual campaigns
Testing automated programs
3. Measuring success with the wrong metrics
A workable testing plan needs relevant metrics to measure success accurately. The wrong metrics can inflate or deflate your results. This, in turn, can mislead you into optimizing for the losing variant instead of the winner.
The open rate, for example, has been a popular success metric ever since we learned how to use it back in the early days of HTML email. But it’s a flawed and unreliable metric, especially now that Apple’s Mail Privacy Protection feature masks a campaign’s true open rate. But even if opens were accurate every time, the open rate is still not necessarily the right metric.
Clicks, for example, are a more accurate engagement measure, but they don’t reveal how much money your campaign generated. If your goal is only to get clicks, go ahead and use the click rate. But if you’re rewarded on campaign revenue, you need to use a revenue metric such as number of purchases or basket value.
4. Testing without statistical significance
If your testing results are statistically significant, it means that the differences between testing groups (the control group, which was unchanged, and the group that received a variable, such as a different call to action or subject line) didn’t happen because of chance, error or uncounted events.
Having a small number of results can throw off significance testing, either because you could test only a fraction of your population or because the test didn’t run long enough to generate enough results. That’s why tests should run as long as possible (for automations) and reach a statistically significant sample size (for campaigns).
Most testing uses a 5% significance factor. This means your variable made a difference in at least 95 of every 100 results in your test, and the remaining five results could be random.
Results that aren’t statistically significant can lead you to assume the wrong conclusions and misinterpret both the test results and your campaign’s outcomes. Achieving 95% statistical significance indicates a 5% risk of concluding that a difference exists when there is no actual difference.
Everything you need to know about email marketing deliverability that your customers want and that inboxes won’t block. Get MarTech’s Email Marketing Periodic Table.
5. Stopping with one test
The philosopher Heraclitus said, “No man ever steps in the same river twice, for it’s not the same river and he’s not the same man.”
The same is true for your email campaigns. Your subscriber base is always gaining new subscribers and losing old ones, and customers don’t react the same way every time to every campaign. A campaign that worked well one time might fall flat the next.
If you run only one test and then apply the results to all future campaigns, you’ll miss these subtle but important changes. That’s why you must bake testing into every campaign, testing everything more than once to exclude anomalies.
This will give you trends you can consult to learn general truths about your audience and indicate important shifts in attitudes and behavior. Use these to fine-tune or overhaul your campaigns’ approaches.
6. Testing only one element in a campaign
Subject-line testing is ubiquitous, mainly because many email platforms build A/B subject line split testing into their platforms. That’s a great start, but it gives you only part of a picture and is often misleading. A winning subject line that’s measured on the open rate doesn’t always predict a goal-achieving campaign.
That’s one reason why I developed the practice called Holistic Testing, which moves beyond single-channel, one-off, single-variable testing.
Here’s an example of a motivation-based hypothesis you could use as part of holistic testing. It names the appropriate metric (conversions) and incorporates copy-related factors such as subject lines, headings, copy blocks, calls to actions and even landing pages:
As long as the changes to the variables support the hypothesis, then, by using multiple variables, you are making the test more robust. The difference between this and a multivariate test is that all the variables support the hypothesis, and when the winner is announced, we can apply what we’ve learned.
7. Not using what you learned to make email better
We don’t test to see what happens in a single campaign or satisfy curiosity. We test to find out how our programs are working and what will improve them – now and the long term. We test to determine if we are spending money on things that help us achieve our goals.
We test to discover trends and shifts in our audience that we can apply across other marketing channels – because our email audience is our customer population in a microcosm. Don’t let your test results languish in your email platform or in a team notebook.
An action plan for testing to refine an email campaign would look like this:
1. Develop a hypothesis that states what you expect to see and why and how you will measure success.
2. Report results accurately following the established testing plan.
3. Choose relevant metrics that measure outcomes (conversions, revenue, downloads, registrations, completed processes and the like).
4. Set a time length for the test (if an automation) or the number of tests to be performed (if a campaign) to generate enough results to pass significant testing.
5. Analyze results, write the conclusion and recommend future campaigns.
6. Put results into action – both within your email marketing program and other channels where appropriate.
7. Refine and repeat the testing process to improve and continue the cycle of testing, analysis and implementation.
Dig deeper: Is A/B testing dead?
Testing is more important than ever. Are you ready?
The COVID-19 pandemic upended email marketers’ knowledge of our customers. In 2020, we needed testing to detect what customers wanted and what changed and what stayed the same in their responses to our campaigns.
The pandemic is receding in many areas but threatening to rise again in others. Testing will help us stay ahead of new changes and put those insights to work right away. That keeps our email programs relevant and valued to customers and raises email’s profile as a reliable tool to help our companies achieve success.
I mentioned earlier that your email database is a microcosm of your customer base. Accurate testing results can uncover shifts in customer thinking and motivation that you can use to test and update your social media, your website, SMS marketing and even offline in direct marketing.
I can’t think of any other tool in the marketing kit that’s more versatile, cost-effective and adaptable than email. Accurate and up-to-date testing keeps this old reliable tool shiny and new.
Opinions expressed in this article are those of the guest author and not necessarily MarTech. Staff authors are listed here.
Related stories