AI Personalization at Scale: What Works: Andrew Luxem

The vendor demo problem

Every marketing platform now sells AI personalization. The pitch is consistent: feed customer data into the system, let the AI optimize everything, watch engagement lift. The demos look great. The reality is more specific than that.

Some AI personalization capabilities work well today and produce measurable results in production environments. Others require data volumes most brands don't have, infrastructure most teams haven't built, or governance frameworks nobody has written yet. Knowing the difference between the two is the entire game.

I've deployed AI personalization features at multiple companies across different platforms. Here's what I've seen work, what I've seen fail, and where the line sits right now.

Send-time optimization: the quiet winner

If you're only going to implement one AI personalization feature, make it send-time optimization. It's boring, it's unsexy, and it consistently delivers.

The concept is simple: instead of sending a campaign to your entire list at 9 AM, the system predicts when each recipient is most likely to engage and delivers the message at that individual's optimal window. The AI learns from historical open and click patterns and adjusts per user over time.

In my experience, send-time optimization produces a 5-12% lift in open rates and a measurable (though smaller) lift in click rates. It requires no changes to creative, no changes to segmentation, and minimal setup. The data requirements are modest: a few months of send history is enough for the model to start making useful predictions.

The reason it works is that it's solving a narrow, well-defined problem with clean signals. The AI isn't trying to understand what a customer wants. It's trying to predict when they check their inbox. That's a tractable problem with reliable feedback loops.

Every major ESP supports this now. If you're not using it, you're leaving the easiest AI win on the table.

Content recommendations: works with guardrails

AI-powered content recommendations (product suggestions in email, personalized website modules, "you might also like" blocks) are the second most reliable AI personalization feature. But the gap between good and bad implementations is wide.

Good implementations constrain the recommendation space. They define business rules that the AI operates within: don't recommend products the customer already owns, don't surface out-of-stock items, don't recommend across incompatible categories. The AI handles the ranking within those constraints. The constraints handle the edge cases the AI doesn't understand.

Bad implementations give the AI full autonomy and end up recommending winter coats to customers in Phoenix in July because the model optimized for click probability without business context. Or they recommend the same bestseller to everyone because the model converges on popularity when it doesn't have strong individual signals.

At Bed Bath & Beyond, we found that recommendation engines performed best when limited to a customer's active product categories plus one adjacent category. Broader than that and the recommendations felt random. Narrower and the model didn't have enough candidates to produce useful variation.

The key metric isn't recommendation click-through rate. It's incremental revenue per recipient compared to a non-personalized control. If your AI recommendations are driving clicks but not incremental purchases, the model is optimizing for engagement, not value.

Subject line generation: useful, not transformative

AI-generated subject lines have gotten meaningful attention from marketing teams. The capability is real: language models can produce dozens of subject line variants, test them against historical performance data, and predict which will perform best.

In practice, I've seen AI-generated subject lines produce a 3-8% lift in open rates over human-written ones when the human baseline is a team writing subject lines quickly under deadline pressure. When compared against a skilled copywriter who has time to craft and iterate, the gap shrinks to near zero.

Where AI subject lines add value is speed and consistency. A team that needs to produce 15 variants for a multi-segment campaign can use AI to generate candidates faster than writing them from scratch. The AI handles the volume. A human editor handles the brand voice and quality check.

Don't remove humans from the loop. AI subject lines occasionally produce technically optimized but tonally wrong output. A subject line that maximizes predicted open rate by using urgency language might conflict with your brand positioning. The AI doesn't know the difference.

Where AI personalization still falls short

Dynamic journey orchestration. The promise: AI decides in real time which channel, message, and timing each customer should receive at each point in their lifecycle. The reality: most implementations default to simple rules with an AI label. True multi-step, multi-channel journey optimization requires enormous data volumes, complex reward signals, and attribution models that don't exist in most organizations. I've seen this demo'd impressively and deployed poorly more than any other AI feature.

Predictive content creation. Generating entire personalized email bodies or landing pages per customer sounds transformative. In practice, the quality control problem is unsolved at scale. When you're sending 2 million emails with dynamically generated body content, who reviews the output? What happens when the model produces something off-brand, factually wrong, or legally problematic? The governance infrastructure for AI-generated customer-facing content at scale doesn't exist at most companies. Until it does, this capability is a risk, not an advantage.

Emotion and sentiment-based personalization. Some platforms claim to detect customer sentiment and adjust messaging accordingly. The underlying models aren't reliable enough for production use. Customer sentiment inference from behavioral data (not explicit feedback) has high error rates. Building campaigns around unreliable sentiment predictions means you'll send empathetic messaging to happy customers and upbeat messaging to frustrated ones. That's worse than no personalization at all.

The guardrails that matter

AI personalization without governance is a liability waiting to surface. Three guardrails are non-negotiable.

Fallback content. Every AI-personalized element needs a default. If the model can't generate a recommendation, the subject line generation fails, or the send-time prediction has low confidence, the system should serve a human-curated fallback, not nothing or garbage. I've seen campaigns where the recommendation module served blank space to 8% of the list because no one defined the fallback behavior.

Suppression logic. AI optimization will, left unchecked, over-index on your most responsive customers. Send-time optimization sends them more messages at their peak engagement time. Recommendations show them more of what they click on. Subject lines target their known preferences. The result is your best customers get fatigued fastest. Build frequency caps and suppression rules that the AI cannot override.

Human review thresholds. Define what percentage of output needs human review before send. For established features with track records (send-time optimization), the threshold can be low. For newer capabilities (AI-generated content), the threshold should be high. Reduce it over time as you build confidence in the output quality.

The takeaway

AI personalization is not one thing. It's a collection of capabilities at different maturity levels. Send-time optimization works today and should be running in every program. Content recommendations work with proper constraints. Subject line generation saves time without transforming outcomes. Dynamic orchestration, predictive content generation, and sentiment-based personalization are not ready for production at most organizations, regardless of what the vendor demo showed.

Deploy what works. Test what's promising. Be honest about what's still hype. That's the only personalization strategy that holds up over time.

Keep Reading

Glossary: Deliverability | Holdout Test