Using statistics to find the nicest (and nastiest) food at Waitrose

Using the Wilson score interval to identify the most delicious, and disgusting, foods at Britain’s best online supermarket. Jump to the results.

The hardest bit of cooking, for me, has always been choosing what to cook. Sure, it’s fine if you only make dinner once a week – then you can flick through a designer cookbook and pick the prettiest picture.

But actual cooking, every day, without it taking up all your time – that’s tougher. You need food that is tasty, healthy, and affordable. Finding this is hard, so it’s easy to end up cooking the same things again and again.

And online shopping makes meal-planning even less inspiring – you can’t smell a tomato to see if it’s ripe. That’s why I was so excited when I realised I could use the power of statistics to find the overall most delicious – and the most disgusting – things you can buy at Ocado.

(For non-UK readers, Ocado is an online supermarket chain. It mostly sells food from Waitrose, the best British supermarket. And I have no affiliation with either company.)

How not to sort by average rating

I was mulling the above recently when I came across Evan Miller’s How Not To Sort By Average Rating. It’s a great article, and I realised I could use it to make my own shopping easier.

Ocado have great reviews on their site, with rich comments and star ratings, but they commit the second type of sin mentioned by Evan – they rank their groceries by mean rating. This means that you can’t reliably tell which groceries are actually the most popular.

Let me show you why this matters. Say I want soup for Friday lunch. I will find the “soup” category on the Ocado site, and then sort by customer rating. These are the top results:

As noted above, Ocado ranks by the average – mean – star rating for each product. This throws up some weird anomalies.

For example, in third place is Swedish blueberry soup, with just two reviews. Both those reviews are five-star, so it has a mean rating of five stars. Swedish blueberry soup may well be delicious, but with only two reviews, I’m unwilling to take a chance.

Much further down, in 10th place, is some gazpacho, with 48 ratings, of which 47 are positive and 46 are five-star. That means it has a slightly lower mean ranking, of 4.91 stars, so it comes further down the list. But 46 people loved it enough both to review it and give it five stars.

I want to try that gazpacho! But to work out it was popular, I had to click every soup to check the number of reviews. I can’t do that with every single thing on my shopping list.

Estimating true popularity with the Wilson interval

So how can we unleash the full potential of Ocado’s reviews? Evan’s article explains how to trade off a high average rating against the overall number of reviews. We can calculate a confidence interval for each soup’s true popularity.

Here is Wilson’s interval in full:

The maths looks complicated, but the premise is this (as articulated by a Hacker News comment): “If we rounded up the entire population and forced every single person to carefully review this item and issue a rating, what’s our best guess as to the percentage of people who would rate it positively?”

The clever thing about the Wilson interval is that it looks at the number of ratings as well as the value of the reviews. If few people have reviewed a product, our confidence interval is wide. As more people review it, the confidence interval narrows – because we’re more confident about how good or bad it really is.

So, we can now rank all the foods listed on the Ocado site. I wrote a Python script to scrape them all. For each item, I recorded the total number of reviews, and the proportion of reviewers who would recommend the product.

Then I wrote more Python code (based on Evan’s Ruby example) to calculate the Wilson score interval for each product. Here is the full script – you are welcome to use it for your own projects.

The results: sugar and convenience good…

Without more ado, here are the definitive results: the most popular of the 18,229 foods that you can buy at Ocado, ranked by the lower bound of the 95% Wilson score interval.

So what does this tell us about Britons’ tastes? Well, it seems we really like:

  • Fattening food. We bought the apple yogurt, in second place. It is satanic – so sugary that I threw it away unfinished. The passionfruit yogurt in 19th looks even sweeter. The green Thai soup in 13th is very nice, but at 500 calories a pot, it ought to be.
  • Convenience food. Frozen pain au chocolats and baguettes – these are indeed handy. Posh fish fingers. Ready-chopped shallots. You get the picture.
  • Reliable basics. Eggs and milk do surprisingly well. Who reviews milk?! But Clarence Court eggs are indeed very nice.
  • Specialist foods. Tofu, gluten-free bread, quark, dairy-free ice-cream – I guess tasty versions of these become cult items for people with restricted diets.

The full spreadsheet is here. You’ll see that I exclude some branded products from the list. This was because they had sponsored reviews, so I didn’t think it was fair to include them.

…Heston Blumenthal and runner beans bad

We can also calculate the most negatively rated items. This is quite simple – we just plug the same data into the same equation, but instead of looking at the number of reviewers who would recommend the item, we look at the number who wouldn’t.

Here are the most hated things sold by Ocado, ranked by the lower bound of the 95% Wilson confidence interval.

What are the patterns here? It seems fresh fruit and vegetables are often disappointing. We tried the runner beans – they tasted of stringy dishwater – and the peaches, which went from rock-hard to rotten overnight.

British bagels also suck, but we all knew that.

Speaking of serious problems, I really want to try the Heston Blumenthal baked alaska, which apparently consists of “smooth raspberry parfait encased in crisp chocolate glaze surrounded by banana and caramel parfait wrapped in a light sponge and covered in soft meringue”. Just 8 out of 91 reviewers recommended it, and this is what people said:

Horrible…
Synthetic…
Bleuch…
Just wrong…
[Tastes of] amyl acetate and the artificial strawberry flavour in the penicillin we had as kids…
Chemical Ali could have made better…
Simply the foulest dessert we’ve ever tasted…
Worst product I have ever had…
Even my dog wouldn’t eat it.

Here’s the full spreadsheet. The “Proportion positive” and “Proportion negative” columns show the Wilson boundaries – perhaps it will inspire your own shopping.

With luck, Ocado will eventually change the way they rank their items. In the mean time, I’ll be using the spreadsheet to find inspiration – and steering clear of Heston’s runner-bean surprise.

If you are interested in the maths behind the Wilson score interval, there’s a good discussion at Hacker News, including links to some critiques of the approach.

6 Comments

  1. Fascinating. I shall be looking at review ranking in a whole new light.

  2. This is brilliant.

    I was thinking about doing the same thing with just-eat.co.uk. I am never going to order from the pizza place with only one review….

  3. First, I wanted to say: Love your work! I’m an epidemiologist, not quite so stats-savvy as yourself, and certainly no coding guru, but loving the way you’re using the numbers for real, everyday things.
    Second, apologies for the third (below) as it’s about something you did a couple of years ago rather than this post, but there was no comments section I could see on the relevant blog.
    Third, regarding your “What Size Am I” app – have you had a lot of feedback? The thing that I’m particularly interested in is whether the shops actually make their clothes to fit the measurements they say they do. So, if your measurements did match a shop’s published sizing, would their clothes fit? I’d be interested in any views you might have on this!
    Carry on the good work,
    Lucy

  4. like your work

Leave a Reply