CellarTracker Scoring Methodology

I want to update my score methodology for more consistency and am curious for folks input/how they approach scoring levels. I feel at times my scores don’t have enough dispersion and I would like better personal guidelines for my scores.

As I think about a framework, I am gravitating to the following guidepoints:

  • Generally speaking I feel the total score range is 85-100, at least at the price point of wines I generally drink (for the most part $35+, although the occassional $15-25 bottle).
  • I generally want these scores to pattern match existing CT scores meaning, a 91-93 score from me would correlate to a 92 on CT (not 100% of the time, but generally)
  • CT has its own recommended scoring guide, but this doesn’t seem applicable to how people score (CT recommends 80 = good, but I imagine folks would rate good a bit higher)
  • I think there are 4 major scoring categories: best ever/epiphany good, above average/pleasurable, house/cusp of drinkable, and harsh/burning/acidic. And then one can decide what guideposts are within this range.
  • Overall this would have: Lower scores are characterized by rough alcohol/tannin/acidity, more so than issues with fruit. 93 being a point where a wine feels balanced and pleasurable. Higher scores are more subjective (e.g., a qualitative appreciation for the many quantitative scoring of aroma, balance, finish, etc.), and for me, having not had as many 95+ scoring wines as I have 90-93, a category I will just have to get better at.


    So step 1, I was thinking guide post ratings are:
    85 = Harsh/burning/acidic - don’t want to swallow
    89 = House wine/cusp of drinkability
    93 = Above Average/Pleasurable
    99 = Epiphany good wine

Leading to then fill out (at 2 point intervals)

85 = Harsh/burning/acidic - Don’t swallow
87 = Harsh/burning/acidic - Swallow if you want
89 = Slightly harsh/burning/acidic, but not overly so
91 = Well made, but perhaps lacking a component or not particularly pleasurable
93 = Above Average/Pleasurable
95 = Extremely good wine
97 = Makes time stop wine
99+ = Epiphany good wine

[winner.gif] Tasting Notes and Ratings
Tasting notes are one of the core components of CellarTracker; in fact, we currently have over 5 million user-generated reviews in our database! They offer a way for users to share their thoughts on how a particular wine is tasting at any given point in time.

Tasting notes are public and are seen by all users. These should not contain copyrighted material, scores from critics, or promotional material from retailers. In a nutshell, a tasting note should reflect your impressions on how a wine tastes. It is important to keep in mind that tens of thousands of people see these tasting notes every day. Any remarks that are defamatory, offensive, or strongly political may be edited or deleted. Play nicely!

With regard to copyrights, please note the following which appears on the personal tasting note entry screen:

Note: Material you enter here is publicly visible to all viewers of the site and should not include copyrighted material. The use of tasting notes is governed by Section 6 (Contributions) of the Terms and Conditions. Please use Pro Reviews to privately store copyrighted material and Articles to store winemaker notes.
CellarTracker also supports a variety of other public and private note types; for more information, see the Public and Private Note Types help topic.

Wine Rating Scale
CellarTracker uses a 50-100 point scale for rating wines.

Score Grade Meaning
98-100 A+ Extraordinary
94-97 A Outstanding
90-93 A- Excellent
86-89 B+ Very Good
80-85 B Good
70-79 C Below/Average
50-69 D Avoid

I don’t have empirical proof handy, but I doubt the average user awards 80 points to good wine. Or 86 to a very good wine. Versus something much higher.

I dont know, I found a note the other day and the comment was “best wine I have ever had” and then the score was 83.

So your goal is for your scores to conform to those of the average CT poster? That isn’t a goal I would personally pursue. Scores and TNs are primarily for me. That may be a bit selfish, but I’ll usually add some context to give a reader an idea of where my palate is. That’s more important than a score IMO.

Don’t want to swallow is an 85? Don’t want to swallow is a 75 tops, maybe a 70 in my book. I’d expect an 85 to be a good wine. This is one of the reasons CT average scores are useless to me. At least if there’s a TN I can ignore the score.

One of my resolutions this year is to use more of the scoring range, not less. The range of my scores has become so compressed as to be all but meaningless, which annoys me. In any case I’m going to use scores in the mid to high 80s more frequently for wines that I find to be good but not great. A reasonably priced wine should not feel any shame for getting 87-89 points.

I’m also seriously considering stealing A. So’s approach of using increments to address to some degree the false precision of the 100 point scale (see acyso's User Profile - CellarTracker for his scale).

What I am finding is:

  1. I think my historical scoring paradigm/what I posted in OP correlates pretty well with current CT socres
  2. But as I drink smarter/better, I need to create scoring room for these better wines (said another way, a 95 to me today is a much better wine than a wine I would have said was a 95 five years ago).

Example:
I just had a 2009 Drouhin Clos Sorbe. Historically and per my qualifications in the OP it would be a 90. As it so happens, all four prior tasting notes gave it a 90, so success! However, I feel in my gut that was not a 90 wine. That over the recent years I have had more enjoyable 90 scoring wines. So this should be an 89, or something lower… I need need more dispersion as I get more educated.

ASo:
Generally speaking, I think I suspect my scores are ~1-3 points above his. e.g., what I score a 95 he would give a 92-94, and what I give a a 90 would be a 88-90 in his book (I think). It is mentally difficult to emulate his scores, as I don’t drink enough 95+ wines in his book. Further I haven’t drunk enough of what he considers 93 vs. 95 vs. 97 wines to really appreciate the difference between his 100 and a 93… although I feel scores 94+ in CT are a bit of a random walk so this is OK.

Conclusion so far:
Both the 2009 Drouhin and ASo example suggest I should use a slightly lower points scale, which will give me more dispersion to play with on average wines. However, I have to accept this will likely cause me to be a more conservative CT scorer.

Secondary conclusion is CT has a missed opportunity of doing a better job of harmonizing scores. Perhaps they will take this flag up in the future, which has added bonus of increasing user engagement on the site.

I would prefer if CT had he option of standardized scores. That is, I choose the adjectives and CT provides the score. Therefore, if i choose the adjective “good” it would start at 82 and build from there. Thus if i want a wine to rate at least 90, i would have to choose “excellent”, so on and so forth. I just dont understand people who use private scales for PUBLIC notes. Why not just use private notes.?

I use the CellarTracker system. Typically, I put the adjective “Good”, “Outstanding” etc. at the end of the note, along with the corresponding numerical score.

83 points… [rofl.gif] I love the honesty.

The rating system is so all over the place with how different folks use it that I just have given up. No scores, just written notes that emphasize helping another know if they would enjoy it/if it is ready to drink. Some people think 93pts is just ‘good’ and for others that’s 85, still another might try a wine that is obviously corked and give it a 75 or whatever. I have rarely found any use for the numerical scores on CT. Then there’s another guy who just writes what Wine Spectator scored it and gives the same number…

Easier to search or create a rank order list on the score field?
The score field seems like the obvious place to record a score?
They’re not thinking about the community average?

I asked Eric if they could list wines in order of the average score of people you were a fan of, since CT calculates that. He gave me an explanation of why they couldn’t that was above my head technically. Think he said it was because it was a dynamic score, but I didn’t understand how that made it different from the average of all scores.

[winner.gif] We have another winner!

The whole point of Cellar Tracker notes and scores is to add what you think, not guessing what others think.

Hitting the average isn’t like getting an answer right on a test.

If you have a totally different impression of the wine, then indeed you should not have the same score. But if you think similar things as another taster’s note, the better outcome is for two similar point scores versus one at 90 and one at 80.

You could just use the YELP scale that so many these days are tied to.

1986 Chateau d’Yquem - Popped this bad boy at Red Lobster for the gf’s birthday. This didn’t pair well with anything on the Fisherman’s Platter. I asked the guy at the wine shop for something real special for her birthday and he came up with this. For $300 I would have expected at least a full bottle. Way too sweet. We switched to beer and gave this to some unsuspecting guy at the next table. Love the biscuits! ***

But that’s impossible when there are hundreds of different scoring paradigms in play, and you don’t have a representative sample of them participating in the scoring of the particular wine you are evaluating.

Whatever scoring system you employ, it will align with only some of those that others use. If your goal is to use one that fits somewhere near the middle of the road, your scores will be close to the mean of other CTers. That’s fine if you value the collective, but it means you will have to adopt what I perceive to be a distorted and compressed scale.

I understand that the bulk of people posting scores on CT are wine-savvy to the point where they are drinking more high quality wines than plonk, and that that contributes to the narrow range of scores. But it doesn’t explain all of it. Some of it is avoidance of cognitive dissonance: not wanting to give a wine you bought a low score. Some of it is only posting scores on the good stuff. But if that leads to a wine so unappealing you don’t want to swallow it getting a score of 85, it’s contributing to a distorted, inflated, compressed scoring system.

My personal approach to this is to avoid point scores altogether. I use 6 broad categories for non-flawed wine: poor, average, good, very good, excellent, outstanding. The main reason I do this: I’m not a good enough or frequent enough or consistent enough taster to be able to repeatedly score a wine within 1 or 2 points on a 50 point (or 30 or 25 point) scale anyway. I only use points for tastings where the organizer requests it. Even with broad categories, most of the wines I drink I rate as very good or excellent. But that’s because I’ve learned what I like over the years.

I subjectively give bad wines lower scores and good wines higher scores neener

There are so many reasons why that is just not reality, and why frankly it’s OK that it’s not. Here are just a few.

You are using different scales see posts 1 & 2
You come at said wine from very different perspectives i.e. he loves The Prisoner & you drink nothing but Chablis (the real stuff, not the stuff in the box)
The other taster may have an ax to grind, be winery fan boy or just nuts
If you like looking at scores use the CT average as that weeds out the the highs & lows (to me still useless)
Or if you use CT like I do, other tasters scores are irrelevant as, my scores are just a reference point for myself.
I hate to get existential here, but none of this means anything, your enjoyment or not, should not change because someone else liked a wine 2 points or 20 points differently then you did.

Cellar Tracker, like Democracy, is flawed but be glad that it is still better then everything else we have come up with so far.

Shon –

Unlike others who have commented, I think trying to normalize your scores is laudable (albeit probably difficult). If you are scoring wines at all, you are either doing it because it is helpful to you or because you think it may be helpful to others. In order for your scores to be helpful to others, they need to understand what the scores mean. Trying to discern a community consensus as to the scoring scale is the most natural way to post scores that are understandable to others.

Now, since individuals’ own scoring scales do not necessarily correspond to a community average/consensus scale, putting your scores on the latter scale will not necessarily make them correspond to a particular reader’s own scale. It should, however, make them intelligible in the context of other community scores, and that should be helpful. I have previously analogized this to grading in academia: I can complain all I want that today’s grading scale is inflated compared to what it was when I studied, but if I want prospective employers and other academic institutions to be able to understand the grades that I give my students, I had better not give a B+ for work that is excellent, bordering on superb. Doing so will only confuse the consumers of the grades I give. Since I am not arrogant enough to believe that consumers will try to figure out how my scale differs from the norm, I try to conform to the norm.

I also think that the scale you set out is probably somewhat close to the CT norm (and, indeed, the norm among critics, if you consider them as a group), with the following caveats:

  1. People probably allow a little more space than you do between “barely drinkable” and “above average” – an average wine might be somewhere in the 87-90 range, depending on the scorer, but most scorers would not give a score in their subjective “average” range for a wine with decidedly unpleasant characteristics. (Note, also, that some of your unpleasant descriptors – harsh, burning, acidic – would not necessarily show as unpleasant in the context of what is expected from a particular wine, e.g., a Bourgeuil, a young Port, a Riesling.)

In my post linked above (which is from 9 years ago, during which period scales have compressed upward a little), I suggested dividing the range below “above average” into a 3-point range for “unobjectionable but not particularly special,” with scores below that indicating that the taster “did not really like it very much.”

  1. For usability, it should not be so much your goal to have your scores uniformly fall near a median CT score for the corresponding wine as to have the distribution of your scores match the median distribution of a CT user. Statistically, the median scores for wines on CT (at least wines with a reasonable number of reviews) will tend much closer to the overall community average than do individual scores for a given wine – because the median scores are themselves averages that eliminate a degree of variability.

a) Just because a wine is fantastic (or terrible) in some quasi-objective sense does not mean it will show that way to every consumer on every night, so that variability will lead to some variation in scores.
b) Add to that the variability of different users’ score ranges.
c) Add to that the fact that the most fantastic wines are (because of cost, accessibility, or simply the fact that those who like repurchase) more likely to be consumed again by those who have most enjoyed them in the past, who are more likely to be jaded, inured to excellence, or disappointed by the contrast of today’s experience to an even better previous one. And conversely, those who truly hate a given wine likely will not drink it again, giving it a better chance of being scored by someone who might find a redeeming characteristic.

As a result, if you were always hitting the CT average, you would be scoring on a compressed scale by comparison to other users.

– Matt

Exactly what I was about to post.