I think this got to one of mongoose's main critiques. It seemed that a lot of the individual experiments didn't show a significant change.
No......the data showed that the number of tasters correctly picking the odd one out did not rise to the number necessary to denote statistical signficance at the .05 level.
However, when you compare a beer made with 4 "generally accepted best practices" vs a beer made with 4 "shortcuts", then it *is* different enough that tasters can perceive it.
When you say "tasters" you mean just those that could, and include those who were guessing, correct? One of the things I look at with results like these is not only how many supposedly could discern a difference, but also how many were unable to do so.
I.e. one critique is that individual process steps may have effects that are below the taster's detection threshold, but if brewers start taking that as gospel and deciding they can make the "short & shoddy" beer without having an impact on their brew, it's incorrect. That we shouldn't look at the result of one experiment and use it to determine that brewer's best practice is incorrect just because no difference is observed.
This is why I wonder about the cumulative effect of small differences that each are below the level of perception but when added together rise to a level that can be perceived.
I think I get at the heart of what you're saying in my above paragraph. The experiments are all interesting, but as you initially said you saw brewers in the comments saying "oh well I guess I can stop doing process X now!" they were making changes based on information that was FAR too incomplete to justify a change if they were happy with their beer.
Actually I think it was someone else who said that.
This is where I think you go too far. I.e. the short & shoddy experiment, it was DEFINITELY significant. However, a 6-2 split between the preference test you said wasn't enough to justify that perhaps the 4 "generally accepted best practices" were better than the "shortcuts", and in that case I disagree.
Sure, there's a small difference. But one thing I do is also look at those who could not discern a difference. The results were 6-2-4. Not quite as rock solid as 6-2 appears.
One way I suggest people think about these kinds of results is to assign a dollar value to them. In other words, would you bet $1000 that the results are correct? $100? $10? $5? While not "signifcance" such a thought experiment helps us think about how confident we are in the results. I'm at about $20 in this one. I surely would not bet $1000 on the accuracy of the results.
You discount things like preference TOO much. Yes, while I'll agree that we can't give preference too much weight, we shouldn't ignore it either. Human tasters (especially inexperienced and non-trained tasters) are an imprecise tool. Often they may not fully understand the root of their preferences. But that doesn't mean their preferences don't have weight.
Well, we'll have to disagree on that. I'm not sure what preference means. Only if there's an overwhelming majority on one side or another will I be more confident in the conclusion. Say, if 15 people could discern a difference and the preferences were split 14-1. That seems fairly convincing. A split of 9-6? Not so much. But that's me, you may think differently.
Here's why: we're making beer for human consumption. These are people who like and drink beer, so it stands to reason that when we're talking about process, they're more often than not going to prefer the objectively better beer. I.e. their preferences give us a reasonable idea which processes are more likely to help us brew "better" beer. The vast majority of tasters don't like off flavors (hence why they're called off flavors).
You lost me at "objectively better beer." Yes, if the preferences align such that a large majority (14-1, say) agree one is better than the other, it seems also likely that I'd agree with them if sampling the same beer. But when it comes out as 6-2 with 4 expressing no preference, now I'm not so sure.
One could reasonably suggest that the reason the results were 6-2 in favor of good practices was that the short and shoddy beer had some objective flaws. Is it close? Yes, because we're talking about a brewer with a lot of experience and who probably has his other processes (i.e. sanitation, oxidation, etc) really well tuned. So even if he took the short and shoddy approach, it was probably still not "bad" beer. But 6-2 may be enough to call good processes objectively better.
Again, you're leaving out the 4 who had no preference.
One of the beauties of these things--both the exbeeriments and HBT--is that nobody has to agree with anyone else, and we all can use whatever processes we want. We get to make our own decisions!
If, for you, a 6-2 split with 4 no preferences is enough to make a conclusion, go for it. I've used Brulosophy results--the trub- or no-trub exbeeriments led me to try it myself, and I could not tell any difference. The exbeeriments can certainly give us ideas to try and those that make sense to us should be tried if possible.
Sometimes that leads down wrong paths, yes. I.e. the W-34/70 fermented cold vs warm might suggest that the tasters prefer esters to clean lager styles, not that W-34/70 will produce a great clean lager at 82F. And so higher sample sizes and more replication is important, and a critical eye at *interpreting* the results in context.
Bingo! As we cannot know what the tasters are perceiving, then what do the results indicate? And if you get a panel of tasters for whom those flavors are more important, then there you are.
But when you impugn the concept of tasters based upon mere questions of panel construction, what they did before the tasting, etc, I think you're basically saying that you throw out the human tasting aspect of it entirely as a guide.
No, as I have noted above, if the split is 14-1 in preference to one over the other, that's a stronger result than 9-6.....or 6-2-4.
And if you think I'm "impugning" the panel on the basis of panel construction, what they did before tasting, etc......well, maybe. I just don't know the answer to those questions, and neither does anyone else.
But there's not necessarily evidence that drinking an IPA an hour before tasting the best practices vs the short & shoddy beer will affect whether or not you can perceive that one is cleaner than another. No taster is ever a fully pristine slate. But human tasters are still able to do a LOT of things better than running beer through a spectrometer to determine it's composition. Humans can tell you whether it tastes good to a human palate. And that is important data, even if it's somewhat noisy.
We're going to have to disagree on this. You are setting up straw men and using them to buttress your point. I'm going to knock them down right now.
No, there's no evidence that drinking an IPA an hour (what about 5 minutes) will affect perception. But there's also not evidence that it doesn't. And we don't know what they did before taste testing, which is the point. You don't know. I don't know. And as a scientist, I'm trained to think about what could cause the results to be false. This is one of many things that could cause them to be false.
And while I can tell you if something tastes good to a human palate, we can't know how universal that opinion is. That's the point here.
***************************
Good science is organized skepticism. We try to disprove things in science because as it turns out we cannot really prove anything. That's the whole basis of null hypothesis testing.
We're looking at potential causal processes here. Change this variable and what happens, if anything? To demonstrate causality you need to show correlation (differences with and without the experimental treatment), time order (cause precedes the effect), and nonspuriousness (there are no other explanations for the results).
My comments in this thread focus primarily on whether we're really measuring something useful, and whether there are other explanations for the results.
*****************
Again, bully for Marshall and his associates for trying to bring data to bear on issues related to homebrewing. Everyone can use or not use those results as they see fit. If using them results in better beer, I only hope that someday I get to enjoy one with you.