I don't think that is correct, AJ. If the null is that the two are indistinguishable, the alternative is that they are.
But they are distinguishable in some way or ways i.e. one has more or less of some attribute (or attributes) than the other. The second part of the test really asks whether the beers are distinguishable with respect to that attribute which is what the investigator is interested in. H0 should really be worded "Beer A and Beer B are indistinguishable with respect to the attribute or attributes of interest." And it then becomes incumbent on the testing team to insure that other attributes are hidden or masked.
For example, the test team would place the beer in opaque cups if the long lagering in the German vs crash scenario I mentioned in my last post removed more color than the quicker program.
Unless the investigators wanted to include color as part of the criteria for goodness.
Not that one is better than the other.
In that example the question was "Is one better?" I grant you that if the answer to the first part of the test is that the beers are indistinguishable there is no point in asking the question but as I showed in that fermentation program example the question augments our ability to confidently accept or reject the null hypothesis.
People are going ga-ga over preference, and it's a mistake to do that, IMO. People like what they like.
And that's fine if preference is the parameter you are trying to measure as in a marketing study. I agree in this case, i.e. where a subjective question is being asked, greater thought is required on the part of the investigators.
The reason for doing exbeeriments like Brulosophy does is generally to see if processes make any detectable difference.
And doing the two part test can help them determine that if things are done carefully.
I've noted repeatedly my concerns about panel composition as well as whether the panels have palates that are useful for this, partly because we don't know what they were drinking/eating just prior to doing the testing.
Those and a lot of other things are a concern. As I've noted elsewhere the instructions in the ASBC MOA (or any source describing discrimination testing) will list some of these and what to do about them (isolating panelists, quiet, masking parameters that are not of interest).
Even so, if one beer is preferred by 60 percent of tasters who can distinguish them and the other is preferred by 40 percent, it's not clear that we have learned much, if anything. Which would we like better? Maybe the former, but possibly the latter.
This is a statistics game. There is no definite answer. The best we can do is compute the probability of the null hypothesis and reject it or not.
As I like to say, what's the actionable intelligence? Answer: Not much if any.
Well in the example you give if the panel has 20 members and 10 qualify (can tell the difference) the probability that that can happen under the null hypothesis is 9.1% and we cannot confidently conclude that it (the null hypothesis) can be rejected. This assumes a properly conducted test. If one of the beers were, for example, served warmer than the other then 20 out of 20 would easily detect the odd beer and we would reject the null hypothesis concluding that our fermentation program choice did indeed make a discernable difference. But the null hypothesis in this case should be "Warm beer is not distinguishable from cold."
Now if, in this case we go on to process the "which did you prefer" responses we find, using p(M,N,n) to be the probability under the null hypothesis that out of M panelists N or more qualified (correctly identified the odd beer) of whom n or more preferred one or the other:
p(20,10,4) = 7.9%
p(20,10,6) = 4.1%
p(20,10,8) = 0.87%
Thus if 6 of the qualified prefer one or the other the support for the null hypothesis drops from above the usual minimum confidence level (5%) at 4 who prefer to beneath it and we can reject the null hypothesis based on the preference part of the experiment whereas we could not based only on detecting the odd beer (9.1%). That, IMO, is the actionable intelligence.
Forgive me, but a triangle test is to see if the beers can be distinguished from each other, isn't it?
Yes it is. Forgive
me for being so stubborn on this point.
So where did I get this idea about the second part of the test? I didn't make it up (I should be so clever). I got it from the ASBC Triangle Test MOA and so always have assumed it to be part of a triangle test. That MOA described the procedure as I have been outlining it and had two tables. The first was of the probabilities than N or more out of M correctly identified the odd beer and the second was the probability that n or more out of N preferred one or the other. Now I note that later (this was 25 years ago) editions of the MOA do not include this second table. I have no idea why. It's clear to me that the second part of the test increases its power (as illustrated by your example). But perhaps I am deceiving myself in thinking this.
Not about preference. It's used to identify who (presumably--but since it includes guessers that's suspect) can tell the beers apart.
It's used to see if beers are detectably differnent by the panel. Thus as I keep yelling it is a test of panel and beer. The panel must be qualified. You can, as I have noted several times here, calibrate a panel for an objective test (e.g. investigation of processes that might or might not increase diacetyl) but it is harder to do so for a subjective test (is there any detectable difference at all). As I have also said before, demographics appears to be the best we can do in such cases. Males between 18 and 21, members of your homebrew club and members of Inbevs QC team would doubtless give different results.
Including guessers is a feature, not a flaw. If the panel consists of all guessers the null hypothesis is likely to be accepted and we have demonstrated that this panel cannot distinguish the beers. That is useful information.
To some, not to all. Not all flavors are evident to all tasters, and not all flavors are desired by all tasters. I don't care for the flavor of Belgians; that's probably a personal failing, but I like what I like, and that's not what I like.
Again emphasizing that we must be very careful in picking panel members.
Not clear what you mean by qualifying.
Clearly if you are trying to find out whether one beer is better than another a taster who can't even tell that the beers are different is not qualified to express an opinion as to which is better. Qualify thus simply means able to pick the odd beer.
I've tasted beers I find different but have no preference one over the other. I cannot say which is better. There is no objective standard for which is "better," only that certain people prefer one or the other or neither.
And that is exactly what a brewery is likely to be trying to figure out. Who are these certain people? They are the ones to be targeted for a particular brand. Suppose a brewery has beer A and beer B which it presents to a panel from the western part of its city. Suppose the panel of 20 has 18 qualifiers (the beers are pretty distinct) but that half prefer A and half B). p = 1.4E-7 so there is no question that the test result is valid and it says that there is, among qualified tasters, no preference of brand A or brand B and the brewery shoots for equal shelf space for each in that market.
That's partially true, AJ, but when the purpose of an experiment is to see if there's a detectable difference, to find out whether a process is reasonably robust or not, one shouldn't get lost in trying to see if there's this flaw or that flaw.
If that's what you want to test for then that's what you test for. But if I'm going to do all the work of using the German process as opposed to the crash cooling one I don't only want to know if there is a detectable difference. I want to know if the process change improved the beer.
If you brewed two beers side-by-side whose processes varied only by one being mashed .2 degrees higher than the other, you'd be unlikely to find anyone who could reliably note a difference.
True. So before I did a test on a production beer to see if turning down the heat (thereby saving money) a little made a difference I'd calibrate my panel with 0.2, 0.4, 0.6... degree difference pilot beers. If I couldn't get a panel demonstrably sensitive to this parameter I wouldn't do the test.