Do "professional" brewers consider brulosophy to be a load of bs?

Homebrew Talk - Beer, Wine, Mead, & Cider Brewing Discussion Forum

Help Support Homebrew Talk - Beer, Wine, Mead, & Cider Brewing Discussion Forum:

This site may earn a commission from merchant affiliate links, including eBay, Amazon, and others.
Status
Not open for further replies.
i really appreciate the scientific rigor you are trying to bring to this debate. I do have a question and a comment.

Question: How many of the brulosphy experiments have you read (more than the headline and results)...actually read the full write up? Actually this question goes for other posters in this thread too...i see a lot of comments regarding the experimental design that don't seem to have really read too the reports.

In reading the vast majority of the experiments it seems to me that marshal and crew are focused on detecting whether process or ingredient changes result in a perceptible difference. Everything else is intended to guide thinking about future experiments.

While i am a firm believer in fermentation temperature control i do find it interesting that their experiments show that some other things i took as largely irrelevant may lead to perceptible changes in the beer that are easier for typical bomebrew drinkers to detect than control of fermentation temperature.

Take for example the tasters were able to distinguish between beer brewed with wlp001 and us05. But tasters were not able to distinguish between beer brewed with galaxy and mosaic hops. Tasters saw a difference between glass carboy and corney keg fermentation. But did not see difference between chocolate malt and carafa special 2 in a schwartzbier. In all of these examples i am much more impressed about whether people could detect a difference than whether the qualified group preferred one over the other. When i design a recipe it is my preference that counts, but i will likely pay more attention to choice of wlp001 vs us05 in the future.

Yes these are all single experiments and i'd also prefer to see them repeated before changing tried and true processes. That said all of them are presented with sufficient detail in the reports than any of us could take on the challenge to try to repeat.

+1
 
Not a single one! And why not? Because my goal has not, up to this point, been to critique what Brulosphy did or didn't do but rather to point out things like the importance of designing the test according to the information sought and the demographic it is sought from (a triangle test is a test of the panel and the beer) and some of the pit falls such as failure to mask a differentiating parameter that is not being investigated and possible procedural errors (e.g. failure to isolate panelists from one another and failure to randomize the order of presentation of the cups. The only comment I made about Brulosphy in particular was that if any criticism of what they did was justified it was probably in one of those areas and I'd say that about anybody I new was doing a triangle test.

haha ok

To me it seemed that your heavy science/math posts on this thread were intended to argue that the Brulosophy crew was somehow failing in their scientific rigor.

In hindsight this may be exactly the answer to OPs question. Not really all that interested in amature science.
 
A friend recounted a story about his cousin who worked on a Ford assembly line. When asked what he did. He said he screwed these little white things into the engine, but didn't know what they were. Turns out they were spark plugs.

I'm sure there are more than a few people who work in breweries, that know their own jobs, but couldn't describe what's going on inside the brew.
 
Water is one of several stops (often the last one) on the way to very good beer but other things are just as important or more so. Good water is a sine qua non for good beer but good water is easily obtained by adding 1 gram of calcium chloride to each gallon of RO water used (mash and sparge). Brew with water like that to let you get a handle on grist design and preparation, mashing, hopping and fermentation. Then when those are under control you can come back and tweak the water if you like but don't expect dramatic differences.

These ubiquitous beliefs from this forum, are worthy of debate, imo.
 
You should do that and maybe try some of the experiments.

I will. As for doing some of the experiments I have done many experiments usually more in the acid/base properties of malt line than than the sensory line. But I have done sensory based experiments exploring, for example, the question as to whether reducing chloride levels improves beer. It doesn't improve my beer in my opinion and as the demographic I am interested in is me there would be little point in doing a triangle test. Beyond that I don't have access to the facilities necessary for proper conduct of such a test and would therefore have questions as to its validity.

What the hell is "Virginia/Quebec" by the way?
Ogden is a municipality in the Estrie. I live there in the summer. I'll bet, now that you know that, you can figure out the significance of McLean and Virginia.

Quebec is in Canada.
Newfoundland is too.
 
That's exactly what Brulosophy and Drew and I at EB expect people to do.... We're presenting the results of our experiments as starting points for further exploration.

I did go and look at one of the experiments - the one that found a warm fermented beer indistinguishable from a cool fermented one. I chose that one as it flies in the face of common sense. In the writeup it is stated that a triangle test was used. There were several identifiable problems with the conduct of this test. First, the picture showed a bunch of guys sitting around a table with beer bottles in front of them. I'm not sure whether that represents the 'triangle test' but if it does then there is a big problem in that these people are able to interact with each other. Even if this photo does not depict the tasting I doubt the 23 panelists were isolated from one another and that is important. Second, the panelists were served 2 samples of A and one of B. Proper procedure requires that the panelists be given AAB, ABA, BAA, BBA, BAB or ABB with the sequence (order is important) randomly chosen from one of those 6, third, two of the panelists were unable to decide. In a triangle test they must guess if unable to decide.

So if we take those 2 out of the panel he found 9 out of 21 panelists were able to correctly identify the odd beer. Assuming that failure to randomize the order of presentation and failure to present 2 A's as frequently as 2 B's in the triplet does not have an effect on on choice under the alternate hypothesis (that the beers are different) we find the probability of 9 out of 21 is 0.24 under the null hypothesis which hardly supports the alternate hypothesis. Since the alternative hypothesis in this case is highly likely to be valid we would suspect the panel (palate fatigue, ordinary fatigue, did this at the end of an evening of camaraderie, just had a squid, garlic and onionpizza...) or the environment (distractions, masking aromas...). Or did the use of dark malts mask some subtle differences in flavor. Or was the particular yeast strain chosen insensitive to temperature relative to others. Some of these things are mentioned in the comments to the web page where the results are posted.

Further investigation is definitely warranted if anyone really thinks that fermentation temperature might not have an effect on beer flavor.

We're not presenting what we feel are scientific conclusions in the way AJ is describing. We're all frustrated that people take them otherwise.

This seems to say "We are doing some experiments and collecting data under questionable conditions but we are putting it through triangle testing. Even so we are frustrated that anyone would try to determine whether our results are meaningful or not". In an earlier post you suggested dropping publication of p values. As p values based on triangle tests aren't meaningful if the tests aren't triangle tests then perhaps that's best. Perhaps its best not to mention triangle tests at all. Just tell the people what you did and report the numbers you got. Let the ones interested (me and the other guy) in significance figure it out for themselves.
 
To me it seemed that your heavy science/math posts on this thread were intended to argue that the Brulosophy crew was somehow failing in their scientific rigor.

I had, up to this point, suggested that given the myriad details that must be attended to in running a meaningful triangle test, that it would be difficult for amateurs to be successful. I have now checked one Brulosphy experiment and found indeed that there are problems with some of the things they did in that experiment. I suggested that they drop mention of triangle tests at all and just report the tasting panel results.

In hindsight this may be exactly the answer to OPs question. Not really all that interested in amature science.
Many professional brewers aren't interested in brewing science whether the scientist be working in his kitchen or the labs of Inbev. I once questioned a lab guy on what he was doing. He indignantly replied "Look, I don't know why what I do works. I just know that it does work." That's sufficient for many. One of the wise men said something to the effect that success in brewing depends greatly on science but is, ultimately, an art.
 
Imo, the path to better brew is the water.

Water is one of several stops (often the last one) on the way to very good beer but other things are just as important or more so. Good water is a sine qua non for good beer but good water is easily obtained by adding 1 gram of calcium chloride to each gallon of RO water used (mash and sparge). Brew with water like that to let you get a handle on grist design and preparation, mashing, hopping and fermentation. Then when those are under control you can come back and tweak the water if you like but don't expect dramatic differences.

I think there's a middle ground here... The people who discount the value of water--have good water to start with.

I brewed almost 9 years before going to RO+salts. I used campden tablets to neutralize chloramine, but other than that was using what came out of the tap.

During those 9 years, I did what AJ is suggesting. I honed my process. I got all the other aspects of brewing under pretty tight control. I was pitching plenty of healthy yeast. I was controlling fermentation temps. I was chilling rapidly. Etc etc etc. I made a lot of good beer in that time, and when I say that, my beer was consistently good. There weren't "clunkers" that were outliers and bad, so I believe my process to be sound.

After I switched to RO+salts, I *immediately* noticed a difference in the beer. The beers--especially the pale ones--where noticeably brighter, more crisp, and just seemed cleaner. I was actually surprised at the difference. I was NOT expecting the water to make that big of an impact.

At the same time, however, one of my homebrew club buddies (@JonW), lives only about 20 miles away from me in Huntington Beach. His water is sourced differently than Mission Viejo. He makes excellent beer, using only a charcoal filter (IIRC), and sees no reason to move to RO.

Water is a huge portion of beer. I'd venture to say that it's in some ways a question of your starting water, though. If your tap water is "good enough", switching to RO+salts probably will make a minor difference. But if your tap water is NOT "good enough", as mine wasn't, changing the water can be a pretty big step.
 
Many of them, particularly related to water treatment, are debated at length in the Brewing Science forum.

I hope so, but somehow I feel the prevailing ideas can be found in any common Brewing book. I think you should really rethink your stance on water. And I'm going to offer some unstatistical proof, and there is absolutely no insult intended in that. You are obviously brilliant and i admire your posts. I think a lot of times we all come off different than the inflection that was meant. See I'm not an engineer, I don't need any proof, I don't have any ponies in this race, I barely like Brewing, and I choose what to believe.

Now mino choi was on Denny's show, and he was on basic brewing, I think it was. His first mead, I guess, scored a perfect score from an accredited BJCP judge. From there he's won medals in everything, it's ridiculous. He has golds and best of shows it's crazy. Anyways when someone like that talks I listen. He brings up how the Germans didn't move to Wisconsin for the tap water from the treatment facility. He gets his water from a special well and said that he filled up big 50 gallons or whatever of it. He stated when he stopped using that water he started to lose. Now I know a guy who makes great beer and I've always wondered about it. His well is fed from a glacier behind his house. When Marshall and the crew do a write up, I believe it, almost 100% and it sticks with me, if nothing else as background thought. I know he doesn't want me to take it that way and I know others don't want me to take it that way. I know he's only offering for thought and I know others think differently, but as far as I'm concerned I have no reason to not just go along with what I see.
 
Now mino choi was on Denny's show, and he was on basic brewing, I think it was. His first mead, I guess, scored a perfect score from an accredited BJCP judge. From there he's won medals in everything, it's ridiculous. He has golds and best of shows it's crazy. Anyways when someone like that talks I listen. He brings up how the Germans didn't move to Wisconsin for the tap water from the treatment facility. He gets his water from a special well and said that he filled up big 50 gallons or whatever of it. He stated when he stopped using that water he started to lose. Now I know a guy who makes great beer and I've always wondered about it. His well is fed from a glacier behind his house.

And I'm sure you, AJ, and I are in agreement, basically. Better water = better beer.

But it's when you jump to the "water is the most important" thing in brewing, there I think we diverge. I think it's important, but I don't want to overstate the signficance of water relative to everything else.

It's hard to consistently medal in brewing competitions* without adequate water. And "ideal" water may be one of the differentiating factors in between a beer that merely medals and one that's a BOS candidate. But I'd state this: perfect water + bad process = mediocre beer.

Water is important, but so is everything else.

(* This is of course assuming perfect judging, which we all know is hit or miss. But I think there will be a positive correlation between the better beers and better scores / medals / BOS, even if it's an imperfect correlation.)
 
@bwarbiany i understand your thinking. Take great water and poor process, etc, and there is no guarantee of quality, medicore you said. The problem is, in real life its just not that way. I mean consider some average skill, one person recirculates at 152 another passively lets it drop from 154 to 151. One ferments at 70, one at 66. These differences arent going to mean anything in the long run, imo. Consider again beer is 90 (90!) percent water. Looks like fuji water would be close to 45 dollars for a batch. There is no way someone is going to out do that, with same recipe because one mashed with tap at 152 and the other at what, 156, 146, 160. I don't believe it, thats where I stand. Im not jumping in, im all in until further notice, GIVEN some basic considerations obviously. This simple looking tea is nothing simple. It was made with spring water from a canyon outside boulder, co filled with a pretty cool hippie community. Anyways you know how good our tap water is here. There is no comparo, imo. This tea is incredible, and its the water.

View attachment 1502653690718.jpg
 
I think you should really rethink your stance on water.
Way too late for that.

And I'm going to offer some unstatistical proof,
A contradiction in terms.

You are obviously brilliant
Thanks but let's not get too carried away.


...and I choose what to believe.
Most people eventually come to find out that doing this leads to problems in several life arenas. investment decisions (frequently involving small breweries) is one. Now I do know a guy who made a killing investing in a small brewery (Victory) but I would hardly advise you to invest in one based on his experience because the vast majority of people I know who did this lost at least a substantial part of their investments.

Choosing what to believe is called by various names but "Confirmation Bias" is a common one. We are all, even engineers and scientists, subject to it and because we know of its dangers we take great pains to blind ourselves to it. This is one of the advantages of a triangle test. Neither the panelists nor the people serving the samples to them know which is the test beer.

...as far as I'm concerned I have no reason to not just go along with what I see.
An investor receives a free stock pick newsletter suggesting that a certain stock is going to go up. It does. The first letter is followed up by a second predicting that a second stock will go up. It does. Shortly after than he gets another letter suggesting that if he hold any of a certain issue he should sell it because it is going down. It does. This happens until he has 7 letters correctly predicting stock market moves. An eight letter comes saying that he has gotten free stock advice which clearly demonstrates that the writer can pick winners correctly time after time and that from now on if he wants more of this can't loose advice he'll have to pay $1000 for a year's subscription to his monthly newsletter. Should the guy send in the $1000?

This is an actual scam. Think about why.
 
An investor receives a free stock pick newsletter suggesting that a certain stock is going to go up. I does. The first letter is followed up by a second predicting that a second stock will go up. It does. Shortly after than he gets another letter suggesting that if he hold any of a certain issue he should sell it because it is going down. It does. This happens until he has 7 letters correctly predicting stock market moves. An eight letter comes saying that he has gotten free stock advice which clearly demonstrates that the writer can pick winners correctly time after time and that from now on if he wants more of this can't loose advice he'll have to pay $1000 for a year's subscription to his monthly newsletter. Should the guy send in the $1000?

This is an actual scam. Think about why.



Was this meant to be funny? Literally, this is great.
 
Take great water and poor process, etc, and there is no guarantee of quality, medicore you said. The problem is, in real life its just not that way.
The problem is that in real life it is that way. I have great water. Must be great as I've made some great beers with it. OTOH I've made some stinkers - beer's I'd be ashamed to offer you. And those have been the result of some error in process (e.g. - fermentation temperature dropped too low for an ale).

The only reasonable explanation as to why someone would be so naive as to think that he can make a great beer with 'great' water and despite process errors is placebo effect, a form of confirmation bias.

Was this meant to be funny? Literally, this is great.
No. It was meant to be illustrative. To show that what you see isn't always to be accepted. If something seems too good to be true then it nearly always proves that it isn't true. You might want to consider ESIL water or Kangen water.

You probably ought to take further water questions to the Brew Science forum. But wear your lead underwear.
 
@ajdelange oh i get what you are saying about something being to good to be true. Stock futures, margin buying, pyramid scams, and get rich quick programs.
 
This seems to say "We are doing some experiments and collecting data under questionable conditions but we are putting it through triangle testing. Even so we are frustrated that anyone would try to determine whether our results are meaningful or not".

This, in my mind, is the best description of the science of Brulosophy.
 
For folks concerned about poor tasters, etc, this write up is fairly interesting.

http://brulosophy.com/2016/01/21/in...t-xbmt-performance-based-on-experience-level/

That discussion is about testing hypothesis H1: "Tasters with more experience are better able to discriminate differences in beers than those with less". The null hypothesis then is H0: "The ability to distinguish differences in beers is not related to experience."

In the post he lists correct odd beer detection percentages:
General Beer Drinker: 40%
Craft Beer Enthusiast: 49%
Home Brewer: 43%
BJCP in Training: 46%
BJCP certified: 44%

If we make the assumption that the level of experience ascends as we go up the list then there is a correlation between experience and performance but a weak one. Pearson's r is only 0.235 and the probability that we might see a value of r that high under H0 is 35% which lends very weak support for the notion that performance is related to experience.

Now let's slaughter a sacred cow. Lets assume that the average craft beer enthusiast is actually more experienced (whatever that means) than BJCP judges. After all he may have been critically tasting craft beers for many years while the BJCP judge may have been at this (albeit with great enthusiasm) for but a year or two. Certainly the data suggests that craft enthusiasts have more relevant experience than BJCP judges as they scored better.

Now let's slaughter another sacred cow and assume that the BJCP judge in training is actually a better judge (has more relevant experience) than the average certified judge. The data suggests that this may be the case as they performed better and I know my beer tasting skills were better when in the thick of training (weekly training panels with other judges) than they have ever been. Anyway, this is my analysis and I can make any assumptions I want. This may lend some insight into the engineer's joke I mentioned in an earlier post i.e. that all the statisticians on earth laid end to end wouldn't reach a conclusion.

With my new assumptions the 'levels of experience' are now in the same order as the performances and the conclusion is very different. For this rearranged data set Pearson's r is 0.988 and the probability that we would see r that big or bigger under the null hypothesis is only 0.08%. We are now on solid ground rejecting the idea that experience doesn't make a difference.

Bottom line here is that testing whether experience makes a difference or not depends very much on how we define experience.

The reason for the linked post is that readers, and the experimenters, have been concerned that many of the experiments results in answers that don't carry statistical significance at the levels we like. The author is seeking an explanation and has focused on his panels. This is definitely the right thing to do but overlooks the fact that the power of a panel depends on signal to noise ratio which depends on the beers as well as the panel. The more different the beers the louder the signal. The better the panel's skills the less the noise. Clearly he is dealing with signal to noise ratios that suggest that half or fewer of panelists are going to be able to detect the odd beer in a triplet. Assuming the number is half and that he is running panels of 20 then 10 would be expected to pick the right beer on average. The statistical significance at the 10 out of 20 level is 9.2% - not significant relative to the usual maximum acceptable level of 5%. A more powerful test is needed to attain significance. Improving the panel to the point where the signal to noise is such that 60% choose correctly would imply that 12 out 20 would be successful on average. The significance level associated with 12 out of 20 is 1.2%. Assuming the pool of tasters is what it is improving the panel is going to be tricky. They could be given a tasting test, for example, but that would have to be done with care. Empaneling only tasters who can demonstrate ability to tell the difference between the beers you want to test is clearly folly so you would have to test them on some other beers but how would you choose the other beers? This gets back to the earlier discussions (that so infuriated some) of matching the panel to the demographic you are interested in. A test showing that 85% of people chosen from a pool that has demonstrated that they can taste the difference doesn't tell you much about the man on the street or the average home brewer or in fact about any particular demographic but the one you have sampled - those you already know can distinguish the beers. But improving the panel isn't the only way to increase SNR and thus significance. Just making it larger will do the same thing. If the panel size were doubled to 40 then we'd expect numbers like 20 correct from a panel with 50% probability of choosing correctly when the beers are distinguishable. The significance associated with 20 out of 40 is 2.1%.

From this we conclude that while panels of 40 are tougher to handle than panels of 20 this may be all that they need to do to gain the significance they desire. The fact that they seem to be unaware of this is a bit disturbing. I haven't delved far into the site but it seems that they are attempting to apply statistical methods to their results (which is commendable) without understanding how to do that. They are certainly not alone in this. They should consult someone who knows how these things are done or shell out the $45 for a copy of ASTM E - 1885 Standard Test Method for Sensory Analysis - Triangle Test.
 
OMG, this thread was a long read. Now I have to check out Brulolsophy, to see what all this is about LOL. I know I stopped racking to secondary after one of two times because it didn't see to make a difference and I couldn't tell that sitting on the trub was hurting my beer. I also think I may have to get some iodine to see when my mash is done. Well thanks guys for the "light" reading. :mug:
 
OMG, this thread was a long read. Now I have to check out Brulolsophy, to see what all this is about LOL. I know I stopped racking to secondary after one of two times because it didn't see to make a difference and I couldn't tell that sitting on the trub was hurting my beer. I also think I may have to get some iodine to see when my mash is done. Well thanks guys for the "light" reading. :mug:

The iodine test is pretty worthless IMO. And also check out experimentalbrew.com
 
After pursuing this thread, I think I am finally ready for my probabilities and statistics final. Thanks everybody!
 
Because you will pass every single time you do the test. That's why I stopped using it. I have never done a diastatic power calculation and never had a mash that has failed to convert completely. What's the point of doing a test that tells me I'm completely converted when I already know that?
 
Those who are still flogging away at this might appreciate this thread:

https://www.homebrewtalk.com/showthread.php?t=633396

It's about how what you have drunk prior can affect your perceptions of later beers. Which, of course, is the single biggest issue with the triangle testing as it is presented. It's why "qualifying" people on the basis of a lucky guess doesn't make any sense, and why there may be more differences perceived by a panel if they had done the testing under more controlled conditions.
 
Yes, still flogging and this post is about triangle testing from the perspective of ASTM E 1885 - 04 Standard Test Method for Sensory Analysis - Triangle Test. I had recommended in an earlier post that the Bruslosophy folks bite the bullet ($45 worth) and spring for a copy of this and follow it to the best of their ability given the resources they have in order to insure that they are following the protocol as established by at least one standards association and in order to help them interpret their data. As an example of this I return to the fermentation temperature experiment which had some problems in that they did not present the samples correctly and because they did not force panelists unable to make a choice to guess but here I am more interested in their conclusions. They found that 9 out of 21 panelists picked the odd beer correctly (I took out the two who reported that they couldn't decide):
H0(Pd=0): Panelists: 21; 3-ary test; 9 Correct Choices; P(>= 9) = 0.23988
This means that under the null hypothesis (that the warm and cold fermented beers are indistinguishable) we would find in 24% of panels of 21 members 9 or more would pick the odd beer correctly. This, as they noted, is luke warm support for rejection of the null hypothesis, they lament that the test wasn't statistically significant and accept the null hypothesis coming up with the surprising conclusion that hot and cold fermented beers (or at least this Schwarzbier) are not distinguishable. This they felt instinctively was the wrong conclusion and did a lot of thinking about what could explain it. The interesting this is that processing of their findings per ASTM 1885 would have led them to the opposite conclusion. In the ASTM prescription one considers not only the probability that one will reject the null hypothesis when it is true (false alarm, Type I error) but that he will reject the alternate hypothesis when it is true (false dismissal, Type II error). This relates to what I had called in earlier posts "signal to noise ratio" and the ROC's I had introduced but for now we are not interested in how A.J. looks at things but rather how an accepted standards organization does. There is only 1 null hypothesis and that is that the beers are indistinguishable. There are an infinite number of alternate hypotheses ranging from that the beers are barely distinguishable to that the beers are so distinguishable that anyone can tell them apart. Remember in the earlier discussions that we noted repeatedly that one is testing the beers and the panel. ASTM, as I did, measures distinguishability in terms of the percentage of the population of interest that can distinguish them symbolized by the parameter 0 <= Pd <= 1. Thus H0 means Pd = 0 and H1 means Pd > 0.

Let's hypothesize (H1) that the beers tested by Brulosophy are distinguishable by the population represented by their panel to the extent of 10%, Nine out of the panel members chose correctly:
H1(Pd=0.10): Panelists: 21; 3-ary test; 9 Correct Choices; P(< 9) = 0.52372; 0.00 < Pd < 0.35 with conf. 0.90
P(< 9) = 0.52372 says that if the beers are only 10% distinguishable that less than 9 people would correctly choose the odd beer in 52% of panels. That's weak support for the notion that the beers are only 10% differentiable. Now lets hypothesize that they are 20% differentiable:
H1(Pd=0.20): Panelists: 21; 3-ary test; 9 Correct Choices; P(< 9) = 0.28653; 0.00 < Pd < 0.35 with conf. 0.90
And now 38%:
H1(Pd=0.38): Panelists: 21; 3-ary test; 9 Correct Choices; P(< 9) = 0.04632; 0.00 < Pd < 0.35 with conf. 0.90
The probabilities that fewer than 9 panelists will be able to pick the odd beer goes down dramatically as the beer becomes more differentiable and we can be confident, at the 0.05 level that the differentiability is less than 38%. At the end of the line of data we see that we can, from these data, be 90% confident that 0.00 < Pd < 0.35 and we suppose that the beers probably are differentiable as common sense has dictated all along. But Pd = 0 (the null hypothesis) is in the confidence interval and we'd obviously feel better about our conclusion if it weren't. The way to fix Brulosophy's problem with these beers is to increase the panel size (or go to a quaternary test but ASTM doesn't know about those). Taking what I found from the Brulosophy results I decide I am interested in confidence levels of about 0.05 (I want to be able to claim 'statisitical significance' at the minimum acceptable level) for the Type I error but am more relaxed about confidence that I haven't falsely dismissed a difference that is real and accept 10% (0.1) for Type II confidence and am interested in finding if beers are detectable at Pd < 0.3 I consult ASTM E1885 and find
Minimum Panelists = 53; Threshold (Required correct answers)= 24; P(>= 24|H0) = 0.04687; P(< 24|H1(Pd=0.30))= 0.09481
and assuming I get the the threshold number of responses (correct answers) I see that
H0(Pd=0): Panelists: 53; 3-ary test; 24 Correct Choices; P(>= 24) = 0.04687
and
H1(Pd=0.30): Panelists: 53; 3-ary test; 24 Correct Choices; P(< 24) = 0.09481; 0.05 < Pd < 0.31 with conf. 0.90
which says that if I hit the threshold (24 out of 53) that I can, with confidence, reject the null hypothesis and conclude that the beers are differentiable and, beyond that, say with 90% confidence, that the differentiability is between 5% and 31%.

Thus I am recommending that Brulosophy, or anyone who wants to report credible testing results using triangle test follow more or less what I have outlined here which follows ASTM E1885. The standard contains tables which will give you all the numbers you need to determine the number of required tests and the significance of tests given test results. The formulas for computation of the confidence bounds are given. But suppose you don't want to spend the $45 (outrageous). No problem. You can get all the data from a simple Excell spreadsheet.

Put the following labels in Column A in the spreadsheet starting at A1

M
N
k
Pd - Differentiability
Pc - Probaility correct
Alpha Type I risk
Beta Type II risk
Pc_ = N/M
Pd_ Clalculated from Pc_
Confidence for Pd estimate
Z Critical value
Standard Dev in Pd_
Pd Upper Limit
Pd Lower Limit

Now put the following in the cells:

B1 Enter the number of panelists
B2 Enter the number of correct answers
B3 Enter the order of the test - 3 for a triangle test
B4 Enter Pd, the differentiability
B5 =B4*(1-1/B3)+1/B3
B6 =1-BINOM.DIST(B2-1,B1,1/B3,1)
B7 =BINOM.DIST(B2-1,B1,B5,1)
B8=B2/B1
B9 = MAX(0,(B8-1/B3)/(1-1/B3))
B10 Enter the desired confidence level for the Pd estimate
B11 =NORM.INV(B10,0,1)
B12 =SQRT(B8*(1-B8)/B1)/(1 - 1/B3)
B13 =B9+B12*B11
B14 =MAX(0,B9-B12*B11)

In trying to find the minimum number of panelists required as in the example above I would have to put in a value for Pd (0.3) and then try values for M and N until I got the desired alpha (0.05) and beta (0.1) values I want. Eventually I would stumble on 53 and 24 and the spreadsheet would look llke:

M 53
N 24
k 3
Pd - Differentiability 0.3
Pc - Probaility correct 0.5333
Alpha Type I risk 0.0469
Beta Type II risk 0.0948
Pc_ = N/M 0.452830189
Pd_ Clalculated from Pc_ 0.1792
Confidence for Pd estimate 0.95
Z Critical value 1.644853627
Standard Dev in Pd_ 0.102560959
Pd Upper Limit 0.347943049
Pd Lower Limit 0.010547517

Everything below the beta line would be ignored when trying to find test size. Unfortunately I can't figure out how to make Solver find M and N automatically so manual groping is about the only way to go. E1885 has a table listing M vs. alpha and beta for different level of Pd. I would think the best thing to do if using the spreadsheet would be to manually build up a rough table so that one would know where to start.

If you can write a program here's pseudo code for finding M and N. The spreadsheet formulas will tell you how to calculate Pc and calculate alpha and beta.

Set Pd
calculate Pc
M = 5
do
N=0
do
calculate alpha(N,M)
N += 1
while (alpha > desired)*(N<M)
calculate beta(N-1,M)
M += 1
while (beta > desired)
M -= 1; N -=1
end

The other use of the spreadsheet is in evaluating results. Supposing I used 53 panelists and got 25 correct responses (one more than threshold). Then the spreadsheet would look like.

M 53
N 25
k 3
Pd - Differentiability 0.3
Pc - Probaility correct 0.5333
Alpha Type I risk 0.0253
Beta Type II risk 0.1499
Pc_ = N/M 0.471698113
Pd_ Clalculated from Pc_ 0.2075
Confidence for Pd estimate 0.95
Z Critical value 1.644853627
Standard Dev in Pd_ 0.102855252
Pd Upper Limit 0.376729005
Pd Lower Limit 0.038365335

This result tells me to reject the null hypothesis with more confidence than I had designed for and that I can be 95% confident that 0.04 < Pd < 0.38

Summary: The Brulosophy, or any other, people have at their disposal the tools they need to draw meaningful conclusions from triangle tests provided they are willing to learn to use them. The simplest route would be to buy a copy of ASTM E1885 as it will lend lots of insights beyond what I have room for here (but many of them can be found in my earlier posts). One could, however, get by very nicely with the spreadsheet here but choosing panel size is a bit difficult as it must be done by trial and error.
 
https://www.homebrewtalk.com/showthread.php?t=633396

It's about how what you have drunk prior can affect your perceptions of later beers. Which, of course, is the single biggest issue with the triangle testing as it is presented. It's why "qualifying" people on the basis of a lucky guess doesn't make any sense, and why there may be more differences perceived by a panel if they had done the testing under more controlled conditions.

In triangle testing as presented by ASTM (and ASBC) there is no qualifying of the panel once the test is underway. This is not to be confused with the augmented triangle test I have discussed here in which preference is asked of panelists who have qualified. But that is not in the triangle part of the test nor does ASTM advocate a preference question - in fact quite the opposite. They think having to pick the odd beer may bias one's decision about whether he likes it!. ASBC used to have preference as part of their Triangle MOA but don't any more. I have shown how the preference question increases the sensitivity of the test and even remember some discussion of that in some ancient text but I don't have my library up here and can't remember where it was anyway (deClercK?).

ASTM does talk about qualifying panels in some cases depending on the nature of the investigation (a concept I had mentioned in earlier posts). In those cases they cite the necessity to be sure that enough time elapses between training/qualification in order to insure that no residual palate effects are present. They also emphasize that the order of presentation is important and must be randomly chosen from among the 6 possibilities as the probability of correct choice for AAB would be different from that for BBA if A has more of a residual effect on the palate than B.

This may be the reason that higher order tests are not done. If A contains a numbing level of hops AAAB and BAAA just worsen the effect. An then there are now 8 possible sequences instead of 6 so that panel sizes are ideally integer multiples of 8 rather than of 6 and so on. But higher order tests are definitely more sensitive (a pentagon test typically requires 40% fewer panelists than a triangle).
 
I get value out of reading exbeeriments and will continue to do so but if I had my druthers I'd like to hear that the experimenters were keeping track of their tasters who can and can't predictably pick out the odd beer in their tests. The can's would continue to get invited to participate while the cant's would be weaned out of their taster pool. What would be the point of continuing to use a taster who can never or very infrequently pick out the different beer?

They could start the weaning process by looking that the exbeeriments with low p values. If a large number of tasters can pick the correct beer, the one's that couldn't should have their other participation evaluated to see if they should continue to be used.
 
Since I'm advocating ASTM E1885 as gospel let me quote a bit from ¶7.2
Choose assessors in accordance with test objectives. For example, to project results to a general consumer population, assessors with unknown sensitivity might be selected. To increase protection of product quality, assessors with demonstrated acuity should be selected.
Thus it depends on whom and what you are interested in. The fact that assessors trained to tell the difference between Beer A and B can tell the difference between beer A and B doesn't tell me much about whether my SO is going to be able to tell the difference. OTOH if that panel was trained (or pruned) to detect diacetyl then I have confidence that Beer C is different from beer D with respect to diacety if they determine it to be so. The problem here seems to be that the investigators have not done all their homework in terms of determining what they want to determine before doing the experiments.
 
AJ,

Question for you. In earlier posts I did what someone called a meta-analysis (made me feel special to get such a term applied to *my* lowly self) when I looked at the warm fermentation experiments as a group.

Basically I included 7 of the 8, as one wasn't "cold vs warm" but rather "static temp vs variable temp". Of the 7, only 2-3 IIRC achieved significance at the p<0.05 level on their own.

However, I made the leap that since they are all tracking the same variable, one could conceivably consider them to all be part of a larger experiment.

By that, I surmised that because 72 of 176 tasters were able to pick out the odd sample, and in a tasting panel of 176 tasters, 72 would be p=0.021, it lends greater credence to the idea that the beers were distinguishable but that the degree of difference wasn't large enough to be teased out in the small experiments due to panel size.

Statistically, did I violate principles of experimentation to put these experiments together? Or is this an accepted practice when multiple separate experiments purport to study the same variable?
 
It's quite common. Lots of PHS have been granted to people who never stepped into a lab and lots of decisions about medical, environmental, social... problems have been made based on meta analysis. Sometimes there is no choice. If you wanted to test the hypothesis that people who live in Wisconsin live longer you couldn't very well do a study but you could look into the various county's vital statistics records.

Check the quote in 278.

See the Wikipedia article on meta analysis to see if you violated any principles. It lists several of the pitfalls. Lets say that, for example, you and your buddy from out of town brew a split batch fermenting one half warm and the other cold. He takes half the beer (both sub batches) home and has his home brew club do a triangle test but that they don't understand how to do one properly and make many of the mistakes Brulosophy did. Your club also does a triangle test but they have a copy of ASTM E1885 and follow it closely. The two tests give, unsurprisingly enough, different answers. Should you combine them? Probably not in this case because the one set of data was collected with known errors in procedure.

Now lets assume that the out of town club also has E1885 and follows it. The data are still different. Should you combine? Maybe. Since the beers are the same the difference in results must be caused by differences in the panels. If your goal is to determine how the population in your town, let's say they are mostly of German ancestry, regards the beers then you should probably not combine the results from the other town (mostly Irish). But, OTOH, if you want to test with a panel that more closely represents the population of your state, you should combine the data.

Another "It depends" answer from the statisticians. I am not one of those but I still have to give that answer.
 
Status
Not open for further replies.
Back
Top