Do "professional" brewers consider brulosophy to be a load of bs?

Peruvian802 · Aug 20, 2017

After pursuing this thread, I think I am finally ready for my probabilities and statistics final. Thanks everybody!

ajdelange · Aug 20, 2017

Because you will pass every single time you do the test. That's why I stopped using it. I have never done a diastatic power calculation and never had a mash that has failed to convert completely. What's the point of doing a test that tells me I'm completely converted when I already know that?

mongoose33 · Aug 20, 2017

Those who are still flogging away at this might appreciate this thread:

https://www.homebrewtalk.com/showthread.php?t=633396

It's about how what you have drunk prior can affect your perceptions of later beers. Which, of course, is the single biggest issue with the triangle testing as it is presented. It's why "qualifying" people on the basis of a lucky guess doesn't make any sense, and why there may be more differences perceived by a panel if they had done the testing under more controlled conditions.

ajdelange · Aug 21, 2017

Yes, still flogging and this post is about triangle testing from the perspective of ASTM E 1885 - 04 Standard Test Method for Sensory Analysis - Triangle Test. I had recommended in an earlier post that the Bruslosophy folks bite the bullet ($45 worth) and spring for a copy of this and follow it to the best of their ability given the resources they have in order to insure that they are following the protocol as established by at least one standards association and in order to help them interpret their data. As an example of this I return to the fermentation temperature experiment which had some problems in that they did not present the samples correctly and because they did not force panelists unable to make a choice to guess but here I am more interested in their conclusions. They found that 9 out of 21 panelists picked the odd beer correctly (I took out the two who reported that they couldn't decide):
H0(Pd=0): Panelists: 21; 3-ary test; 9 Correct Choices; P(>= 9) = 0.23988
This means that under the null hypothesis (that the warm and cold fermented beers are indistinguishable) we would find in 24% of panels of 21 members 9 or more would pick the odd beer correctly. This, as they noted, is luke warm support for rejection of the null hypothesis, they lament that the test wasn't statistically significant and accept the null hypothesis coming up with the surprising conclusion that hot and cold fermented beers (or at least this Schwarzbier) are not distinguishable. This they felt instinctively was the wrong conclusion and did a lot of thinking about what could explain it. The interesting this is that processing of their findings per ASTM 1885 would have led them to the opposite conclusion. In the ASTM prescription one considers not only the probability that one will reject the null hypothesis when it is true (false alarm, Type I error) but that he will reject the alternate hypothesis when it is true (false dismissal, Type II error). This relates to what I had called in earlier posts "signal to noise ratio" and the ROC's I had introduced but for now we are not interested in how A.J. looks at things but rather how an accepted standards organization does. There is only 1 null hypothesis and that is that the beers are indistinguishable. There are an infinite number of alternate hypotheses ranging from that the beers are barely distinguishable to that the beers are so distinguishable that anyone can tell them apart. Remember in the earlier discussions that we noted repeatedly that one is testing the beers and the panel. ASTM, as I did, measures distinguishability in terms of the percentage of the population of interest that can distinguish them symbolized by the parameter 0 <= Pd <= 1. Thus H0 means Pd = 0 and H1 means Pd > 0.

Let's hypothesize (H1) that the beers tested by Brulosophy are distinguishable by the population represented by their panel to the extent of 10%, Nine out of the panel members chose correctly:
H1(Pd=0.10): Panelists: 21; 3-ary test; 9 Correct Choices; P(< 9) = 0.52372; 0.00 < Pd < 0.35 with conf. 0.90
P(< 9) = 0.52372 says that if the beers are only 10% distinguishable that less than 9 people would correctly choose the odd beer in 52% of panels. That's weak support for the notion that the beers are only 10% differentiable. Now lets hypothesize that they are 20% differentiable:
H1(Pd=0.20): Panelists: 21; 3-ary test; 9 Correct Choices; P(< 9) = 0.28653; 0.00 < Pd < 0.35 with conf. 0.90
And now 38%:
H1(Pd=0.38): Panelists: 21; 3-ary test; 9 Correct Choices; P(< 9) = 0.04632; 0.00 < Pd < 0.35 with conf. 0.90
The probabilities that fewer than 9 panelists will be able to pick the odd beer goes down dramatically as the beer becomes more differentiable and we can be confident, at the 0.05 level that the differentiability is less than 38%. At the end of the line of data we see that we can, from these data, be 90% confident that 0.00 < Pd < 0.35 and we suppose that the beers probably are differentiable as common sense has dictated all along. But Pd = 0 (the null hypothesis) is in the confidence interval and we'd obviously feel better about our conclusion if it weren't. The way to fix Brulosophy's problem with these beers is to increase the panel size (or go to a quaternary test but ASTM doesn't know about those). Taking what I found from the Brulosophy results I decide I am interested in confidence levels of about 0.05 (I want to be able to claim 'statisitical significance' at the minimum acceptable level) for the Type I error but am more relaxed about confidence that I haven't falsely dismissed a difference that is real and accept 10% (0.1) for Type II confidence and am interested in finding if beers are detectable at Pd < 0.3 I consult ASTM E1885 and find
Minimum Panelists = 53; Threshold (Required correct answers)= 24; P(>= 24|H0) = 0.04687; P(< 24|H1(Pd=0.30))= 0.09481
and assuming I get the the threshold number of responses (correct answers) I see that
H0(Pd=0): Panelists: 53; 3-ary test; 24 Correct Choices; P(>= 24) = 0.04687
and
H1(Pd=0.30): Panelists: 53; 3-ary test; 24 Correct Choices; P(< 24) = 0.09481; 0.05 < Pd < 0.31 with conf. 0.90
which says that if I hit the threshold (24 out of 53) that I can, with confidence, reject the null hypothesis and conclude that the beers are differentiable and, beyond that, say with 90% confidence, that the differentiability is between 5% and 31%.

Thus I am recommending that Brulosophy, or anyone who wants to report credible testing results using triangle test follow more or less what I have outlined here which follows ASTM E1885. The standard contains tables which will give you all the numbers you need to determine the number of required tests and the significance of tests given test results. The formulas for computation of the confidence bounds are given. But suppose you don't want to spend the $45 (outrageous). No problem. You can get all the data from a simple Excell spreadsheet.

Put the following labels in Column A in the spreadsheet starting at A1

M
N
k
Pd - Differentiability
Pc - Probaility correct
Alpha Type I risk
Beta Type II risk
Pc_ = N/M
Pd_ Clalculated from Pc_
Confidence for Pd estimate
Z Critical value
Standard Dev in Pd_
Pd Upper Limit
Pd Lower Limit

Now put the following in the cells:

B1 Enter the number of panelists
B2 Enter the number of correct answers
B3 Enter the order of the test - 3 for a triangle test
B4 Enter Pd, the differentiability
B5 =B4*(1-1/B3)+1/B3
B6 =1-BINOM.DIST(B2-1,B1,1/B3,1)
B7 =BINOM.DIST(B2-1,B1,B5,1)
B8=B2/B1
B9 = MAX(0,(B8-1/B3)/(1-1/B3))
B10 Enter the desired confidence level for the Pd estimate
B11 =NORM.INV(B10,0,1)
B12 =SQRT(B8*(1-B8)/B1)/(1 - 1/B3)
B13 =B9+B12*B11
B14 =MAX(0,B9-B12*B11)

In trying to find the minimum number of panelists required as in the example above I would have to put in a value for Pd (0.3) and then try values for M and N until I got the desired alpha (0.05) and beta (0.1) values I want. Eventually I would stumble on 53 and 24 and the spreadsheet would look llke:

M 53
N 24
k 3
Pd - Differentiability 0.3
Pc - Probaility correct 0.5333
Alpha Type I risk 0.0469
Beta Type II risk 0.0948
Pc_ = N/M 0.452830189
Pd_ Clalculated from Pc_ 0.1792
Confidence for Pd estimate 0.95
Z Critical value 1.644853627
Standard Dev in Pd_ 0.102560959
Pd Upper Limit 0.347943049
Pd Lower Limit 0.010547517

Everything below the beta line would be ignored when trying to find test size. Unfortunately I can't figure out how to make Solver find M and N automatically so manual groping is about the only way to go. E1885 has a table listing M vs. alpha and beta for different level of Pd. I would think the best thing to do if using the spreadsheet would be to manually build up a rough table so that one would know where to start.

If you can write a program here's pseudo code for finding M and N. The spreadsheet formulas will tell you how to calculate Pc and calculate alpha and beta.

Set Pd
calculate Pc
M = 5
do
N=0
do
calculate alpha(N,M)
N += 1
while (alpha > desired)*(N<M)
calculate beta(N-1,M)
M += 1
while (beta > desired)
M -= 1; N -=1
end

The other use of the spreadsheet is in evaluating results. Supposing I used 53 panelists and got 25 correct responses (one more than threshold). Then the spreadsheet would look like.

M 53
N 25
k 3
Pd - Differentiability 0.3
Pc - Probaility correct 0.5333
Alpha Type I risk 0.0253
Beta Type II risk 0.1499
Pc_ = N/M 0.471698113
Pd_ Clalculated from Pc_ 0.2075
Confidence for Pd estimate 0.95
Z Critical value 1.644853627
Standard Dev in Pd_ 0.102855252
Pd Upper Limit 0.376729005
Pd Lower Limit 0.038365335

This result tells me to reject the null hypothesis with more confidence than I had designed for and that I can be 95% confident that 0.04 < Pd < 0.38

Summary: The Brulosophy, or any other, people have at their disposal the tools they need to draw meaningful conclusions from triangle tests provided they are willing to learn to use them. The simplest route would be to buy a copy of ASTM E1885 as it will lend lots of insights beyond what I have room for here (but many of them can be found in my earlier posts). One could, however, get by very nicely with the spreadsheet here but choosing panel size is a bit difficult as it must be done by trial and error.

ajdelange · Aug 21, 2017

mongoose33 said:
https://www.homebrewtalk.com/showthread.php?t=633396

It's about how what you have drunk prior can affect your perceptions of later beers. Which, of course, is the single biggest issue with the triangle testing as it is presented. It's why "qualifying" people on the basis of a lucky guess doesn't make any sense, and why there may be more differences perceived by a panel if they had done the testing under more controlled conditions.

In triangle testing as presented by ASTM (and ASBC) there is no qualifying of the panel once the test is underway. This is not to be confused with the augmented triangle test I have discussed here in which preference is asked of panelists who have qualified. But that is not in the triangle part of the test nor does ASTM advocate a preference question - in fact quite the opposite. They think having to pick the odd beer may bias one's decision about whether he likes it!. ASBC used to have preference as part of their Triangle MOA but don't any more. I have shown how the preference question increases the sensitivity of the test and even remember some discussion of that in some ancient text but I don't have my library up here and can't remember where it was anyway (deClercK?).

ASTM does talk about qualifying panels in some cases depending on the nature of the investigation (a concept I had mentioned in earlier posts). In those cases they cite the necessity to be sure that enough time elapses between training/qualification in order to insure that no residual palate effects are present. They also emphasize that the order of presentation is important and must be randomly chosen from among the 6 possibilities as the probability of correct choice for AAB would be different from that for BBA if A has more of a residual effect on the palate than B.

This may be the reason that higher order tests are not done. If A contains a numbing level of hops AAAB and BAAA just worsen the effect. An then there are now 8 possible sequences instead of 6 so that panel sizes are ideally integer multiples of 8 rather than of 6 and so on. But higher order tests are definitely more sensitive (a pentagon test typically requires 40% fewer panelists than a triangle).

beernutz · Aug 21, 2017

I get value out of reading exbeeriments and will continue to do so but if I had my druthers I'd like to hear that the experimenters were keeping track of their tasters who can and can't predictably pick out the odd beer in their tests. The can's would continue to get invited to participate while the cant's would be weaned out of their taster pool. What would be the point of continuing to use a taster who can never or very infrequently pick out the different beer?

They could start the weaning process by looking that the exbeeriments with low p values. If a large number of tasters can pick the correct beer, the one's that couldn't should have their other participation evaluated to see if they should continue to be used.

HarborTownBrewing · Aug 21, 2017

I took Statistics in college and thoroughly enjoyed it.

What you all have done on this thread is make me hate the topic.

Thanks, all. 😅

ajdelange · Aug 21, 2017

Since I'm advocating ASTM E1885 as gospel let me quote a bit from ¶7.2

Choose assessors in accordance with test objectives. For example, to project results to a general consumer population, assessors with unknown sensitivity might be selected. To increase protection of product quality, assessors with demonstrated acuity should be selected.

Thus it depends on whom and what you are interested in. The fact that assessors trained to tell the difference between Beer A and B can tell the difference between beer A and B doesn't tell me much about whether my SO is going to be able to tell the difference. OTOH if that panel was trained (or pruned) to detect diacetyl then I have confidence that Beer C is different from beer D with respect to diacety if they determine it to be so. The problem here seems to be that the investigators have not done all their homework in terms of determining what they want to determine before doing the experiments.

betarhoalphadelta · Aug 21, 2017

AJ,

Question for you. In earlier posts I did what someone called a meta-analysis (made me feel special to get such a term applied to *my* lowly self) when I looked at the warm fermentation experiments as a group.

Basically I included 7 of the 8, as one wasn't "cold vs warm" but rather "static temp vs variable temp". Of the 7, only 2-3 IIRC achieved significance at the p<0.05 level on their own.

However, I made the leap that since they are all tracking the same variable, one could conceivably consider them to all be part of a larger experiment.

By that, I surmised that because 72 of 176 tasters were able to pick out the odd sample, and in a tasting panel of 176 tasters, 72 would be p=0.021, it lends greater credence to the idea that the beers were distinguishable but that the degree of difference wasn't large enough to be teased out in the small experiments due to panel size.

Statistically, did I violate principles of experimentation to put these experiments together? Or is this an accepted practice when multiple separate experiments purport to study the same variable?

ajdelange · Aug 21, 2017

It's quite common. Lots of PHS have been granted to people who never stepped into a lab and lots of decisions about medical, environmental, social... problems have been made based on meta analysis. Sometimes there is no choice. If you wanted to test the hypothesis that people who live in Wisconsin live longer you couldn't very well do a study but you could look into the various county's vital statistics records.

Check the quote in 278.

See the Wikipedia article on meta analysis to see if you violated any principles. It lists several of the pitfalls. Lets say that, for example, you and your buddy from out of town brew a split batch fermenting one half warm and the other cold. He takes half the beer (both sub batches) home and has his home brew club do a triangle test but that they don't understand how to do one properly and make many of the mistakes Brulosophy did. Your club also does a triangle test but they have a copy of ASTM E1885 and follow it closely. The two tests give, unsurprisingly enough, different answers. Should you combine them? Probably not in this case because the one set of data was collected with known errors in procedure.

Now lets assume that the out of town club also has E1885 and follows it. The data are still different. Should you combine? Maybe. Since the beers are the same the difference in results must be caused by differences in the panels. If your goal is to determine how the population in your town, let's say they are mostly of German ancestry, regards the beers then you should probably not combine the results from the other town (mostly Irish). But, OTOH, if you want to test with a panel that more closely represents the population of your state, you should combine the data.

Another "It depends" answer from the statisticians. I am not one of those but I still have to give that answer.

betarhoalphadelta · Aug 21, 2017

https://en.wikipedia.org/wiki/Meta-analysis

Interesting. Based on the Wiki article, it seems that it is okay to combine. Clearly there wasn't any problem in the searching for studies (as this was covered by them all being run by the same group), and there isn't a publication bias problem because we know that Brulosophy publishes both results and non-results. Given that they used the same methodology in each study, that means that I didn't have to correct for multiple results to reach a homogeneous sample--they were already homogeneous.

However, the aggregation of studies is still limited by the quality of the original experiments. If there were flaws in their methodology (such as not doing AAB, ABA, BAA, ABB, BAB, BBA random presentation of samples, etc), that error is not in any way corrected for by the meta-analysis.

One could also accuse me of bias because I already believe that temperature control is important, but I could not use that bias in any way given my selection of studies. I took all 7 of the Brulosophy studies comparing cold vs warm ferment as written without qualifying them, and the only study I excluded--which was static vs variable temp--actually had a stronger result than studies I used so excluding it didn't strengthen my case.

But in general, I feel that the meta-analysis in this case has bolstered the contention that fermentation temp has a statistically significant effect on beer.

bob1852 · Aug 21, 2017

bwarbiany said:
But in general, I feel that the meta-analysis in this case has bolstered the contention that fermentation temp has a statistically significant effect on beer.

Gestalt statistics!

cmac62 · Aug 21, 2017

ajdelange said:
Because you will pass every single time you do the test. That's why I stopped using it. I have never done a diastatic power calculation and never had a mash that has failed to convert completely. What's the point of doing a test that tells me I'm completely converted when I already know that?

I guess I was thinking of doing it at 20-30 minutes and sparging from there if the conversion is done. that is one way to save a few mins on brew day :ban:

applescrap · Aug 21, 2017

bwarbiany said:
https://en.wikipedia.org/wiki/Meta-analysis

Interesting. Based on the Wiki article, it seems that it is okay to combine. Clearly there wasn't any problem in the searching for studies (as this was covered by them all being run by the same group), and there isn't a publication bias problem because we know that Brulosophy publishes both results and non-results. Given that they used the same methodology in each study, that means that I didn't have to correct for multiple results to reach a homogeneous sample--they were already homogeneous.

However, the aggregation of studies is still limited by the quality of the original experiments. If there were flaws in their methodology (such as not doing AAB, ABA, BAA, ABB, BAB, BBA random presentation of samples, etc), that error is not in any way corrected for by the meta-analysis.

One could also accuse me of bias because I already believe that temperature control is important, but I could not use that bias in any way given my selection of studies. I took all 7 of the Brulosophy studies comparing cold vs warm ferment as written without qualifying them, and the only study I excluded--which was static vs variable temp--actually had a stronger result than studies I used so excluding it didn't strengthen my case.

But in general, I feel that the meta-analysis in this case has bolstered the contention that fermentation temp has a statistically significant effect on beer.

Five different yeasts. That's right five. Three different people testing it in different states. 8 experiments in total, 6 unable to show significance as tested. One of them brought to National Homebrew convention and tasted by everybody, including famous people in Homebrew. Tons of anecdotal and qualitative data. Meaningless preference data often times showing preference for the warm fermented anyways. And yet you have added everything together so you can prove something. I think you've done a great job of showing how real information can be skewed in any way anybody wants. I dont get what you are holding onto so much that you feel the need to add all these negative results together to reach a positive. Look I didnt care if any of this was true from the get go, so I guess we just had opposite beginnings. It seems way to upsetting to me. If someone told me pasta could be made in cold water, it wouldnt make me all angry either. Beer can be mashed in cold water overnight it turns out. I think its cool, not something to disprove. Brulosophy is cool and interesting to me, i try not to attach any personal value on the findings.

schematix · Aug 21, 2017

cmac62 said:
I guess I was thinking of doing it at 20-30 minutes and sparging from there if the conversion is done. that is one way to save a few mins on brew day

A better test than iodine is to check the actual gravity of the mash.

The mash gravity is quite predictable for a known quantity of grain and water.

http://www.braukaiser.com/wiki/index.php?title=Understanding_Efficiency#Conversion_efficiency

I have done measurements recently as the mash proceeded and you clearly see the gravity changing rapidly early on and then plateauing and slowly creeping to 100% conversion after about 2-3 hours.

Things do still continue to happen after 20-30 minutes of mashing.

betarhoalphadelta · Aug 21, 2017

applescrap said:
Five different yeasts. That's right five. Three different people testing it in different states. 8 experiments in total, 6 unable to show significance as tested. One of them brought to National Homebrew convention and tasted by everybody, including famous people in Homebrew. Tons of anecdotal and qualitative data. Meaningless preference data often times showing preference for the warm fermented anyways. And yet you have added everything together so you can prove something. I think you've done a great job of showing how real information can be skewed in any way anybody wants. You are a smart guy, I dont get what you are holding onto so much that you feel the need to add all these negative results together to reach a positive. Look I didnt care if any of this was true from the get go, so I guess we just had opposite beginnings. It seems way to upsetting to me. If someone told me pasta could be made in cold water, it wouldnt make me all angry either. Beer can be mashed in cold water overnight it turns out. I think its cool, not something to disprove. Brulosophy is cool and interesting to me, i try not to attach any personal value on the findings.

Sometimes I just like to geek out on numbers... I'm an engineer, after all

This one was particularly interesting to me, though, as I've always thought that fermentation temp control was a significant step in improving my beer. I even thought that years ago when I did a double-batch with my former brewing partner, and we fermented one in a temp-controlled fridge but had to put the other one in his spare bathroom un-controlled. The un-controlled batch seemed "hot", i.e. with higher-alcohol flavors.

But for me, something just seemed "off" regarding these experiments. While the individual experiments didn't achieve significance, I was noticing that the error was always in the "correct" direction. From a statistics standpoint, that belied the idea to me that we weren't dealing with blind chance. It suggested that beer was not completely indifferent to fermentation temperatures, but that perhaps the effect was too small to be seen based on the size of the tasting panels.

That's why I looked at the meta-analysis, and based upon what I can see, my tactic was NOT statistically unsound. Meta-analysis is used for just this purpose--to find effects that may be too small to be significant in individual experiments, but taken in the aggregate are meaningful.

FYI there's now a 9th ferm temp experiment, and this one actually hit p=0.002... Interestingly this one was taking WLP300 and fermenting the batches at either 60 or 72, both within range.

So with this, we're now at 91 of 208 (43.75%) correct identification of the odd beer in the triangle, against an expected null hypothesis of 33%. This corresponds to p=0.001. Assuming meta-analysis is a valid technique, these experiments are absolutely demonstrating IMHO that fermentation temperature affects the finished product.

But of course, as you state, that doesn't get to preference. Which is obviously an important factor, and the warm ferment lager was preferred to the cool, which is a surprising finding.

bob1852 · Aug 21, 2017

bwarbiany said:
Sometimes I just like to geek out on numbers... I'm an engineer, after all

This one was particularly interesting to me, though, as I've always thought that fermentation temp control was a significant step in improving my beer. I even thought that years ago when I did a double-batch with my former brewing partner, and we fermented one in a temp-controlled fridge but had to put the other one in his spare bathroom un-controlled. The un-controlled batch seemed "hot", i.e. with higher-alcohol flavors.

But for me, something just seemed "off" regarding these experiments. While the individual experiments didn't achieve significance, I was noticing that the error was always in the "correct" direction. From a statistics standpoint, that belied the idea to me that we weren't dealing with blind chance. It suggested that beer was not completely indifferent to fermentation temperatures, but that perhaps the effect was too small to be seen based on the size of the tasting panels.

That's why I looked at the meta-analysis, and based upon what I can see, my tactic was NOT statistically unsound. Meta-analysis is used for just this purpose--to find effects that may be too small to be significant in individual experiments, but taken in the aggregate are meaningful.

FYI there's now a 9th ferm temp experiment, and this one actually hit p=0.002... Interestingly this one was taking WLP300 and fermenting the batches at either 60 or 72, both within range.

So with this, we're now at 91 of 208 (43.75%) correct identification of the odd beer in the triangle, against an expected null hypothesis of 33%. This corresponds to p=0.001. Assuming meta-analysis is a valid technique, these experiments are absolutely demonstrating IMHO that fermentation temperature affects the finished product.

But of course, as you state, that doesn't get to preference. Which is obviously an important factor, and the warm ferment lager was preferred to the cool, which is a surprising finding.

And, of course, determining preference isn't the goal of a triangle test...

Thanks for crunching the numbers on all the temp results! Certainly interesting stuff!

ajdelange · Aug 21, 2017

bwarbiany said:
But in general, I feel that the meta-analysis in this case has bolstered the contention that fermentation temp has a statistically significant effect on beer.

I think you are doubtless right but even the single case experiment I looked at (9 out of 21 correct guesses) supports that conclusion. Though those results only support rejection of the hypothesis that fermentation temperature does not make a difference to the 25% significance level that does not prove the null hypothesis is false. In the long post I showed that we can calculate the probable range of differentiabilities and these data indicate it is between 0 and 0.35 with 90% confidence. Based on that we are not likely to accept the null hypothesis (Pd = 0). I took it one step further and modified the calculation routine to also calculate the most probable value of Pd given the number of correct answers obtained and the panel size. For 9 out of 21 correct this is Pd_max_liklihoood = 14%. That's respectably far from the hull hypothesis Pd = 0.

H1(Pd=0.30): Panelists: 21; 3-ary test; 9 Correct Choices; P(< 9) = 0.11886; 0.00 < Pd < 0.35 with conf. 0.90
Most Likely Pd: 0.140

ajdelange · Aug 21, 2017

bwarbiany said:
So with this, we're now at 91 of 208 (43.75%) correct identification of the odd beer in the triangle, against an expected null hypothesis of 33%. This corresponds to p=0.001. Assuming meta-analysis is a valid technique, these experiments are absolutely demonstrating IMHO that fermentation temperature affects the finished product.

Lets look at the combined data and see what it implies:
H0(Pd=0): Panelists: 208; 3-ary test; 91 Correct Choices; P(>= 91) = 0.00112
H1(Pd=0.20): Panelists: 208; 3-ary test; 91 Correct Choices; P(< 91) = 0.18079; 0.09 < Pd < 0.22 with conf. 0.90
Most Likely Pd: 0.160

The larger sample size gets us to confidence of 0.0011 that we can toss the null hypothesis. Fiat! It also tightens our 90% confidence band for differentiability (and gets 0 out of it) and gives us a maxiumum liklihood estimate of 16% for the differentiability as opposed to 14%. That's not really very different. This consistency suggests that the various experimnental result are indeed capable of being combined. But what did it buy us? A substantial improvement in support for rejecting the null hypothesis. But we already were pretty sure we should do that. Or, put another way, alpha bigger than we might like based on the assumption that it always needs to be < 0.05 isn't the whole story.

applescrap · Aug 21, 2017

Excellent response. So what did you take from that xbmt? I took a lot. It confirmed what I had been thinking for a while that ale, hef, some yeasts are more reactive then others. I certainly never meant or intended to be pigeonholed into any yeast strain never showing a difference. Just that it didnt always matter in the grand scheme of things.

Imo the assumption is in defense of the purchased equipment. Quickly the assumption seems made that there was a difference, so yay cold was better, i am right, this is how good beer is made. But putting oneself in not caring shoes, one can see that 7 preferred the warm and 8 the cold, 4 had no preference. Furthermore the brewer claims taking a 3rd place medal with a 75/25 mix of the warm and cold. So should I go buy a fridge and controller? If so why? Is anyone really surprised that a hef yeast showed a reaction to temp? Which one is better? The cold, right because that is the common thought and equipment defense. Surely, this has to make sense to someone.

So they noticed a difference, now what? If you like hefs, it says to me that flavor could be manipulated by temp and that warm and cold would both be worth trying. It says to me that 72 would be ok for a hef as preference seems split anyways and he did well with the 3rd. Nothing in this speaks to dogmatism about ferment temp in the grand scheme of things. Both warm and cold will make good beer and the joy of not caring or extra equipment gives the warmer an edge to me.

Bilsch · Aug 21, 2017

Some day ABI, Miller, Heineken, et al. might stumble onto the repository of science that is Brulosophy and discover they can ferment their lagers warm with no consequence. Think of the implications to their bottom line from the increase in production and the decrease in residence time. They must not be spending their research money wisely.

applescrap · Aug 21, 2017

No doubt Bilsch. They cant though can they because of past? Its my understanding they spend more now than ever to make their beer. They have to spend more on rice now than ever due to prices, cant remember podcast source.

eric19312 · Aug 21, 2017

ajdelange said:
Since I'm advocating ASTM E1885 as gospel let me quote a bit from ¶7.2

Thus it depends on whom and what you are interested in. The fact that assessors trained to tell the difference between Beer A and B can tell the difference between beer A and B doesn't tell me much about whether my SO is going to be able to tell the difference. OTOH if that panel was trained (or pruned) to detect diacetyl then I have confidence that Beer C is different from beer D with respect to diacety if they determine it to be so. The problem here seems to be that the investigators have not done all their homework in terms of determining what they want to determine before doing the experiments.

Hey AJ this seems to be the same E1885 I linked back in post #25. Guess you answered my question about credible source. Rang a bell cause we both pointed out section 7.2. It's not $45 it's free. Oddly my reading of same document pointed at how much the brulosophers were getting right in their effort to use the triangle test while your read is more or less the opposite.

joshesmusica · Aug 21, 2017

Man, I'm not as nerdy into statistics as a few of you guys by a long shot, but I nerded out on this thread with you. Thanks AJ and bwarbiany for the great posts. My inclinations were initially the same as yours B, but I didn't have the stats background to show it. And then as AJ pointed out, it has to actually reach the null hypothesis in order to be considered fully indifferent (if I understood that correctly). So although the experiments didn't *prove* that fermentation temp mattered, they didn't *disprove* the theory either. Rather, they should've raised more questions and more testing. Then the fact that the meta-analysis is actually acceptable practice, and that it, in fact, does *prove* that fermentation temp will make a difference, only furthers my gut-feeling that if the panel sizes were larger, we would see more significant results.

I'm not sure how inclined AJ is to converse with the brulosophy dudes, but just from my random interactions with them online, I have a feeling they'd actually appreciate some of this stuff. I would bet you that they're much closer to the mode of thinking of wanting to actually do things right, and would prefer to have such inclinations as fermentation temp proven correctly. And I'd venture to say that they're completely against those types in this thread who are taking these experiments as brewing gospel.

Lastly, after going through the meta-analysis on the ferment temp experiments, it'd be interesting to do the same with others that are testing the same variables.

mongoose33 · Aug 21, 2017

joshesmusica said:
Man, I'm not as nerdy into statistics as a few of you guys by a long shot, but I nerded out on this thread with you. Thanks AJ and bwarbiany for the great posts. My inclinations were initially the same as yours B, but I didn't have the stats background to show it. And then as AJ pointed out, it has to actually reach the null hypothesis in order to be considered fully indifferent (if I understood that correctly). So although the experiments didn't *prove* that fermentation temp mattered, they didn't *disprove* the theory either. Rather, they should've raised more questions and more testing. Then the fact that the meta-analysis is actually acceptable practice, and that it, in fact, does *prove* that fermentation temp will make a difference, only furthers my gut-feeling that if the panel sizes were larger, we would see more significant results.

I'm not sure how inclined AJ is to converse with the brulosophy dudes, but just from my random interactions with them online, I have a feeling they'd actually appreciate some of this stuff. I would bet you that they're much closer to the mode of thinking of wanting to actually do things right, and would prefer to have such inclinations as fermentation temp proven correctly. And I'd venture to say that they're completely against those types in this thread who are taking these experiments as brewing gospel.

Lastly, after going through the meta-analysis on the ferment temp experiments, it'd be interesting to do the same with others that are testing the same variables.

Well, I suppose this is one way to view it, but lost in all the statistical handwaving is the ultimate problem with all of this, and that is the measuring process being used by the Brulosophy approach is hugely flawed.

I'm not antistatistical--I've had 10 university stat courses in my life, 9 of which were at the graduate level. Nothing less than an A in any of those classes. PhD minor in Statistics (yeah, they had those). One of the things one learns when one beats one's head against the statistical wall like I did is that one should never, not ever, forget that if you don't measure your variables reliably and validly, the statistics are not--ARE NOT--worth a hill of beans.

I use this brulosophy material in my own classes as a way to illustrate what happens when people get lost in probability theory and forget that in the end, without accurately measuring the variables at issue, one's conclusions are really uncertain at best.

This was pointed out earlier in the thread, and it's an absolute thread-killer, in that it really cannot be refuted. This is fundamental to research, and to statistics.

We do not know who the samples are and thus to what populations they may be generalizable. Tasters are "qualified" even if they just guessed the right answer, and then treated as if they are effective in distinguishing differences. We know that there is no consistency in what tasters may have or may not have been drinking prior to the triangle tests, and it's clear that there is quite the potential for palate fatigue, or taste-bud numbing. We simply do not know who they are, how they are prepared, and that is not science, it's something else entirely.

I have little doubt there will be another attempt to convince people with statistical hand-waving, but in the end, you can largely ignore that effort. It's just a way to distract from the fundamental problem with the brulosophy approach, something that the reliance on statistics cannot overcome.

Of course, people can believe what they want to believe, and if this "let's assume it's all measured well and then proceed as if it's valid" stuff is convincing, well, so be it.

bob1852 · Aug 21, 2017

joshesmusica said:
Man, I'm not as nerdy into statistics as a few of you guys by a long shot, but I nerded out on this thread with you. Thanks AJ and bwarbiany for the great posts. My inclinations were initially the same as yours B, but I didn't have the stats background to show it. And then as AJ pointed out, it has to actually reach the null hypothesis in order to be considered fully indifferent (if I understood that correctly). So although the experiments didn't *prove* that fermentation temp mattered, they didn't *disprove* the theory either. Rather, they should've raised more questions and more testing. Then the fact that the meta-analysis is actually acceptable practice, and that it, in fact, does *prove* that fermentation temp will make a difference, only furthers my gut-feeling that if the panel sizes were larger, we would see more significant results.

I'm not sure how inclined AJ is to converse with the brulosophy dudes, but just from my random interactions with them online, I have a feeling they'd actually appreciate some of this stuff. I would bet you that they're much closer to the mode of thinking of wanting to actually do things right, and would prefer to have such inclinations as fermentation temp proven correctly. And I'd venture to say that they're completely against those types in this thread who are taking these experiments as brewing gospel.

Lastly, after going through the meta-analysis on the ferment temp experiments, it'd be interesting to do the same with others that are testing the same variables.

The probability that Marshall is reading this thread is 1.

Although I've worked with a scientific modeler who says that there's never a probably of 0 or 1...

triethylborane · Aug 21, 2017

AZ_IPA said:
The probability that Marshall is reading this thread is 1.

Although I've worked with a scientific modeler who says that there's never a probably of 0 or 1...

Almost surely, lol

ajdelange · Aug 21, 2017

eric19312 said:
Hey AJ this seems to be the same E1885 I linked back in post #25. Guess you answered my question about credible source. Rang a bell cause we both pointed out section 7.2. It's not $45 it's free.

Well I certainly wish I'd remembered that post. I'd be $45 richer! As to the credibility of the source it is, by definition, credible as it is a standard. There are typos in it and I think I have found an error in the confidence interval calculation which I did not correct in the spreadsheet because it is a standard (and I'm not, at this point, 100% sure i'm right).

eric19312 said:
Oddly my reading of same document pointed at how much the brulosophers were getting right in their effort to use the triangle test while your read is more or less the opposite.

In the one description of an experiment I read all panelists were presented, at best, the permutations of AAB (no BBA). Panelists that couldn't decide were not instructed to guess. These are pretty glaring errors in protocol and suggest that there were others. For example, I doubt that they have individual isolated booths for their panel members. These things would introduce 'noise' yet the single experiment and the pooled experiment results both suggest that the differentiability is about 15% so I think we have to conclude that they did some things right.

betarhoalphadelta · Aug 21, 2017

ajdelange said:
I think you are doubtless right but even the single case experiment I looked at (9 out of 21 correct guesses) supports that conclusion. Though those results only support rejection of the hypothesis that fermentation temperature does not make a difference to the 25% significance level that does not prove the null hypothesis is false. In the long post I showed that we can calculate the probable range of differentiabilities and these data indicate it is between 0 and 0.35 with 90% confidence. Based on that we are not likely to accept the null hypothesis (Pd = 0). I took it one step further and modified the calculation routine to also calculate the most probable value of Pd given the number of correct answers obtained and the panel size. For 9 out of 21 correct this is Pd_max_liklihoood = 14%. That's respectably far from the hull hypothesis Pd = 0.

H1(Pd=0.30): Panelists: 21; 3-ary test; 9 Correct Choices; P(< 9) = 0.11886; 0.00 < Pd < 0.35 with conf. 0.90
Most Likely Pd: 0.140

ajdelange said:
Lets look at the combined data and see what it implies:
H0(Pd=0): Panelists: 208; 3-ary test; 91 Correct Choices; P(>= 91) = 0.00112
H1(Pd=0.20): Panelists: 208; 3-ary test; 91 Correct Choices; P(< 91) = 0.18079; 0.09 < Pd < 0.22 with conf. 0.90
Most Likely Pd: 0.160

The larger sample size gets us to confidence of 0.0011 that we can toss the null hypothesis. Fiat! It also tightens our 90% confidence band for differentiability (and gets 0 out of it) and gives us a maxiumum liklihood estimate of 16% for the differentiability as opposed to 14%. That's not really very different. This consistency suggests that the various experimnental result are indeed capable of being combined. But what did it buy us? A substantial improvement in support for rejecting the null hypothesis. But we already were pretty sure we should do that. Or, put another way, alpha bigger than we might like based on the assumption that it always needs to be < 0.05 isn't the whole story.

I understand where you're coming from... The null hypothesis vs achieving "statistical significance" at p<0.05 are two different things...

I agree with you that as we look at these experiments we can not necessarily "toss out" the null hypothesis, but we can somewhat discount its likelihood when we get results that suggest an effect but one not large enough to achieve p<0.05.

The question is how do you explain results to people who may not look at it the same way? That's where I think the meta-analysis can come in. By combining experiments we increase sample size, and we have a much stronger case for not just discounting the null hypothesis, but outright rejecting it.

betarhoalphadelta · Aug 21, 2017

applescrap said:
Excellent response. So what did you take from that xbmt? I took a lot. It confirmed what I had been thinking for a while that ale, hef, some yeasts are more reactive then others. I certainly never meant or intended to be pigeonholed into any yeast strain never showing a difference. Just that it didnt always matter in the grand scheme of things.

Imo the assumption is in defense of the purchased equipment. Quickly the assumption seems made that there was a difference, so yay cold was better, i am right, this is how good beer is made. But putting oneself in not caring shoes, one can see that 7 preferred the warm and 8 the cold, 4 had no preference. Furthermore the brewer claims taking a 3rd place medal with a 75/25 mix of the warm and cold. So should I go buy a fridge and controller? If so why? Is anyone really surprised that a hef yeast showed a reaction to temp? Which one is better? The cold, right because that is the common thought and equipment defense. Surely, this has to make sense to someone.

So they noticed a difference, now what? If you like hefs, it says to me that flavor could be manipulated by temp and that warm and cold would both be worth trying. It says to me that 72 would be ok for a hef as preference seems split anyways and he did well with the 3rd. Nothing in this speaks to dogmatism about ferment temp in the grand scheme of things. Both warm and cold will make good beer and the joy of not caring or extra equipment gives the warmer an edge to me.

Well, in this case note that the difference was between 60 and 72. Hefe yeast is generally accepted to produce more clove at low temp and more banana at "high" temp, but in no way was this experiment trying to ferment a hefe with uncontrolled temperatures. The temps were within accepted ranges for hefe yeast. They didn't try to ferment a hefe at 85 degrees.

What it suggests is that buying a fridge and temp controller give you greater control over what your beer will be than not doing so. It suggests that there is a demonstrable difference between fermenting a hefe at 60 vs 72, that these produce different characteristics, and if you want banana vs clove or vice versa, you should control for that.

Do "professional" brewers consider brulosophy to be a load of bs?

Help Support Homebrew Talk:

Well-Known Member

Supporting Member

Supporting Member

Supporting Member

Supporting Member

Supporting Member

Well-Known Member

Supporting Member

Supporting Member

Supporting Member

Supporting Member

Supporting Member

Supporting Member

Be the ball!

Well-Known Member

Supporting Member

Supporting Member

Supporting Member

Supporting Member

Be the ball!

Well-Known Member

Be the ball!

Supporting Member

Supporting Member

Supporting Member

Supporting Member

Well-Known Member

Supporting Member

Supporting Member

Supporting Member

Similar threads