Do "professional" brewers consider brulosophy to be a load of bs?

Homebrew Talk - Beer, Wine, Mead, & Cider Brewing Discussion Forum

Help Support Homebrew Talk - Beer, Wine, Mead, & Cider Brewing Discussion Forum:

This site may earn a commission from merchant affiliate links, including eBay, Amazon, and others.
Status
Not open for further replies.
The other day I went into my back yard, heated some water, soaked a few pounds of grain, added some hops and yeast and stuff...and tonight I am sitting on the deck drinking the result. That water somehow turned into a tasty beverage. Go figure.

Before that I read a few exbeeriments and my take away was my beer should be OK if I mash at 153 instead of 151.5, or if I ferment at 64 instead of 62.11. I didn't take away this as gospel, or the new brew dogma, or even a path to get my masters. Just some guys in a garage who like to brew and want to play around with "what ifs" for the rest of us schmucks to read or not read, believe or not believe.

My thoughts: Take your p-value argument to the science room so there is some other "science" there except "water"!

Cheers! I am going to finish my home beer! :fro:
 
If you don't enjoy the discussion going on here, then you're free to ignore it.

As AJ said, some of us want to know what the numbers are actually saying, because as we've seen from a few posters in this thread (and even more so ALL OVER THIS SITE), people are taking these exbeeriments and saying, "See none of this **** matters!" Yet we all know from our own experiences that it often does.

Now we know from looking further into the data, that them (or maybe just the crazies who go around claiming it for them) saying that it's not significant might not be true. Because for most of them, they haven't reached a low enough threshold in order to say, "see this doesn't matter." The only thing they've proven with some of these is that it *might not* matter. But then when we see it taken from a larger sample size, the numbers say, "actually this likely does matter."

This stuff is important to us. Sure, it might not be attempts at curing cancer, but that doesn't mean we don't find significance in this sort of discussion in OUR lives. That's all there is to it. If you don't enjoy the fact that some of us are enjoying the discussion, then, again, you're free to ignore it.

The other day I went into my back yard, heated some water, soaked a few pounds of grain, added some hops and yeast and stuff...and tonight I am sitting on the deck drinking the result. That water somehow turned into a tasty beverage. Go figure.

Before that I read a few exbeeriments and my take away was my beer should be OK if I mash at 153 instead of 151.5, or if I ferment at 64 instead of 62.11. I didn't take away this as gospel, or the new brew dogma, or even a path to get my masters. Just some guys in a garage who like to brew and want to play around with "what ifs" for the rest of us schmucks to read or not read, believe or not believe.

My thoughts: Take your p-value argument to the science room so there is some other real science there except "water"!

Cheers! I am going to finish my home beer! :fro:

ICYMI

and FTFY

If you don't like statistics, then avoid the discussion. If you don't like science, then avoid the discussions. For the most part, there isn't "science" being discussed there, but the real thing. And this topic being discussed is very much still relevant to the OP because these are professionals explaining the merits of the brulosophy findings/conclusions.
 
AJ, you're trying to teach Grandpa how to suck eggs.
Apparently Grandpa needs a refresher on some fundamental concepts.

I already know that you don't really understand this stuff, and that was evident earlier in your misunderstanding of reliability and validity,
Taking validity first we note that in a triangle test we can observe the number of correct answers and relate it to the parameter of interest by Pd = (Pc -1/k)/(1-1/k). Thus Pc is a valid thing to test in order to get and estimate of the state variable of interest. Now with respect to reliability, we know that people have self noise and biases. We note that it is important in conducting a triangle test to minimize those noises and biases and we know ways to decrease the extent to which that noise degrades the estimate. It seems we do know about reliability and validity after all but statisticians in the real sciences, engineers and physicists may be more likely to discuss measurability, ergodicity and stationarity rather than in terms of validity and reliability.

..and now in your statement that a result would be significant if a larger panel were used. That's a game, set, match type of comment.
Yes it is as the truth of my statement is just common sense. Or at least it is to someone who has spent as much time as I have looking at data corrupted by noise. I have explained how this works and even given a chart that shows expected significance vs panel size parametric in Pd. You are saying that the chart is invalid and are claiming it to be so because I am a tyro when it comes to this stuff. Your argument would be a lot more convincing if you could critique my method or my results (which would also require you to explain why Table A1.1 in ASTME1885, which shows better confidence with increased panel size, is wrong.)

When, in another post, you asserted again that I didn't understand reliability and presented a sequence of widely varying weights as representative of unreliable data and I responded by showing you how to get 'actionable' information from it you chose to ignore it as you do with most of my rebuttals. And there are rebuttals in that case. If you had said that the scale's noise were non ergodic or non stationary then indeed the sequence would be unreliable. Had you said that I would have concluded that you know what you are talking about but as you just ignored the rebuttal I had to assume that you couldn't rebut my point and accept the alternate hypothesis.


And you present all kinds of handwaving to try to buttress....some sort of point.
There have been several rather distinct points
  • Triangle tests are not inherently flawed. They can yield useful information to brewers
  • The essence of the triangle test is forced guessing
  • Increasing panel size improves the sensitivity of the test
  • Improperly done triangle tests can lead one astray. Care is needed
  • Increasing the order of the test from binary to triangle to quad increases the sensitivity

Not sure.
I don't see how you could have missed those as each was followed by analysis, graphs and numerical examples.

You're not correct,
If I were not correct then you should be able to refute some of the points more solidly than just fuming that I am a tyro and thus have no right to express an opinion. But, as you can't seem to figure out what those points are obviously you can't do that. If you can't figure out what the points are how can you reasonably offer an opinion as to whether they are correct or not other than to claim that the reporter is globally disqualified? This is the approach you have taken and it is not one that leads to fruitful dialogue. And, ultimately, I am, of course, correct as I have justified my positions using sound principles most of which are found in fundamental texts on probability, statistics, estimation theory and so on. Furthermore, all the points except the last one above are supported by the fact that the triangle test is widely accepted and used by thousands of scientists. The third point is obvious.

..and so even if what you post *is* correct, it isn't a defense of what you wrote.
Huh?

I teach this material. I know it very, very well.
I think the basic problem here may be that you know one corner of statistics - the parts that are used in your field. Statistics is much, much broader than this. You seem unaware of much of the terminology and techniques of the other branches e.g. ROC's, SEM, SNR, entropy, differentiability. At least when I have asked if you are familiar with those terms you have ignored the question.

You are a journeyman when it comes to this.
Well thanks but I think you mean the opposite.

If this were a water thread, I'd bow to your knowledge, which appears to be very good.
Thanks for this compliment too but if you see me say something that doesn't make sense with respect to water (and I have done that) then I don't want you to bow to my experience. I want you to point out what you see as wrong. I'm smart enough to know that just because I have taught it for many years doesn't mean that I'm always right. In fact I've found that those with long experience (the acknowledged 'experts') are quite often wrong on certain points as our memories slowly and subtly drift over time.

I feel that I must point out that while I have been thinking about brewing water for perhaps 20 years I have been applying the statistical concepts I've advanced in this thread for more like 50 yrs to radar, sonar, signal processing, telemetry, communications, orbital mechanics, horology, navigation, antennas, pH measurement and color characterization. I should also point out that I have had very similar discussions with guys re water chemistry They make outrageous statements (e.g. Henry's coefficient is a function of pH) and when bombarded with evidence to the contrary fall back on "I've been teaching it for years. You don't know what you are talking about."

Your attempt to deflect explanation of the misunderstanding...
I have never attempted to deflect explanation of the misunderstandings. I have attacked them directly. I have, in all cases, tried to give you enough information to help you to do some thinking or pencil and paper work or Monte Carlo simulations or even just reading to resolve your misunderstandings. If you can remove the blinders I think you can do that. When you say that increasing panel size may result in the same or reduced significance that is true. You just need to understand that it is true proportionally less and less of the time as panel size increases so that the expected result (are you familiar with the term "expected value"?) goes down. You have to accept that the probable significance levels depends not only on the width of the null distribution but of the alternate distribution. A little thinking should make that obvious. Think rather than dismissing this as hand waving and you'll be there.


..is analogous to what I see people do in some fields, where if an analysis doesn't yield significance at the .05 level, they relax it to the .10 level.
My attempts to clearly explain the principles have nothing to do with changing the confidence levels. I have, though, demonstrated that if one obtains a marginal confidence level he is likely to get improved confidence by repeating the test with a larger panel (or going to a quad rather than a triangle).

In other words, we're going to show something matters even if we have to change the rules to make it so. I always laugh at that, and whenever I see it, I know what they know about statistics--not much.
I wouldn't assume that at all. My old boss used to say "Figures don't lie but liars figure."


You can keep posting charts and graphs until you're blue in the face, but that changes nothing--it just makes it appear, to the casual reader, that you in fact have an argument, when you do not.
Yes, there's nothing like data to mislead people especially when it agrees with published data and that's what I would hope readers would see: AJ's spreadsheet agrees with ASTM E1885 as do his curves thus either AJ is right or AJ and ASTM are both wrong.

It is particularly distressing to me that as of my reading of this post two people had 'liked' it. This means you have taken them in and I have not done a good enough job of stating the case.

None of this makes you a bad person, but in this area, you might find it advantageous to listen more and post less.
You have stated that triangle testing is invalid because it is a forced guessing test. That's wrong. You have implied that only someone who doesn't know anything about statistics would think increasing the panel size would improve the power of a triangle test. That's wrong too. Were I to remain silent there would be more than two people who would accept those incorrect statements.
 
None of this makes you a bad person, but in this area, you might find it advantageous to listen more and post less.

IIRC, this is the second time you've had this sentiment about "this" not making AJ a "bad person." Rip apart his argument if you can, but this a super weird thing to say.
 
and now in your statement that a result would be significant if a larger panel were used. That's a game, set, match type of comment.

You have implied that only someone who doesn't know anything about statistics would think increasing the panel size would improve the power of a triangle test. That's wrong too.

Respectfully, I think you're talking past each other here. If I might try to summarize what I *think* mongoose is saying, given the context, it is this:

Mongoose: "Experiment A obtained X correct samples out of Y tasters, which didn't achieve significance. Thinking you can just extrapolate out to X*3 and Y*3, and the declare significance because the sample size improved, is completely and utterly unsupported by statistics."

AJ: "We can learn some things from Experiment A, despite the fact that it didn't achieve 'significance'. These things are limited due to small sample size, but with a larger sample size, we could be more justified in the veracity of the conclusion we're trying to draw, because larger sample sizes increase the power of triangle testing."

OBVIOUSLY, you can't take an experiment and declare that if the same ratio held but the sample size was larger, it suddenly becomes significant. Because inherent in a small sample size and not declaring something "significant" is the likelihood of achieving that result by chance. Because of the nature of these experiments, that likelihood of chance resulting in the outcome means that you can't manufacture a larger sample size without actually performing an experiment with a larger sample size. And I think we *all* agree on that. I think that was a strawman created by Mongoose, because I didn't see you argue that. But if you had argued that, it would have been flatly wrong.

What AJ argued is that if the sample size was larger, the results are more powerful, as triangle testing is sensitive to sample size. And then AJ started going off on a completely different point, which is that even though the experiment doesn't achieve p<0.05, we can still look at the results in a statistical manner and ask "despite not achieving a low p-value, what can this experiment teach us?"

But nobody is arguing that you can simply carry forth a ratio of responses to a larger sample size without doing the experiment on that sample size. Only that a larger sample size is more sensitive and more powerful than a small sample size. Although the method of performing a meta-analysis suggests you can aggregate similar experiments to improve sample size, but that's a different point.
 
Also, just because someone is a professional brewer does not mean he or she is knowledgeable in the craft. I'm an attorney, and I work for a mortgage company in attorney oversight and compliance (and a bunch of other things) - so I am lawyer who lawyers other lawyers. And a lot of those lawyers are not as swift as one would hope given their hourly billing rate. Half the lawyers I worked with in private practice were not worth a ****, and going to law school is a lot harder than going to Seibel (or at least a lot more expensive).

The same applies to brewing. Think about it - if success is a function of beer quality, then BMC are just killing all the craft breweries five to one (craft beer is 17% of market volume and 20% of market value). Our club has one member who consistently medals at competition (over 25 this year) and also the header brewer at one of our local micros. I listen very carefully when either of them speak.

I just do not get the hate on Brulosphy. Is it gospel? Of course not. At least they are pushing the scientific method by attempting to control variables and publish results and avoid random chance. Personally I think the sample sizes are too small, so they do yield significant data but I wonder what the margin of error calculates out to be, but given the scale, it is a good read and information worth knowing.

My LHBS beer courses still teach to secondary beer, and a lot of my homebrew friends secondary - and you mention the word "secondary" here and in about six seconds someone is going to chime in with "don't do that, it doesn't help and only increases your risk of oxidation, and autolysis is a not an issue at the homebrew scale".

I would guess 75% of the techniques we use and the recopies we use are rote copies of what someone else has said.

/rant over, someone else can get on his or her soapbox now.
I feel theres a blurred line where "Craft" brewing and pro brewing meet... just as theres little comparision between a loaf of bread from a mass produced bread and roll maker and a local old world style bakery... somewhat different priorities and products..
 
I feel theres a blurred line where "Craft" brewing and pro brewing meet... just as theres little comparision between a loaf of bread from a mass produced bread and roll maker and a local old world style bakery... somewhat different priorities and products..

That cuts both ways. My local brewpub that doesn't distribute at all cuts corners all the time, especially on anything that could use some age. He just doesn't have the capacity to sit on product for any length of time. Then you have someone like Sierra Nevada or Stone with warehouses of barrel space... Each end of the spectrum has it's pros and cons.
 
I feel theres a blurred line where "Craft" brewing and pro brewing meet... just as theres little comparision between a loaf of bread from a mass produced bread and roll maker and a local old world style bakery... somewhat different priorities and products..

There's definitely shades of grey in between craft and not craft that depend on personal definitions (unless you simply accept the BA's take on it, which is a political position certainly open to scrutiny).

The line between pro and not pro is very clear cut. It's getting paid or not.

They're measures of different things.

And neither say anything intrinsic about quality.

The only blur between "craft" and "pro" is if you call homebrewers "craft" brewers as well. Which while not the first time I've heard that, it's not a typical position in my experience. But to each their own.
 
...I've seen tables here but darned if I know how to put one in....

Here you go:
Code:
Experiment               Panelists  No. Correc p Type I   p Type II   MLE Pd        Z         SD      Lower Lim  Upper Lim     Cat
Hochkurz                    26          7        0.815     0.0118        0        1.645      0.13         0        0.215       II
Roasted Part 3              24         10       0.2538     0.2446      0.125      1.645      0.151        0        0.373
Hop Stand                   22          9        0.293     0.2262      0.114      1.645      0.157        0        0.372
Yeast Comparison            32         15       0.0777     0.4407      0.203      1.645      0.132        0        0.421
Water Chemistry             20          8       0.3385     0.2065       0.1       1.645      0.164        0        0.37
Hop Storage                 16          4       0.8341     0.0207        0        1.645      0.162        0        0.267       II
Ferm. Temp Pt 8             20          8       0.3385     0.2065       0.1       1.645      0.164        0        0.37
Loose vs. bagged            21         12       0.0212     0.7717      0.357      1.645      0.162      0.091      0.624        I
Traditonal vs. short        22         13       0.0116     0.8301      0.386      1.645      0.157      0.128      0.645        I
Post Ferm Ox Pt 2           20          9       0.1905     0.3566      0.175      1.645      0.167        0        0.449
Dry Hop at Yeast Pitch      16          6       0.4531     0.1624      0.063      1.645      0.182        0        0.361
Flushing w/ CO2             41         15       0.3849     0.0725      0.049      1.645      0.113        0        0.234
Boil Vigor                  21         11       0.0557     0.6215      0.286      1.645      0.163      0.017      0.555
Butyric Acid                16          9        0.05      0.6984      0.344      1.645      0.186      0.038      0.65         I
BIAB Squeezing              27          8       0.7245     0.0228        0        1.645      0.132        0        0.217       II
Dry Hop Length              19          3        0.976      0.001        0        1.645      0.125        0        0.206       II
Fermentatopm Temp Pt 7      22         11       0.0787     0.5415      0.25       1.645      0.16         0        0.513
Water Chemistry Pt. 8       22          8       0.4599     0.1178      0.045      1.645      0.154        0        0.298
The Impact Appearace Has    15          7        0.203     0.4006       0.2       1.645      0.193        0        0.518
Stainless vs Plastic        20          6       0.7028     0.0406        0        1.645      0.154        0        0.253       II
Yeast US-05 vs. K-97        21         15       0.0004     0.9806      0.571      1.645      0.148      0.328      0.815        I
LODO                        38         25          0       0.9863      0.487      1.645      0.115      0.297      0.677        I
Yeast WLP001 vs. US-05      23         15       0.0017     0.9424      0.478      1.645      0.149      0.233      0.723        I
Whirlfloc                   19          9       0.1462     0.4354      0.211      1.645      0.172        0        0.493
Yeast: Wyeast 1318 vs 10    21         12       0.0212     0.7717      0.357      1.645      0.162      0.091      0.624        I
Headspace                   20         11       0.0376     0.7002      0.325      1.645      0.167      0.051      0.599        I
Yeast US-05 vs. 34/70       34         25          0       0.9986      0.603      1.645      0.113      0.416      0.79         I
Hops: Galaxy vs. Mozaic     38         17       0.0954     0.3456      0.171      1.645      0.121        0        0.37
Storage Temperature         20         12        0.013     0.8342       0.4       1.645      0.164      0.13       0.67         I
Yeast Pitch Temp Pt. 2      20         15       0.0002     0.9903      0.625      1.645      0.145      0.386      0.864        I
Corny vs. Glass Fermente    29         16       0.0126     0.7682      0.328      1.645      0.139       0.1       0.555        I
Brudragon: 1308 vs GY054    75         28       0.2675     0.0405      0.06       1.645      0.084        0        0.198       II
 
But nobody is arguing that you can simply carry forth a ratio of responses to a larger sample size without doing the experiment on that sample size.
No, indeed! I though that the frequent use of terms like 'on average' and 'retest' and the fact that I mentioned that some experimenters on retest would obtain worse significance but that the majority would obtain better would have made that clear. But maybe not. Perhaps having it restated by someone else as you have done will make it clearer. Thank you.

There is yet another way of looking at this and that is by information content as I'd hinted in an earlier post. In a triangle test we can look at the observable (the number of panelists picking the odd beer correctly) and see whether the average information changes with panel size. It also depends on Pd (a more powerful signal better conveys information than a weak one) and on k, the order of the experiment. Larger k means less noise.

Information is measured in Shannons in honor of Claude Shannon who invented information theory at Bell Labs way back when. But many people still measure it in bits. One bit equals one Shannon. Shannon found that the english alphabet conveys entropy at the rate of about 4.7 bits per letter. What this practically means is that you need 5 bits to represent and english letter. Add 1 bit for shift and 1 bit for control and you have the 7 bit Baudot code of yore. The letter e, the most common in the language, has 0.378 bits of entropy whereas the much rarer z has 0.278. Thus the letter z conveys 0.1 bit more information than the letter e. I put that in here so you have some sort of idea about how 'big' a bit is.

For a 3-ary test involving 25 panelists the entropy of the observable (fraction correct) for differentiability 0.20 is -1.28 Shannons (bits)
For a 3-ary test involving 50 panelists the entropy of the observable (fraction correct) for differentiability 0.20 is -1.78 Shannons (bits)
For a 3-ary test involving 100 panelists the entropy of the observable (fraction correct) for differentiability 0.20 is -2.28 Shannons (bits)
For a 3-ary test involving 100 panelists the entropy of the observable (fraction correct) for differentiability 0.40 is -2.30 Shannons (bits)

Thus each time we double the panel size on average the entropy is decreased by a half a bit and, as entropy is a measure of disorder, the information conveyed by the observable increases by that amount. This is what every beginning student of statistics knows: things improve as 1/sqrt(M).

Only that a larger sample size is more sensitive and more powerful than a small sample size. Although the method of performing a meta-analysis suggests you can aggregate similar experiments to improve sample size, but that's a different point.
If certain that the multiple tests were done in the same way there is no reason not to combine them. In fact if one is certain that he can repeat the conditions of the test it is doubtless more practical to repeat and combine the data and start over with a new panel of twice the size. The second test should, however, have different panelists. This averages down any biases in the first panel (or second).

The decisions as to panel size should be made with reference to a ROC curve. The data in table A1.1 of ASTM E1885 is ROC curve data. It is a bit hard to see what is going on from a table so here's a look at ROC curves for 25 and 136 panelists.

A brewer looking for confirmation of the effectiveness of a process or material change which he knows suspects is going to mean differentiability of about 0.20 would use the upper curve. He'd like to be pretty sure that if he concludes the effect is real it is indeed real and so wants 0.01 confidence that a detectable difference he calls is real. He looks at the M = 25 ROC curve (because he has 25 tasters) and sees that he can get 0.01 confidence that the effect is real if he detects it but must accept very poor confidence that if he rejects it is really isn't there. This is not acceptable. He wants much more confidence that if he rejects the hypothesis of a differentiable difference he has justification for doing so. The thus consults the M = 136 curve and sees that with 136 panelists he can have 0.01 confidence that he hasn't falsely alarmed (detected a difference) while at the same time enjoying confidence at the 0.2 level that he hasn't falsely dismissed a detectable difference if indeed there is one. So he establishes a threshold at 59 correct answers and resolves to accept the hypothesis of detectability if his panel gives him 59 or more correct answers and reject it if he doesn't. The curve tells him that 136 panelists is enough for this test if Pd = 0.2. It may or may not, of course, equal 0.2 and the confidence levels he actually attains will depend both on Pd and on the randomness of the panel's decisions on the day of the test.

I didn't pick M = 136 because I like that number particularly. I chose it because it appears in Table A.1.1. against alpha = 0.01 and beta = 0.2 for Pd = 0.2. Thus, for this particular example, my curve and the ASTM standard agree. My methods reproduce the ASTM table. Thus one can't accuse me of being wrong. I am following scripture. The scripture of the church of a false god, perhaps, but a god nevertheless.

ROC136.jpg
 
I really like brulosophy. It's easy to read and provides good information. Are there flaws in the method? Sure! Should the results be taken as gospel? Definitely not. But I think the author of the xbeeriments is pretty transparent about the results. He gives the results of the triangle taste test from his cohort of friends which give you some information. He usually gives you the number of people who chose the different sample correctly. And he goes further and ask of those people if they could tell apart giving you the number of people who got lucky on their pick.

I think the three pools of people (those who couldn't pick the odd sample out, those who picked right but did so by luck and those who picked right because they could distinguish the samples) gives you a pretty good idea when a variable tested actually matters. And on top the author gives his own view on the matter as additional information.

Overall, I think brulosophy does a pretty good job at controlling their variables and constants. Would it be nice if they could include 1000 people randomly picked from around the world, and do their experiment 3 times with 3 different groups of people? Of course it would. But the point is they still provide valuable information in a way that's easy to interpret. That's awesome in my book.
 
I really like brulosophy. It's easy to read and provides good information. Are there flaws in the method? Sure! Should the results be taken as gospel? Definitely not. But I think the author of the xbeeriments is pretty transparent about the results. He gives the results of the triangle taste test from his cohort of friends which give you some information. He usually gives you the number of people who chose the different sample correctly. And he goes further and ask of those people if they could tell apart giving you the number of people who got lucky on their pick.



I think the three pools of people (those who couldn't pick the odd sample out, those who picked right but did so by luck and those who picked right because they could distinguish the samples) gives you a pretty good idea when a variable tested actually matters. And on top the author gives his own view on the matter as additional information.



Overall, I think brulosophy does a pretty good job at controlling their variables and constants. Would it be nice if they could include 1000 people randomly picked from around the world, and do their experiment 3 times with 3 different groups of people? Of course it would. But the point is they still provide valuable information in a way that's easy to interpret. That's awesome in my book.


All of that is true, but you can't really do much with their conclusions because you just do not know.
 
We know Marshall commented on this thread earlier, but I do find it interesting that the most recent exbeeriment (posted yesterday) concluded with this:

"It&#8217;s true these results are but a single point of data that ought not be accepted as gospel,..."

I know they often conclude their blogs with something akin to this statement, but I found it curious that Jake decided to insert the word gospel this go round (who knows, maybe I've missed it before?), considering how much we've been blasting those on this thread and about this forum who tout these exbeeriments as gospel.

And even with this latest iteration, besides the questioning of how all of the important controls for the tasters were conducted, if they're trying to single out one variable, why not stick to that variable? Yes they're homebrewers who are likely filling two 5-gallon kegs that they themselves must consume the majority of. But if they're going to test the difference between pale malt and pilsner malt, why also add in a couple of other malts to the ingredient list (no matter how low of a percentage they might be)? Why add hops at different intervals instead of sticking with one 60 minute addition that adds a minimal amount of IBUs, but not enough to mask any differences in the flavor of the malts? Maybe the difference would still be rather negligible.

Then again, as bwarb and AJ have pointed out, this is another one that is above 33%. Not by much, but it still is.

And then, the rest of that conclusive paragraph ends with... "though based on my experience with both of these beers, my previous convictions about the differences between Pale and Pilsner malts have faded. I won&#8217;t be ditching the use of continental Pilsner malt altogether, but I&#8217;ll definitely be using more domestic Pale malt in styles I once believed required the former, especially if the biggest difference is a little more cash in my pocket."

In one sentence, he denounces these results as gospel. In another, he confirms he won't completely stop using pilsner malt. In the last sentence, and the one likely to stick in the average reader's head, he basically accepts the results as confirming enough that he'll stop using them when he deems it unnecessary. He claims it as being "a little more cash in his pocket book," which sounds very utilitarian to the average person. But he also said he buys in bulk. So the average cost difference per batch between the two malts is almost negligible (if a homebrewer is worried about a couple of bucks per batch, perhaps he should be rethinking his monthly budget). But in using this type of language to conclude the blog, it will likely convince many people that it would be completely ok to abandon pilsner malt if they just want to go ahead and use pale malt.

Now take this type of language for this one particular exbeeriment, and extrapolate it over several. All of a sudden, if someone wants to brew a traditional czech pilsner, they've determined based on the language of these exbeeriments that the following are not necessary in order to make a good rendition of the style: malt choice, mash temperature, mash length, mash pH, using decoctions, vorlauf-ing, specific bittering hops, boil length, boiling with a lid on, fermentation temp (too many to link), fermentation timeline, yeast pitch rate, etc. And most of all, the fact that even if you compound all of these wrongdoings together, no worries! You'll still end up with only imperceptible differences in your beer!

Oh, wait a minute...

You see, while some of you here may be professionals in statistics, I've always been focused on their language, as that's my expertise (I'm at a nearly fluent level in two other languages besides English). And although they usually try to preface the conclusions in their closing paragraphs, they also usually insinuate that they're probably willing to give up the generally accepted rules for that particular data point. Thus contradicting themselves in the very same paragraph.

Yet, as their traditional vs. short and shoddy (aka compounding all the variables from each "confirmed") experiment showed, maybe having one particular aspect of your brew day go wrong won't completely ruin your beer. But maybe having all of these variables changed all at once will in fact produce a subpar beer.

IOW, in my estimation, their language is just as dangerous as the testing methods they're using, if not more so.
 
To me Brulosophy shows that there is a lot of latitude in the tolerances of the beer making process, and I'm thankful for that. If the latitude range was not as broad as it is, the world of home brewing would be a much smaller one.

I only wish that they would quit making so many IPA derivatives, and concentrate upon more subtle styles. You can hide a lot behind a massive infusion of hops. They should perhaps settle upon one subtle recipe, to eliminate the recipe itself as a variable. Then later move to another and noticeably different recipe and repeat each experiment.

https://mashmadeeasy.yolasite.com/
 
I think the professional community would be ambivalent to these experiments. Very few of them have direct applicability to professional processes and/or equipment.

Statistical analysis at the professional level tends to focus on internal QA/QC for process control against a set of standards, usually determined internally and documented through SOP.

Sensory analysis is certainly used and its sophistication is generally directly proportional to the size of the breweries operations. Large breweries with dedicated QA departments have sensory panels that are trained and regularly tested against dosed samples to determine their threshold levels. Sensory training takes years. There are companies entirely devoted to conducting sensory analysis for the brewing/distilling industry.

A nano brewer (one or two man operation) might look at a Brulosophy experiment with some interest but when you get in the 10bbl+ size brewery there is little applicability and at the large scale breweries with multi-state distribution, no applicability at all.
 
I think you guys are trying to apply your personal expertise and interest in statistics and language to use Brulosophy in an effort to prove points which weren't intended or expected...
If it isn't intended or expected that people would try to interpret the results of a triangle test in the way triangle tests are interpreted why bother to follow some parts of the prescription for a triangle testing while ignoring others and why tell people that you are using triangle testing? It would be more intellectually honest to just describe what you are doing and give your results. But if you say we did a triangle test and here's our conclusion and the confidence level readers are going to assume that you followed a recognized protocol and have some expectation that your results are at least as reliable as the protocol allows. And people are going to look at your result in that light. When people find that the investigators really didn't follow a triangle protocol and that they apparently do not understand what a triangle test really is, when it applies, what its limitations are and how to conduct one it raises questions as to what the data really mean, if anything. When the investigators, or, really, their fans, come back and say "Well you never should have tried to interpret these tests as triangle tests" the obvious response is "Well you never should have mislead me into thinking you were doing triangle tests".

That's harsh and implies that the data are worthless. But those of us with some personal expertise and interest in statistics are interested in any data and want to see if there is any value in these tests whatever name they are called by. And it seems there is. In several cases we have found some even though the experimenters may not have been aware that it was there. And in ranting on about what we found and how we found it we can point to some things the experimenters could do to bring their procedures around to following more closely to a standard such as the oft mentioned ASTM E1885. At the same time we hope to educate the experimenters to the point where they are more likely to spend their test time effectively.

The most recently posted experiment is an interesting example:

My Impressions: These beers were much more difficult to discern than I imagined they&#8217;d be. Over multiple triangle test attempts, I pick the odd beer out and certainly lacked confidence in my choices.
Now if he done these taste tests before he empaneled his tasters he would have known from the difficulty in his ability to confidently detect the difference that the signal was weak i.e. that the beers are not very different; that Pd is small. His test results eventually suggested that it is about 0.05 though it could have been (95% confidence) between 0 and 0.3. Barely being able to taste the difference says it is much closer to 0 than 0.3. Had he looked at the ASTM prescription (as I have recommended over and over) he would have seen no panel size less than 39 for Pd = 20 at any tabulated level of confidence (the table is for confidence levels <0.2 for both alpha and beta) and would have known that a test with a panel of 22 members was not likely to give him a 'statistically significant' answer for signals < 0.2 which is where his pretasting strongly suggests he lies.

In order to reach statistical significance at this sample size, 12 tasters (p<0.05) would have had to correctly identify the unique sample, though only 8 (p=0.46) did, indicating participants in this xBmt were unable to reliably distinguish a lager made Pale malt from one made with Pilsner malt.
This is only half the story. What is the confidence, based on these data, that fewer than 20% of tasters will be able to detect the apparently small difference? It is p = 0.12. That's not as strong as we would like but suggests that one may very well be able to get away with using whichever malt he has on hand without his 'customers' noticing the difference. I think that's significant.

It was noted in another post that the main man is reading this thread. Despite that in this most recent test the practice of presenting 2 of one of the beers and one of the others persists. This is a violation of the protocol and something that is easy to change. We wonder why they persist in doing this. Perhaps in a desire to be consistent if wrong or perhaps they just don't care.

The earlier comments may seem like statistical gobbledy-gook and readers may think one needs a PHD in statistics to carry out a triangle test. That's not so. The procedure isn't written for PHD's in statistics. It's written for guys in bakeries, breweries, bottling plants, pharmaceutical labs etc. Get it, read it, follow it and you are doing triangle tests. I think it's a pity the Brulosophers don't choose to do that but then my job is to show them what they can do; not tell them what they should do.
 
To me Brulosophy shows that there is a lot of latitude in the tolerances of the beer making process, and I'm thankful for that. If the latitude range was not as broad as it is, the world of home brewing would be a much smaller one.

I only wish that they would quit making so many IPA derivatives, and concentrate upon more subtle styles. You can hide a lot behind a massive infusion of hops. They should perhaps settle upon one subtle recipe, to eliminate the recipe itself as a variable. Then later move to another and noticeably different recipe and repeat each experiment.

https://mashmadeeasy.yolasite.com/

I guess maybe I was a bit too wordy and/or my attempt at cleverness failed.

What they show is that there might be some latitude on one single variable, and even then, not all of the time, but sometimes.

This is the problem I have with the way they've worded things, and the problem others have (and now myself as well) with the way their using these statistics. Although there there might be some latitude on one single variable, and even then, not all of the time, but sometimes, what they have ALSO shown is that when you compound all of the latitude on these variables, it does in fact make a difference.

So maybe my brew day goes really well, but I missed on my mash temp by a bit. Ok, now worries as long as it's in the generally accepted (read: scientifically shown) guidelines for when the enzymes are working. Or maybe I don't have enough pale malt, so I go ahead and throw in some pilsner. Or maybe I don't have enough time to boil it for 60 minutes. Or maybe I don't have the time to do a 60 minute hopstand. Maybe I can relax a little and not get so stressed over a hobby because one thing went a bit awry. Or maybe even a couple of those things not going according to plan might not ruin my beer.

But even brulosophy has shown that if you add all of those mistakes together, you're at the least ending up with a perceptibly different beer. When I set out to make a certain beer, that's what I expect to end up in my glass. So for people like me, and likely many others in this thread and on this forum, when people start touting the results as gospel, or even less than that, just saying that there is some latitude in all aspects of brewing, I'll refute that. And in fact the one actual exbeeriment with "triangle testing" and everything also refuted that.
 
This is the problem I have with the way they've worded things, and the problem others have (and now myself as well) with the way their using these statistics. Although there there might be some latitude on one single variable, and even then, not all of the time, but sometimes, what they have ALSO shown is that when you compound all of the latitude on these variables, it does in fact make a difference.

Agreed. Their reliance on p<0.05 really makes the waters murky, particularly with small sample sizes.

If they have 19 tasters on a panel, they need 11 (~58%) to claim "significance". If they get 10 correct responses picking the odd beer, they claim it's not significant. When random chance would be 6.33 guessers.

So they claim it's not significant that tasters couldn't "reliably detect the difference" and then readers say "See! It doesn't matter!!!" No, you're wrong. It probably DOES matter but the sensitivity of the test didn't prove it matters beyond an arbitrary threshold that the world has set.

I think these experiments are very useful as a learning tool. It helps to understand how significant different processes are to making beer. But reliance on p<0.05 gives a veneer of statistical rigor to these experiments that make it WAY too easy for readers to completely misinterpret the findings.

(Edit: I keep using the word "prove" and I *know* it's grating on AJ lol... I should stop saying prove.)
 
So maybe my brew day goes really well, but I missed on my mash temp by a bit.
Maybe I can relax a little and not get so stressed over a hobby because one thing went a bit awry. Or maybe even a couple of those things not going according to plan might not ruin my beer.

We all have our favorite mistakes:

[ame]https://www.youtube.com/watch?v=AmIlUKo4dQc[/ame]
 
It seems like the thread started with two camps: Brulosophy is awesome! OR Brulosophy is not scientific enough. Now a third camp is emerging: Brulosophy might not be scientific enough, but it still gives us insight into how effective a process might be.

But the third camp is incorrect. If it's not enough of a sample to draw any real conclusion from it, then the conclusion you draw has no basis. And as mentioned, most of them (that I've read) come to the conclusion that what they're testing doesn't matter. And often we know that it does matter, we just don't have any more rigorous experiment to confirm it.
 
And why haven't the mods shut this down yet? It's off topic, aggressive, and way too much math. And boring besides.
 
And why haven't the mods shut this down yet?.

Because it's turned into a very productive discussion. It's actually quite rare on HBT to have such a long insightful conversation with so many intelligent minds giving their input.

It's a welcome change from "I just made my first beer. It's a 15% RIS. Why did it stop fermenting at 1.060?"
 
Because it's turned into a very productive discussion. It's actually quite rare on HBT to have such a long insightful conversation with so many intelligent minds giving their input.

It's a welcome change from "I just made my first beer. It's a 15% RIS. Why did it stop fermenting at 1.060?"

The sarcasm went woosh
 
It seems like the thread started with two camps: Brulosophy is awesome! OR Brulosophy is not scientific enough. Now a third camp is emerging: Brulosophy might not be scientific enough, but it still gives us insight into how effective a process might be.

But the third camp is incorrect. If it's not enough of a sample to draw any real conclusion from it, then the conclusion you draw has no basis. And as mentioned, most of them (that I've read) come to the conclusion that what they're testing doesn't matter. And often we know that it does matter, we just don't have any more rigorous experiment to confirm it.

This is the issue though - you're getting too hung up on scientific process and statistical proof to even allow for the possibility of the third camp existing, thus labeling them as "incorrect".

Regarding your 3rd camp: Am I getting insight into an effective process? Me - absolutely. I'm thinking about things I never considered before. Is it "correct"? I don't know. Is it "provable"? PROBABLY NOT, at least without a tremendous use of resources.

So until people who complain about their testing not being good enough start to donate some serious cash to roll these exbeeriments and report on them in ways which fill their needs, maybe just take it for what it is and appreciate that there is some information there.
 
I do Know that I love brewing beer. I have met a bunch of great folks who also love brewing beer. I also know that for me the difference between 148f and 156f mash is not a lot. It may make some differences, but to my burnt out palette, not much. I am glad brewlosophy is out there doing stuff, brewing beers, asking questions, and sharing what they find out. There is so much brewing lore, but how much of it is just passed on from from master to padwan? Again, what I know is I can make some really good beer in my backyard and that is AWESOME!!

Peace Out :ban: :mug:
 
(Edit: I keep using the word "prove" and I *know* it's grating on AJ lol... I should stop saying prove.)

No, not at all but you should be, as you are, aware that statistical analysis does not, in fact, prove anything but rather lets us quantify our uncertainty which can be helpful (and profitable) when it comes to making decisions.

Suppose we have a coin and want to test its fairness (before entering into a game of two-up). We toss it 6 times and get 6 heads. Does this prove the coin is not fair? No it doesn't but as the probability of getting 6 heads in a row from a fair coin is 2^-6 = 0.03 we can say we are confident that the coin is not fair at the 0.03 level. That's below the standard minimum accepted confidence level and so we may be tempted to declare that the coin is not fair even though the probability is pretty good that is is. To the point that we might want to ask the boxer (the guy that runs the game) to substitute a different coin. At a level of 0.03 percent we are not very confident that the coin is fair to the point that we might want to take action. Would we be more confident if we got 10 heads in a row? Numerically, yes as the number of heads supports the thesis that the coin is unfair at the 0.001 level. We would be more likely to demand a new coin. Is 0.001 confident enough? Only you can decide.

:.. Brulosophy might not be scientific enough, but it still gives us insight into how effective a process might be.

But the third camp is incorrect. If it's not enough of a sample to draw any real conclusion from it, then the conclusion you draw has no basis.
Would that it were that simple! What is a real conclusion? Or, in terms of the previous remarks, what level of confidence is required to make a conclusion "real". If real means the conclusion is proven then there are no real conclusions and that is, of course, always the case. So why do we bother? Because we gain information. Before testing we have no idea as to whether a process change effects a perceptible change. In the last Brulosophy experiment the data supports the idea that using 8 tasters correct as a detection threshold that in 46% of panels a difference would be noted if we gave them the same beer. That's not very good support for the notion that the beers are differentiable. But the data tells us that using that same threshold (i.e. what the experiment measured) only 12% of panels would find them the same if we gave them samples of both beers. This is pretty good support for the notion that the beers are not differentiable.


And as mentioned, most of them (that I've read) come to the conclusion that what they're testing doesn't matter.
That's because they don't know how to interpret the data they have obtained nor how to pick a panel size. In many cases it is true - the data is not significant with respect to Type I or Type II errors. In several other cases, however, it is significant with respect to one or the other of those two error type. QED.
 
So until people who complain about their testing not being good enough start to donate some serious cash to roll these exbeeriments
Get me an estimate of how much additional cash is required to present AAB, ABA, BAA, BBA, BAB, and ABB in cups of the same color and tell me where to send the check.

...and report on them in ways which fill their needs,
The additional cost of that is the cost of obtaining a copy of the standard which is $0 as it's online free. Tell me where to send that [i.e. the link] too. Now, of course, I am assuming that the additional labor required to properly present the samples, collect and analyze the data is born by interested volunteers. I won't offer to pay salaries. The data crunching can be done by the simple spreadsheet I posted way back in this thread. Incremental cost $0.

I am not volunteering to build a sensory laboratory with the requisite lighting, temperature, air quality control and isolation booths. They would have to come up with some way of better isolating the tasters from one another than the posted pictures show.

.. maybe just take it for what it is and appreciate that there is some information there.
There is. Let's hope that they are willing to learn enough about this kind of testing to be able to find it all.
 
maybe just take it for what it is and appreciate that there is some information there.

BINGO! Marshall and I have spent a lot of time talking about this. This is what we both intend.


This really goes to the heart of this thread... is the information on Brulosophy to be taken seriously at all?

If the author(s) only intend on the information being there essentially for entertainment purposes, then that settles it. It's not meant to be a truly rigorous source and one should not cite a Brulosophy experiment as "proof" of anything. Read it, ponder it, create a meta analysis from it if you are so inclined, perhaps even duplicate an experiment for yourself. It is what it is and nothing more.
 
If the author(s) only intend on the information being there essentially for entertainment purposes, then that settles it.
Even so there may be, and evidently is, usable information, contained in the data. If their procedures are deemed consistent enough then one can do meta (and other) analysis,
 
It seems like the thread started with two camps: Brulosophy is awesome! OR Brulosophy is not scientific enough. Now a third camp is emerging: Brulosophy might not be scientific enough, but it still gives us insight into how effective a process might be.

But the third camp is incorrect. If it's not enough of a sample to draw any real conclusion from it, then the conclusion you draw has no basis. And as mentioned, most of them (that I've read) come to the conclusion that what they're testing doesn't matter. And often we know that it does matter, we just don't have any more rigorous experiment to confirm it.

That's the rub. All they do is perform an experiment and offer a [very] limited amount of statistical analysis of the experiment. It's up to the reader to actually draw a conclusion [or not].

p>0.05 doesn't mean the variable being tested "doesn't matter". Nor does p<0.05 mean that the variable being tested "matters". All the p-value does is give us a rough confidence level of if and how much it matters based upon the experiment design and the tasting sensitivity of the panel.

Beyond that, you need to draw the conclusion for yourself.
 
That's the rub. All they do is perform an experiment and offer a [very] limited amount of statistical analysis of the experiment. It's up to the reader to actually draw a conclusion [or not].

p>0.05 doesn't mean the variable being tested "doesn't matter". Nor does p<0.05 mean that the variable being tested "matters". All the p-value does is give us a rough confidence of if and how much it matters based upon the experiment design and the tasting sensitivity of the panel.

Beyond that, you need to draw the conclusion for yourself.

Unfortunately, I'll make the assumption that the majority of brulosophy readers think that p=</>0.05 is all that matters to be "true" or "false."
 
I wonder how some of you were able to watch the show Mythbusters. Too many variables left untested - how could they POSSIBLY attract a television audience!!!
 
I wonder how some of you were able to watch the show Mythbusters. Too many variables left untested - how could they POSSIBLY attract a television audience!!!

A great show! Warning Scientific Content!! They made me laugh and think. I still use the 5 second and double dipping results, "there busted". Love it :ban::mug:
 
Status
Not open for further replies.
Back
Top