• Please visit and share your knowledge at our sister communities:
  • If you have not, please join our official Homebrewing Facebook Group!

    Homebrewing Facebook Group

Do "professional" brewers consider brulosophy to be a load of bs?

Homebrew Talk

Help Support Homebrew Talk:

This site may earn a commission from merchant affiliate links, including eBay, Amazon, and others.
Status
Not open for further replies.
I've explained it as clearly as I can and if you can't see it then I would, depending on your level of interest, suggest pushing some numbers around or even doing a Monte Carlo or two if you are so inclined. The main disconnect here is that you are arguing that a statistical test, even though more powerful than another, is less valid than the other. That can only mean that could lead us to take the wrong action which, in this case, would imply that asking our hypothetical brewer using the more powerful test to decide against using the low valine malt even though it does improve (defining improve as reduction in diacetyl). I don't see how that could possibly happen.

We're going to have to agree to disagree about all this. Partly you want these tests to be about specific elements of the beer (diacetyl, e.g.), whereas I'm looking for areas that potentially confound the results.

And allowing the guessers to be part of preference trials may be necessary for the statistical elements of tests to be met, but it is the antithesis of good measurement to include tasters who can't tell a difference. They've already indicated they have no preference since they can't even tell them apart. Then a forced choice introduces noise into the data, which makes no sense at all.

Frankly, I think some of this testing is an attempt to hide, under the veneer of "scientific" statistics, the fact that the tasting panels are suspect. I'd be much more inclined, if I wanted to do a preference test, to just ask people which they preferred, and bag the triangle test. Randomly assign which beer was tasted first, then see what you get.

If I can ever figure out how to do one of these exbeeriments at a level which will satisfy my desire to make it well-controlled, I'll do some of this. Problem is, I have a single 5-gallon system. Can't split a 10-gallon batch which is what I'd really like to do. I'd do one of these "ferment warm versus ferment cool" exbeeriments and then see what we can see. And I'd put some rules on tasters as to the conditions under which they can taste the beers. Not after a lot of other beers, not after eating a garlic-and-onion burger, like that. Repeated triangle tests, something that would validate their inclusion via ability to repeat their results.

Maybe this is all just a justification for an upgrade to my system? :) :)
 
Lemme put it this way, you can have have taproom staff and your regular customers (none of whom are trained tasters) notice- consistently enough to be a pattern- smaller tweaks to beers (before they're aware they're there) than the variables Brulosophy experiments test.

Maybe that's scale, maybe not. So, grain of salt. That doesn't make what they're doing invalid, just that you need to see the limitations of one group doing unreplicated experiments with small samples of partially unknown composition in unknown settings. If you're blindly trusting it as some here seem to you're a fool.
 
Most seriously: their experiments can not be replicated reliably by others. This is a key element of the scientific process.

I'd be quite interested in hearing about this. Who has failed to replicate their results?

Anecdotally, I tried Marshall's quick ale fermentation schedule and am not terribly pleased with the result. I might change my mind once the beer is done dryhopping, but I get a weird ester taste which I _think_ is a temp-related fruit flavor, which I didn't expect from fermenting at 64 degrees with US-05.
 
Perfect. Because I dry hopped a beer for 2 days and it is the best IPA I have made so far, but when I told some "pros" they were giving me crap while I just think the proof is in the pudding. Had I not told them I dry hopped for 2 days vs a week or whatever I think they would have had a completely different opinion on the beer and commended me for the brew :p :tank:

Interesting. I am in a brewing school and when Matt Brynildson (Brewmaster at Firestone Walker) presented his Hops lectures, he did show studies that indicate after 48 hours you really get nothing of value from dry hopping. Your 2 days makes perfect sense.

When Phil Leinhart (10 years at AB and Brewmaster at Ommegang) presented Wort boiling and Cooling, he very specifically covered Cold Break as part of that.

Hot Trub Cold Trub
Proteins 50-60% 50-70%
Tannins 20-30% 20-30%
Hop Resins 15-20% 6-12%
Ash 2-3% 2-3%
Particle Size 20-80 μ 0.5-1.0 μ

So Pro's do pay attention to ongoing studies, research, etc. You have to understand that for commercial brewers, they need to look beyond whether the beer tastes good through the life of a fresh 5 gallon batch....their beer needs to maintain quality for several months at least. They are concerned with hotside aeration, dissolved oxygen post fermentation, dissolved oxygen in packaging, beer staling precursor's, storage temperatures, etc. and most of all CONSISTENCY in a product line. To change a recipe or process is a HUGE thing and is not done without considerable thought, research, and sensory analysis.
 
Lemme put it this way, you can have have taproom staff and your regular customers (none of whom are trained tasters) notice- consistently enough to be a pattern- smaller tweaks to beers (before they're aware they're there) than the variables Brulosophy experiments test.

Maybe that's scale, maybe not. So, grain of salt. That doesn't make what they're doing invalid, just that you need to see the limitations of one group doing unreplicated experiments with small samples of partially unknown composition in unknown settings. If you're blindly trusting it as some here seem to you're a fool.

I think you are the fool... for calling others fools in what has been a decent conversation.
 
So Pro's do pay attention to ongoing studies, research, etc.


SOME do. In my city, we have a lot of breweries, but only a handful are actually making good beer. I've talked to brewers at some of the bad breweries and they've brashly admitted they don't pay attention to the industry or any developments.

A few of these places have closed since I spoke to them. We're not talking about breweries who need to bottle each batch the same, we're talking about breweries who don't distribute, other than growlers, and can afford to take some chances and make things a little differently batch to batch.
 
Ugh.....

We are ALL fools.

So are we all chumps or are we all fools? Ugh to you, what great additions you've made to this conversation. You insulted everybody famous and now you're insulting all us. Welcome to my ignore List and have a nice life chump.
 
Interesting. I am in a brewing school and when Matt Brynildson (Brewmaster at Firestone Walker) presented his Hops lectures, he did show studies that indicate after 48 hours you really get nothing of value from dry hopping. Your 2 days makes perfect sense.

Yeah, I believe the brewing folks at Oregon State University did some research on this a few years back. It definitely changed my process on dry-hopping. I rarely dry-hop longer than 3-4 days, then cold crash. Given that the hop presence is one of the quickest things in a beer to fade, leaving it on the hops for up to 14 days seems like it would have allowed the hop character to fade unnecessarily if all the oil extraction happens in 48ish hours.

I wonder how this applies to a somewhat standard homebrew process of dry hopping in the keg. I've heard a lot of brewers do this and anecdotally they say that it preserves hop character throughout the life of the brew. But if all the oils are extracted in 48 hours, it seems strange that it would continue contributing to flavor over this time frame...
 
We're going to have to agree to disagree about all this.
I'm not really sure there is that much disagreement.

Partly you want these tests to be about specific elements of the beer (diacetyl, e.g.),
Actually I want it ti be about whatever the investigator is interested in. If H1 is "Valine rich malt decreases diacetyl" then the focus is on diacetyl. If H1 is "Valine rich malt improves customer acceptance in the demographic represented by this panel" then the focus is on preference. As I've said several times before these are different tests with regard to the panel selection but not the beer. In the diacetyl case you want a panel sensitive to diacetyl which you verify is the case by doing the test with diacetyl spiked samples. You can't do that with a preference testing panel but you can take steps to insure that the panel is representative of the demographic you are interested in.

...whereas I'm looking for areas that potentially confound the results.
They abound and that was the point of my original post in this thread. If the beer differs detectably in any other attribute than the one we are interested in the triangle part of the two stage test is immediately invalidated. The example I have used before is color. If use of the valine rich malt changes the color in addition to the diacetyl and the panelists can see the color the test is invalid. The probability under H0 is small and the investigator is lulled into rejecting it because of something that has nothing to do with the parameter he is interested in whether the question is as to perceived diacetyl or preference. This is why my original and several follow on posts emphasized that the investigators have to be very thoughtful about the design and conduct of the tests and my suggestion that if there were a flaw in Brulosopher's approach that it might well lie in this area.

And allowing the guessers to be part of preference trials may be necessary for the statistical elements of tests to be met, but it is the antithesis of good measurement to include tasters who can't tell a difference.
That depends on what the investigator is interested in. If he wants to know about diacetyl he shouldn't empanel a group that doesn't have demonstrated sensitivity to diacetyl (as demonstrated by triangle tests with diacetyl spiked beers). But in the preference case (H0: "Valine rich malt does not improve customer acceptance in the demographic represented by this panel") we want the panel to include people who can't tell the difference if we are interested in selling beer to a demographic which includes people who can't tell the difference. For such a panel it is possible that H0 may be true. If asked about preference a diacetyl sensitive panel would probably enthusiastically endorse the beer made with the valine enhanced wort causing the investigator to reject H0 and, given that he is interested in a market that has a decent proportion of people who can't tell the difference, thereby commit a type I error.

They've already indicated they have no preference since they can't even tell them apart. Then a forced choice introduces noise into the data, which makes no sense at all.
As noted above, sometimes it does. Type I errors can be as damaging as Type II. It appears that failure to reject H0 when we should accept it (Type I) is the threat in preference (subjective) investigations whereas Type II errors (failure to reject H0 when we should) is the threat in tests in which an objective (e.g. more or less diacetyl) answer is sought. In those cases guessers do introduce noise but as noted in my last post we could easily reduce that noise by using quadruplets rather than triplets. The fact that this is not done indicates (to me, anyway) that the amount of noise injected in a triplet test is not problematical or at least that the reduction going to a quadruplet test is not justified by the extra difficulty in manipulating quadruplets).


Frankly, I think some of this testing is an attempt to hide, under the veneer of "scientific" statistics, the fact that the tasting panels are suspect.
My impression, as an engineer, is that people in fields like biology, medicine, the social sciences, finance and many others, go to college and are given a tool kit of statistical tests which they then apply in their careers and eventually come to a pass where they are plugging numbers into some software package without remembering what they learned years ago in college and thus not fully understanding what the results they get mean. Engineers do this too, BTW. In homebrewing I think you find people scratching their heads over what data they get from their homebrewing experiments mean and then they discover an ASBC MOA into which they can plug their numbers and get a determination as to whether they are 'statistically significant' or not without having a real idea as to what that means. This is kind of tricky stuff. If I go away from it for even a short period of time I have to sit down and rethink the basic concepts. Maybe it's just that I am not intrinsically good at statistics or don't have much experience with it but as the discussion shows here there are many subtle nuances in how experiments are conducted and how the data are analyzed. As engineers say "If all the statisticians in the world were laid end to end they wouldn't reach a conclusion." It's supposed to be a joke but it is true because of the fundamental nature of statistics: it is a guessing game. That's why a statement
And to a guy in my field, "guessing" is the antithesis of reliability. Without reliability you cannot have validity--and it's very hard for me to see either here.
from a statistician kind of surprises me. Everything we observe is corrupted by noise. We cannot measure voltage with a voltmeter. We can only obtain from it an estimate of what the voltage is and must recognize that the reading is the true voltage plus some error. Statistics is the art of trying to control or at least quantify that error so that the guesses we are ultimately forced to report represent the truth at least fairly well. Well, that's my engineer's perspective on it.

I'd be much more inclined, if I wanted to do a preference test, to just ask people which they preferred, and bag the triangle test. Randomly assign which beer was tasted first, then see what you get.
Interesting that you say that as just this morning I came up with a test. A number of participants are presented n objects one of which is different from the others. The instructions to the participants are:

"You will be given a number of objects and a die. One of the objects is different from the other. Identify it. If you cannot use the die to randomly pick one of the objects. Separate the object you picked and one other object from the group. Now choose which of these two objects you prefer. If you cannot decide on one or the other use the die again (or a coin) to randomly select one."

Thus the test you propose is the first part of my test with n = 2 and the triangle test is first part of my test with n = 3. The following sets of numbers show probabilities, under the null hypothesis, that 10 out of 20 testers will chose the different object correctly AND that 5 of those 10 will prefer one or the other.
3 TR(20,10,5,1/3,1/2) means that n = 3, the panel size is 20, 10 correctly pick, 5 prefer one or the other, that the probability of picking correctly is 1/n = 1/3 and that the probability of preferring is 1/2. The first number in the next line is the confidence level for the triangle part of the test and the second the confidence level for the two part test.

2 TR(20,10,5,1/2,1/2)
0.588099 0.268503
3 TR(20,10,5,1/3,1/2)
0.0918958 0.0507204
4 TR(20,10,5,1/4,1/2)
0.0138644 0.00802728
5 TR(20,10,5,1/5,1/2)
0.00259483 0.00153428

These numbers clearly show that a triangle test is a more powerful test than a pick one of 2 test (which is why triangle tests are performed rather than pick one of two tests) and that a quadrangle test is more powerful than a triangle test. They also show that the two part test is more powerful than the triangle or quadrangle by itself in that they increase one's confidence in what his data shows him. This is all, of course, under the assumption that the investigator does not step into one of the many potential pitfalls we have discussed.

If I can ever figure out how to do one of these exbeeriments at a level which will satisfy my desire to make it well-controlled, I'll do some of this.
I don't think you will ever be able to do enough experiments to get you past your conception that allowing guesses is a detriment. I think Monte Carlo is a much more promising approach.

Maybe this is all just a justification for an upgrade to my system? :) :)
If H0 is "You shouldn't upgrade your system" the level of support for that is p << 1.
 
Interesting. I am in a brewing school and when Matt Brynildson (Brewmaster at Firestone Walker) presented his Hops lectures, he did show studies that indicate after 48 hours you really get nothing of value from dry hopping. Your 2 days makes perfect sense.

When Phil Leinhart (10 years at AB and Brewmaster at Ommegang) presented Wort boiling and Cooling, he very specifically covered Cold Break as part of that.

Hot Trub Cold Trub
Proteins 50-60% 50-70%
Tannins 20-30% 20-30%
Hop Resins 15-20% 6-12%
Ash 2-3% 2-3%
Particle Size 20-80 &#956; 0.5-1.0 &#956;

So Pro's do pay attention to ongoing studies, research, etc. You have to understand that for commercial brewers, they need to look beyond whether the beer tastes good through the life of a fresh 5 gallon batch....their beer needs to maintain quality for several months at least. They are concerned with hotside aeration, dissolved oxygen post fermentation, dissolved oxygen in packaging, beer staling precursor's, storage temperatures, etc. and most of all CONSISTENCY in a product line. To change a recipe or process is a HUGE thing and is not done without considerable thought, research, and sensory analysis.

Excellent, agree very much with you and I think you shed some light on some of the differences between pros and homebrewers.
 
Our sensory instructor pointed out that the earlier your beer is tasted in a trial, the better your score is likely to be....order matters.
 
So are we all chumps or are we all fools? Ugh to you, what great additions you've made to this conversation. You insulted everybody famous and now you're insulting all us. Welcome to my ignore List and have a nice life chump.

I added this a long time ago but nobody responded:

if you were to lower the significance expectation to perhaps just 80-85% confidence to match the scienciness of these experiments, then suddenly a lot more of their experiments indicate something *might* actually be going on with their chosen variables than their conclusions currently indicate. In other words, where they come close to "significance" and just miss it by a little, there might actually be something happening worth further exploration. THOSE then are the xbmts that should be revisited in my view... And by independent teams, yadda yadda.

Getting back on topic... I'm sure most of the pros couldn't care less about Brulosophy. Why would they. They make beer and sell it. Even if it really sucks, they sell it to the masses just fine, no problem and no need for improvement. Baa baa.

Chump. :D
 
A perhaps more intuitive explanation as to why a triangle (or quadrangle) test is better than a binary test:

Suppose you have a panel of 60 guys 30 of whom are BJCP masters and 30 of whom can't tell the difference between Bud Light and Pilsner Urquell ("All yellow fizzy beers to me" as a colleague used to say). In the binary test every one of those gets to vote as to whether one beer is better than the other so the panel is has 60 members of whom half (50%) are unqualified. In a triangle test 2/3 of the unqualified (40) will be eliminated and the panel now consists of 40 people 10 of whom are unqualified with 30 qualified and the percentage of qualified voters is now 75%.
 
Our sensory instructor pointed out that the earlier your beer is tasted in a trial, the better your score is likely to be....order matters.

It's been awhile since I reviewed their methodology, but does Brulosophy keep the order of the triangle tests the same for each taster, or do they randomly assign order?

That would be interesting to look at in repeated tests....can those who correctly identified (or guessed right) the odd beer out repeat that result if the order of samples is different?
 
It's been awhile since I reviewed their methodology, but does Brulosophy keep the order of the triangle tests the same for each taster, or do they randomly assign order?

That would be interesting to look at in repeated tests....can those who correctly identified (or guessed right) the odd beer out repeat that result if the order of samples is different?

Good point.
 
Aj and mongoose, awesome work, I hope to understand your discussions better some day.
 
Yeah, I believe the brewing folks at Oregon State University did some research on this a few years back. It definitely changed my process on dry-hopping. I rarely dry-hop longer than 3-4 days, then cold crash. Given that the hop presence is one of the quickest things in a beer to fade, leaving it on the hops for up to 14 days seems like it would have allowed the hop character to fade unnecessarily if all the oil extraction happens in 48ish hours.

I wonder how this applies to a somewhat standard homebrew process of dry hopping in the keg. I've heard a lot of brewers do this and anecdotally they say that it preserves hop character throughout the life of the brew. But if all the oils are extracted in 48 hours, it seems strange that it would continue contributing to flavor over this time frame...

I've had to be cautious dry hopping. Most of the time 3-4 days. However some seem to have an issue with renewed fermentation after dry hopping (and research is shedding light on it, Cascade seemingly a hop to do it among others) that if we crash out prematurely may cause VDK to pop up down the line. Not as big an issue in the tap room where it can be monitored but we can't have kegs go out to distro like that. And obviously I'd rather not have it at all.

So yes. Paying attention to research.
 
Fascinating discussion of the statistics aspect by ajdelange and mongoose - I appreciate all this. But my own &#8220;testing&#8221; consists of making a specific change from what I did on a similar brew a couple of months ago and noting which one I liked better. Regardless of the statistical validity, the Brulosophy method is many orders of magnitude better than my &#8220;tests&#8221;, so I pay close attention.
 
I'll comment:
Where Brulosophy goes a little too far in my view is expecting 95% confidence before they'll declare a result statistically significant. Come on... That's overkill.
That is really up to you to decide. It is incumbent on the investigator to report what he measured and what those data imply in terms of the null hypothesis but is up to you to decide if you wish to reject it based on what was measured. People who do this for a living usually don't accept p > 5% but if you are convinced at p > .15 or .20 then that's OK. It's just that most people won't agree with you.

The Brulosopy team and chump tasters ain't scientists.
It is also up to you to decide whether the null hypothesis tested against is valid. Remember that it is "This panel can't tell the difference between these beers". If you think the panel is biased or unqualified or that difference has been telegraphed to it or... through improper procedure or panel selection then you don't really care what the confidence level is because, in your opinion, the test is invalid.

But if you were to lower the significance expectation to perhaps just 80-85% confidence to match the scienciness of these experiments, then suddenly a lot more of their experiments indicate something *might* actually be going on with their chosen variables than their conclusions currently indicate. In other words, where they come close to "significance" and just miss it by a little, there might actually be something happening worth further exploration.
Take out "to match the scienciness of these experiments" and you are spot on. That is exactly what a close to significance level tells you. That something might be going on but you didn't measure it to statistical significance because your panel wasn't big enough. Before one can draw even that conclusion, however, he must be convinced that the methodology was sound. Marginally high confidence in the data from a flawed experiment is a useless as high confidence in the data from a flawed experiment.
 
AJ, thanks for the thoughtful response.

My opinions and further insights:

Personally I am pretty confident that Marshall & Co. are good brewers -- maybe even very good. While I've never tasted their beers, I also have no reason to doubt their experience and mastery of their processes, and they sure seem to keep better control of things than I would, plus I know they've won awards in competitions and are generally well respected by all who know them. Plus, I have interacted with Marshall several times and he seems to me to be a very level-headed and like-minded individual.

What I question personally is whether we should place much validity in their results when they expect John Q. Randomguy -- who might know little or nothing about beer -- to be able to detect differences between two beers at a 95% confidence level. But in my view, if we take a more loose approach and only expect John Q. as well as all the other various experienced tasters to detect a difference an average of maybe about 80% of the time, with the ultimate goal being, "MAYBE, JUST MAYBE, there is something going on here", rather than expecting a result of "yea, verily, this experiment has 95% confidence that there seems to be a difference", then with an 80% bar instead of 95%, this lower bar is easier to meet or to "qualify" for further experimentation, rather than rehashing the same old "nope, we didn't achieve 'statistical significance' yet again". Statistically, if they only expect to be right about 80% of the time instead of 95%, the results reported should prove more interesting, at least in my own chumpy eyes.

Shameless admission: I actually review closely the p value achieved for each and every Brulosophy xbmt with a goal of about 0.15-0.20 as explained above so that I can find gold dust beneath the lack of gold nuggets from the conclusions that they often present to us. The data is there; I just have to interpret it myself.

I believe you and I are singing somewhat the same tune, just that you're more like Luciano Pavoratti and I'm more like the chump humming along to it and bopping his head with the tempo.

Cheers again all.
 
So are we all chumps or are we all fools? Ugh to you, what great additions you've made to this conversation. You insulted everybody famous and now you're insulting all us. Welcome to my ignore List and have a nice life chump.


Simmer down or the mods will close the thread! :)
 
Interesting. I am in a brewing school and when Matt Brynildson (Brewmaster at Firestone Walker) presented his Hops lectures, he did show studies that indicate after 48 hours you really get nothing of value from dry hopping. Your 2 days makes perfect sense.


2 days of dry hopping has no effect. But it was the best IPA he's made. Therefore, less hop contribution is better.

Evidence that IPAs just aren't good?
 
Makes me sad that I only took one stats class in college. Who knew it would be so useful! :p

never got into stats, more of a probability man myself...

'draw five cards from a shuffled deck, what is the probability that two are red and one is a seven?'

factorials, ftw!:ban:
 
I'm not really sure there is that much disagreement.

About some of it, no. But about qualifying for panels, absolutely.

I've never had a problem including guessers in the triangle test statistics. That's what the test evaluates, whether the numbers are greater than what one would expect by pure guessing.

I don't think you will ever be able to do enough experiments to get you past your conception that allowing guesses is a detriment. I think Monte Carlo is a much more promising approach.

Here's why I do think that, and it's not a statistical reason, it's a measurement reason. It flies in the face of common sense that one would use, in a test of preference, people who demonstrably cannot make a preference decision.
They're guessing!

You want people who demonstrably can make that distinction doing such preference evaluation. There are other ways, better ways, to find such people, including seeing if people who choose the odd one out can do so repeatedly before moving them on to the preference test.

It's simply a reliability issue, nothing more.
 
Before one of the mods terminates this thread, I want to commend the OP for constructing one of the best loaded titles I've seen on HBT :mug:
Seriously, that's art right there ;)
 
Status
Not open for further replies.

Latest posts

Back
Top