• Please visit and share your knowledge at our sister communities:
  • If you have not, please join our official Homebrewing Facebook Group!

    Homebrewing Facebook Group

Brulosophy expert can't tell a Blonde Ale made with straight RO from RO plus minerals

Homebrew Talk

Help Support Homebrew Talk:

This site may earn a commission from merchant affiliate links, including eBay, Amazon, and others.
Status
Not open for further replies.
Whatever people think of these "experiments", one can't hide from the fact that most people, experienced beer drinkers or not, usually can't tell the difference between beer A and beer B.

Except that fact is not a fact at all. Here is a rather typical example:

"While 15 tasters (p<0.05) would have had to identify the unique sample in order to reach statistical significance, only 13 (p=0.10) made the accurate selection, indicating participants in this xBmt could not reliably distinguish an IPA made with a 7.4 oz/209.4 g dry hop charge from one made with an 11 oz/311.8 g dry hop charge."

Note that 13 tasters accurately selected the different sample, and that p=0.10.
This means that if there were no detectable difference between the beers, there was only a 10% chance that 13 or more tasters would get it right. But they did. To translate that into "indicating participants in this xBmt could not reliably distinguish" is misleading as hell. Anyone reading those words and not thoroughly familiar with triangle testing and p values would be very likely to think "the results show that there's no difference."

More reading: http://sonsofalchemy.org/wp-content/uploads/2020/05/Understanding_Brulosophy_2.pdf
 
Nobody here has been more critical of Brulosophy's test methodology than me. But that said, I've long thought the experimental methods were actually rather good. They take pains to try to brew or adjust two batches keeping everything the same except the variable of interest, then see if there are any discernible differences. The whole idea is to remove all other explanations for the difference except one.

The testing part, not the brewing part, is where I've always had problems. No controls over who the population is, what they ate or drank beforehand.....the testing is just not very robust, and it introduces an explanation for differences besides that of the test samples.

But in the end, what Marshall does has no effect on my life. My beer tastes great; my family still loves me; he doesn't have access to my bank account. He's very straightforward in what he's doing, and the critical-thinking consumer of information can choose to accept it or not.

**********​

I met Marshall at the BREW Boot Camp in March 2019. Nice guy--the sort with whom you'd like to sit down and have a beer. Not pretentious, just doing something in the beer area.

I'd brought a few bottles of my Darth Lager along for feedback from people. We were sitting around a coffee table in the lounge area and he had a half-glass of Darth. A few minutes passed and he said "Hey, let me have some more of that." But then, it was just him, and no comparisons. :) :)

**********​

In the end, I've always thought this about Brulosophy: at least they're doing something. Is he making money from this? Sure. A lot? Nah. And nothing he does requires any money from me.

Very few of his detractors here--including me--are doing carefully-controlled experiments and then reporting on the results. At least, as Teddy would have said, he's "in the arena."
 
Last edited:
Except that fact is not a fact at all. Here is a rather typical example:

"While 15 tasters (p<0.05) would have had to identify the unique sample in order to reach statistical significance, only 13 (p=0.10) made the accurate selection, indicating participants in this xBmt could not reliably distinguish an IPA made with a 7.4 oz/209.4 g dry hop charge from one made with an 11 oz/311.8 g dry hop charge."

Note that 13 tasters accurately selected the different sample, and that p=0.10.
This means that if there were no detectable difference between the beers, there was only a 10% chance that 13 or more tasters would get it right. But they did. To translate that into "indicating participants in this xBmt could not reliably distinguish" is misleading as hell. Anyone reading those words and not thoroughly familiar with triangle testing and p values would be very likely to think "the results show that there's no difference."

More reading: http://sonsofalchemy.org/wp-content/uploads/2020/05/Understanding_Brulosophy_2.pdf

Not sure what your point was here. Statistical significance is simply an objective way of deciding whether to reject the null hypothesis (in this case, no difference) or not. There's nothing holy about the 5% level, or 1% level. They're just standards that have become....well, standard.

I would agree that translating a non-significant result into "participants....could not reliably distinguish" is misleading. Reliability is consistency of measurement, so I take issue with the use of that word, and in the end, it's the panel, not the individuals. If an individual, in repeated triangle tests, could not consistently pick the odd one out, then that individual could not reliably do so.

And then there's the panel thing. Who constitutes it? To what population of beer drinkers does it generalize? People who drink to get drunk? People who like trying different things? People who loves to deconstruct a beer into its various flavor components? Nobody knows. So if the panel can or cannot "reliably" distinguish between the beers, what does it tell us about how beer drinkers perceive it? I don't know.

And then you get to the individuals involved. What were they eating or drinking just prior to the triangle test? A couple of pints of very bitter IPA that fried their taste buds? Heavily spiced food?

So in the end, I don't know if an individual can't tell the difference because A) what they ate/drank killed their taste buds, or B) they don't have a very sensitive palate generally, or C) there really is no difference between the two samples.
 
Not sure what your point was here. Statistical significance is simply an objective way of deciding whether to reject the null hypothesis (in this case, no difference) or not. There's nothing holy about the 5% level, or 1% level. They're just standards that have become....well, standard.

My point here is that average readers don't understand significance, p value numbers, or even the basic fact that in a triangle test a blind guess is expected to be correct 1/3 of the time, and not half. So how could they let people know how to really think about the results? Well, they could append these words (with the appropriate numbers) to every test where p was >= .05...

"It should be noted that if there were no difference, on average we would expect 9 or 10 of the tasters to make the correct selection. But 13 (or more) did, which if there were no difference, was only 10% likely to happen."

But then, you wouldn't have well intentioned people posting things like "one can't hide from the fact that most people, experienced beer drinkers or not, usually can't tell the difference between beer A and beer B." And you wouldn't have people dropping or changing important parts of their process after being misled.
 
The worst thing IMHO is their trying to draw general, broad conclusion from a single data-point which is at least bad science and in such a complex system such as beer and the brewing of beer is just laughable.
One example. They try boiling with and without a lid on to see if DMS becomes detectable boiling with a lid on or not. The problem is, there are several variables affecting the final level of DMS in beer. First you have the level of DMS-precursors in the malt that makes up the grist. Then you have the lid design which which determines the amount of condensation and where that condensate ends up. Then you have DMS scrubbing during fermentation which can be stronger or weaker depending on fermentation dynamics. Change any of those variables and you could find yourself with DMS levels that are above detection threshold.
What do they do? They decide the only variable is lid-on/lid-off and only test for that, partly because of the fact that with their simple means they have no way of testing for and affecting the other variables. All they could derive from such an experiment is, "We brewed this beer with this malt and this equipment and couldn't detect any differenece vis-a-vis DMS between boiling with lid on and lid off." Which is all jolly good for them but frankly totally irrelevant for the general public, who under different conditions might very well get completely different results.
But turn that into a BeerXperiment and draw broad, albeit unproven, coclusions and suddenly you're "radical" and "revolutionary" and you can sell a T-shirt with your name on it for a few bucks...
 
My point here is that average readers don't understand significance, p value numbers, or even the basic fact that in a triangle test a blind guess is expected to be correct 1/3 of the time, and not half. So how could they let people know how to really think about the results? Well, they could append these words (with the appropriate numbers) to every test where p was >= .05...

"It should be noted that if there were no difference, on average we would expect 9 or 10 of the tasters to make the correct selection. But 13 (or more) did, which if there were no difference, was only 10% likely to happen."

But then, you wouldn't have well intentioned people posting things like "one can't hide from the fact that most people, experienced beer drinkers or not, usually can't tell the difference between beer A and beer B." And you wouldn't have people dropping or changing important parts of their process after being misled.

OK, I agree. Most don't understand significance. If something is "significant," it must be important, right? :) :) :)

I taught college-level statistics for....well, 30+ years--retired in May--and one of the best examples of this came from the Brulosophy site.

There was an early Brulosophy experiment that compared Maris Otter with 2-Row. MO is my favorite malt, so naturally I was interested.

The results of the testing showed a significant result. But if you dug down into the preferences, guess what? Exactly the same number of testers preferred MO as did 2-Row.

Even though the result was "statistically significant," there was no actionable intelligence. If I had a brewery for profit, I'd want to know what most people preferred, so I could brew it. But nothing that came out of the results would have been helpful in that way. I suppose, if it doesn't make a difference, the "actionable intelligence" is to choose the cheaper of the two.

I suppose what it indicates, if you can take it at face value, is that some people like a maltier/fuller/richer flavor (MO), and some like a lighter/crisper flavor (2-Row). Who knew that people vary in what they like? :)
 
This means that if there were no detectable difference between the beers, there was only a 10% chance that 13 or more tasters would get it right. But they did. To translate that into "indicating participants in this xBmt could not reliably distinguish" is misleading as hell.
The worst thing IMHO is their trying to draw general, broad conclusion from a single data-point which is at least bad science and in such a complex system such as beer and the brewing of beer is just laughable.
These are valid points, but you can consider what is presented and still get something useful out of their experiments. Think of it as just one data point. When I try something different, my normal method is to try to compare the experimental beer with one that I brewed, and drank, a few months ago. I just have no practical way to do the side-by-side test like they do. Obviously, side-by-side testing has a big advantage.
 
Brülosophy has helped me realize that good/bad beer is subjective and prattling on about how X makes beer better is just talk.

In this world there’s talkers and there’s doers. Brülosophy is doers. And in the doing, they show how little difference there is in the mole hills we (myself included) make mountains of.
 
And in the doing, they show how little difference there is in the mole hills we (myself included) make mountains of.

No. They really haven't shown that in most cases, if any. Not if you just go by the words they use to summarize the results, which is all most people can do, in the absence of intimate familiarity with the methodology or a concise explanation of what the results really mean (and don't mean).
 
These are valid points, but you can consider what is presented and still get something useful out of their experiments.

You certainly can, if you understand what the numbers actually mean. But why not share that in the write-ups? That would make the information useful to a lot more people.
 
The truth is that perfectly drinkable yet mediocre beer is really easy to make, good beer takes experience and effort, and great beer requires borderline religious fervor and dedication. Nobody ever 'accidentally' composed a masterpiece, they are always the result of practice and continual improvement.

Also, humans as a species have an absolutely pathetic sense of taste, being both incredibly subjective and easily fooled.
 
Brülosophy has helped me realize that good/bad beer is subjective and prattling on about how X makes beer better is just talk.

In this world there’s talkers and there’s doers. Brülosophy is doers. And in the doing, they show how little difference there is in the mole hills we (myself included) make mountains of.


This hurts my eyes to read.

Bad beer exists, it is very real and it flows from the bottles and taps of home and craft breweries all over this fine country.
 
I really enjoy reading the Brülosophy site and occasionally listening to their podcast.
I think a lot of the info is very useful and listening to them can be very entertaining. Unlike other podcasts that are industry centric, I like the fact that it’s centered around home brewing. I don’t give a crap about listening to a podcast based on who is who and who they brew for.

The one major thing I disagree with Brülosophy is the constant drumbeat of drinking fresh beer. I brew mostly lagers, some ales but I never find them to be really good within two weeks or less.

Recently I’ve been using WLP840 American lager yeast and I swear the beers taste bland and flabby until it hits the 6 week mark. Ales are obviously less but still don’t hit their stride until the 4th week or so.

All my beers are kegged so I am sampling throughout.

I occasionally will brew a pale or IPA and those follow the same rule- usually best after 3-4 weeks.
 
"It should be noted that if there were no difference, on average we would expect 9 or 10 of the tasters to make the correct selection. But 13 (or more) did, which if there were no difference, was only 10% likely to happen."

It's been a complaint of mine for years with their exbeeriments. I actually like reading them but wouldn't put any faith in the results. In cases like the one you list, what they've actually shown is that there is quite likely a difference (p<0.10 is quite significant given the small sample size), but it needs a larger sample size (i.e. more tasters) to confirm a difference with statistical significance. Unfortunately, as you say, they imply that it means there's no difference. Really, their studies are at best pilots to see if it's worth studying something properly (even that's a stretch). Don't get me started on the idea that one batch of one beer vs. one batch of another beer can do anything more than prove that those two batches are different (you CAN'T reliably say that the variable is the reason for a difference unless you have multiple batches of each beer).
 
This hurts my eyes to read.

Bad beer exists, it is very real and it flows from the bottles and taps of home and craft breweries all over this fine country.

I know what you’re saying, but one man’s bad beer is another man’s nectar of the Gods. I had a sour beer from McFates that was aged in Chardonnay barrels. I honestly don’t think I have a great palette, but you didn’t have to be a beer sommelier to get malt vinegar. It was absolutely undrinkable. I have in my life not finished 2 beers on purpose and this was one of them. The bartender took the beer back and told me that people come from all around to get this beer. I believe it’s true too, besides the vinegar, there was a strong flavor of marketing.
 
I know what you’re saying, but one man’s bad beer is another man’s nectar of the Gods. I had a sour beer from McFates that was aged in Chardonnay barrels. I honestly don’t think I have a great palette, but you didn’t have to be a beer sommelier to get malt vinegar.

Yep. All of my favourite Brett beers have noticeable malt vinegar in them (not in your face acetic acid, but a background tarty-sweetness). Favourite style = Flanders Red where vinegar is part of the flavour profile. I really enjoy it. To each their own!
 
I have in my life not finished 2 beers on purpose and this was one of them. The bartender took the beer back and told me that people come from all around to get this beer. I believe it’s true too, besides the vinegar, there was a strong flavor of marketing.

I regularly leave unfinished beers, more than I could count. Beer is about the enjoyment of the flavors and craft I'm rarely drinking for the buzz when I'm out which is the only reason to drink meh beer. If a beer isn't well done I would rather spend the calories on something enjoyed.
 
I used to be Mr. “I know good and bad beer”. And I still have an opinion if it’s good or bad for my taste. Each beer has a story and I want to be able to express how I feel about a beer because I love talking about it. I don’t believe most of us can understand a beer after just a sip, at least not in depth. Example: Firestone Walker Mind Haze: I get a really strong rosewater like aroma from this beer. A powerfully flavored IPA is a big plus for a lot of people, but this beer is quite off putting to me. Am I glad I drank the whole beer? Absolutely, in fact, over some weeks I drank a twelver of it. Was the only reason I drank the beer to get drunk absolutely not. Getting drunk on Mind Haze would be a very unpleasant experience for me, but I feel like I have truly experienced this beer and can have an intelligent conversation as to what it’s about. I’m not saying that there aren’t some sort of hypothetical standards, but testing ourselves to understand what those standards are would require far more than hanging out in a brewery and drinking beer with friends and family. Brülosophy does way more than even so called experts do to that end, but even they just touch the tip of the iceberg. Searching for subjective standards has so many opportunity costs that it becomes a game of fighting wind mills.
 
The results of the testing showed a significant result. But if you dug down into the preferences, guess what? Exactly the same number of testers preferred MO as did 2-Row.

This is probably the most important thing that they actually find with every one of their exbeeriments - tasters that can tell the beers apart are always* split with their preferences. That tells us something important - brewers need to stop claiming that one method/ingredient/other makes better beer. Better beer is entirely subjective! Case in point: I love lambics, but many beer drinkers would tip them out in disgust.

*always: means I haven't read one that doesn't, but they may exist.
 
My point here is that average readers don't understand significance, p value numbers, or even the basic fact that in a triangle test a blind guess is expected to be correct 1/3 of the time, and not half. So how could they let people know how to really think about the results? Well, they could append these words (with the appropriate numbers) to every test where p was >= .05...

fwiw, they do talk about the 1/3 thing quite often on the podcast.
 
My 2 cents; Get over yourselves, we're all making beer; some good, some bad. We didn't invent this thing called beer; people have been brewing beer for thousands of years, some good, some bad.

So, good beer or bad beer is just random?
 
Let me preface by saying that I don't think what Brulosophy is doing is wrong/bad. I do think there are flaws in testing that you can't really address without a lot of money behind the testing, which I don't think they have. I personally read (and have done so for many years) Brulosophy for entertainment and brewing interest, with some actual factual information sprinkled in. And really, that's what most of us do when we share our "knowledge" of brewing with others. Depending on our experience, and how much others respect our knowledge of the subject, will influence others and their brewing process/habits. But, I'm the type of person that listens, observes, and tests for myself. I don't normally take things for face value, usually, they just make me think of ways to prove it.

With all that said, I can see how it can stifle someone from exploring the craft on their own because someone already told them "do XYZ, because of ABC result". We can't address how people use the information we provide and we just need to be as transparent as possible with the information presented. I think they should always make sure to say that "this is subjective based on many different sensory abilities".
 
I read through the experiment and I was curious about the adjusted water profile. It is a very low mineral content, and I think it would be somewhat hard to tell them apart. An approach I would like to see is RO water vs. a more minerally water, something like 190 ppm Ca, 10 ppm Mg, 15 ppm Na, 250 ppm SO4 and 150 ppm Cl, which is a water profile I usually use for some beers. It's more " english "y than anything, but I find I enjoy it more than a lower mineral content.
 
Each beer has a story and I want to be able to express how I feel about a beer because I love talking about it.

The story often starts with eyebrow raised at glaring off flavor with feelings of disappointment at dropping $6 on a pint of homebrew. The story finishes with me wishing I had just bought a $9-$12 6 pack of pilsner urquell or saison dupont at the store.
 
I appreciate Brulosophy and accept the limitations of their methods. Their Short and Shoddy method got me back into brewing after a 2 year hiatus. If I had to tolerate the 6-8 hour brew days that the purists advocate, I simply would quit the hobby and go back to buying commercial beer. A 3-hour single-vessel BIAB brew day fits into my schedule and allows me to create good and sometimes great beer that I can share with my friends and family. That's what it's all about for me.
 
After a few years of haggling over it, on January 29, 2016 the ASA (American Statistical Association) finally issued a series of policy statements with regard to the validity of the P<0.05 value. Their issued statements begin with the "Introduction" section found ~40% down the page at this link:

https://www.tandfonline.com/doi/full/10.1080/00031305.2016.1154108

A couple of culled statement points are as follow:

2.P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.

3.Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
Practices that reduce data analysis or scientific inference to mechanical “bright-line” rules (such as “p < 0.05”) for justifying scientific claims or conclusions can lead to erroneous beliefs and poor decision making. A conclusion does not immediately become “true” on one side of the divide and “false” on the other.

5.A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
Statistical significance is not equivalent to scientific, human, or economic significance. Smaller p-values do not necessarily imply the presence of larger or more important effects, and larger p-values do not imply a lack of importance or even lack of effect.

6.By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.
Researchers should recognize that a p-value without context or other evidence provides limited information. For example, a p-value near 0.05 taken by itself offers only weak evidence against the null hypothesis.
 
Last edited:
If P<0.05 is factually meaningful in any way, then why are a plethora of medical/clinical/drug studies often found to be unrepeatable? I came across at least one statistician who tossed out ~25% unrepeatability as typical of such studies that rely upon P<0.05. And one might initially presume that such studies are well controlled.
 
If P<0.05 is factually meaningful in any way, then why are a plethora of medical/clinical/drug studies often found to be unrepeatable? I came across at least one statistician who tossed out ~25% unrepeatability as typical of such studies that rely upon P<0.05. And one might initially presume that such studies are well controlled.

How many studies testing similar (or the same) hypotheses had p values >.05 and, as a result (at least in part), were likely not published? Replication "crises" are prevalent in a lot of scientific domains, at least in part due the publication process itself and the incessant desire for publishing novel/new contributions. But, on the less bleak side, it's part of the natural evolution of a field. Not to argue that we should strive to do bad science, but just that things we think we know often aren't "true", at least unequivocally. I'm an academic psychologist, and one of the watershed moments for each of my Ph.D. students is when we explain that no single study really matters that much in terms of establishing whether a population effect actually exists.

1,500 scientists lift the lid on reproducibility
 
You certainly can, if you understand what the numbers actually mean. But why not share that in the write-ups? That would make the information useful to a lot more people.

Using a generally misunderstood sensory evaluation system means in the conclusions they can give people the information they want to hear and not necessarily what the results actually say.
I wonder why would somebody do that? 🤔
 
Last edited:
Status
Not open for further replies.
Back
Top