Value of brulosophy exbeeriments, others experience, myths and beliefs

Homebrew Talk - Beer, Wine, Mead, & Cider Brewing Discussion Forum

Help Support Homebrew Talk - Beer, Wine, Mead, & Cider Brewing Discussion Forum:

This site may earn a commission from merchant affiliate links, including eBay, Amazon, and others.
By the way, my take-away from the Brulosophy experiments as a whole is that pretty often in brewing "good enough" is good enough. Particularly with mash parameters, fermentation parameters, aeration (hot and cold) and sanitation. ...
+1, which leads us to this: that RECIPE DESIGN is far and away the most important factor for beer success. Specifically,
1) the selection and % of non-base-malts, and
2) yeast selection.

But xbmts typically focus more on process, not recipe (despite recipe being much more important!!).
 
I think they are pretty good about keeping interpretation of the triangle test clear. Does the change result in a beer that can be differentiated by average beer drinkers without relying on visual cues. This is useful information as it allows brewer to make a risk based assessment of changing a process. If risk to the product is high and the savings is marginal don't do it. If risk is low and savings in time or materials are significant might be worth a try.

I saved 10-15 minutes of my brewday based on their boiling with the lid on experiment. I now leave the lid on while bringing kettle to a boil and it gets there much faster. Experiment showed not likely to impact the final product savings looked potentially meaningful, it has changed my brewday.
 
+1, which leads us to this: that RECIPE DESIGN is far and away the most important factor for beer success. Specifically,
1) the selection and % of non-base-malts, and
2) yeast selection.

But xbmts typically focus more on process, not recipe (despite recipe being much more important!!).

haha wow no I really disagree with this. We had a brew club experiment where we all brewed the same recipe. One of the APAs from BCS. Then brought them to a meeting and wow these beers were all over the place. Process is huge it is why so many commercial brewers really don't mind sharing recipes.
 
Good, I'm glad my post didn't come across as a personal attack, as that definitely wasn't the intention.

RE: luck in guessing, that's the reason for including large numbers (within reason, too large of groups can produce artificial significance) and then obtaining a p-value. The odds that everyone is guessing (and getting it correct) become very high once you get p-value below significance. For example, if two people guess correct, that's 1/3 * 1/3 = 1/9, 3 people becomes 1/27, etc. This is obviously oversimplified, but that's why so many of the exbeeriments come back as non-significant (imo).

Oh, I understand this stuff pretty well--I teach it! In fact, I used the exbeeriment comparing Maris Otter and 2-Row in my class in April, to help the students understand the difference between statistical significance and substantive importance. Just because something's significant doesn't mean it's important.

I also use these to demonstrate causality and the search for alternative explanations for results. Sometimes the students pay better attention to the beer examples. :)


We certainly agree on the ingredient ones (I think most feel that way actually). That's also why I don't understand your issue with this specific exbeeriment. You want personal preference, yet we all know everyone's will be different. The important piece is the total number of people who correctly picked the odd beer out. Going past that is simply anecdotal, maybe they should stop including that part?

I'm afraid I disagree--what would be the point if all you could do is say the beers are different? Unless you have a conclusion that points to a better process or ingredient, it's for nothing.

It's like saying (as I would in my class) that there's a statistically-significant difference between men and women with regard to, say, satisfaction with their marriage--and saying nothing more. Without knowing which direction the results point, there's nothing useful there.

I have my own issues with the panels as they're used--we don't know about the reliability of the panel (it's a one-shot tasting!), we don't know who they represent, we don't know their palates, we don't know what they were drinking or eating just prior to tasting.

This is why I'm more interested in the nonsignificant results--people can't tell a difference, which suggests there isn't a meaningful difference between the beers. But even then, testing by me or you is going to be needed, at our local level, to see if it matters to us.
 
haha wow no I really disagree with this. We had a brew club experiment where we all brewed the same recipe. One of the APAs from BCS. Then brought them to a meeting and wow these beers were all over the place. Process is huge it is why so many commercial brewers really don't mind sharing recipes.
ha ha that's funny stuff. But seriously, the xbmts are showing us that as long as you follow good basic practices, many of these details don't matter much. Apparently your brew club ... umm ... [stepping away from the keyboard...]

And don't forget the vast majority of commercial brewers will NOT share their recipes. Or at least their *actual* recipes. ;)
 
Preference was almost 50:50!

Bingo!

In my opinion, too many people are using Xbmt results to try to steer them towards making "better" beer.

That's not the intent of triangle tests.

There's still a taste preference that shouldn't be overlooked, even if an Xbmt results is statistically significant for a perceived difference.
 
Oh, I understand this stuff pretty well--I teach it! In fact, I used the exbeeriment comparing Maris Otter and 2-Row in my class in April, to help the students understand the difference between statistical significance and substantive importance. Just because something's significant doesn't mean it's important.

I also use these to demonstrate causality and the search for alternative explanations for results. Sometimes the students pay better attention to the beer examples. :)




I'm afraid I disagree--what would be the point if all you could do is say the beers are different? Unless you have a conclusion that points to a better process or ingredient, it's for nothing.

It's like saying (as I would in my class) that there's a statistically-significant difference between men and women with regard to, say, satisfaction with their marriage--and saying nothing more. Without knowing which direction the results point, there's nothing useful there.

I have my own issues with the panels as they're used--we don't know about the reliability of the panel (it's a one-shot tasting!), we don't know who they represent, we don't know their palates, we don't know what they were drinking or eating just prior to tasting.

This is why I'm more interested in the nonsignificant results--people can't tell a difference, which suggests there isn't a meaningful difference between the beers. But even then, testing by me or you is going to be needed, at our local level, to see if it matters to us.

We're certainly in agreement here, I think all the exbeeriments are worthy of consideration, whether significance is "achieved" or not.

Would it be wonderful to have a consistent panel of 50 tasters ranging in skill level that get to sample the beer 3x a day for a week? Well sure! That's obviously a stretch, but I don't think its fair to criticize their process in this manner. You might disagree.

Correct me here if I'm wrong, but it sounds like you take issue with the exbeeriments that achieve significance, but not those that don't? That seems awfully selective if so. Please, point me to a beer blog that publishes twice a week like clockwork and has more scientific rigor than this one.
 
We're certainly in agreement here, I think all the exbeeriments are worthy of consideration, whether significance is "achieved" or not.

OK. They're all worthy of consideration; the question is what you do once you consider them.


Would it be wonderful to have a consistent panel of 50 tasters ranging in skill level that get to sample the beer 3x a day for a week? Well sure! That's obviously a stretch, but I don't think its fair to criticize their process in this manner. You might disagree.

Why not? I don't doubt their sincerity, and I appreciate the effort in trying to shed light on home brewing. But that sounds like grading someone on effort, not what they produce, and I never, not ever, give students credit for effort.

The fact we like the Brulosophy folks and appreciate their effort should not stand in for a critical eye on just what the results tell us.

Correct me here if I'm wrong, but it sounds like you take issue with the exbeeriments that achieve significance, but not those that don't?

I'm not sure that take issue is the right term. I'm trying to decide what, if any, actionable intelligence comes out of any of this. If that's not what you're interested in, that's fine, I'm all for people getting out of this whatever makes them happy. But we should not draw certain conclusions given the difficulties inherent in the approach.

I'll try to say it differently, perhaps I wasn't as clear as I could be before. There are a host of possible alternative explanations for the significant results--and in the end, if we can't have confidence that those alternative explanations have been eliminated, then we--perhaps just me--don't have much confidence in the results. Significance, by itself, tells us very little. As I noted above, it could be panel composition, what people did before they tasted the beers, what their palates are like, what the panel generalizes to, whether the tasters like the beer, etc. etc. etc. as possible reasons for the outcomes. If the results are overwhelming, that's different, but almost none of them are.

The null hypothesis here is that there is no difference between batches. Non-significant results mean we don't reject that hypothesis, leaving us to (lightly) conclude that perhaps there's no difference. The concerns above about panel composition and so on are less important--differences in the panel aren't going to account for not finding a significant difference, if that makes sense. So the non-significant results are a little more believable.

Further, even when people say they can perceive a difference and even when one is preferred over the other, we just don't know what that means.


That seems awfully selective if so.

I don't think so. I think I'm applying statistical, measurement, and scientific principles to puzzle this out. That's what I do. It may appear harsh, but as I've always said, this is not an indictment of Marshall or the Brulosophy people. You do what you can with what you have.

Please, point me to a beer blog that publishes twice a week like clockwork and has more scientific rigor than this one.

I cannot, but so what? If what you want is for me to say this is the best there is "out there," well, it would appear so. That's not the same as saying it passes all the demands of high-level, believable research, which it does not.
 
RE: luck in guessing, that's the reason for including large numbers (within reason, too large of groups can produce artificial significance) and then obtaining a p-value. The odds that everyone is guessing (and getting it correct) become very high once you get p-value below significance. For example, if two people guess correct, that's 1/3 * 1/3 = 1/9, 3 people becomes 1/27, etc. This is obviously oversimplified, but that's why so many of the exbeeriments come back as non-significant (imo).

Warning: pedantic stats instructor correction ahead.

You are stating the logic of the hypothesis test backwards. The null hypothesis supposes that everyone is guessing with a probability of getting the correct answer of 1/3 (see also my earlier response for another justification based on randomized cups). A small p-value/significant result makes no statement about the odds/probability/likelihood that people are guessing. Rather the opposite: if it were true that people are guessing, the probability of seeing a result this large or larger is equal to the p-value.

If this probability is very small, we would then be skeptical of the original claim: that all subjects are guessing. The p-value, however, can't be called a probability that subjects are not guessing as it is formed by supposing that they are. The logic may seem a little twisted, but I like to think of it like a proof by contradiction or evaluating the consequences of an argument. We suppose something, see where it leads, and if the results seems impossible or unlikely, we go back and challenge our original assumptions.
 
Isn't this the basis of the series of short and shoddy experiment? At least in that one they showed the instrument was able to discern between a beer made with a starter + 60 min mash and boil + active temperature control fermentation from a beer made with one smack pack + 30 min mash and boil + ambient temperature control.

I think this got to one of mongoose's main critiques. It seemed that a lot of the individual experiments didn't show a significant change. However, when you compare a beer made with 4 "generally accepted best practices" vs a beer made with 4 "shortcuts", then it *is* different enough that tasters can perceive it.

I.e. one critique is that individual process steps may have effects that are below the taster's detection threshold, but if brewers start taking that as gospel and deciding they can make the "short & shoddy" beer without having an impact on their brew, it's incorrect. That we shouldn't look at the result of one experiment and use it to determine that brewer's best practice is incorrect just because no difference is observed.

OK. They're all worthy of consideration; the question is what you do once you consider them.

I think I get at the heart of what you're saying in my above paragraph. The experiments are all interesting, but as you initially said you saw brewers in the comments saying "oh well I guess I can stop doing process X now!" they were making changes based on information that was FAR too incomplete to justify a change if they were happy with their beer.

I'll try to say it differently, perhaps I wasn't as clear as I could be before. There are a host of possible alternative explanations for the significant results--and in the end, if we can't have confidence that those alternative explanations have been eliminated, then we--perhaps just me--don't have much confidence in the results. Significance, by itself, tells us very little. As I noted above, it could be panel composition, what people did before they tasted the beers, what their palates are like, what the panel generalizes to, whether the tasters like the beer, etc. etc. etc. as possible reasons for the outcomes. If the results are overwhelming, that's different, but almost none of them are.

This is where I think you go too far. I.e. the short & shoddy experiment, it was DEFINITELY significant. However, a 6-2 split between the preference test you said wasn't enough to justify that perhaps the 4 "generally accepted best practices" were better than the "shortcuts", and in that case I disagree.

You discount things like preference TOO much. Yes, while I'll agree that we can't give preference too much weight, we shouldn't ignore it either. Human tasters (especially inexperienced and non-trained tasters) are an imprecise tool. Often they may not fully understand the root of their preferences. But that doesn't mean their preferences don't have weight.

Here's why: we're making beer for human consumption. These are people who like and drink beer, so it stands to reason that when we're talking about process, they're more often than not going to prefer the objectively better beer. I.e. their preferences give us a reasonable idea which processes are more likely to help us brew "better" beer. The vast majority of tasters don't like off flavors (hence why they're called off flavors).

One could reasonably suggest that the reason the results were 6-2 in favor of good practices was that the short and shoddy beer had some objective flaws. Is it close? Yes, because we're talking about a brewer with a lot of experience and who probably has his other processes (i.e. sanitation, oxidation, etc) really well tuned. So even if he took the short and shoddy approach, it was probably still not "bad" beer. But 6-2 may be enough to call good processes objectively better.

Sometimes that leads down wrong paths, yes. I.e. the W-34/70 fermented cold vs warm might suggest that the tasters prefer esters to clean lager styles, not that W-34/70 will produce a great clean lager at 82F. And so higher sample sizes and more replication is important, and a critical eye at *interpreting* the results in context.

But when you impugn the concept of tasters based upon mere questions of panel construction, what they did before the tasting, etc, I think you're basically saying that you throw out the human tasting aspect of it entirely as a guide. But there's not necessarily evidence that drinking an IPA an hour before tasting the best practices vs the short & shoddy beer will affect whether or not you can perceive that one is cleaner than another. No taster is ever a fully pristine slate. But human tasters are still able to do a LOT of things better than running beer through a spectrometer to determine it's composition. Humans can tell you whether it tastes good to a human palate. And that is important data, even if it's somewhat noisy.
 
Warning: pedantic stats instructor correction ahead.

You are stating the logic of the hypothesis test backwards. The null hypothesis supposes that everyone is guessing with a probability of getting the correct answer of 1/3 (see also my earlier response for another justification based on randomized cups). A small p-value/significant result makes no statement about the odds/probability/likelihood that people are guessing. Rather the opposite: if it were true that people are guessing, the probability of seeing a result this large or larger is equal to the p-value.

If this probability is very small, we would then be skeptical of the original claim: that all subjects are guessing. The p-value, however, can't be called a probability that subjects are not guessing as it is formed by supposing that they are. The logic may seem a little twisted, but I like to think of it like a proof by contradiction or evaluating the consequences of an argument. We suppose something, see where it leads, and if the results seems impossible or unlikely, we go back and challenge our original assumptions.

Thank you for the nice (correct) explanation. Funny how much trouble is had in understanding p-values and what they really mean!
 
Thank you for the nice (correct) explanation. Funny how much trouble is had in understanding p-values and what they really mean!

For sure. I've made these kinds of mistakes more times than I care to admit. Without going down the rabbit hole of statistical schools of thought, I will say one of the best criticisms of the use of p-values, in my opinion, is that they are often exactly the opposite of what people really want: the probability that the hypothesis is true, given the data. As noted, the p-values value assumes the hypothesis and gives the probability of the the data.
 
I live in a remote area where supplies and Homebrewers are in extremely rare supply. Brulosophy has taught me more about brewing beer than anyone else has. My beer has never been easier to make and has never tasted better. I don't know where I would be without Marshall and his crew.
 
I think this got to one of mongoose's main critiques. It seemed that a lot of the individual experiments didn't show a significant change. However, when you compare a beer made with 4 "generally accepted best practices" vs a beer made with 4 "shortcuts", then it *is* different enough that tasters can perceive it.

I.e. one critique is that individual process steps may have effects that are below the taster's detection threshold, but if brewers start taking that as gospel and deciding they can make the "short & shoddy" beer without having an impact on their brew, it's incorrect. That we shouldn't look at the result of one experiment and use it to determine that brewer's best practice is incorrect just because no difference is observed.



I think I get at the heart of what you're saying in my above paragraph. The experiments are all interesting, but as you initially said you saw brewers in the comments saying "oh well I guess I can stop doing process X now!" they were making changes based on information that was FAR too incomplete to justify a change if they were happy with their beer.



This is where I think you go too far. I.e. the short & shoddy experiment, it was DEFINITELY significant. However, a 6-2 split between the preference test you said wasn't enough to justify that perhaps the 4 "generally accepted best practices" were better than the "shortcuts", and in that case I disagree.

You discount things like preference TOO much. Yes, while I'll agree that we can't give preference too much weight, we shouldn't ignore it either. Human tasters (especially inexperienced and non-trained tasters) are an imprecise tool. Often they may not fully understand the root of their preferences. But that doesn't mean their preferences don't have weight.

Here's why: we're making beer for human consumption. These are people who like and drink beer, so it stands to reason that when we're talking about process, they're more often than not going to prefer the objectively better beer. I.e. their preferences give us a reasonable idea which processes are more likely to help us brew "better" beer. The vast majority of tasters don't like off flavors (hence why they're called off flavors).

One could reasonably suggest that the reason the results were 6-2 in favor of good practices was that the short and shoddy beer had some objective flaws. Is it close? Yes, because we're talking about a brewer with a lot of experience and who probably has his other processes (i.e. sanitation, oxidation, etc) really well tuned. So even if he took the short and shoddy approach, it was probably still not "bad" beer. But 6-2 may be enough to call good processes objectively better.

Sometimes that leads down wrong paths, yes. I.e. the W-34/70 fermented cold vs warm might suggest that the tasters prefer esters to clean lager styles, not that W-34/70 will produce a great clean lager at 82F. And so higher sample sizes and more replication is important, and a critical eye at *interpreting* the results in context.

But when you impugn the concept of tasters based upon mere questions of panel construction, what they did before the tasting, etc, I think you're basically saying that you throw out the human tasting aspect of it entirely as a guide. But there's not necessarily evidence that drinking an IPA an hour before tasting the best practices vs the short & shoddy beer will affect whether or not you can perceive that one is cleaner than another. No taster is ever a fully pristine slate. But human tasters are still able to do a LOT of things better than running beer through a spectrometer to determine it's composition. Humans can tell you whether it tastes good to a human palate. And that is important data, even if it's somewhat noisy.

There are certain foods that accentuate or mask certain flavors, that's not really something that can be argued. Certain foods can make you perceive more sweetness, while others can completely mask flavors. If one person comes straight from dinner and tests samples and another made sure they cleaned their palate the results are compromised. I think most people that have a scientific background see there's not nearly enough control in these experiments to be able to take away anything other than maybes.

And human tasters can't really tell you anything about what a beer tastes like to your palate. But actual real data can tell you what effects changes on the process have to the beer. I don't care what can be perceived by a group of people, I care if I can perceive it. These experiments tell me nothing about changes to the beer and if I could perceive it.
 
We're certainly in agreement here, I think all the exbeeriments are worthy of consideration, whether significance is "achieved" or not.

Would it be wonderful to have a consistent panel of 50 tasters ranging in skill level that get to sample the beer 3x a day for a week? Well sure! That's obviously a stretch, but I don't think its fair to criticize their process in this manner. You might disagree.

Correct me here if I'm wrong, but it sounds like you take issue with the exbeeriments that achieve significance, but not those that don't? That seems awfully selective if so. Please, point me to a beer blog that publishes twice a week like clockwork and has more scientific rigor than this one.

If I run an experiment and don't measure anything at all, then run a second experiment and only measure temperature is the second experiment valid because it's more scientifically rigorous? Of course not, it's still a bad experiment. Running an experiment once, poorly controlling the variables, not testing results outside of poorly controlled test groups, none of this can really be called scientific rigor.

For some reason people are attracted by the "science" of these experiments and passionately defend the scientific validity of them. But as someone who gets paid to do research and perform experiments, these type of methods would tell me 0 about any work I was doing if this is how I approached my research. I'll probably get push back for saying this, but if someone is going to try and present work as having scientific rigor it should stand up to the same scrutiny and be treated to the same rigor actual research is, and this work doesn't hold up to that.

Another poster pointed out what these videos are great for, learning how to brew.
 
I think this got to one of mongoose's main critiques. It seemed that a lot of the individual experiments didn't show a significant change.

No......the data showed that the number of tasters correctly picking the odd one out did not rise to the number necessary to denote statistical signficance at the .05 level.



However, when you compare a beer made with 4 "generally accepted best practices" vs a beer made with 4 "shortcuts", then it *is* different enough that tasters can perceive it.

When you say "tasters" you mean just those that could, and include those who were guessing, correct? One of the things I look at with results like these is not only how many supposedly could discern a difference, but also how many were unable to do so.


I.e. one critique is that individual process steps may have effects that are below the taster's detection threshold, but if brewers start taking that as gospel and deciding they can make the "short & shoddy" beer without having an impact on their brew, it's incorrect. That we shouldn't look at the result of one experiment and use it to determine that brewer's best practice is incorrect just because no difference is observed.

This is why I wonder about the cumulative effect of small differences that each are below the level of perception but when added together rise to a level that can be perceived.


I think I get at the heart of what you're saying in my above paragraph. The experiments are all interesting, but as you initially said you saw brewers in the comments saying "oh well I guess I can stop doing process X now!" they were making changes based on information that was FAR too incomplete to justify a change if they were happy with their beer.

Actually I think it was someone else who said that.


This is where I think you go too far. I.e. the short & shoddy experiment, it was DEFINITELY significant. However, a 6-2 split between the preference test you said wasn't enough to justify that perhaps the 4 "generally accepted best practices" were better than the "shortcuts", and in that case I disagree.

Sure, there's a small difference. But one thing I do is also look at those who could not discern a difference. The results were 6-2-4. Not quite as rock solid as 6-2 appears.

One way I suggest people think about these kinds of results is to assign a dollar value to them. In other words, would you bet $1000 that the results are correct? $100? $10? $5? While not "signifcance" such a thought experiment helps us think about how confident we are in the results. I'm at about $20 in this one. I surely would not bet $1000 on the accuracy of the results.

You discount things like preference TOO much. Yes, while I'll agree that we can't give preference too much weight, we shouldn't ignore it either. Human tasters (especially inexperienced and non-trained tasters) are an imprecise tool. Often they may not fully understand the root of their preferences. But that doesn't mean their preferences don't have weight.

Well, we'll have to disagree on that. I'm not sure what preference means. Only if there's an overwhelming majority on one side or another will I be more confident in the conclusion. Say, if 15 people could discern a difference and the preferences were split 14-1. That seems fairly convincing. A split of 9-6? Not so much. But that's me, you may think differently.

Here's why: we're making beer for human consumption. These are people who like and drink beer, so it stands to reason that when we're talking about process, they're more often than not going to prefer the objectively better beer. I.e. their preferences give us a reasonable idea which processes are more likely to help us brew "better" beer. The vast majority of tasters don't like off flavors (hence why they're called off flavors).

You lost me at "objectively better beer." Yes, if the preferences align such that a large majority (14-1, say) agree one is better than the other, it seems also likely that I'd agree with them if sampling the same beer. But when it comes out as 6-2 with 4 expressing no preference, now I'm not so sure.

One could reasonably suggest that the reason the results were 6-2 in favor of good practices was that the short and shoddy beer had some objective flaws. Is it close? Yes, because we're talking about a brewer with a lot of experience and who probably has his other processes (i.e. sanitation, oxidation, etc) really well tuned. So even if he took the short and shoddy approach, it was probably still not "bad" beer. But 6-2 may be enough to call good processes objectively better.

Again, you're leaving out the 4 who had no preference.

One of the beauties of these things--both the exbeeriments and HBT--is that nobody has to agree with anyone else, and we all can use whatever processes we want. We get to make our own decisions!

If, for you, a 6-2 split with 4 no preferences is enough to make a conclusion, go for it. I've used Brulosophy results--the trub- or no-trub exbeeriments led me to try it myself, and I could not tell any difference. The exbeeriments can certainly give us ideas to try and those that make sense to us should be tried if possible.


Sometimes that leads down wrong paths, yes. I.e. the W-34/70 fermented cold vs warm might suggest that the tasters prefer esters to clean lager styles, not that W-34/70 will produce a great clean lager at 82F. And so higher sample sizes and more replication is important, and a critical eye at *interpreting* the results in context.

Bingo! As we cannot know what the tasters are perceiving, then what do the results indicate? And if you get a panel of tasters for whom those flavors are more important, then there you are.

But when you impugn the concept of tasters based upon mere questions of panel construction, what they did before the tasting, etc, I think you're basically saying that you throw out the human tasting aspect of it entirely as a guide.

No, as I have noted above, if the split is 14-1 in preference to one over the other, that's a stronger result than 9-6.....or 6-2-4.

And if you think I'm "impugning" the panel on the basis of panel construction, what they did before tasting, etc......well, maybe. I just don't know the answer to those questions, and neither does anyone else.

But there's not necessarily evidence that drinking an IPA an hour before tasting the best practices vs the short & shoddy beer will affect whether or not you can perceive that one is cleaner than another. No taster is ever a fully pristine slate. But human tasters are still able to do a LOT of things better than running beer through a spectrometer to determine it's composition. Humans can tell you whether it tastes good to a human palate. And that is important data, even if it's somewhat noisy.

We're going to have to disagree on this. You are setting up straw men and using them to buttress your point. I'm going to knock them down right now.

No, there's no evidence that drinking an IPA an hour (what about 5 minutes) will affect perception. But there's also not evidence that it doesn't. And we don't know what they did before taste testing, which is the point. You don't know. I don't know. And as a scientist, I'm trained to think about what could cause the results to be false. This is one of many things that could cause them to be false.

And while I can tell you if something tastes good to a human palate, we can't know how universal that opinion is. That's the point here.

***************************

Good science is organized skepticism. We try to disprove things in science because as it turns out we cannot really prove anything. That's the whole basis of null hypothesis testing.

We're looking at potential causal processes here. Change this variable and what happens, if anything? To demonstrate causality you need to show correlation (differences with and without the experimental treatment), time order (cause precedes the effect), and nonspuriousness (there are no other explanations for the results).

My comments in this thread focus primarily on whether we're really measuring something useful, and whether there are other explanations for the results.

*****************

Again, bully for Marshall and his associates for trying to bring data to bear on issues related to homebrewing. Everyone can use or not use those results as they see fit. If using them results in better beer, I only hope that someday I get to enjoy one with you.
 
If I run an experiment and don't measure anything at all, then run a second experiment and only measure temperature is the second experiment valid because it's more scientifically rigorous? Of course not, it's still a bad experiment. Running an experiment once, poorly controlling the variables, not testing results outside of poorly controlled test groups, none of this can really be called scientific rigor.

For some reason people are attracted by the "science" of these experiments and passionately defend the scientific validity of them. But as someone who gets paid to do research and perform experiments, these type of methods would tell me 0 about any work I was doing if this is how I approached my research. I'll probably get push back for saying this, but if someone is going to try and present work as having scientific rigor it should stand up to the same scrutiny and be treated to the same rigor actual research is, and this work doesn't hold up to that.

Another poster pointed out what these videos are great for, learning how to brew.

Expectations. That's really what I think the large majority of this debate comes down to. I don't believe any of the contributors are doing this as a full-time job, so you can only expect so much. Of course there are many areas where improvement could be made to increase rigor, but to me, they have exceeded what anyone else has done with respect to approaching brewing from a scientific perspective. This doesn't absolve them from improving, but I believe they have been if you follow the timeline of their experimentation.

You put science in quotations, as if there is some threshold that must be achieved before you can use that term. Science is knowledge (ask someone who knows Latin). I doubt even their most ardent critic would say they hadn't contributed to homebrewing knowledge.
 
The attempts to precisely quantify every aspect of brewing reminds me a lot of the widely varying views of audiophiles. There was a famous electrical engineer whose lab tests of audio equipment were very controversial. He was widely derided for his view that two amplifiers that measure the same will sound the same. This opinion ignored the fact that the effect of the appurtenant equipment will introduce other variables, and that there may be factors involved that can't be measured.

Some audiophiles become so involved in the hobby that they can even hear a difference between two different sets of speaker cables. I have experienced this in my own system so I know it's true, but many people scoff at the idea.

Scientists are very critical of the reviews in audio magazines, insisting that they don't follow proper control procedures, etc., but the human factor always stymies the process. Naturally, you wouldn't want someone evaluating audio equipment who just left their job at a stamping plant, or someone with a cold, but it's not possible to eliminate all such variables. Another major roadblock to objective observation is the fact that audio memory is imprecise, so even an A/B test would be a subjective comparison.

The common thread between audio and brewing is that the final result depends on a sensory observation that is unique to each person. A person's sense of taste can change due to any number of factors including subtle odors in the room or even the use of medications.

As brewers we're all experimenting in an effort to produce better beer, but the vast majority of us don't come close to the taste tests that the Brulosopher performs, regardless of how imperfect they may be. The Brulosophy exBEERiments may not necessarily result in "hard" data, but they do suggest what we may expect in our own experimentation. The real value is that they are being shared with us, and I appreciate that.
 
I honestly don't find most of the experiments to be that helpful in brewing. Absolutely nothing can replace firsthand experience. unless you use the same equipment, process, and ingredients as a particular experiment, you really cant pull any generalized rules from them.

I do see a TON of generalization online, citing the brulosophy blog as a source or proof for arguments, and giving advice for which the experiment that may be related but not necessarily applicable. Even the authors are guilty of it. my own personal bias about blogs aside, I think a lot of brewers use the blog as a substitute for personal experience, which is a mistake.
 
Some audiophiles become so involved in the hobby that they can even hear a difference between two different sets of speaker cables. I have experienced this in my own system so I know it's true, but many people scoff at the idea.

I do accept that two different sets of cables can make a difference, but generally I think that most audiophiles go well beyond this statement into lunacy.

I think basically speaker wire falls into two categories:

  1. Good enough for the power it carries and the distance of the run.
  2. Not good enough for the power/distance it runs.

As an electrical engineer, I think there's really no evidence that the properties of copper wire change appreciably between two different brands given that it's of sufficient gauge to not be "lossy" over the length of the run. These things just aren't that complex.
 
I do accept that two different sets of cables can make a difference, but generally I think that most audiophiles go well beyond this statement into lunacy.

I think basically speaker wire falls into two categories:

  1. Good enough for the power it carries and the distance of the run.
  2. Not good enough for the power/distance it runs.

As an electrical engineer, I think there's really no evidence that the properties of copper wire change appreciably between two different brands given that it's of sufficient gauge to not be "lossy" over the length of the run. These things just aren't that complex.

Haha! I knew I'd get myself in trouble with that analogy, and yes, I do agree that some audio claims do qualify as lunacy. There are endless arguments about this on the audio websites, so I don't want to go there, but I too was a skeptic and I'm a believer now.

One of the arguments I've heard, in regard to speaker wire, is the size of the strands and how they're wrapped can cause different interactions between adjacent strands, much like arcing, that can cause distortions in the audio signal at different frequencies.

An example of this effect may be exhibited in the different sound of the sought-after old guitar pickups from the 50's and 60's. Current day manufacturers would attempt to duplicate them on machines with the same type and gauge wire, same magnets, same number of turns, but they just didn't have the same sound. When they hand-wound them in the imperfect, uneven way the originals were wound, they got that sweet, coveted distortion guitar players were looking for. Electrically, I'm sure they measure the same, but they sound different.

At any rate, some people will never hear this sort of subtle difference just as some of the beer tasters may not pick up on subtle differences in taste.
 
I've always thought the Brulosophy guys have done a pretty decent job of caveating all of their exbeeriments in almost all of their posts. The latest, covering the 150+ that they have done, went out of its way to point out these issues. These guys aren't professional scientist, but are just beer geeks asking questions and coming up with one off answers (that they repeatedly state isn't necessarily THE answer). Seems like some people just want to criticize, poke holes, burn strawmen, etc. If it's so bothersome, go ahead and establish your own methodology to be more scientifically "stringent", brew away and publish the results. From my aspect, it provides some interesting results that I may or may not incorporate. They are having fun, brewing beer they like and having some good discussions based on results that are not scientific from an academic perspective, but probably more robust that Joe Smoe saying I did this so it is the way. I look forward to their articles and mesh that with all of the other input I get from my brewing, reading and research.

Sorry if it sounds like I'm bustin' anyone's chops, but sometimes we (myself included) need to lighten up and remember, no one is going to die because of a failed chi squared, p-test or other statistical measure. It's about making better beer that WE (hobbyists) like, maybe learning something and having a bit of fun. Where does the science end and the art begin? Or vice-versa.
 
Well, it *is* a thread to discuss the issues of methodology and results. I don't know how to do that without carefully examining what they do and where the holes in the research are, if any.

The funny thing is that I think the processes of doing the exbeeriments are pretty decent. Pains are taken to make the two batches as equivalent as they can be, except for the variable at issue.

The problem is what the results mean. In the end, everybody can draw their own conclusions about them.
 
I give mad props to Marshall and his team. Brulosophy is my fav brewing website. I love it and look forward to all their postings.
I started brewing beer in an area with almost zero Homebrewers. I had to learn literally everything I know about brewing from the web. Almost all my brewing skills come from brulosophy, and I wouldn’t be making killer beer today, stoking my friends and neighbors, and winning competitions without that website.
I’ve written Marshall personally and asked him questions. He always takes the time to write me back. My current process of brewing is simple, easy and straightforward. I’ve cut my brew day down to 4 hours from start to finish.
I feel that in this hobby, you get out what you put in. Brulosophy has shown me that a lot of firmly held beliefs about brewing don’t always translate to the homebrew scene, and for that I’m always gonna be grateful.
 
For sure. I've made these kinds of mistakes more times than I care to admit. Without going down the rabbit hole of statistical schools of thought, I will say one of the best criticisms of the use of p-values, in my opinion, is that they are often exactly the opposite of what people really want: the probability that the hypothesis is true, given the data. As noted, the p-values value assumes the hypothesis and gives the probability of the the data.

My favorite is this:

If you decide that there is an effect, then there's a 5% chance (assuming you used alpha 0.05) that you've made the incorrect decision. So, of course there must then be a 95% chance that you've made the correct decision. Since you had to have done one or the other, those probabilities have to add to 100%. How could it possibly be otherwise? But no.
 
I honestly don't find most of the experiments to be that helpful in brewing. Absolutely nothing can replace firsthand experience. unless you use the same equipment, process, and ingredients as a particular experiment, you really cant pull any generalized rules from them.

I do see a TON of generalization online, citing the brulosophy blog as a source or proof for arguments, and giving advice for which the experiment that may be related but not necessarily applicable. Even the authors are guilty of it. my own personal bias about blogs aside, I think a lot of brewers use the blog as a substitute for personal experience, which is a mistake.
There is a TON of generelization on both sides of the "brulosophy divide" its fair to say. Whether people quote brulosophy or 100 year old brewing tradition. Neither might be accurate advice today.

By design, laziness, or accident, my process has become rather simplified. Some, thanks to stuff i have either read, or later "justified", with brulosophy experiments. Some on advice provided by the people on this forum, and reading elsewhere, which i rather wanted to believe, and or try.

In the end, most of us will only get to taste OUR beer results ourselves, and perhaps share with a few friends or family. Ive tasted other peoples home brew that was simply terrible. It was actually offered to me, so they must have thought it Ok. My beer i swapped for theirs, they could not get over. It was clearly superior. I am intrigued to see just how good other peoples home brew might taste. I have a sneaking suspicion id be criticsl.

Ive also bought craft beer in bars etc., thats just not tasted great, but they've clearly taken pains to make a presentable product. Hit all their numbers, just not a recipe i like. A commercial or craft brewer is often at the mercy of the distribution system, or bar owner, in terms of quality getting to the taste they brewed to the consumer imho, and that often fails where i live.

I think you can cut some corners in the home brew process, that are simply being overstated as important process at times. Often because the method simply mirrors that of a commercial brewer seeking consistency, more than better taste, or just because thats how everyone says its been done. Its just another anecdotal response, but ive done many of the things youre not supposed to do, based on brewers lore, and got away with still making a beer that is more than adequate. Much of it, Brulosphy experiments validate.

In its defence, everything brulosphy does is there for all to see and reference. They are completely neutral with regards the interpretation of results. Whether its pitch temps, frementation temps, mash control, their list of experiments is impressive. Beyond taking a response from the forum, which may be repeated several times as gospel, its just another source of info.

Whether right or wrong, scientific or not, i find their results quite compelling. Until someone else takes time to do the sheer volume of experiments that do, then itll continue to be so.
 
Last edited:
brewing is an art, not a science...unless you desire to be the next Anheuser-Busch.

I see lots of snide comments on other threads slamming this guy. Well at least he's doing something...step up to the plate and make a better blog yourself...

I find some of his things interesting. Some so-so. Some rather irrelevant. But if nothing else, it does makes you think about...BEER...
 
Is there a means of reaching out to Brulosophy whereby to offer suggestions as to potential future exbeeriments?
 
Is there a means of reaching out to Brulosophy whereby to offer suggestions as to potential future exbeeriments?

Probably their Patreon page would get you the most consideration. Just kidding...sort of. Am sure they are happy enough to consider suggestions but believe they get way more suggestions than they can possibly run. Haha @Holden Caulfield beat me to it.

The bottleneck isn't the cash flow though. Takes time to make the batches and effort to find a tasting panel. There are plenty of skeptics that will tell you why their experiments are wrong/garbage/useless but I've never seen the skeptics offer a better designed experiment that they ran themselves. I've read a few posts of skeptics who gave it try and then realized it is a lot harder to do than it sounds. Hope the Brulosphers are able to get back to tasting panels soon but when they do hope they keep the brewer's 10 trials as part of the evaluation.
 
They are completely neutral with regards the interpretation of results.

They are absolutely not completely neutral with regards to the interpretation of results. In experiments where they find a P-value of, say, 0.06, they use these standard words: "... indicating participants in this xBmt were unable to reliably distinguish..."

A p-value of 0.06 means that if there were actually no detectable difference, there was only a 6% chance that as many (or more) tasters would have chosen the odd beer as actually did. Does that sound to you like there's very likely not a difference?

They absolutely could be completely neutral if they instead said "Results indicate that if there were no detectable difference between the beers, there was a 6% chance of "X" <fill in the blank> or more tasters identifying the beer that was different." That would be neutral and accurate.
 
brewing is an art, not a science...unless you desire to be the next Anheuser-Busch.

I see lots of snide comments on other threads slamming this guy. Well at least he's doing something...step up to the plate and make a better blog yourself...

I find some of his things interesting. Some so-so. Some rather irrelevant. But if nothing else, it does makes you think about...BEER...

Yep, no shortage of brulosophy nea sayers on many a brew forum.

...

The bottleneck isn't the cash flow though. Takes time to make the batches and effort to find a tasting panel. There are plenty of skeptics that will tell you why their experiments are wrong/garbage/useless but I've never seen the skeptics offer a better designed experiment that they ran themselves. I've read a few posts of skeptics who gave it try and then realized it is a lot harder to do than it sounds. Hope the Brulosphers are able to get back to tasting panels soon but when they do hope they keep the brewer's 10 trials as part of the evaluation.

And this...... indeed..... show me the money.
 
They are absolutely not completely neutral with regards to the interpretation of results. In experiments where they find a P-value of, say, 0.06, they use these standard words: "... indicating participants in this xBmt were unable to reliably distinguish..."

A p-value of 0.06 means that if there were actually no detectable difference, there was only a 6% chance that as many (or more) tasters would have chosen the odd beer as actually did. Does that sound to you like there's very likely not a difference?

They absolutely could be completely neutral if they instead said "Results indicate that if there were no detectable difference between the beers, there was a 6% chance of "X" <fill in the blank> or more tasters identifying the beer that was different." That would be neutral and accurate...
i am not treating their reports as scientific proof. Merely validating - along with other anecdotal evidence of my own, and other posters - that a process i use can be less stringent - less critically controlled - less important - than people might otherwise want me to believe. I am the ultmate arbiter, as explained, and thats my experience.

I invite everyone to test the limits of brewing outside some of the guidelines. Im not here to argue it.
 
i am not treating their reports as scientific proof. Merely validating - along with other anecdotal evidence of my own, and other posters - that a process i use can be less stringent - less critically controlled - less important - than people might otherwise want me to believe. I am the ultmate arbiter, as explained, and thats my experience.

I invite everyone to test the limits of brewing outside some of the guidelines. Im not here to argue it.

That's cool. But that in no way supports your claim that "They are completely neutral with regards the interpretation of results," which is all that I addressed.
 
That's cool. But that in no way supports your claim that "They are completely neutral with regards the interpretation of results," which is all that I addressed.
EDIT: Neutral in that they have nothing to gain by how they intepret the result. If their mathematical process is incorrect, then i cant comment.

They make people taste the same beer made two different ways. They have a panel of 10-30 people taste it. They try it themselves, knowing / hoping there should be a difference. Sometimes there is, sometimes there isnt. They appear to be sincere in what they attempt to do. Certainly have no cross to bare, and indeed often state they are surprised by their findings.

If the number who can tell a difference is nearly half, or even a third, i might think its significant. More significant than the Brulosophy folks do at times. Its up to me what i decide, and how i interpret their findings. How I decide to proceed.

Few other websites offer the same levels or array of home brew testing experiments. Few on the HBT forum can offer more compelling evidence than their own experiences. I see no difference in using either as a source of information.
 
Last edited:
They make people taste the same beer made two different ways. They have a panel of 10-30 people taste it. They try it themselves, knowing / hoping there probably should be a difference. Sometimes there is, sometimes there isnt. They appear to be sincere in what they attempt to do. Certainly have no cross to bare, and indeed often state they are surprised by their findings.

If the number who can tell a difference is nearly half, or even a third, i might think its significant. More significant than the Brulosophy folks do at times. Its up to me what i decide, and how i interpret their findings. How I decide to proceed.

Few other websites offer the same levels or array of home brew testing experiments. Few on the HBT forum can offer more compelling evidence than their own experiences. I see no difference in using either as a source of information.

Ok, if you're not going to address what I actually said, or answer the easy yes/no question that I put to you, there's really no reason to discuss any further. Peace out.
 
Ok, if you're not going to address what I actually said, or answer the easy yes/no question that I put to you, there's really no reason to discuss any further. Peace out.
I did edit my comment above......
 
Ok, if you're not going to address what I actually said, or answer the easy yes/no question that I put to you, there's really no reason to discuss any further. Peace out.

I did edit my comment above......

EDIT: Neutral in that they have nothing to gain by how they intepret the result.

Good, now we're getting somewhere. Do you think it's possible that by not making sure that readers know how to correctly interpret the P-values (which most do not), the result is that many of those readers take the words "... indicating participants in this xBmt were unable to reliably distinguish..." as an indication that there's likely no difference? If not, why not?
 
Brulosophy is cool. They come up with some good experiments on occasion. I respect their effort and their raw data. However I do not agree with their choice of wording in their interpretation of their results. Which I am free to do. We can all interpret the results however we see fit. It might however be nice if they would present their results in a more factual manner without any slant to the "unable to reliably distinguish" bit, which is a misrepresentation.
 

Latest posts

Back
Top