Value of brulosophy exbeeriments, others experience, myths and beliefs

Homebrew Talk - Beer, Wine, Mead, & Cider Brewing Discussion Forum

Help Support Homebrew Talk - Beer, Wine, Mead, & Cider Brewing Discussion Forum:

This site may earn a commission from merchant affiliate links, including eBay, Amazon, and others.

mongoose33

Supporting Member
HBT Supporter
Joined
Dec 16, 2015
Messages
8,139
Reaction score
8,117
Location
Platteville, WI
In another ongoing thread there was a diversion about the value of brulosophy exbeeriments, others' experience, and so on. Trying to direct the discussion here to a dedicated thread.

I read the brulosophy material religiously. I'm trained as a scientist, think that way as a natural extension of that background, and like the approach. It's not always useful, but I have gleaned some conclusions (subject to disconfirmation, as true science allows).

1. They're all one-shot exbeeriments, though there has been some effort at replication. Ultimately the results are based on specific panels of tasters whose generalizability is not always clear. So I, as a scientist, must take the results as tentative.

**************

2. There are some methodological flaws in the approach; however, as a rule, I find the exbeeriments well-executed. Blind tasting is an excellent approach, and there is an attempt to provide scientific rigor.

**************

3. I find the ingredient exbeeriments not that useful; people like what they like. Look at the exbeeriment that compared Maris Otter with standard 2-row. A significant number of tasters could discern a difference, but in the end, they were split exactly 50-50 as to preference. Nothing there that would help anyone except to try both and decide for themselves.

**************

4. The process exbeeriments I find more useful. The ones on dumping trub into the fermenter led me to try it myself; I can't tell any difference, so I do. (disclaimer: might excess trub in the fermenter lead to problems with storage later on? I don't know, I just know that I can't tell any difference within the time period during which I drink said beer).

But mostly, if you look at all of them, most of the time the results show no signficant difference (in tasters' ability to detect a difference). I've drawn a tentative conclusion from this: homebrewing, if you follow basic processes of cleanliness/sanitation and other mechanisms, is a pretty robust and forgiving enterprise. Papazian's RDWHAHB would appear to be pretty well supported by this.

***************

5. In the end, and Revvy said something to this effect in the other thread, one's own experiences with this are probably the most important. I take the brulosophy stuff as working hypotheses to be tested against my own homebrewing setup and recipes.

I don't go generally further than that. Every exbeeriment is very situation and recipe-specific. Would it work the same with a different yeast? Different ingredients? Different mash temps, fermentation temps, kegging or bottling procedure?

One can't know if it applies to one's own situation for sure; all one can do it try what seems to make sense, and judge for themselves.

**************

And bravo to Marshall and the brulosophy people; someone is testing, testing, testing, and in the end nothing but good can ever come from that.
 
I'm not a scientist but I do like the exbeeriments. I think, like you said, they basically prove that the hobby is pretty forgiving. We tend to look at certain things as key tenets and (in my best Capt'n Barbossa voice), they are "more what you call guidelines".
 
In another ongoing thread there was a diversion about the value of brulosophy exbeeriments, others' experience, and so on. Trying to direct the discussion here to a dedicated thread.

I read the brulosophy material religiously. I'm trained as a scientist, think that way as a natural extension of that background, and like the approach. It's not always useful, but I have gleaned some conclusions (subject to disconfirmation, as true science allows).

1. They're all one-shot exbeeriments, though there has been some effort at replication. Ultimately the results are based on specific panels of tasters whose generalizability is not always clear. So I, as a scientist, must take the results as tentative.

2. There are some methodological flaws in the approach; however, as a rule, I find the exbeeriments well-executed. Blind tasting is an excellent approach, and there is an attempt to provide scientific rigor.

3. I find the ingredient exbeeriments not that useful; people like what they like. Look at the exbeeriment that compared Maris Otter with standard 2-row. Significant number of tasters could discern a difference, but in the end, they were split exactly 50-50 as to preference. Nothing there that would help anyone except to try both and decide for themselves.

4. The process exbeeriments I find more useful. The ones on dumping trub into the fermenter led me to try it myself; I can't tell any difference, so I do.

But mostly, if you look at all of them, most of the time the results show no signficant difference (in tasters' ability to detect a difference). I've drawn a tentative conclusion from this: homebrewing, if you follow basic processes of cleanliness/sanitation and other mechanisms, is a pretty robust and forgiving enterprise. Papazian's RDWHAHB would appear to be pretty well supported by this.

5. In the end, and Revvy said something to this effect in the other thread, one's own experiences with this are probably the most important. I take the brulosophy stuff as working hypotheses to be tested against my own homebrewing setup and recipes.

I don't go generally further than that. Every exbeeriment is very situation and recipe-specific. Would it work the same with a different yeast? Different ingredients? Different mash temps, fermentation temps, kegging or bottling procedure?

One can't know if it applies to one's own situation for sure; all one can do it try what seems to make sense, and judge for themselves.

**************

And bravo to Marshall and the brulosophy people; someone is testing, testing, testing, and in the end nothing but good can ever come from that.

I agree that the exbeeriments are a great way to learn about brewing, are well executed, and can have great value.

But a way that they could potentially be improved is to manipulate multiple variables in the same exbeerminent (which you hint at at the end of your post). I can't say this has never been done, but I don't recall seeing it in any of the exbeeriments I read about (and I just scanned the website and didn't see any).

So for example, one of the xbmts looked at dry hopping in a bag vs. no bag. They found no difference. Quite appropriately, all other factors were held constant. One of them is of course the hops themselves; in both cases, they were the same variety, and in pellet form. But it could be that there is no difference between bag and no bag when pellets are used, but there IS if whole hops were used.

If you manipulate both Hop Form (whole vs. pellet) and Bag (use vs. don't use), then you could answer that. That is more difficult because now you need 4 rather than 2 set-ups (whole, bag; whole, no bag; pellet, bag; pellet no bag). Or perhaps the Variety of hops is also important. So you could use, say, a piney hop, a citrusy hop, and a resiny hop. Now you've got 12 conditions.

Obviously, this gets unworkable pretty soon (notice I'm not volunteering to carry out the study I just described)...but it COULD end up with conclusions like "When using a piney hop, it's best to use a bag, regardless of whether you are using pellets or whole hops. On the other hand, if using a citrusy hop, you definitely want to use a bag--but only if you are using pellets. If you're using whole hops, the bag makes no difference, so just do whatever's easiest. And if you're using a resiny hop, use a bag or not, use pellets or whole, as neither factor seemed to matter when resiny hops were used."

This kind of study is more complex, both in its execution and conclusions, but could help get to the bottom of why homebrewer X doesn't think it matters whether or not a hop bag is used, but homebrewer Y does.
 
You've got to start somewhere, otherwise we'd never get started, but you've got to test things out to see for yourself otherwise we'd never develop anything new. Brulosophy is what it is, a great starting off point for many concepts which due to incomplete data is maybe something which you'll likely want to test out to see for yourself. Read something, add the variables you want to see added, trial it. Some experiments on there maybe, but others I can't imagine many people seriously trying to quote it as absolute hard truth without having put it to the test themselves.

Having data in the first place on perception of dry hop rates etc is a very good starting off point. Also I'm always surprised by the number of books on brewing that reference home brew experimentation because it is often closer to the cutting edge.
 
I don't mind the experiments or (mostly) the way they are presented. There has been some exasperating lumping together of, say, provisional judges with highly experienced and skilled beer palates, but that's a topic for another time.
What makes me roll my eyes is when they make a conclusion and readers say "thanks, I'll add/remove that step from my brew day!" (or something to that effect.)
I'm a proponent of doing one's own research and drawing their own conclusions, and that if something works, you should keep doing it. Basically, their research is valuable as something for me to emulate or build upon in my own experiments, not something to parrot or blindly follow.
 
In the spirit of scientific investigation, some constructive criticism:

I
1. They're all one-shot exbeeriments, though there has been some effort at replication.
No doubt true. But, I think this fails to represent how much better even a simple experiment is than anecdotal information. Also, most experiments are not replicated. Such is the economy of science. Saying that the exbeeriments are not frequently replicated puts them in "good company."

2. There are some methodological flaws in the approach; however, as a rule, I find the exbeeriments well-executed.

Please be more specific.

3. I find the ingredient exbeeriments not that useful; people like what they like.

I heartily agree. I think part of this is that all the work so far as been based on the triangle test. Triangle tests are great for the situations where you want to make a process change or use a cheaper ingredient and keep a product the same. If you want to improve a product, it is far less useful.

Consider, as an alternative, giving two cups and asking tasters to pick a preference (Pepsi challenge style). Under the hypothesis that subjects are randomly picking which cups they like, the number that prefer A is binomial with p = 1/2 (i.e., a sign test). In essence a very similar test, but one that answers preference questions rather than sameness questions.

4. The process exbeeriments I find more useful....
But mostly, if you look at all of them, most of the time the results show no signficant difference (in tasters' ability to detect a difference).

I'm working on a white paper right now for how to combine multiple panels to look across similar experiments. My tentative conclusion thus far is that overall there seem to be indications of small, but discernible, differences when pooling. Such a conclusion doesn't invalidate what you are saying, but does add a nuance to it.

Every exbeeriment is very situation and recipe-specific. Would it work the same with a different yeast? Different ingredients? Different mash temps, fermentation temps, kegging or bottling procedure?

Very true. The added difficulty is that the exberimeents are based on experimenting on *people* not experimenting on *beers*. An experiment on beers would take many batches and randomly assign a portion to get, say, high fermentation temperature and the remainder to get low fermentation temperature. Instead, we take two beers and ask many people to review them.

This is not a critique of the designs as experimenting on beers would be much more difficult. But we should be aware of it when trying to generalize the results to other situations, recipes, procedures, etc.

And bravo to Marshall and the brulosophy people; someone is testing, testing, testing, and in the end nothing but good can ever come from that.
+1000
 
I'm a proponent of doing one's own research and drawing their own conclusions, and that if something works, you should keep doing it.

The difficult part is actually knowing if something works, which is tougher than it sounds. Say I claim A leads to B because I did A and observed B. This implies a counter-factual world in which I did not do A and consequently did not observe B. Getting a handle on such situations requires careful, methodical research to vary A and observe B. Since few things are quite as clear cut as "A always leads to B", this further raises the hurdle

Basically, their research is valuable as something for me to emulate or build upon in my own experiments, not something to parrot or blindly follow.

If you actually run experiments, points to you. It's something I don't do often enough. Until I do more, I'll remain at least as skeptical of my own experience as experiments with more rigorous procedures.
 
It has become my favorite beer blog / site on the web. Still like the community here but hands down they are posting interesting content. I absolutely agree with the idea that any attempt to run a controlled experiment is better than anecdotal experience. The plural of anecdote is not data.

But I believe there is room for improvement. I follow the logic of the triangle test and its ability to provide a statistical result from a relatively small sample size. But I question the method. I've tried triangle tasting myself and it is not at all surprising to me the testers have difficulty detecting differences. Small pours in opaque cups, no preping the audience on experimental parameter or even the style of the beer. I read about these taste tests being conducted at home brew club meetings, conferences, breweries and brewpubs. I question the calibration of the testers. I believe if I were tasting a commercial macro beer on a daily basis I could get pretty good at pinpointing minor differences. This sample tastes like our beer is supposed to taste like, that sample doesn't...ahah they must be different. But give me three small opaque cups of vienna lager after I've had an IPA or two and ask me which of these is different I'd be challenged to tell them apart. GIve me a clue and tell me one was intentionally oxidized and see if I can pick out that oxidation flavor. Sure that's not the way Bud does it but Bud tasting panels are probably not drinking Pliny 20 minutes before their panel either. They might also be lost if asked to taste a panel of RIS instead of light american lager today.
 
IMHO the biggest shortcoming is that you might have tasting skills well above those of the tasting panel. Thus many of the results could be worthless to you. (or conversely, you may have poor taste too...)

Perhaps I'm hypercritical, but I would be more interested in how experts perform. OTOH, I'm not certain that panels made up on 100% beer judges would do better. Who knows.
... I take the brulosophy stuff as working hypotheses to be tested against my own homebrewing setup and recipes.
+1. I would never extrapolate from their results, esp. anything related to yeast/fermentation.
 
I think of the site as Myth Busters. They busted a lot of myths but many of their busted myths were less than scientific. Same with Brulopshy. Some of it is preference in taste, as noted above, but it also bust a lot of myths that have been around while which are probably true on a large scale brewery, such as hot side aeration, since the average home brewer is not piping wort through miles of piping like Coors would do, or because are we have a better understanding of the brewing process versus homebrewers in the 1980's.
 
In the spirit of scientific investigation, some constructive criticism:


No doubt true. But, I think this fails to represent how much better even a simple experiment is than anecdotal information. Also, most experiments are not replicated. Such is the economy of science. Saying that the exbeeriments are not frequently replicated puts them in "good company."

Simple is better if it's done well. Anecdotal isn't necessarily worse, it's just information that has not been gathered systematically. But if you hear 15 anecdotal evaluations of a local beer and they are consistent in what's perceived, that carries weight in and of itself.


Please be more specific.

I'm going to add this to Eric's comment below


I heartily agree. I think part of this is that all the work so far as been based on the triangle test. Triangle tests are great for the situations where you want to make a process change or use a cheaper ingredient and keep a product the same. If you want to improve a product, it is far less useful.

In the ingredient exbeeriments I'm always at a loss to determine what I should do--unless the results are overwhelming in one direction. The exception, it seems to me, is if preferences are split evenly, then the cheaper option probably makes sense. If I were a brewery and found i could reduce my ingredient costs w/o reducing perceived or real quality, well, that's a no brainer. So sometimes a "no difference" result might be valuable, to the right person. Just not to me. :)

Consider, as an alternative, giving two cups and asking tasters to pick a preference (Pepsi challenge style). Under the hypothesis that subjects are randomly picking which cups they like, the number that prefer A is binomial with p = 1/2 (i.e., a sign test). In essence a very similar test, but one that answers preference questions rather than sameness questions.

They do usually ask preference at the end, among those who could distinguish the different beer. Part of that process is what is methodologically muddier. If a person picks the odd one out just by guessing, then should they be included? A recent exbeeriment noted one of the "successful" tasters admitted they tasted the same--despite having guessed correctly. But if people are guessing correctly (and that's the nature of the statistical test), should they also be included later, or excluded as tasters who just got lucky? Why would I want information from someone who just got lucky? Not sure i would.

I'm working on a white paper right now for how to combine multiple panels to look across similar experiments. My tentative conclusion thus far is that overall there seem to be indications of small, but discernible, differences when pooling. Such a conclusion doesn't invalidate what you are saying, but does add a nuance to it.

A meta-analysis? There actually has been an attempt to combine panels on the brulosophy site, and attempts to see if judges are better at distinguishing differences (IIRC, they aren't). The goal was to see if any of these criticisms held water in the data. Good on them for looking at it.

Very true. The added difficulty is that the exberimeents are based on experimenting on *people* not experimenting on *beers*. An experiment on beers would take many batches and randomly assign a portion to get, say, high fermentation temperature and the remainder to get low fermentation temperature. Instead, we take two beers and ask many people to review them.

In the end, the people are the measuring instrument trying to determine if the beers are different, at least within their sensory abilities. Typically in an experiment we'd take a sample of people, split them into two groups randomly, apply a treatment to one while the other was a control. Then we'd see what happened.

Here, the tasters are the measuring instrument trying to discern if there's a difference between the beers as a result of process or ingredient differences.

This is not a critique of the designs as experimenting on beers would be much more difficult. But we should be aware of it when trying to generalize the results to other situations, recipes, procedures, etc.

There is so much that is recipe and process dependent; in the end, we're back to trying what makes sense to us and seeing what happens.

It has become my favorite beer blog / site on the web. Still like the community here but hands down they are posting interesting content. I absolutely agree with the idea that any attempt to run a controlled experiment is better than anecdotal experience. The plural of anecdote is not data.

But I believe there is room for improvement. I follow the logic of the triangle test and its ability to provide a statistical result from a relatively small sample size. But I question the method. I've tried triangle tasting myself and it is not at all surprising to me the testers have difficulty detecting differences. Small pours in opaque cups, no preping the audience on experimental parameter or even the style of the beer. I read about these taste tests being conducted at home brew club meetings, conferences, breweries and brewpubs. I question the calibration of the testers. I believe if I were tasting a commercial macro beer on a daily basis I could get pretty good at pinpointing minor differences. This sample tastes like our beer is supposed to taste like, that sample doesn't...ahah they must be different. But give me three small opaque cups of vienna lager after I've had an IPA or two and ask me which of these is different I'd be challenged to tell them apart. GIve me a clue and tell me one was intentionally oxidized and see if I can pick out that oxidation flavor. Sure that's not the way Bud does it but Bud tasting panels are probably not drinking Pliny 20 minutes before their panel either. They might also be lost if asked to taste a panel of RIS instead of light american lager today.

Markstache asked me to elaborate, and this is one of the methodological issues. The measuring instrument assessing the beers is....unclear. Does everyone have clean palates? Are they three sheets to the wind? Just had a glass of burn-the-enamel-from-your-teeth 184-ibu hopmurder beer?

We don't know. Another issue is who the testers represent. Are they a cross-section of all beer drinkers? Are there regional variations in tastes such that a different panel in a different place would produce different results?

To what are the results generalizable? Maybe the people perceiving differences are super-tasters and I would never notice a difference.

According to the Hops book, some people can't perceive some tastes in hops. I'm not a super-taster by any means; I recall having an apricot beer that a friend of mine insisted had apricot extract used instead of real fruit. Maybe so--I couldn't taste the apricot in the first place, and neither could another person to whom I offered the beer. And yet, i don't recall what beer I had prior that may have screwed up my taste buds.

*******************

There's one other thing I've been cogitating on. Imagine 5 process changes one might do in brewing beer. Each one, by itself, produces a small and better result than not doing it--but it's just below the threshold where one can discern it.

When we do exbeeriments we allow one variable to vary; what if there's an additive effect, such that no one process change will produce a result we can discern, but if we did all 5, it would be a WOWEE! result.

This has actually been the basis of my doing a lot of small things that are suipposed to help with the quality of the brew. I see it as a continuous quality improvement approach, the additive effect of many small things will--I'm hoping--produce a powerful result.
 
A meta-analysis?
Yes. Specifically, a closed testing procedure that allows one to get safe p-values no matter how many different groups of hypotheses one combines. E.g., we might want to pool the fermentation temp experiments to see if there is evidence against the null of no difference in all panels and also look at the individual panels. I'll try to remember to update this thread when I get it finished.

In the end, the people are the measuring instrument trying to determine if the beers are different, at least within their sensory abilities. Typically in an experiment we'd take a sample of people, split them into two groups randomly, apply a treatment to one while the other was a control. Then we'd see what happened.

Here, the tasters are the measuring instrument trying to discern if there's a difference between the beers as a result of process or ingredient differences.
The point I am trying to make is that we are experimenting with beers on people (beer is the manipulation, people are the subjects). I'm using the word "experiment" here in the sense of "generating data," not so much a randomized trial. We can see this in the "n" of the study: the stats are based on the number of subjects, not the 2 beers employed. In fact, this is precisely the difficulty in generalizing to other beers. With two beers, they could be different due to the manipulation (say ferm temp), but they could also be different because one carboy happened to be dirty. As we increased the number of beers, random assignment of manipulation would "average out" such confounding sources. With only 2 beers, we simply can't do that.

As an aside, while it is not frequently done in the tasting panels I've seen, in my opinion the best justification of use of the binomial distribution for the triangle test is when each set of 3 cups is randomly assigned (independent, uniform) to one of the conditions "AAB", "ABA", "BAA", "BBA", "BAB", or "ABB" (for beers A and B). Under the hypothesis that the beers are identical, then each person would have picked the same numbered cup, regardless of serving order, and the correct identifications follow a binomial distribution. In so far as we have a randomized trial, it is on cups. I see the "guessing at random" justification as being an approximation to this ideal.

Markstache asked me to elaborate, and this is one of the methodological issues. The measuring instrument assessing the beers is....unclear.

This is a classic issue in the behavioral and social sciences: as we make experiments more highly controlled and precise, we make them less and less like the real world (internal vs. external validity, to use the lingo). There is probably no solution that makes everyone happy. I tend to favor the "let them drink beer" perspective: let anyone in the panel, under any circumstances. I care about how my beer will be received by a general audience, and making the generalization to this group seems easiest with a wide variety of people on the panel. For those who are mostly concerned with BJCP type events, I could see the emphasis on the tasters being judges and taking the panel in a highly controlled environment.

There's one other thing I've been cogitating on....
[\quote]

I don't know if there is much to say, except to call for more data collection! Perhaps it is also useful to recall that failing to reject the null is not absolute proof that the null is true. At a minimum, it suggests that there is a reasonable case to be made for either procedure, so if you continue to believe one improves your beer, carry on.
 
I've found their articles interesting and have actually based equipment purchases off their reviews. Recently I have been trying to discover things for myself rather than let their results dictate how I brew. I see potential for error in some of their experiments that could sway results. And who can blame them? Its hard to read through texts, have the equipment and resources, and have the time to do all this. I think we can look back at experiments (even HSA) and see potential flaws that weren't considered before. We have to remember that brewing in our garage with a turkey fryer and water cooler should not override well respected researchers. We could, however, interpret some of these experiments as non-issues at a homebrew level because we simply don't have the equipment or resources to control the variable in question. We seen their results in our own garages, but I guess my question is whether or not we would think differently with the right equipment and/or right process? That is what a homebrewer should research and decide for himself, whether its feasible and worthwhile to do so.
 
We have to remember that brewing in our garage with a turkey fryer and water cooler should not override well respected researchers.

From an admittedly brief reading of brewing texts, I've found many of the conclusions are based on studies that targeted things other than taste perception, such as dissolved oxygen, yeast cell counts, amounts of volatile organic compounds, etc. The implication being that if process A lowers DO or VGCs, the resulting beer will be better. Perhaps. But perhaps it will be below flavor thresholds.
 
From an admittedly brief reading of brewing texts, I've found many of the conclusions are based on studies that targeted things other than taste perception, such as dissolved oxygen, yeast cell counts, amounts of volatile organic compounds, etc. The implication being that if process A lowers DO or VGCs, the resulting beer will be better. Perhaps. But perhaps it will be below flavor thresholds.

This is the basis of my "cogitating" above. Maybe one thing produces a result that is positive but below the taste threshold; but what if there's an additive, cumulative effect.

Reminds me of getting high response rates in survey research; a high response rate is the sum of a lot of little things that, together, allows one to produce high response rates. One technique might increase RR a percent; another might get you 2 percent. But do enough of them and they start to add up.
 
1. They're all one-shot exbeeriments, though there has been some effort at replication. Ultimately the results are based on specific panels of tasters whose generalizability is not always clear. So I, as a scientist, must take the results as tentative.

**************

3. I find the ingredient exbeeriments not that useful; people like what they like. Look at the exbeeriment that compared Maris Otter with standard 2-row. A significant number of tasters could discern a difference, but in the end, they were split exactly 50-50 as to preference. Nothing there that would help anyone except to try both and decide for themselves.

**************

The replication problem is my big issue. I do sometimes consider the above #3.

What effort at replication has there been.

Also, I loathe the intentional misspelling of 'experiment'... it just seems tacky in the way that Weird Al sings about.

That said, I do read a lot of their stuff. The reason I bought the aeration system that I did is because of their experiments.
 
From an admittedly brief reading of brewing texts, I've found many of the conclusions are based on studies that targeted things other than taste perception, such as dissolved oxygen, yeast cell counts, amounts of volatile organic compounds, etc. The implication being that if process A lowers DO or VGCs, the resulting beer will be better. Perhaps. But perhaps it will be below flavor thresholds.

True, we all need to decide for ourselves what we prefer. It may not be feasible or worthwhile to chase a specific result. I would not write-off anything on these experiments by themselves, though. That are a good read though and food for thought.
 
The replication problem is my big issue. I do sometimes consider the above #3.

What effort at replication has there been.

They have a few "exbeeriments" where they're looking at the same variable again. Yeast pitch temperature, ferm temperature. They're not strict replications, but they ultimately are testing the conclusions from the earlier trials.


Also, I loathe the intentional misspelling of 'experiment'... it just seems tacky in the way that Weird Al sings about.

You might think I used that phrase above intentionally to poke you :) but I have another reason. By using the phrase "exbeeriment" I'm not trying to anoint the process with more rigor than it might deserve. So when I use that term you know exactly the source of the information--brulosophy.

Reminds me of people on the Cast Boolits site--the word "boolit" is used by some to denote home-cast bullets, as opposed to commercial ones.
 
You freaking geeks crack me up! Especially the dude complaining about the name of the exBEERiments! They aren't presenting this as a great white paper or a new dogma of brewing. It's something to think about and if one of the exBEERiments hits home with you then think some more and maybe do one of your own.

Like I said, they are essentially showing this is a pretty forgiving hobby and sometimes we tend to stress over 1 degree here and .2 ph there when we need to relax and have a home brew.
 
You freaking geeks crack me up! Especially the dude complaining about the name of the exBEERiments! They aren't presenting this as a great white paper or a new dogma of brewing. It's something to think about and if one of the exBEERiments hits home with you then think some more and maybe do one of your own.

Like I said, they are essentially showing this is a pretty forgiving hobby and sometimes we tend to stress over 1 degree here and .2 ph there when we need to relax and have a home brew.

I absolutely agree. The exBEERiments do serve a useful purpose, however, when you're dealing with something as variable as people's individual taste, your "data" will also vary.

I had been using US-05 for most of my brewing for several years when I began to notice a strange taste and even SMELL from that yeast. Then I began to notice a similar nastiness in Nottingham!

There are certainly enough people out there who would tell me they're very clean neutral yeasts and I'm full of it, but it's very real to me.

I still wanted the ease of dry yeast so, after the Brulosopher exBEERiments with w34/70 at various temps, I decided to try it for myself. I use it for just about all my beers, now. In fact, I took a first place with a blonde "ale" fermented with 34/70 in a recent competition.
 
There's one other thing I've been cogitating on. Imagine 5 process changes one might do in brewing beer. Each one, by itself, produces a small and better result than not doing it--but it's just below the threshold where one can discern it.

When we do exbeeriments we allow one variable to vary; what if there's an additive effect, such that no one process change will produce a result we can discern, but if we did all 5, it would be a WOWEE! result.

This has actually been the basis of my doing a lot of small things that are suipposed to help with the quality of the brew. I see it as a continuous quality improvement approach, the additive effect of many small things will--I'm hoping--produce a powerful result.

Isn't this the basis of the series of short and shoddy experiment? At least in that one they showed the instrument was able to discern between a beer made with a starter + 60 min mash and boil + active temperature control fermentation from a beer made with one smack pack + 30 min mash and boil + ambient temperature control.
 
two months ago I was lamenting to Stan Hieronymus that so many blindly follow experiment results without testing or experiencing them themselves. I believe it was after another "Belgian Beer: You're probably doing it wrong" article came out, and was also a conversation about the widely varying ability to discern certain characters.

Stan reminded me of one of my favorite anecdotes, from his blog in 2010:
"A few years ago Martin Krottenthaler, a professor at the Weihenstephan
brewing university north of Munich, talked about research comparing
decoction mashing and less-time consuming infusion mashing. He flipped
through PowerPoint slides, explaining why lesser malts once made
decoction necessary. “Boiling is boiling,” he said, showing benchmarks
that the chemists recorded were different throughout the two processes
but the resulting worts produced almost identical profiles.

Then he introduced the human element. A tasting panel basically
confirmed the results, because few of its members could tell the
difference — but Krottenthaler was one of those who could pick out the
beers produced using decoction. “For me it was significant,” he said. "

*shrug* ...I dunno, just seemed relevant here.
cheers--
--Michael
 
Isn't this the basis of the series of short and shoddy experiment? At least in that one they showed the instrument was able to discern between a beer made with a starter + 60 min mash and boil + active temperature control fermentation from a beer made with one smack pack + 30 min mash and boil + ambient temperature control.

We have to look at what the results indicate, not just that people could discern a difference. And yes, to some extent this was an exbeeriment in cumulative effects but the results leave me less than convinced of....well, what it all means.

Here's a place where I tend to lose my way with the exbeeriment results. The findings of that exbeeriment indicated this:

"...a total of 13 (p=0.012) identified the odd-beer-out, suggesting participants were indeed able to reliably distinguish a beer made with traditional methods from one made with an abbreviated mash and boil that was under-pitched and fermented warm.

The 13 participants who correctly selected the unique sample in the triangle test were instructed to complete a brief set of additional questions comparing only the two different beers, still blind to the nature of the xBmt. Likely aligning with the expectations of many, 6 tasters chose the traditional beer as their most preferred while only 2 reported liking the short & shoddy beer more. Equally as interesting is the fact 4 tasters reported having no preference despite perceiving a difference between the beers. Only 1 lone participant said they experienced no difference between the beers."


Some issues I have with this: First is the last sentence--one lone participant said they experienced no difference between the beers. If so, what was she/he doing in the preferences panel? It would appear that this person just guessed, got it right, and now is in the tasting panel--and yet, they can't tell a difference. I know that the statistical test is determining whether a greater-than-chance number of people are picking the correct sample, but how many "correct" ones are really just guesses? Seems to me that if you have to guess, you can't tell them apart, which means....shouldn't you be recorded as a wrong guess?

Here's another issue: more of the panel liked the traditional beer better than the short-and-shoddy beer, but it was a 6-to-2 difference. What we don't know is how much better they liked it. Did they not like the other? Are the differences only minor? Would any of them have said "I'll drink this one, but this other one is swill?" Heck, four of the panel said they could perceive a difference but didn't have a preference.

Suppose the beer was a Belgian Dubbel. I don't care for Belgians. Something about the flavor I just don't care for. If were able to distinguish two such beers in a triangle test, and then asked for my preference, I'd have to judge them this way: Which tastes less bad to me? I wouldn't like either. And maybe I would dislike the flavor so much either way that I couldn't pick a "favorite."

My point is what if people are judging a style they don't like? To say they "prefer" one over the other....well, they don't prefer either. There's just a less...disgusting?....flavor in the one beer as compared to the other.

And what about those two who chose the short-and-shoddy beer. Perhaps they are poor homebrewers themselves, always doing short-and-shoddy methods, and they picked out the beer that had the flavors--off-flavors, really--that they recognized from their own beer?

****************

In all this, again, the intent here is not to knock Brulosophy and Marshall for what they're doing. It can be easy to snipe at someone from behind the safety of the armor of having never done anything others can critique. Some do that. Even, perhaps, in this thread.

No research is perfect, and in the world where we involved human beings it's often tremendously hard to do good research. At least the Brulosophy people are trying some stuff, doing their best to adhere to a research protocol that will eliminate many if not most alternative explanations for their results, having others evaluate the resulting beer, and reporting on it. Bully for them!
 
We have to look at what the results indicate, not just that people could discern a difference. And yes, to some extent this was an exbeeriment in cumulative effects but the results leave me less than convinced of....well, what it all means.

Here's a place where I tend to lose my way with the exbeeriment results. The findings of that exbeeriment indicated this:

"...a total of 13 (p=0.012) identified the odd-beer-out, suggesting participants were indeed able to reliably distinguish a beer made with traditional methods from one made with an abbreviated mash and boil that was under-pitched and fermented warm.

The 13 participants who correctly selected the unique sample in the triangle test were instructed to complete a brief set of additional questions comparing only the two different beers, still blind to the nature of the xBmt. Likely aligning with the expectations of many, 6 tasters chose the traditional beer as their most preferred while only 2 reported liking the short & shoddy beer more. Equally as interesting is the fact 4 tasters reported having no preference despite perceiving a difference between the beers. Only 1 lone participant said they experienced no difference between the beers."


Here are some issues I have with this. First is the last sentence--one lone participant said they experienced no difference between the beers. If so, what was she/he doing in the preferences panel? It would appear that this person just guessed, got it right, and now is in the tasting panel--and yet, they can't tell a difference. I know that the statistical test is determining whether a greater-than-chance number of people are picking the correct sample, but how many "correct" ones are really just guesses? Seems to me that if you have to guess, you can't tell them apart, which means....shouldn't you be recorded as a wrong guess?

Here's another issue: more of the panel liked the traditional beer better than the short-and-shoddy beer, but it was a 6-to-2 difference. What we don't know is how much better they liked it. Did they not like the other? Are the differences only minor? Would any of them have said "I'll drink this one, but this other one is swill?"

Suppose the beer was a Belgian Dubbel. I don't care for Belgians. Something about the flavor I just don't care for. If were able to distinguish two such beers in a triangle test, and then asked for my preference, I'd have to judge them this way: Which tastes less bad to me? I wouldn't like either.

My point is what if people are judging a style they don't like? To say they "prefer" one over the other....well, they don't prefer either. There's just a less...disgusting?....flavor in the one beer as compared to the other.

You are certainly entitled to your opinions, but I think you're kind of losing your footing here.

As I understand it, the exbeeriments are setup to ask a single question, can a panel of "random" tasters tell two beers apart that have (as best they can) a single variable changed. The short-and-shoddy ones are a bit different, but the premise is the same. ALL that matters is, was there a detectable difference. To me, personal preferences are something extra on the side, but not really that important, for precisely the reason you pointed out. Everyone likes what they like. It would be egregious to throw out data points post analysis, as the statistics tests account for randomness.
 
Some issues I have with this: First is the last sentence--one lone participant said they experienced no difference between the beers. If so, what was she/he doing in the preferences panel? It would appear that this person just guessed, got it right, and now is in the tasting panel--and yet, they can't tell a difference. I know that the statistical test is determining whether a greater-than-chance number of people are picking the correct sample, but how many "correct" ones are really just guesses? Seems to me that if you have to guess, you can't tell them apart, which means....shouldn't you be recorded as a wrong guess?

The problem is that of the people who got the correct answer, you can't actually tell who guessed; who fooled themselves into believing they tasted a difference, but actually guessed (e.g. post-hoc justification); who can't accurately compare beers, so that they tasted differences even between the beers that were the same (noise) but guessed correctly; who actually could taste a difference and who actually did taste a difference, but was unsure and so thought they guessed (lack of trust in senses).

The robust solution is to treat all correct tasters the same.
 
You are certainly entitled to your opinions, but I think you're kind of losing your footing here.

As I understand it, the exbeeriments are setup to ask a single question, can a panel of "random" tasters tell two beers apart that have (as best they can) a single variable changed. The short-and-shoddy ones are a bit different, but the premise is the same. ALL that matters is, was there a detectable difference. To me, personal preferences are something extra on the side, but not really that important, for precisely the reason you pointed out. Everyone likes what they like. It would be egregious to throw out data points post analysis, as the statistics tests account for randomness.

I appreciate the civility with which you're disagreeing with me. But I would suggest that you're missing the point I tried to make, which might simply be the result of a poorly-written explanation.

Suppose tasters can tell a difference--so what? The crucial point for me is "what's the actionable intelligence that result produces?" If it doesn't result in a process or ingredient that allows me to produce better beer, what have I learned?

This is why, at one level, I tend to ignore ingredient exbeeriments (people will like what they like, which I might not like) and focus more on the process exbeeriments. If such a process exbeeriment shows no significant result, suggesting the process variable isn't producing a result most people can discern, then I tend to pay more attention.

Even the short-and-shoddy exbeeriment doesn't give me a lot of confidence in the results. The difference is only four tasters (6-2) and four others didn't have a preference. Pretty slim evidence to me. It may be more vital to you, I don't know.

This is why, unless the results are overwhelmingly one-sided for something that came in as significant, I tend to focus more on the results that say "no difference." And yes, I know I have no way to quantify Type II error when I say that.

******************

As long as I'm at it, here's another area I tend to have issues with.

There's an element of triangle testing that to me doesn't truly evaluate preference. I'd like to know whether people could differentiate between the samples more than once. The panels are said to be able to reliably distinguish between the beers, but that statement really can't be made.

Reliability means a measure is consistent and repeatable. We don't know that. If you gave me the same triangle test for 6 days straight, would I be able to pick out the odd-one-out each time? If I could, my results would reliable.

But what if I had to guess? By chance I'd get 2 correct.

Now, imagine I'm part of one of these panels. Am I just getting lucky in guessing? And if so, is that reliability? No, it's not. it's just luck.

So when I see results from the short-and-shoddy exbeeriment, I see this:

6 preferred the traditional
2 preferred the short-and-shoddy
4 had no preference
1 couldn't tell a difference.

The evidence here is thin, very thin.
 
By the way, my take-away from the Brulosophy experiments as a whole is that pretty often in brewing "good enough" is good enough. Particularly with mash parameters, fermentation parameters, aeration (hot and cold) and sanitation.

The reason for keeping good practices isn't that if you take one variable just out of bounds then the beer will be ruined, but that if enough variables start to run up against the boundaries, then things can go wrong fast.

Of course, some beers are more sensitive to certain parameters than others.
 
The problem is that of the people who got the correct answer, you can't actually tell who guessed; who fooled themselves into believing they tasted a difference, but actually guessed (e.g. post-hoc justification); who can't accurately compare beers, so that they tasted differences even between the beers that were the same (noise) but guessed correctly; who actually could taste a difference and who actually did taste a difference, but was unsure and so thought they guessed (lack of trust in senses).

The robust solution is to treat all correct tasters the same.

I'm still on the issue I noted just above for Isomerization. OK, you bring in all correct tasters, including those who couldn't really tell.

And now we're going to ask them for preference, when they couldn't tell a difference in the first place. That's the part where I have difficulty determining the actionable intelligence. In other words, what am I going to do differently now that I have this result, if anything--knowing that the result includes information from tasters who could not tell the difference between the beers in the first place.
 
I appreciate the civility with which you're disagreeing with me. But I would suggest that you're missing the point I tried to make, which might simply be the result of a poorly-written explanation.

Suppose tasters can tell a difference--so what? The crucial point for me is "what's the actionable intelligence that result produces?" If it doesn't result in a process or ingredient that allows me to produce better beer, what have I learned?

This is why, at one level, I tend to ignore ingredient exbeeriments (people will like what they like, which I might not like) and focus more on the process exbeeriments. If such a process exbeeriment shows no significant result, suggesting the process variable isn't producing a result most people can discern, then I tend to pay more attention.

Even the short-and-shoddy exbeeriment doesn't give me a lot of confidence in the results. The difference is only four tasters (6-2) and four others didn't have a preference. Pretty slim evidence to me. It may be more vital to you, I don't know.

This is why, unless the results are overwhelmingly one-sided for something that came in as significant, I tend to focus more on the results that say "no difference." And yes, I know I have no way to quantify Type II error when I say that.

******************

As long as I'm at it, here's another area I tend to have issues with.

There's an element of triangle testing that to me doesn't truly evaluate preference. I'd like to know whether people could differentiate between the samples more than once. The panels are said to be able to reliably distinguish between the beers, but that statement really can't be made.

Reliability means a measure is consistent and repeatable. We don't know that. If you gave me the same triangle test for 6 days straight, would I be able to pick out the odd-one-out each time? If I could, my results would reliable.

But what if I had to guess? By chance I'd get 2 correct.

Now, imagine I'm part of one of these panels. Am I just getting lucky in guessing? And if so, is that reliability? No, it's not. it's just luck.

So when I see results from the short-and-shoddy exbeeriment, I see this:

6 preferred the traditional
2 preferred the short-and-shoddy
4 had no preference
1 couldn't tell a difference.

The evidence here is thin, very thin.

Good, I'm glad my post didn't come across as a personal attack, as that definitely wasn't the intention.

RE: luck in guessing, that's the reason for including large numbers (within reason, too large of groups can produce artificial significance) and then obtaining a p-value. The odds that everyone is guessing (and getting it correct) become very high once you get p-value below significance. For example, if two people guess correct, that's 1/3 * 1/3 = 1/9, 3 people becomes 1/27, etc. This is obviously oversimplified, but that's why so many of the exbeeriments come back as non-significant (imo).

We certainly agree on the ingredient ones (I think most feel that way actually). That's also why I don't understand your issue with this specific exbeeriment. You want personal preference, yet we all know everyone's will be different. The important piece is the total number of people who correctly picked the odd beer out. Going past that is simply anecdotal, maybe they should stop including that part?
 
There's always an issue of asking for a subjective opinion like preference in these kinds of things because taste and expectation play such a big role. Some responses will be dominated by taste, some by expectation for the style, some by comparison to a favored commercial example. I don't think having guessers involved is as big a problem as the subjectivity, to be honest, and even from the non-guessers there's a good chance the preference is actually "both are good, just different".


OTOH, if you were trying to clone a particular beer, and asking how close you got, or you were trying to fit a BJCP style, then the subjectivity is probably lower, and having the guessers involved does actually make the results worse.

A related fun game would be to ask several panels for a triangle test and a preference, but tell them different things about the beer before hand, like which style it was supposed to be, or what you had varied.
 
In the end we're applying scientific techniques to something that's largely subjective - flavor.



It's hard to make solid science out of that to begin with. Unless you start trying to measure things you can literally measure with instrumentation. Like IBUs or something like that. Even then, you're left with NUMBERS and you don't necessarily know how one person will enjoy or hate that particular number, or even what their threshhold is for tasting it.
 
From the Brulosophy article today: "Out of the 12 of 21 blind tasters who were able to distinguish a beer fermented with Saflager W-34/70 at 60˚F/16˚C from the same beer fermented at 82˚F/28˚C, 7 selected the warm ferment beer as their preferred, 2 chose the cool ferment sample, 2 felt there was a difference but had no preference, and 1 thought there was no difference. This doesn’t mean the warm ferment lager was necessarily better, just that of the participants who were correct, a majority liked it more than the cool fermented sample."

I feel like this paragraph helps explain my position. The majority of people enjoyed the warm fermented lager v. the cold (or "properly") fermented lager. You know what this means to me? These people don't like lagers, lol. What it really means, is the exbeeriment returned a significant result, the beers were different.
 
From the Brulosophy article today: "Out of the 12 of 21 blind tasters who were able to distinguish a beer fermented with Saflager W-34/70 at 60˚F/16˚C from the same beer fermented at 82˚F/28˚C, 7 selected the warm ferment beer as their preferred, 2 chose the cool ferment sample, 2 felt there was a difference but had no preference, and 1 thought there was no difference. This doesn’t mean the warm ferment lager was necessarily better, just that of the participants who were correct, a majority liked it more than the cool fermented sample."

I feel like this paragraph helps explain my position. The majority of people enjoyed the warm fermented lager v. the cold (or "properly") fermented lager. You know what this means to me? These people don't like lagers, lol. What it really means, is the exbeeriment returned a significant result, the beers were different.

What the text there doesn't say is whether they were told that the beer was intended to be a lager before they gave their preference. That might have changed the results a lot - particularly if they are people that don't usually go for lager!
 
Here's another one: "A panel of 37 people with varying levels experience participated in this xBmt. Each blind taster was served 2 samples of the no-boil Berliner Weisse and 1 sample of the boiled Berliner Weisse in differently colored opaque cups then instructed to select the unique sample. At this sample size, 18 tasters (p<0.05) would have had to accurately select the unique sample to achieve statistical significance. Ultimately, 31 tasters (p=0.0000000004) chose the different beer, suggesting participants were able to reliably distinguish the boiled Berliner Weisse from the no-boil sample.

The 31 participants who correctly selected the unique sample in the triangle test were then instructed to complete a brief preference survey comparing only the two different samples, all still blind to the variable. In the end, 15 tasters reported preferring the boiled sample, 13 said they liked the no-boil version better, and 3 people had no preference despite noting a difference between the beers."

Preference was almost 50:50! And it looks like no one (admitted to at least) guessed...

What the text there doesn't say is whether they were told that the beer was intended to be a lager before they gave their preference. That might have changed the results a lot - particularly if they are people that don't usually go for lager!

Absolutely, but I bet they still would have found them different :)
 
Back
Top