Do "professional" brewers consider brulosophy to be a load of bs?

ajdelange · Aug 21, 2017

AZ_IPA said:
Although I've worked with a scientific modeler who says that there's never a probably of 0 or 1...

Probability that you will die: 1.00000000000
Probability that you will live forever: 0.000000000

Probability that you will pay taxes: 1.00000000000
Probability that you will never have to pay taxes: 0.00000000

joshesmusica · Aug 21, 2017

mongoose33 said:
Well, I suppose this is one way to view it, but lost in all the statistical handwaving is the ultimate problem with all of this, and that is the measuring process being used by the Brulosophy approach is hugely flawed.

I'm not antistatistical--I've had 10 university stat courses in my life, 9 of which were at the graduate level. Nothing less than an A in any of those classes. PhD minor in Statistics (yeah, they had those). One of the things one learns when one beats one's head against the statistical wall like I did is that one should never, not ever, forget that if you don't measure your variables reliably and validly, the statistics are not--ARE NOT--worth a hill of beans.

I use this brulosophy material in my own classes as a way to illustrate what happens when people get lost in probability theory and forget that in the end, without accurately measuring the variables at issue, one's conclusions are really uncertain at best.

This was pointed out earlier in the thread, and it's an absolute thread-killer, in that it really cannot be refuted. This is fundamental to research, and to statistics.

We do not know who the samples are and thus to what populations they may be generalizable. Tasters are "qualified" even if they just guessed the right answer, and then treated as if they are effective in distinguishing differences. We know that there is no consistency in what tasters may have or may not have been drinking prior to the triangle tests, and it's clear that there is quite the potential for palate fatigue, or taste-bud numbing. We simply do not know who they are, how they are prepared, and that is not science, it's something else entirely.

I have little doubt there will be another attempt to convince people with statistical hand-waving, but in the end, you can largely ignore that effort. It's just a way to distract from the fundamental problem with the brulosophy approach, something that the reliance on statistics cannot overcome.

Of course, people can believe what they want to believe, and if this "let's assume it's all measured well and then proceed as if it's valid" stuff is convincing, well, so be it.

I did read your stuff as well throughout the thread. With what time I have to read on this forum, it took me a couple of days to sort through this thread. It's obvious you know what you're talking about. And only now have stated just how much you know about statistics, even though it was practically implied in most of your posts.

I guess that was also my point, that if we disregard just how flawed their testing is (and again, I think that if you talked directly with Marshall and crew, they'd readily admit that point), and simply look at their results... EVEN THEN, the stats are not in their favor. And by "their," I simply mean those who take the results and adhere to them like biblical orthodoxy.

And again, just with the interactions I've had with Marshall (albeit, online), I'd say if these sorts of things were presented to him, that he'd accept it with an open, scientific mind.

I could be wrong, of course, but this is my inclination.

joshesmusica · Aug 21, 2017

ajdelange said:
Probability that you will die: 1.00000000000
Probability that you will live forever: 0.000000000

Probability that you will pay taxes: 1.00000000000
Probability that you will never have to pay taxes: 0.00000000

hahaha

/thread

ajdelange · Aug 22, 2017

joshesmusica said:
Man, I'm not as nerdy into statistics as a few of you guys by a long shot, but I nerded out on this thread with you. Thanks AJ and bwarbiany for the great posts. My inclinations were initially the same as yours B, but I didn't have the stats background to show it. And then as AJ pointed out, it has to actually reach the null hypothesis in order to be considered fully indifferent (if I understood that correctly).

You are in the ballpark but not quite there. What this branch of statistics is all about is positing a hypothesis and then calculating the probability that some data you observed would arise under that hypothesis. If the probability is very low then the hypothesis probably isn't true and ought to be dismissed in favor of an alternative hypothesis.

joshesmusica said:
So although the experiments didn't *prove* that fermentation temp mattered, they didn't *disprove* the theory either.

As has been pointed out (my attempts a wit in another post aside) there are no probabilities of 0 or 1 so there is always uncertainty. We can't know that hot and cold fermented beers are indistinguishable. We can just say that it is unlikely that they are given the data that we observed. The question becomes how improbable do they need to be before we decide that for practical purposes we should say they are not indistinguishable. That is up to those who interpret the data to decide. The probability thresholds are often determined by the costs associated with being wrong (i.e. that they are the same but we decide they are different).

joshesmusica said:
Rather, they should've raised more questions and more testing.

The first thing we would want them to do is correct the procedural errors. Then we'd like to see some of the tests repeated using the correct procedures. If they say "Here's what we got following ASTM protocols" then there data is much more likely to be accepted than if they violate ASTM protocols as they have done in the past. You can challenge me on this by saying "What difference does it make if they only presented BBA?" I'd have to answer that I don't really know but that question goes away of they follow the protocol (equal numbers of ABB, BAB, BBA, BAA, ABA and AAB).

joshesmusica said:
Then the fact that the meta-analysis is actually acceptable practice, and that it, in fact, does *prove* that fermentation temp will make a difference, only furthers my gut-feeling that if the panel sizes were larger, we would see more significant results.

I'm assuming you put "prove" in quotes because data based on larger panel size does not prove any more than data from smaller panel sizes. It may strengthen our convictions with regard to a position on a hypothesis though.

The other approach here is to look at the other information that is available from triangle test data i.e. the information on the range of likely differentianbility and on the level of differntiability that most likely explains the observed data. This may help us to reject or accept the hypotheses even though the confidence levels are not what we hoped for. It may not be necessary to increase sample size or pool data from multiple tests but doing either of those does improve the estimate of differentiability in addition to decreasing alpha.

joshesmusica said:
I'm not sure how inclined AJ is to converse with the brulosophy dudes,

Perfectly. My original suggestion to them here was that they shell out the $45 to ASTM for the procedure but can now change that to suggest that they go to #25 and follow the link there. With that in hand there should be common ground enough that I can answer any questions they might have, make suggestions,etc. This I am happy to do.

bob1852 · Aug 22, 2017

ajdelange said:
Probability that you will die: 1.00000000000
Probability that you will live forever: 0.000000000

Probability that you will pay taxes: 1.00000000000
Probability that you will never have to pay taxes: 0.00000000

Death and taxes are knowns. No need to model the probabilities.

joshesmusica · Aug 22, 2017

ajdelange said:
You are in the ballpark but not quite there. What this branch of statistics is all about is positing a hypothesis and then calculating the probability that some data you observed would arise under that hypothesis. If the probability is very low then the hypothesis probably isn't true and ought to be dismissed in favor of an alternative hypothesis.

As has been pointed out (my attempts a wit in another post aside) there are no probabilities of 0 or 1 so there is always uncertainty. We can't know that hot and cold fermented beers are indistinguishable. We can just say that it is unlikely that they are given the data that we observed. The question becomes how improbable do they need to be before we decide that for practical purposes we should say they are not indistinguishable. That is up to those who interpret the data to decide. The probability thresholds are often determined by the costs associated with being wrong (i.e. that they are the same but we decide they are different).

The first thing we would want them to do is correct the procedural errors. Then we'd like to see some of the tests repeated using the correct procedures. If they say "Here's what we got following ASTM protocols" then there data is much more likely to be accepted than if they violate ASTM protocols as they have done in the past. You can challenge me on this by saying "What difference does it make if they only presented BBA?" I'd have to answer that I don't really know but that question goes away of they follow the protocol (equal numbers of ABB, BAB, BBA, BAA, ABA and AAB).

I'm assuming you put "prove" in quotes because data based on larger panel size does not prove any more than data from smaller panel sizes. It may strengthen our convictions with regard to a position on a hypothesis though.

The other approach here is to look at the other information that is available from triangle test data i.e. the information on the range of likely differentianbility and on the level of differntiability that most likely explains the observed data. This may help us to reject or accept the hypotheses even though the confidence levels are not what we hoped for. It may not be necessary to increase sample size or pool data from multiple tests but doing either of those does improve the estimate of differentiability in addition to decreasing alpha.

Perfectly. My original suggestion to them here was that they shell out the $45 to ASTM for the procedure but can now change that to suggest that they go to #25 and follow the link there. With that in hand there should be common ground enough that I can answer any questions they might have, make suggestions,etc. This I am happy to do.

A couple of you are masters at the multi-quoting, I'm too lazy for that. haha.

1) I gotcha. I think. So, if a test shows that the probability is low, and thus should be thrown out, yet none of these ferment temps experiments (except for perhaps one) have reached that threshold of low enough to throw out, then maybe we should be *at the very least* retesting?

2) Ok, so how improbable do they need to be before you decide that for practical purposes you should say they are not indistinguishable?

3) A) I'd agree. My inclinations were that if there were some set of guidelines, that these tasters were likely not under those guidelines (one test that keeps being mentioned is the one at HomebrewCon from last year - yet how many tasters were already drinking? How many were isolated? etc.). And B) no, I wouldn't challenge you on any of this...

4) Precisely.

5) Yeah, I'm 99% sure I'm with you on this point.

6) Only thing I can say is, as others have said, I doubt Marshall and Co. are ready this thread. If they are, I think it'd be pretty awesome if the "two" of you got into contact. If they aren't, I think it'd be pretty cool if you, AJ, took contact with them - as I think they'd be pretty accepting of your ideas.

In the very beginning when Marshall was alone, he was simply doing side by side test, and if I remember correctly, even pretty much telling his testers what he was experimenting with. Then he decided to move to a triangle test because of suggestions from people he interacted with, and basically said to him, "I think it's awesome what you're trying to achieve here, but here's how you could improve." Then, after a bit more time, and after he had already published tons of *data,* he went back and said, "Actually, we were a bit too stringent with our specificity when it came to the null hypothesis thingy magig. With that said, here are a number of tests that would actually have reached significance had we started out with this method."

That being said, I really think that they would be receptive to hear that they could, and should, be doing their testing, and even the stats part of their testing, that much better.

bob1852 · Aug 22, 2017

ajdelange said:
The question becomes how improbable do they need to be before we decide that for practical purposes we should say they are not indistinguishable. That is up to those who interpret the data to decide. The probability thresholds are often determined by the costs associated with being wrong (i.e. that they are the same but we decide they are different).

Well said.

It saddens me to see that so much research these days is about chasing p-values in order to be published.

Slightly off topic, but I'd be interested your thoughts (and others) on:

http://fivethirtyeight.com/features...-agree-on-its-time-to-stop-misusing-p-values/

ajdelange · Aug 22, 2017

mongoose33 said:
Well, I suppose this is one way to view it, but lost in all the statistical handwaving is the ultimate problem with all of this, and that is the measuring process being used by the Brulosophy approach is hugely flawed.

I am not sure what is meant here because in #273 you wrote:

mongoose33 said:
Which, of course, is the single biggest issue with the triangle testing as it is presented. It's why "qualifying" people on the basis of a lucky guess doesn't make any sense,

This suggests that you feel that triangle testing itself is flawed because of the requirement that panelists guess if they can't decide. I've explained before why each increment from triangle, to quadrangle, to pentagonal testing increases the sensitivity of the test so I won't repeat that but rather ask that you step back and look at the forest. If triangle testing were flawed it wouldn't work and the food and beverage industries would probably have noticed this by now. They would not continue to use it nor would there be published standard procedures for it. Now perhaps I have misinterpreted the "...issue with the triangle testing as it is presented" phrase and perhaps this means that you accept the validity of triangle testing but find fault with the way Brulosophy has implemented it. I do too but wonder if "hugely flawed" is an accurate description. When I criticize them for only presenting 2 A's and 1 B (without knowing if they are permuted) I do so because the protocol calls for triplets equally and randomly distributed among the permutations of ABB and BAA. I recognize that there are doubtless good reasons for the random distribution requirements and can even guess what they are but I do not know what the effects of failure to adhere to this particular requirement are other than that they probably introduce noise. Noise masks the signal from the beer reducing the confidence level and our estimate of the signal to noise ratio i.e. the differentiability.

mongoose33 said:
One of the things one learns when one beats one's head against the statistical wall like I did is that one should never, not ever, forget that if you don't measure your variables reliably and validly, the statistics are not--ARE NOT--worth a hill of beans.

Not true! I cut my eye teeth on this stuff using statistical estimation theory to decode signals immersed in highly colored noise. The measurements were most un-reliable in this sense and yet the stats we collected enabled us to decode the signals. The results weren't beautiful but the tax payers ponied up a pretty tall mountain worth of beans in support of this effort. And we did beat our heads against the statistical wall, believe me. Because of this experience I view the art as being able to extract information in cases where the observations are corrupted. I see the faults in Brulosophy's implementation as corruptors but not sufficiently strong corruptors to render the data they have collected as useless. Why do I say that? Because I can extract differntiability estimates from their data. As noted in earlier posts one of their temperature differential experiments gave an estimate of 14% for differentiability and when that data was pooled with other sets of data the differentiability estimate was 16%. When they did another experiment on wheat beer the maximum liklihood estimate of the differentiability rose to 39%. Here's the analysis

H1(Pd=0.20): Panelists: 32; 3-ary test; 19 Correct Choices; P(< 19) = 0.89677; 0.22 < Pd < 0.56 with conf. 0.90
Most Likely Pd: 0.39000
•probs(32,19,3,.0,.90,1)
H0(Pd=0): Panelists: 32; 3-ary test; 19 Correct Choices; P(>= 19) = 0.00222

The consistency between the single test and the pooled data and the appreciably larger differentiability for the wheat beer (anyone who has fermented wheat beer at different temperatures knows the 'signal' with respect to temperature difference is much greater than with lagers), while they don't prove anything conclusively, suggest that Brulosophy's procedural errors may not be so serious after all. Perhaps were they to adhere to ASTM E1885-04 the differentiability estimates might go up (error induced noise goes down).

mongoose33 said:
I use this brulosophy material in my own classes as a way to illustrate what happens when people get lost in probability theory and forget that in the end,

Well clearly lots of people apply statistics in ways that lead to wrong conclusions so anyone who knows what he is doing conducts reality checks. The reality checks I just gave on the Brulosophy data seem encouraging though we don't know how they would change if the strictly followed the protocol.

mongoose33 said:
...without accurately measuring the variables at issue, one's conclusions are really uncertain at best.

Naturally there is going to be uncertainty induced by the noise. The maximum liklihood estimate is the location of the peak of the liklihood function. That peak has finite width.

mongoose33 said:
This was pointed out earlier in the thread, and it's an absolute thread-killer, in that it really cannot be refuted. This is fundamental to research, and to statistics.

I don't think anyone disagrees that if the SNR gets too low you can't measure differentiablilty but this is hardly a thread killer as it is apparent here, at least from the data I have looked at that the SNR isn't too low.

mongoose33 said:
We do not know who the samples are and thus to what populations they may be generalizable. We know that there is no consistency in what tasters may have or may not have been drinking prior to the triangle tests, and it's clear that there is quite the potential for palate fatigue, or taste-bud numbing.

This brings us back to ¶7.2 of the standard "Choose assessors in accordance with test objectives." If the objective is to determine whether fermentation temperature makes a detectable difference over a demographic of tasters trained and untrained, those with palate fatigue or not, drunk or not... then these data accurately (except for the other errors) represent that demographic.

mongoose33 said:
Tasters are "qualified" even if they just guessed the right answer, and then treated as if they are effective in distinguishing differences.

Yes, and this is exactly what you need to have panelists do in order to detect that the null hypothesis is true which is sometimes the thing one is interested in. The power of the triangle test relative to duo-trio and paired comparison tests (which also require guessing for the same reason) derives from the fact that the probability of a 'correct' guess under the null hypothesis is 1/3 as opposed to 1/2. That's why it tends to be used rather than those tests. You really need to understand this.

mongoose33 said:
We simply do not know who they are, how they are prepared, and that is not science, it's something else entirely.

Now if they reported in detail who the panel members were, whether squid and garlic sandwiches were served before the test, etc. then it would be science and that's what we are trying to get them around to.

mongoose33 said:
I have little doubt there will be another attempt to convince people with statistical hand-waving, but in the end, you can largely ignore that effort. It's just a way to distract from the fundamental problem with the brulosophy approach, something that the reliance on statistics cannot overcome.

So we don't accept the results of statistical analysis if they provide strong support for a position we don't like.

That triangle testing is widely used in the food, beverage, cosmetic and any other industry where sensory perception is of interest proves that it is a valid technique because if it weren't it would have been dropped. There. No statistics - just common sense. There is no fundamental problem with the Brulosophy approach as it is basically what they are doing in brewery's, candy factories, soft drink manufacturers, drug companies, the audio industry (where they call it an A, B, C test) etc. Now there are some problems with Brulosphy's implementation. They do not adhere to the accepted standard protocols exactly. But examination of the data seems to indicate that the shortcomings do not completely impair our ability to estimate differentiability.

mongoose33 said:
Of course, people can believe what they want to believe/

So it seems.

ajdelange · Aug 22, 2017

AZ_IPA said:
Slightly off topic, but I'd be interested your thoughts (and others) on:

http://fivethirtyeight.com/features...-agree-on-its-time-to-stop-misusing-p-values/

Yes, absolutely. I was so relieved to see that it said that even scientists have trouble explaining it. I have to think about it every time. I was also intrigued by the statement that p is not the whole story. I made that same assertion in a relatively recent post and have found myself in these investigations to be more inclined to be interested in the maximum liklihood estimate of differentiability than p.

Pkrd · Aug 22, 2017

Have we had jelly beens yet?
Have some jelly beans...

ajdelange · Aug 22, 2017

I just reread the article on p more carefully. Many good points. With respect to the current discussion, the most greivous error comitted by Brulosphy in the the temperature experiments was looking accepting the null hypothesis because they didn't have the 'requisite' p < 0.05. According to the article they have plenty of company among respected scientists.

I'm a Bayseian (by experience - I used Bayes theorem to find the most likely differentiability) and there is definitely a Bayseian tone to ASTM E1885 (as there is bound to be when money is involved) though the frequentist approach is emphasized. Perhaps this is recognition that, as the p paper suggests, both are needed to get a fuller picture and I guess that summarizes my thinking at this point.

Taking another look at the first Brulosophy experiment on temperature:
H0(Pd=0): Panelists: 21; 3-ary test; 9 Correct Choices; P(>= 9) = 0.23988
H1(Pd=0.20): Panelists: 21; 3-ary test; 9 Correct Choices; P(< 9) = 0.28653; 0.00 < Pd < 0.35 with conf. 0.90 Most Likely Pd: 0.14

There's data there but is it 'actionable' as mongoose likes to say? Should a brewery confronted with such data shut buy a refrigeration plant? Looking just at p we might decide it should not as there is enough support for the null hypothesis that we can't reject it. But looking at the liklihood results we see that the percentage of our demographic that can tell the difference is most likely 14%. Probably not high enough to justify the cost. But it's also possible that as many as 35% of our customers could tell a difference. Or as few as none of them. I wouldn't want to have to make a decision based on those results.

Now let's look at the most recent wheat beer test results again:
H0(Pd=0): Panelists: 32; 3-ary test; 19 Correct Choices; P(>= 19) = 0.00222
H1(Pd=0.20): Panelists: 32; 3-ary test; 19 Correct Choices; P(< 19) = 0.89677; 0.22 < Pd < 0.56 with conf. 0.90 Most Likely Pd: 0.39

This is more like it. p = 0.0022 and the liklihood computations show 39% as the most likely fraction of the demographic that would note the difference and that the range goes up to 56% in the 90% confidence band. I'd feel comfortable exploring the purchase of refrigeration gear based on that. This is actionable information.

Thanks to AZ_IPA for posting the link to the p paper. Very timely.

Kee · Aug 22, 2017

ajdelange said:
I just reread the article on p more carefully. Many good points. With respect to the current discussion, the most greivous error comitted by Brulosphy in the the temperature experiments was looking accepting the null hypothesis because they didn't have the 'requisite' p < 0.05. According to the article they have plenty of company among respected scientists.

I'm a Bayseian (by experience - I used Bayes theorem to find the most likely differentiability) and there is definitely a Bayseian tone to ASTM E1885 (as there is bound to be when money is involved) though the frequentist approach is emphasized. Perhaps this is recognition that, as the p paper suggests, both are needed to get a fuller picture and I guess that summarizes my thinking at this point.

Taking another look at the first Brulosophy experiment on temperature:
H0(Pd=0): Panelists: 21; 3-ary test; 9 Correct Choices; P(>= 9) = 0.23988
H1(Pd=0.20): Panelists: 21; 3-ary test; 9 Correct Choices; P(< 9) = 0.28653; 0.00 < Pd < 0.35 with conf. 0.90 Most Likely Pd: 0.14

There's data there but is it 'actionable' as mongoose likes to say? Should a brewery confronted with such data shut buy a refrigeration plant? Looking just at p we might decide it should not as there is enough support for the null hypothesis that we can't reject it. But looking at the liklihood results we see that the percentage of our demographic that can tell the difference is most likely 14%. Probably not high enough to justify the cost. But it's also possible that as many as 35% of our customers could tell a difference. Or as few as none of them. I wouldn't want to have to make a decision based on those results.

Now let's look at the most recent wheat beer test results again:
H0(Pd=0): Panelists: 32; 3-ary test; 19 Correct Choices; P(>= 19) = 0.00222
H1(Pd=0.20): Panelists: 32; 3-ary test; 19 Correct Choices; P(< 19) = 0.89677; 0.22 < Pd < 0.56 with conf. 0.90 Most Likely Pd: 0.39

This is more like it. p = 0.0022 and the liklihood computations show 39% as the most likely fraction of the demographic that would note the difference and that the range goes up to 56% in the 90% confidence band. I'd feel comfortable exploring the purchase of refrigeration gear based on that. This is actionable information.

Thanks to AZ_IPA for posting the link to the p paper. Very timely.

Good stuff. The cartoon reminds me of the two old-school behavioral psychologists with a total disregard for internal experience. After making love, the woman asks: "It was good for you, was it good for me?".

I am enjoying the discussion, even though I'm reminded how little I've retained from the couple of under-graduate courses I took in statistics.

betarhoalphadelta · Aug 22, 2017

joshesmusica said:
1) I gotcha. I think. So, if a test shows that the probability is low, and thus should be thrown out, yet none of these ferment temps experiments (except for perhaps one) have reached that threshold of low enough to throw out, then maybe we should be *at the very least* retesting?

I think you're still missing something that I completely missed until AJ said it about 2-3 times lol...

The hypothesis being tested is the null hypothesis: "These two beers produced with different methods are indistinguishable." That is the hypothesis we're TRYING to throw out during all this testing.

So if something doesn't achieve significance, we don't throw out the positive hypothesis of "fermentation temps matter", we fail to throw out the hypothesis "fermentation temps have no impact on the finished beer." It's hard to wrap your mind around that distinction, but once you do, it all falls into place.

When we say something "didn't achieve significance", it's referring to a commonly accepted threshold of p<0.05, which is actually a pretty stringent criteria. When we say that it didn't achieve significance, we are not throwing out the positive hypothesis, i.e. we are not proving they're distinguishable, but we are not declaring them indistinguishable. We're failing to prove beyond a certain confidence that they're distinguishable.

If you have, say, 36 tasters, you would expect pure guessing to result in 12 (33%) tasters selecting the odd beer. You would require 17 to achieve p=0.058 (close to significance) and 18 (50%) to achieve p<0.05 (actually p=028) to declare that the test was "statistically significant".

But what if you only get 16? That's p=0.109. We would declare that not statistically significant. But it's not 12 (or even less). So you're in a quandary. That result isn't enough to achieve significance, but it's also hard to accept the null hypothesis as true. You have to weigh the odds that the variance of 4 tasters is random chance (null hypothesis true) vs the odds that the variance of 4 tasters is real (null hypothesis false, but the beers are not distinguishable enough to achieve p<0.05).

The simple fact is that p<0.05 is to some degree an arbitrary threshold. And that sample size is important, because a sample of 20 tasters requires 11 (55%) to get to p<0.05 but a sample of 200 tasters requires only 79 (39.5%) to get to p<0.05. The percentage difference relative to guessing to declare significance is dependent on sample size.

This was my point with the meta-analysis. I looked at two things across a bunch of experiments testing the same variable:

1) The "error", i.e. the results relative to chance, always pointed in the same direction. It was ALWAYS >=33% of tasters selecting the odd beer, but the results weren't strong enough to reliably achieve significance. If the "error" relative to guessing was both positive and negative, I would have a lot more difficult time improving the significance by adding the experiments together.

2) If the error is always the same direction but the experiments don't achieve significance, I assume that's a sample size problem. I.e. "these beers are different but not different enough that it's easy to detect." By aggregating the experiments, I "create" a larger sample size, and with a larger sample size, I need a lower percentage correctly selecting the odd beer to declare p<0.05.

2) Ok, so how improbable do they need to be before you decide that for practical purposes you should say they are not indistinguishable?

Ay, there's the rub...

I'm personally convinced based upon what I've done here in the meta-analysis. I have rejected the null hypothesis, that ferm temp doesn't matter. (Preference is something I am still unsure of, but that's a different issue).

applescrap is not. He thinks I'm basically searching for numbers in support of my bias (that ferm temp matters), which is certainly possible. And I suspect he thinks it affects the beer, but that it's not a sizable enough effect to worry about. He's also pointed to the preference numbers in some tests for warm ferment as evidence that even if the two are different we can't categorically say "cool ferment is better" in all instances, which is truly what we're trying to discover.

mongoose IMHO may not be convinced. He has legitimate concerns about the quality of the tasting panels and the methodology of the study itself that make him doubt that the original studies are good enough. As I pointed out elsewhere, even if my meta-analysis was perfectly conducted, it doesn't correct for methodological errors in the original studies.

I'm not sure of AJ's position, actually, because as he's said he hasn't really read the bulk of the experiments. He's been invaluable to all of us by providing background on testing methodology. I would say that he's likely rejected the null hypothesis as well based on what he's said, but trying to say anything beyond that statement isn't supported by what I've read that he's posted in this thread.

So there you go. 4 different posters who have been arguing this stuff vociferously, and we could have as many as 4 different interpretations of the data.

-----------------------------------

BUT, and this is the important thing, AJ made a very useful point. In all these things we're talking about confidence and error, and how they are balanced against the cost of being wrong (either way; the cost of additional equipment/process to do things a certain way vs the cost of making sub-par beer).

I use fermentation temperature control. I really like the beer I make. I've been doing this 10+ years and based on my process, I believe I'm making commercial-quality beer. I've already got sunk costs into the fridge and temp controller, so the only marginal cost of temp control is electricity and floor space in my garage. Even if I'm wrong, I see no reason to change my process because whatever I'm doing is working. So I'm just geeking out on statistics, numbers, and debate, not actively looking to change my own process anyway.

To answer your questions about how improbable it needs to be, the answer is up to you. If you're not using temp control, do the results of the experiments justify in your own mind that you should start using temp control? If you're already using temp control, do the results of the experiments justify in your own mind that the electricity and space savings are justified to get rid of your temp control setup? That's all that matters.

ajdelange · Aug 22, 2017

bwarbiany said:
The hypothesis being tested is the null hypothesis: "These two beers produced with different methods are indistinguishable." That is the hypothesis we're TRYING to throw out during all this testing.

Sometimes we want to throw out the null hypothesis for example if we are trying a new more expensive malt in the hopes that it will make the beer so much more delicious that new customers will flock. But sometimes we want to accept the null hypothesis for example, if we are trying a new cheaper malt in the hopes that customers won't notice and sales will be unaffected. Or, more applicable to the home brewing scenario, if a triangle test revealed, to statistical significance, that a panel of experienced tasters couldn't tell the difference between single decocted and triple decocted beers I could save myself a lot of work.

But in either case we make our decision based on testing the null hypothesis and to do that we must require panelists to guess if we are to obtain scores when the null hypothesis is true or nearly so. This is the point that mongoose seems to be unable to grasp.

bwarbiany said:
It's hard to wrap your mind around that distinction, but once you do, it all falls into place.

I dunno. I still have to be very careful with it.

bwarbiany said:
When we say something "didn't achieve significance", it's referring to a commonly accepted threshold of p<0.05, which is actually a pretty stringent criteria.

But often, if the consequences of accepting the null hypothesis are expensive enough, it is lower than that. 0.01 and 0.001 are the other 'popular' values. Unfortunately, as discussed in the paper referened by AZ_IPA people who don't really understand what it means often grab 0.05 simply because it is a number that has become a sort of maximum allowable.

bwarbiany said:
When we say that it didn't achieve significance, we are not throwing out the positive hypothesis, i.e. we are not proving they're distinguishable, but we are not declaring them indistinguishable. We're failing to prove beyond a certain confidence that they're distinguishable.

We can't prove anything with statistics. When we say that a test does not achieve a desired level of significance it says that the data we observed supports the null hypothesis more strongly than the level at which we would feel comfortable dismissing it.

bwarbiany said:
If you have, say, 36 tasters, you would expect pure guessing to result in 12 (33%) tasters selecting the odd beer. You would require 17 to achieve p=0.058 (close to significance)

If you got 17 you would say "The test was significant at the 0.058 level".

bwarbiany said:
and 18 (50%) to achieve p<0.05 (actually p=028) to declare that the test was "statistically significant".

The test was significant at the 0.028 level. For 19 "The test was significant at the .0125 level." The level at which the test is deemed sufficiently significant is up to the person making the decision. For example, AZ_IPA's article implies that p < 0.05 to get a research paper published. p*cost_of_false_alarm is the expected cost of Type I errors. Obviously we want that to be as low as possible but as the ROC discussions of earlier posts show when we set threshold for lower p we are also reducing the probability of detection for a a particular test and increasing the probability of false dismissal (Type II error). There is a cost associated with this too: p_FD*cost_of_false_dismissal. It's pretty clear that we want to set threshold to the value which minimizes
p*cost_of_false_alarm +p_FD*cost_of_false_dismissal
Think about an air defense radar when pondering this.

bwarbiany said:
But what if you only get 16? That's p=0.109. We would declare that not statistically significant. But it's not 12 (or even less). So you're in a quandary. That result isn't enough to achieve significance, but it's also hard to accept the null hypothesis as true. You have to weigh the odds that the variance of 4 tasters is random chance (null hypothesis true) vs the odds that the variance of 4 tasters is real (null hypothesis false, but the beers are not distinguishable enough to achieve p<0.05).

All you can say here is that the probability of getting 16 or more hits out of 36 tasters under the null hypothesis is .109. This means that if you did 1000 panels just like this one giving them identical beers you'd expect 16 or more hits in 109 of them. This isn't vanishingly small support for the null hypothesis so most people wouldn't feel comfortable rejecting it but then it isn't glaringly strong support for it either. Were the beers indistinguishable you would expect 12 of them to get the right answers and you got 16 which is quite a few more so the beers probably are distinguishable but not enough to be able to say "no way we could have gotten this many hits if they were indistinguishable". In a case like this we look for other information from the data i.e. the probable differentiability of the beers (on a scale of 0 which represents the null hypothesis to 1 which means that 100% of the population of interest can tell them apart). Sixteen out of 36 right suggests that the differentiability lies, with 90% probability, between 0.01 and 0.33 and has most likely value 0.17. This is not a terribly strong signal that the beers, as evaluated by this panel, are differentiable but it is a signal to this effect. It tells you that another test needs to be done with a larger number of panelists to see if we can tighten that band around the most likely value. But p = 0.109 says that same thing.

bwarbiany said:
The simple fact is that p<0.05 is to some degree an arbitrary threshold.

It's completely arbitrary. The required value depends on the application.

bwarbiany said:
And that sample size is important,

You want a sensitive test and sensitivity depends on panel size. The other big factor in sensitivity is the probability of a correct guess under the null hypothesis. That's why a triangle test (pc = 1/3) is more sensitive than a duo-trio test (pc = 1/2) and why a quadrangle test (pc = 1/4) is more sensitive than a triangle test.

bwarbiany said:
1) The "error", i.e. the results relative to chance, always pointed in the same direction....

2) If the error is always the same direction but the experiments don't achieve significance, I assume that's a sample size problem.

I think you are on pretty solid ground here.

bwarbiany said:
mongoose IMHO may not be convinced. He has legitimate concerns about the quality of the tasting panels and the methodology of the study itself that make him doubt that the original studies are good enough.

He is not on board because he doesn't understand that the basis for any of these discrimination tests (paired comparison, duo-trio, triangle, quadrangle...) is testing of the null hypothesis and that requires guessing. Despite the wide acceptance of these test methods he thinks them seriously flawed. His concerns about Brulosophy's procedural and analysis errors are, of course, valid. I indicated in my response to his post that I believe the flaws to obscure the signal from the beers to the point where the observed differentiation value (14 - 16) may be attenuated. IOW with those flaws removed we might find the beers differentiable at an appreciably higher level.

bwarbiany said:
I'm not sure of AJ's position, actually, because as he's said he hasn't really read the bulk of the experiments. He's been invaluable to all of us by providing background on testing methodology. I would say that he's likely rejected the null hypothesis as well based on what he's said, but trying to say anything beyond that statement isn't supported by what I've read that he's posted in this thread.

I'm definitely on board with respect to rejection of the null hypothesis though I am now looking at it through the differentiability parameter estimates rather than consideration of p. A signal at the level Pd = 0.14 (14% of the population can distinguish the beers) doesn't exactly blast alternate hypothesis at you but as the null hypothesis corresponds to Pd = 0 (0% of the population can distinguish them) it seems quite unlikely that the null hypothesis applies to these beers.

bleme · Aug 22, 2017

Marshall came and spoke at my homebrew club meeting about a year ago. I wasn't able to make it to that meeting, but I can tell you that testers were instructed to clear their palette with salt-free crackers and water before and between samples.

mongoose33 · Aug 22, 2017

ajdelange said:
I am not sure (etc.)

AJ, I get it. You don't understand how measurement figures into all this. That doesn't make you a bad person--but it does mean you're missing a basic element of research, one whose failure invalidates conclusions.

Here's a quick and dirty example. Suppose we want to measure people's math aptitudes. We ask them, as our measure, to step on a scale which measures....something.

We ask them to step on, and off, and on, and off, and it returns a consistent figure each and every time. Very consistent, very reliable. The numbers show about 167, which is their math aptitude, right?

Of course, not right. If what you are measuring is not a reasonable measure of the concept (what we call operationalization of the concept), then the figures--and the results!--are meaningless.

Why? Because the measure doesn't measure what it purports to measure. In other words, what we've been talking about all along, Such as when you "qualify" tasters based on a lucky guess. Who would ever intentionally do such a thing?

************

Or try this. We're measuring weight, not math aptitude. We have a person step on, and off, and on, and off, a scale 10 times. The scale returns 167, 155, 111, 211, 106, 92, 47, 113, 197, 175.

So is that a good measure of their weight? Of course not. It's unreliable. An unreliable measure cannot be a valid measure.

In the triangle tests using humans as testers, we have no indication of the reliability of their abilities. In fact, there's a lot of reason to suggest they may be unreliable. And of course, if their reliability is suspect, so too are the results pertaining thereto.

*************

This is basic research and measurement. No amount of statistical handwaving can overcome that.

None of this makes you a bad person. But you cannot take measures that are unreliable, whose validity as a result is suspect, whose generalizability is uncertain at best, and draw conclusions that are meaningful. You don't obtain actionable intelligence. You keep trying, and your doggedness in that is admirable, but without quality measurement, there's nothing to say.

As I tell my students, measurement is where the rubber meets the road. If you can't measure effectively, the rest is meaningless.

cmac62 · Aug 22, 2017

Mongoose welcome back! I'm stuck because I see what both you and AJ are saying. But with any testing with human's opinion as the measurement is there a way to truly dial it in to an acceptable level? I did read the posts on panel selection, but that was a long time ago (three or four days) and it starts to get a little fuzzy after so many beers :mug:

I was forced to take stats once in pursuit of a BA is Psych and once for my MA, I made it through both, but just barely. I have enjoyed the spirited debate and still have no idea what a "p" is. LOL :ban:

Smellyglove · Aug 22, 2017

Man, If I'd brew beer after what's "not statistically significant" after Brulosophys tests.. disregarding this and that during brewing, I know I'd be making pretty ****ty beer. Those experiments "debunk" so many things which I myself know are true that they have become like reading a comic.

At first it was interesting reads, but when one experiment after another tells you that "it doesn't matter", when you've been experimenting with the same yourself, and gotten pretty terrible results.. Well. Then the rest of the experiments get cut over the same comb. To be honest it's pretty LOL when someone uses brulosophy as a reference. I'm pretty sure he got some sponsors and "must" continue to try things out, but please. Mash pH-test in an IPA and also using gelatine?

giraffe · Aug 22, 2017

The_Bishop said:
I'm not knocking *anyone* who happens to be a BJCP judge, but I'm not real impressed by the quality and consistency of them, either.

Case in point: Last beer comp I entered, I sent in a Foreign Extra Stout with my other entries. A buddy of mine paid for an entry but didn't have any finished beer to submit, so I gave him two bottles of *the exact same beer.*

One came back with a score in the low 30's with notes of 'too much caramel, mouthfeel too light.'

The other came back with a low 40's score and no significant flaws noted.

These were the *same* beers, in the *same* flight, by the *same* judges.

More or less killed my desire to enter competitions and renders the feedback to be horribly suspect.

Not to derail, but as a bjcp judge, im amazed at the difference sometime from bottle to bottle of an entry, from the first judging to a mini-bos, or sometimes when you ask for a second bottle. With the "rough" handling that some competitions or drop off points do to the beer before it comes to the table; im suspecting more and more alot of homebrew competitions is how well you bottle. (cause i know that the brown wet cardboard that arrived at the table saying its an apa, was probably fine when it left the brewers keg). So that could just be variance there. Or crappy judges, that happens too; Or the judges got a flight of 14 and one was #2 and one was #14.

eric19312 · Aug 22, 2017

Wow Smellyglove now I get why scientists mainly publish successful experiments.

The brulosophers did not invent the triangle test which is clearly an accepted tool in sensory analysis. Yes they make some compromises in design and administration that might be used to throw some question on the results yet I'm not seeing anyone doing it better in our community. I admit I don't see much professional brewing literature but did read a paper today published in a serious journal authored by brewers from Rock Bottom. Compared four different late hop techniques and was pretty interesting. But turns out they brewed the four batches at different breweries, using different ingredients, and apparanlly amazingly different waters. Had a batch with 1200ppm sulfate compared to beers below 100.... And because this is professional brewing and published in a serious presumably peer reviewed journal it must be well done and valid but the homebrewers are done in by failure to randomize AAB to ABA in their sensory test?

Smellyglove · Aug 22, 2017

eric19312 said:
Wow Smellyglove now I get why scientists mainly publish successful experiments.

The brulosophers did not invent the triangle test which is clearly an accepted tool in sensory analysis. Yes they make some compromises in design and administration that might be used to throw some question on the results yet I'm not seeing anyone doing it better in our community. I admit I don't see much professional brewing literature but did read a paper today published in a serious journal authored by brewers from Rock Bottom. Compared four different late hop techniques and was pretty interesting. But turns out they brewed the four batches at different breweries, using different ingredients, and apparanlly amazingly different waters. Had a batch with 1200ppm sulfate compared to beers below 100.... And because this is professional brewing and published in a serious presumably peer reviewed journal it must be well done and valid but the homebrewers are done in by failure to randomize AAB to ABA in their sensory test?

I've done the same experiments as the one in the mentioned paper and to me that was pretty "old news". They did not report the temperature for the WP hops though, I feel that's pretty important. Over or under a certain threshold will give you a very noticable difference. The results were on par with my own experiences, but the WP temperature has a lot to say..
I'm a partypooper. I somewhat trust some sources (when I've double/triple/quadruple checked them) and/or tried it out for myself, if it's something I'm able to try in my brewery.

1200PPM of sulfates is A LOT! I dind't see that number in the paper. Wow! What does that even taste like?

But. When you said "scientists mainly publish successful experiments".. Aren't all experiments sucessful? It's about testing something, see what impact it has. If the result is opposite of what you'd want it to be, then I can see that one could call it an "unsuccessfull" experiment, but that's just because the scientist was hoping/expecting a given result, i guess. How can an experiment be unsuccessful?

ajdelange · Aug 22, 2017

mongoose33 said:
AJ, I get it. You don't understand how measurement figures into all this. That doesn't make you a bad person--but it does mean you're missing a basic element of research, one whose failure invalidates conclusions.

I would really like you to be able to understand this so I have a simple request for you. Please go to http://editorbar.com/upload/ReBooks/...703b059e4b.pdf and read it. It is only 8 pages. It is ASTM E1885 Standard Test Method for Sensory Analysis - Triangle Test. ASTM is the American Society for Testing and Materials and they promulgate standard procedures to industry for various measurement protocols. Another example whose number I remember is E308 for measurement of color in the CIE tri stimulus system. Investigators in industry use these protocols to insure that they all do a particular measurement in the same way so that inter laboratory comparisons are valid. If you will indeed read this and still feel that triangle testing is "hugely flawed" at least I will be certain that your feelings are based on knowledge of what triangle testing really is. And that's important to me because I can't believe you would oppose it so vehemently if you really understood what it is.

Now when you understand what it is if you still feel it hugely flawed I believe that you are morally obliged to explain this to ASTM because E1885 is accepted as an authority when it comes to sensory testing. They should not, nor would they want to, continue to mislead the world. Also contact ISO as they have a similar standard ISO 4120:2004 which is also promulgated by DIN.

mongoose33 said:
Such as when you "qualify" tasters based on a lucky guess. Who would ever intentionally do such a thing?

Well the thousands of investigators world round who use ASTM E1885 or ISO 4120 to evaluate their companies' products would.

mongoose33 said:
Or try this. We're measuring weight, not math aptitude. We have a person step on, and off, and on, and off, a scale 10 times. The scale returns 167, 155, 111, 211, 106, 92, 47, 113, 197, 175.

So is that a good measure of their weight? Of course not. It's unreliable. An unreliable measure cannot be a valid measure.

Ah, the light comes on. You don't understand an important aspect of measurement and that is that measurements are almost always corrupted by noise. I think you said you are in the social sciences so that may explain it. Engineers and scientists deal with measurement sequences like this all the time and extract useful (actionable) information from them. The guy's weight is 137.4 ± 16.3 lbs. The measurements are valid but are corrupted by noise - in this case quite badly corrupted to the point that we would certainly want to investigate. Estimation theory is an area of study that focuses on extracting useful information from measurements corrupted by noise. We have an estimate of the weight modeled by a gaussian random variable with mean 137.4 and standard deviation 16.3 lbs. If I put a 150 lb guy on that same scale (same meaning it corrupts readings to the same extent) I would obtain a set of readings like 165,156,127,125,92,163,217,83,213,123 from which I would conclude this man weighs 146.4 ± 14.35. Not 150 pounds for sure but clearly the change in weight has been detected. We don't want to weigh things with scales this bad but sometimes we have to and we have the means to deal with such badly corrupted data and extract useful information from them. This may be terra incognita to a social scientist but is an important aspect of measurement in the physical sciences. Your smart phone, for example, uses sophisticated techniques to extract voice and data from what you would call unreliable readings.

mongoose33 said:
In the triangle tests using humans as testers, we have no indication of the reliability of their abilities. In fact, there's a lot of reason to suggest they may be unreliable. And of course, if their reliability is suspect, so too are the results pertaining thereto.

I don't expect that you will understand this either but sometimes we are interested in measuring the 'unreliability'.

mongoose33 said:
This is basic research and measurement. No amount of statistical handwaving can overcome that.

I'm talking about solid, accepted use of statistics to extract useful information from corrupted data.

mongoose33 said:
None of this makes you a bad person.

I'm really trying to bring you into the light here.

mongoose33 said:
But you cannot take measures that are unreliable, whose validity as a result is suspect, whose generalizability is uncertain at best, and draw conclusions that are meaningful.

Well yes you can. Scientists and engineers do that all the time. Your cell phone does it. Your TV set does it. Your GPS receiver does it.

mongoose33 said:
You don't obtain actionable intelligence.

The number of systems that obtain actionable intelligence from corrupted signals is myriad.

Insight is finally coming to me. This is a hypothesis of course but it seems from your remarks that social scientists don't have the tools for extracting information from noise and so reject any data set that is noisy. But physical scientists do and so are able to work with signals that are are little above the noise and, in some cases, well below it (GPS). The problem I see with the "reject all noisy" data sets approach is that all data is noisy to some extent.

mongoose33 said:
You keep trying, and your doggedness in that is admirable, but without quality measurement, there's nothing to say.

I would like to have you understand and a first step would be to read that standard so please consider doing that. But I recognize that you may not be able to and this brings me to the other reason for my doggedness. Your position is frankly lacking in merit and I feel an obligation to the other readers to be sure that they don't reject triangle testing as a possible source of benefit to them because one individual not versed in this aspect of the art tells them it is flawed.

mongoose33 said:
As I tell my students, measurement is where the rubber meets the road. If you can't measure effectively, the rest is meaningless.

No argument here. Our big disconnect is that you can in fact measure effectively under circumstances under which you think we can't.

ajdelange · Aug 22, 2017

Smellyglove said:
At first it was interesting reads, but when one experiment after another tells you that "it doesn't matter"

Brulosophy makes several mistakes in procedure but their most serious mistake is in the interpretation of their results. They say it doesn't matter because they don't reach the arbitrary p < 0.05. But their data, at least the bits I've looked at, clearly say that there is a signal if a weak one that it does make a difference.

ajdelange · Aug 22, 2017

Smellyglove said:
How can an experiment be unsuccessful?

An experiment paid for by an oil company that found that the burning of hydrocarbon fuels caused global warming would be unsuccessful. An experiment paid for by a government agency that found that burning hydrocarbon fuels did not cause global warming would be unsuccessful.

Smellyglove · Aug 23, 2017

ajdelange said:
An experiment paid for by an oil company that found that the burning of hydrocarbon fuels caused global warming would be unsuccessful. An experiment paid for by a government agency that found that burning hydrocarbon fuels did not cause global warming would be unsuccessful.

This is politics. I was just after the homebrew-level.

Smellyglove · Aug 23, 2017

ajdelange said:
Brulosophy makes several mistakes in procedure but their most serious mistake is in the interpretation of their results. They say it doesn't matter because they don't reach the arbitrary p < 0.05. But their data, at least the bits I've looked at, clearly say that there is a signal if a weak one that it does make a difference.

Maybe I havent studied their data enough, but it didn't take long until i realized I didn't want to waste my time, so it's on me.

Kee · Aug 23, 2017

eric19312 said:
Wow Smellyglove now I get why scientists mainly publish successful experiments.

Publication bias is a serious problem and that's one example:

" However, statistically significant results are three times more likely to be published than papers with null results."

https://en.wikipedia.org/wiki/Publication_bias

ajdelange · Aug 23, 2017

Smellyglove said:
This is politics. I was just after the homebrew-level.

I thought that example might illustrate the general principle that an unsuccessful experiment is one that does not give the result that the investigator wants for whatever reason.There can be several reasons why that happens that relate from procedure to analysis to politics. If an experiment leads to the conclusion that fermentation temperature does not matter in brewing that's an unsuccessful experiment in that everyone knows it does. The investigator(s) will not be pleased with the result, will assume that they did something wrong and try to figure out what it was (in this case mostly over reliance on small p and procedural errors), takes steps to correct and rerun the experiment if the resources are available.

It is not always the case that doing this gives the 'right' result but in such cases the investigators learn that the sought after result does not represent truth. In this sense the experiment was a success in that the truth was found even though the experimenter (or his sponsor) may not be pleased with the result. This is, I am sure, is what you meant. I was trying to be pithy (and fell flat - yet again).

Of course it is well know that no one attains fame or fortune for negative results. When Hata found that compound 605 woudn't cure syphillis I'll bet Ehrlich didn't give him an attaboy even though he obviously found something of importance. I'll bet there was a much bigger stir around the lab when the experiment with compound 606 found that in it lay the potential cure (Salvarsan).

Smellyglove · Aug 23, 2017

Aha, i see your point there.

betarhoalphadelta · Aug 23, 2017

ajdelange said:
Ah, the light comes on. You don't understand an important aspect of measurement and that is that measurements are almost always corrupted by noise. I think you said you are in the social sciences so that may explain it. Engineers and scientists deal with measurement sequences like this all the time and extract useful (actionable) information from them. The guy's weight is 137.4 ± 16.3 lbs. The measurements are valid but are corrupted by noise - in this case quite badly corrupted to the point that we would certainly want to investigate. Estimation theory is an area of study that focuses on extracting useful information from measurements corrupted by noise. We have an estimate of the weight modeled by a gaussian random variable with mean 137.4 and standard deviation 16.3 lbs. If I put a 150 lb guy on that same scale (same meaning it corrupts readings to the same extent) I would obtain a set of readings like 165,156,127,125,92,163,217,83,213,123 from which I would conclude this man weighs 146.4 ± 14.35. Not 150 pounds for sure but clearly the change in weight has been detected. We don't want to weigh things with scales this bad but sometimes we have to and we have the means to deal with such badly corrupted data and extract useful information from them. This may be terra incognita to a social scientist but is an important aspect of measurement in the physical sciences. Your smart phone, for example, uses sophisticated techniques to extract voice and data from what you would call unreliable readings.

And now we're getting into my wheelhouse... When you look into electronics, it's *all* about extracting signal from waveforms that look to the untrained eye like noise.

My own industry (data storage) is telling. When I first started getting into hard disk drives, I assumed that the data on the platter would show up as a relatively clear "1" or "0"* based upon the magnetic read data. Not true at all... It looks like noise to me.

So how do they get the bits out of your HDD? It's probability-based. They use a method called PRML - partial response, maximum likelihood. It layman's terms, it's basically taking an educated guess and then checking it against the ECC (error correction and checking) codes.

http://www.pcguide.com/ref/hdd/geom/dataPRML-c.html
https://en.wikipedia.org/wiki/Partial-response_maximum-likelihood

Now, when you actually think about it, it seems like fantasy that you can tease some of these signals out of that noise. But it works. My livelihood and the sanctity of your digital data rely on it working.

Similar things are used in a lot of data transmission scenarios. Your cell phone signal is relatively weak, and it's being sent through the air where all manner of other electronic communications are being sent and potentially interfering with yours. Upon receipt at the cell tower, I'll bet that incoming signal looks like noise. Yet your calls go through.

To bring it back to AJ's point, any time you're dealing with human sensory analysis, you have to assume that there is going to be significant noise. Like the experiments that blindfolded wine experts couldn't always tell red wine from white wine, we are very corrupted instruments. But that DOES NOT mean that all tests are unreliable. It simply means that you have to develop tests which can tease a signal out of the noise.

* Overly simplified, as it is not stored as high/low binary values but based on transitions. But that's getting into the weeds.

Do "professional" brewers consider brulosophy to be a load of bs?

Help Support Homebrew Talk:

Supporting Member

Supporting Member

Supporting Member

Supporting Member

Supporting Member

Supporting Member

Supporting Member

Supporting Member

Supporting Member

Supporting Member

Supporting Member

Supporting Member

Supporting Member

Supporting Member

Supporting Member

Supporting Member

Supporting Member

Well-Known Member

Well-Known Member

Supporting Member

Well-Known Member

Supporting Member

Supporting Member

Supporting Member

Well-Known Member

Well-Known Member

Supporting Member

Supporting Member

Well-Known Member

Supporting Member

Similar threads