How much does Brulosopher affect your brewing?

Homebrew Talk - Beer, Wine, Mead, & Cider Brewing Discussion Forum

Help Support Homebrew Talk - Beer, Wine, Mead, & Cider Brewing Discussion Forum:

This site may earn a commission from merchant affiliate links, including eBay, Amazon, and others.
On a slightly different note:

One thing I think the Xbmts show is that, by and large, a lot of the edicts we are taught are "critical" are really not. Unless you have terrible sanitation and infect your beer, or mistreat your yeast by fermenting at extreme temperatures, you're going to make something decent.

Whether it's as malty, hoppy, crisp, alcoholic, carbonated, clear, or any other adjective matching your expectations is another story. But you probably won't have destroyed it.

So if anything, a lot of the work on Brulosophy shows that brewing is pretty forgiving. And that's a good thing - because, just look in the Beginner's forum for all of the "Did I screw up?" type posts. Brewing is fraught with scary no-no's... and perhaps not all of them are such a big deal after all.
 
They are interesting, but almost always I would say the sample size is too small. If we could do this with hundreds of trained judges I'd listen more.

This is where his statistical significance calculations, that some don't like, come in to play. He accounts for the small sample size by saying he would need a calculated number of people to detect a difference for it to be important. Regardless if his sample population was 10 or 100 you'll need the same percentage of people to detect the difference for it to actually matter.
 
Yeah, sample size in statistics is largely irrelevant, what matters is that your test population is representative of your overall population.

Also, as he is doing triangle tests this makes his results much more robust.
 
1. In my mind, you aren't in a position to critique how the work was done if you aren't willing to get off your fat ass and do the work yourself.

2. This guy is brewing beer and trying to apply some scientific method to the process (and by his own admission there are holes in the experiments). This isn't a NIH/NSF funded peer-reviewed study.

1. That is not how research works. They are trying to frame what they do as research - therefore, it gets critiqued. Sorry/not sorry.

2. You are right, the brewers on that site publish their findings without peer review. That then means that the readers have to self-filter out the good stuff from the irrelevant.
 
1. That is not how research works. They are trying to frame what they do as research - therefore, it gets critiqued. Sorry/not sorry.

2. You are right, the brewers on that site publish their findings without peer review. That then means that the readers have to self-filter out the good stuff from the irrelevant.

1. Most of the people critiquing his work have no idea what a confidence interval is.
2. You always have to self-filter the good stuff. Critical thinking is a life skill of which the vast majority of us are sorely lacking.

Eh, even peer reviewed research is a crap shoot. I've seen systematic reviews get published in medline where 2, TWO, articles fit the inclusion criteria, and one was a case report. Even Cochrane reviews can have bias, assumptions, and conclusions unsupported by evidence.

Don't get me wrong - the scientific method is still the best we've got. In the home brew world, Brulosophy's experiments are actually the most rigorous ones being done. Certainly they're not perfect, and there's been a few where I've shaken my head and wondered "Next time, propose your study and get feedback on your methodology before you go through with it." BUT: I think one of the problems people have with Marshall's work is that they're taking the perspective that he's challenging "gold standard" methods with novel ones, IE: he's challenging orthodoxy.

That's the problem. The home brew world's orthodoxy is...well, it doesn't stand up to a lot of scrutiny. How many people think Starsan is a good general purpose sanitizer? It's not. How many people think mashing at a high temperature produces a maltier/sweeter beer? It doesn't. How many people think dry hopping more than 24 hours helps? It doesn't. How many people think filtering out hop trub and hot break proteins results in a clearer beer? It doesn't. How many people think that to avoid DMS character in modern Pilsner malt you need to boil for 90 minutes? You don't. How many people think you should avoid electric heating elements because they scorch the wort? They don't. How many people think you get maillard reactions by boiling down a gallon of wort to half or less for a decoction? You don't.

The homebrew world, of which HBT is a big part, contains far more lore than actual fact. Brulosopher's work isn't perfect by any means, but man, it's heads and tails above the free advice we get and give here.
 
1. That is not how research works. They are trying to frame what they do as research - therefore, it gets critiqued. Sorry/not sorry.

2. You are right, the brewers on that site publish their findings without peer review. That then means that the readers have to self-filter out the good stuff from the irrelevant.

1. What Marshall and his crew are doing is NOT research. They are doing confirming experiments to prove or disprove some of the 'brewing facts' that get constantly repeated on forums such as this one. This is common in research for independent groups to repeat experiments or devise some of their own to challenge or confirm a hypothesis which has been presented. In no way has he passed the work that they do as 'research'. For those that cite the consistent write up they publish, that is called 'documentation' and is critical to providing a basis for drawing whatever conclusions people take away from their experiments. If they did not provide such documentation, people would be even more critical of how they did their experiments. I applaud their efforts at working to be consistent in their presentation.

2. Many brewers on this site and others do quote their 'discoveries' without peer review and, at times, as established fact. Over the past three years, I have seen many people, mostly those that are new to brewing, start repeating what they have been told as a fact which perpetuates many fallacies. While people such as yourself are savvy enough to understand that what is stated is an observation and not a fact, we have other people here who are not in technical fields who do not understand the difference. Having someone put some boundaries and framework around these 'brewing facts' is not a bad thing.

In short, believe what you want. If you distrust the information, then stop reading it. If you have questions, I believe that the Brulosophy crew has been pretty open to answering challenges and in providing disclaimers to the work that they do.
 
1. In my mind, you aren't in a position to critique how the work was done if you aren't willing to get off your fat ass and do the work yourself.

2. This guy is brewing beer and trying to apply some scientific method to the process (and by his own admission there are holes in the experiments). This isn't a NIH/NSF funded peer-reviewed study.

Take the results for what they are worth.

Jesus.
Way to make this personal with insults.
Real nice.
 
I like a lot of his stuff. I think him trying to do one a week has led to some subpar dilution though. His mash temp one reaffirmed my thoughts, which I had already sold my herms rig for. I don't get why there would be animosity towards h..
 
I love the articles. They help me relax a little about brewing, and try new processes without worry.

^^^^^ Amen brother....I was a reck before I started reading his stuff....Always wondering how much better the beer would have been if I mashed at 152 instead of 150....or 5 vs 7 days of dry hopping. Not to mention the $$ I'm saving on equipment that his experiments have convinced me I don't need for where am at with this hobby.
 
1. That is not how research works. They are trying to frame what they do as research - therefore, it gets critiqued. Sorry/not sorry.

2. You are right, the brewers on that site publish their findings without peer review. That then means that the readers have to self-filter out the good stuff from the irrelevant.

Wait... Why isn't star san any good? I have *never* seen anything to support that at all.

Quoted from https://www.homebrewersassociation.org/forum/index.php?topic=24725.15

"It does not matter how well one cleans when the microbe is yeast or mold. Star San cannot kill these microorganisms due to its mode of action. Dodecylbenzenesulfonic acid (the active ingredient in Star San) kills via attraction to positively charged cells (hence, the anionic part of acid-anionic). Once inside of a bacteria cell, the surfactant goes about wreaking havoc on cellular function. Yeast and mold cells are negatively charged; therefore, Star San is not effective against these organisms."

From Five Star: http://www.fivestarchemicals.com/breweries/craft-brewers/products/

They list IOStar conspicuously as being effective against most moulds and yeast. Starsan's just an acid anionic sanitizer.
 
That being said, there are huge amounts of beer (and wine, etc) being made with Star San used to sanitize the equipment. The act of properly cleaning will remove wild yeast and mold, then the sanitizing takes care of the rest.
 
Quoted from https://www.homebrewersassociation.org/forum/index.php?topic=24725.15

"It does not matter how well one cleans when the microbe is yeast or mold. Star San cannot kill these microorganisms due to its mode of action. Dodecylbenzenesulfonic acid (the active ingredient in Star San) kills via attraction to positively charged cells (hence, the anionic part of acid-anionic). Once inside of a bacteria cell, the surfactant goes about wreaking havoc on cellular function. Yeast and mold cells are negatively charged; therefore, Star San is not effective against these organisms."

From Five Star: http://www.fivestarchemicals.com/breweries/craft-brewers/products/

They list IOStar conspicuously as being effective against most moulds and yeast. Starsan's just an acid anionic sanitizer.

Not sure I agree with this. Granted, this is just an abstract, but it's good enough for me as far as StarSan's efficacy.
http://www.ncbi.nlm.nih.gov/pubmed/17995615
Essentially the main ingredient in StarSan was effective for E. Coli, Saccharomyces, and Listeria at 36 seconds at 40 C.
 
I'd also like to point out that your reference point is a guest account, so that's not exactly a reliable source.
 
The stuff I've read (although admittedly I haven't read a LOT) I've almost always had some issue with the scientific process. I don't think he takes things far enough, although I like that he's recently partnered with a lab to be more "evidential" vs. anecdotal.
 
I'd also like to point out that your reference point is a guest account, so that's not exactly a reliable source.

That person wasn't always a guest account. I'm making a guess here that, for whatever reason, his forum account was downgraded. I've exchanged PMs with him before.

I think the fact that even Five Star doesn't advertise it as an all-purpose sanitizer is key here. Just like Onestep is controversial, by the mechanism of action, Star-san is not a general purpose sanitizer. I for one can verify this - In my fermentation freezer I would use a starsan spray to, in theory, suppress the drips and drops of wort here and there to prevent mold growth without having to move EVERYTHING. It straight out didn't work - I've now had to do deep cleaning of my freezer to remove mold a few times. 3 gallons of boiling PBW solution to an empty freezer plus a broom to slosh it around.
 
I absolutely love the site and look forward to reading something new from them every week. No changes yet but I don't stress out quite as much now if my mash temp is a few degrees off.
 
I for one can verify this - In my fermentation freezer I would use a starsan spray to, in theory, suppress the drips and drops of wort here and there to prevent mold growth without having to move EVERYTHING. It straight out didn't work

How long do you think any sanitizer is effective??

If I sprayed the wall of kegerator with bleach, I'd still end up with mold over time....it's an open system, and one that's pretty hospitable to mold growth.

Beer is not...
 
This is where his statistical significance calculations, that some don't like, come in to play. He accounts for the small sample size by saying he would need a calculated number of people to detect a difference for it to be important. Regardless if his sample population was 10 or 100 you'll need the same percentage of people to detect the difference for it to actually matter.

Actually, that turns out not to be the case. The larger the sample size, the smaller the difference that can be deemed to be statistically significant.

Often people confuse statistical significance and size of difference. If you have, say, 10,000 people in your sample, almost any difference becomes statistically significant.

The problem is the magnitude of the difference deemed statistically signficant. At some point, a sample size becomes large enough that 34 percent choosing the correct sample (out of 3) will be statistically significant. But the size of the effect is so small that who would care? Most people (assuming a reasonable sample size) can't tell the difference.

What Marshall has going for him (several things, actually, but this is one) is that he's using an objective criterion, one that is reproducible by others.
 
Yeah, sample size in statistics is largely irrelevant, what matters is that your test population is representative of your overall population.

It turns out this is not the case. Sample size is just as important as the representativeness of population. Make a mistake in sample size, or make a mistake in representativeness of your population, and you have results that are compromised.

Most people, in my experience, do not understand what statistical significance means. In a nutshell, it tells us the probability that the result, assuming the null hypothesis, is due to random chance.

In Marshall's case, for instance, it tells us the likelihood of getting the result assuming each rater is randomly choosing the odd man out beer sample.

The result, when statistically significant at, say, the .05 level, tells us that if people were choosing randomly we'd only see such a result less than 5 percent of the time. When the result, due to chance, is very unlikely (say a P value of .001), we're saying we no longer believe that the result is due to chance, but more likely represents a real difference.

***************

As you may have surmised, I teach this subject. One example I use to get this across is to imagine a local city council race. Martha Jones, candidate for city council, says she's going to win.

You take a random sample of 20 likely voters (small sample, but it helps illustrate the point).

How few Martha supporters would you have to see in a random sample of 20 likely voters for you to not believe Martha's claim?

If you had 10 out of 20, you'd probably say that this was consistent with her claim, for if she truly had 51 percent support in the voting population, 10 out of 20 in a sample of 20 would be....reasonable to expect.

But what if you had NO voters in favor of Martha in your sample of 20? Would you still believe her claim? I would not--I'd expect, in a sample of 20, to get at least SOME voters favoring her if she truly was destined to win.

So then, when I'm teaching this, I'll ask students at what threshold they'd start to doubt Martha's claim that she's going to win: is it 1 vote in favor of Martha? Two votes? Perhaps 3 or 4 or 5 votes?

Obviously each of us has a different sense of how few is enough to doubt her claim of victory. That's why we use statistical significance to make those determinations, rather than our gut. Significance allows an objective standard that others can reproduce, rather than a gut feeling the basis of which is impossible to know.

**************

I use 20 voters because I can also have students flip coins to simulate random votes for Martha or against her. When I do this I typically get 7, 8, 9, 10, 11, 12, 13 votes for or against her, plus the occasional 6 or 14 votes, and even 5 or 15. It shows how easily in a sample of 20 drawn from a population split 50-50 can be far away from the expected value of 10.

But the kicker is, I'll do this 10 times. Students are flipping coins, I'm joking with them that this is what their tuition buys, i.e., flipping coins. After doing it 10 times I have 10 samples of 20--but also one sample of 200 flips. The variation in percentages evident in samples of 20 disappears in a sample of 200. It's almost always within a couple percent of 50 percent.

It helps them see the effect of sample size on results--and why they're paying tuition to flip coins. :)
 
Last edited:
The problem with your process is that it doesn't sound as if you're cleaning the drips and drops of wort, and you're just spraying them with star-san. That's not how it's intended to be used.

Star san is only effective when it's wet. If you spray and allow to dry, it becomes ineffective.
 
That person wasn't always a guest account. I'm making a guess here that, for whatever reason, his forum account was downgraded. I've exchanged PMs with him before.

I think the fact that even Five Star doesn't advertise it as an all-purpose sanitizer is key here. Just like Onestep is controversial, by the mechanism of action, Star-san is not a general purpose sanitizer. I for one can verify this - In my fermentation freezer I would use a starsan spray to, in theory, suppress the drips and drops of wort here and there to prevent mold growth without having to move EVERYTHING. It straight out didn't work - I've now had to do deep cleaning of my freezer to remove mold a few times. 3 gallons of boiling PBW solution to an empty freezer plus a broom to slosh it around.

I hope you're not saying that StarSan doesn't work. You can't sanitize dirt. Spraying StarSan on wort spills doesn't sanitize it. Not wanting to derail the thread here.
I do also like the xbmt's, and he gives a different perspective that gets people thinking.
 
I used to do short boils (down to 15m) and stopped following common advice on the internet. Short boils worked out just fine and produced clear beer without fining (or about as clear as ny longer boils). After reading xbmts I've gone back to the occasional short boil.
 
It's helped me relax when mistakes happen, and to consider alternatives when those alternatives might benefit my needs. I recently went yo a vitality starter when a brew day popped up unexpectedly, and it turned out wonderfully. The experiment commentaries are also quite enjoyable.
 
Actually, now that I think about it more. The site has inspired me to NEVER try any sort of brewing experiment and report the results since people will $h!t on it no matter how deep of an understanding of statistics I have

I have not read all but this sums the thread: .... There are fixed believes that disregard the long history and different ways that beer history has.

But looking at the thread one see conservitave and liberal votes.

For me, I look at his scientific methodology. He base his conclusions on a described methodology, if he had to submit a masters at a university he would pass because of his methodology regardless his sample size.

Does he challenge convension? Yes, and his conclusions bring merrit to his challenge.

For the liberals, try his conclusions. For the conservatives, your prosesses and input are valuble; describe them as alternative to Bru's findings.
 
It turns out this is not the case. Sample size is just as important as the representativeness of population. Make a mistake in sample size, or make a mistake in representativeness of your population, and you have results that are compromised.

Most people, in my experience, do not understand what statistical significance means. In a nutshell, it tells us the probability that the result, assuming the null hypothesis, is due to random chance.

In Marshall's case, for instance, it tells us the likelihood of getting the result assuming each rater is randomly choosing the odd man out beer sample.

The result, when statistically significant at, say, the .05 level, tells us that if people were choosing randomly we'd only see such a result less than 5 percent of the time. When the result, due to chance, is very unlikely (say a P value of .001), we're saying we no longer believe that the result is due to chance, but more likely represents a real difference.

***************

As you may have surmised, I teach this subject. One example I use to get this across is to imagine a local city council race. Martha Jones, candidate for city council, says she's going to win.

You take a random sample of 20 likely voters (small sample, but it helps illustrate the point).

How few Martha supporters would you have to see in a random sample of 20 likely voters for you to not believe Martha's claim?

If you had 10 out of 20, you'd probably say that this was consistent with her claim, for if she truly had 51 percent support in the voting population, 10 out of 20 in a sample of 20 would be....reasonable to expect.

But what if you had NO voters in favor of Martha in your sample of 20? Would you still believe her claim? I would not--I'd expect, in a sample of 20, to get at least SOME voters favoring her if she truly was destined to win.

So then, when I'm teaching this, I'll ask students at what threshold they'd start to doubt Martha's claim that she's going to win: is it 1 vote in favor of Martha? Two votes? Perhaps 3 or 4 or 5 votes?

Obviously each of us has a different sense of how few is enough to doubt her claim of victory. That's why we use statistical significance to make those determinations, rather than our gut. Significance allows an objective standard that others can reproduce, rather than a gut feeling the basis of which is impossible to know.

**************

I use 20 voters because I can also have students flip coins to simulate random votes for Martha or against her. When I do this I typically get 7, 8, 9, 10, 11, 12, 13 votes for or against her, plus the occasional 6 or 14 votes, and even 5 or 15. It shows how easily in a sample of 20 drawn from a population split 50-50 can be far away from the expected value of 10.

But the kicker is, I'll do this 10 times. Students are flipping coins, I'm joking with them that this is what their tuition buys, i.e., flipping coins. After doing it 10 times I have 10 samples of 20--but also one sample of 200 flips. The variation in percentages evident in samples of 20 disappears in a sample of 20. It's almost always within a couple percent of 50 percent.

It helps them see the effect of sample size on results--and why they're paying tuition to flip coins. :)


You're conflating different scenarios there though. Your coin flipping situation assuming a fair coin should give you a 50:50 result, and the larger your sample size the closer you get to the true result. You get a normal distribution around the mean, and your standard deviation drops as your sample size increases.

This is what statistics is all about, trying to determine a fact about a general population using a sample of that population.

Imagine doing the same coin flipping experiments with an unfair coin. You'd still see some scatter in the results but they would be distributed around a different mean, and when you summed your sets of 20 results you'd see the true mean a lot more clearly. However I'd hazard that your students would very quickly pick up that you'd given them unfair coins too.

What brulosophy is doing is evaluating whether enough people can select the different beer from a triangle test at p=0.05. His null hypothesis is that the beers under test are not different. Therefore if he hits significance there is less than a 5% chance that people selected the different beer by guessing randomly from the three samples presented.

So if the beers are not different as the sample size goes up we would expect it to get closer to an even 33.3% split for each option which is where your coin flipping example comes in, as the experimental result gets closer to the true result. If there really was a different beer, you'd expect the distribution to narrow around the different sample.

Obviously you can have too small of a sample, if you just flipped a coin twice you'd have a pretty good chance of getting two heads or two tails and from that you could reject the null hypothesis that the coin is fair. However even six flips is enough of a sample to hit significance, and ten allows fairly solid conclusions to be drawn. With more degrees of freedom you need a larger sample, but not nearly as large as people think.
 
Martha sucks at votes

So basically in the end the p value that he introduces is trying to 'adjust' the sample size to what the expected results would be if he had a large sample size?(obviously adjust your response to a someone who never took statistics and hates math) So in the dry hop vs. whirlpool there was very little chance that people were randomly selecting the odd beer out and if we had a large sample size it's highly likely that we would get the same results?

Is it only when the results are very close that the P value breaks down? Say if he needed twenty people to select the odd beer out, and only 19 did. Would it be harder to extend the same results over a larger sample size.

If your answer is "it depends" I'll remember why you math nerds are mean people.
 
So basically in the end the p value that he introduces is trying to 'adjust' the sample size to what the expected results would be if he had a large sample size?(obviously adjust your response to a someone who never took statistics and hates math) So in the dry hop vs. whirlpool there was very little chance that people were randomly selecting the odd beer out and if we had a large sample size it's highly likely that we would get the same results?

Is it only when the results are very close that the P value breaks down? Say if he needed twenty people to select the odd beer out, and only 19 did. Would it be harder to extend the same results over a larger sample size.

If your answer is "it depends" I'll remember why you math nerds are mean people.

Sort of. The point of statistics is to characterise a population from a sample of that population. For example, you can't ask everyone how they plan to vote in any given election, but if you pick a sample which is representative of the population you can ask them and use the data to predict what the population will do within a certain margin of error.

Therefore, when Brulosophy experiments hit statistical significance, there is less than a 5% chance that people were guessing which beer was different and that they are selecting based on something other than guessing. So yes, we would assume that if we asked 100 times more people we would still get the same result.

If a result fails to hit significance then from a purely statistical standpoint, it is somewhat more likely that increasing the sample size will in fact push the result away from significance.

As you increase your sample size, the closer your result will get to the real mean value. When we start talking about sensory differences then this does get complicated, as you are introducing a much more complex and subjective measure into the numbers.
 
You're conflating different scenarios there though. Your coin flipping situation assuming a fair coin should give you a 50:50 result, and the larger your sample size the closer you get to the true result. You get a normal distribution around the mean, and your standard deviation drops as your sample size increases.

This is what statistics is all about, trying to determine a fact about a general population using a sample of that population.

Imagine doing the same coin flipping experiments with an unfair coin. You'd still see some scatter in the results but they would be distributed around a different mean, and when you summed your sets of 20 results you'd see the true mean a lot more clearly. However I'd hazard that your students would very quickly pick up that you'd given them unfair coins too.

What brulosophy is doing is evaluating whether enough people can select the different beer from a triangle test at p=0.05. His null hypothesis is that the beers under test are not different. Therefore if he hits significance there is less than a 5% chance that people selected the different beer by guessing randomly from the three samples presented.

So if the beers are not different as the sample size goes up we would expect it to get closer to an even 33.3% split for each option which is where your coin flipping example comes in, as the experimental result gets closer to the true result. If there really was a different beer, you'd expect the distribution to narrow around the different sample.

Obviously you can have too small of a sample, if you just flipped a coin twice you'd have a pretty good chance of getting two heads or two tails and from that you could reject the null hypothesis that the coin is fair. However even six flips is enough of a sample to hit significance, and ten allows fairly solid conclusions to be drawn. With more degrees of freedom you need a larger sample, but not nearly as large as people think.

No, I'm not conflating anything. I'm trying to explain how inferential statistics works.

Not everyone is clear on it. When I was learning this all those years ago, I almost had to go to dark and quiet closet to think about it. The logic is pure and beautiful, but it's not something that necessarily connects to everyday experience.

For instance, when you say above "and the larger your sample size the closer you get to the true result" that is what you'd expect over many trials, perhaps, but it's not true as said--more correctly would be to say "and the larger your sample size the closer you are likely to get to the true result."

There are no guarantees--all you can do is speak in terms of likelihood.

You go on to note some things that are not really relevant to the issue here, such as deciding to change the illustration to an unfair coin. My example uses what it uses to illustrate the point. What you're describing--and it's ok how you do it--is noting what a standard error illustrates. Repeated and large numbers of samples from any population will produce sample statistics that tend to cluster around the real population parameter.

You also note this: "Therefore if he hits significance there is less than a 5% chance that people selected the different beer by guessing randomly from the three samples presented." That is not really true either. What it indicates is that if people were guessing randomly such a result would have been obtained less than 5 percent of the time. Thus we are faced with the choice: do we believe that there truly is a discernable difference, or that the result is due to chance?

Brulosopher is doing a nice job in trying to bring as much analysis to bear as he can--and no more.

And FWIW: I checked his results on one of the exbeeriments to see if he was calculating it right. He was.
 
So basically in the end the p value that he introduces is trying to 'adjust' the sample size to what the expected results would be if he had a large sample size?(obviously adjust your response to a someone who never took statistics and hates math) So in the dry hop vs. whirlpool there was very little chance that people were randomly selecting the odd beer out and if we had a large sample size it's highly likely that we would get the same results?

Is it only when the results are very close that the P value breaks down? Say if he needed twenty people to select the odd beer out, and only 19 did. Would it be harder to extend the same results over a larger sample size.

If your answer is "it depends" I'll remember why you math nerds are mean people.

The formula for calculating the z-value--which you then look up in a table of normal curve values--has two "pieces" which affect the size of that z-value.

Here it is (forgive my poor graphics abilities):

zform.jpg

In the formula, P-hat (the one w/ the little hat over it :) ) is the sample proportion; P is the expected proportion if the null hypothesis were true (in this case .3333), Q is 1-P, and n is the size of the sample.

So, the actual difference in proportions influences the size of Z--the larger the difference, the bigger the Z.

Also the size of the sample size n influences the size of the Z--the larger the sample size, the larger the Z.

So when you ask "So basically in the end the p value that he introduces is trying to 'adjust' the sample size to what the expected results would be if he had a large sample size?" you're in a way describing what happens, but Brulosopher isn't doing it--the formula is.

In essence the formula is accounting for this: everything else being equal, the larger the sample size, the larger the Z value. It's saying, if you want to use that term, that it's harder to get a particular difference--just by random chance--the larger the sample size.

So when you ask--and it's not a terrible question--"Is it only when the results are very close that the P value breaks down?", the answer is that it's really not the right question. The p-value is what it is. The formula automatically corrects for the size of the sample, and the p-value is what it is.
 
I won't drop another long post into this thread. Simply put what you have described reinforces my original point that sample size is relatively inconsequential.
 
I won't drop another long post into this thread. Simply put what you have described reinforces my original point that sample size is relatively inconsequential.

Well, you're welcome to believe whatever you wish.

Oddly, everybody seems to want to focus on the statistics; almost nobody is talking about the representativeness of the sample, which probably deserves more attention.

That's why I keep describing this in terms of if the sample were guessing randomly--not about in terms of the population, because I don't know what that population is. It's an agglomeration of friends and acquaintances of Brulosopher, a convenience sample if you will.

Some are beer judges, some are craft brew nuts, some are regular joes. To what population may we say they are generalizable? The samples, when they include beer judges, include a far larger proportion of beer judges than in the general population of beer drinkers or even craft brewers.

This isn't to denigrate what he's doing--far from it. Just that he's doing what he can with what he has, and using an objective criterion to decide if whatever he's testing matters or not.

And frankly, there's another element of this I don't recall seeing discussed: a lot of his testers can't discern any difference. I'd love to see a breakdown by who can tell and who cannot. Are beer judges more successful at discerning differences? Regular joes? No difference? I'm not what is known as a "super taster," can we tell how many of the testers are super tasters? Maybe they're the only ones who can tell? More questions..... :)
 
Well, you're welcome to believe whatever you wish.

Oddly, everybody seems to want to focus on the statistics; almost nobody is talking about the representativeness of the sample, which probably deserves more attention.

That's why I keep describing this in terms of if the sample were guessing randomly--not about in terms of the population, because I don't know what that population is. It's an agglomeration of friends and acquaintances of Brulosopher, a convenience sample if you will.

Some are beer judges, some are craft brew nuts, some are regular joes. To what population may we say they are generalizable? The samples, when they include beer judges, include a far larger proportion of beer judges than in the general population of beer drinkers or even craft brewers.

This isn't to denigrate what he's doing--far from it. Just that he's doing what he can with what he has, and using an objective criterion to decide if whatever he's testing matters or not.

And frankly, there's another element of this I don't recall seeing discussed: a lot of his testers can't discern any difference. I'd love to see a breakdown by who can tell and who cannot. Are beer judges more successful at discerning differences? Regular joes? No difference? I'm not what is known as a "super taster," can we tell how many of the testers are super tasters? Maybe they're the only ones who can tell? More questions..... :)


That last part has been discussed. It's even on some of his experiments. No difference between the judges and the joes.
 
I haven't read all the xbeeriments yet, but I don't mind anymore if some of the kettle trub enters my fermentor. So I must admit he has affected my brewing.
 
I usually skip almost the entire article and go straight to the tester's opinion. (i skip most of the process, the triangle test stuff etc.)
This made me much more laidback about stuff like mashing and simplified my brewdays. Water treatments made my brewday complicated enough anyways. (boil softening, ph measurements etc.)
 
Back
Top