In search of...predictable mash efficiency for differing batch sizes & gravities

Homebrew Talk - Beer, Wine, Mead, & Cider Brewing Discussion Forum

Help Support Homebrew Talk - Beer, Wine, Mead, & Cider Brewing Discussion Forum:

This site may earn a commission from merchant affiliate links, including eBay, Amazon, and others.

stealthfixr

Supporting Member
HBT Supporter
Joined
May 19, 2012
Messages
350
Reaction score
180
Location
Tucson
I am relatively new to competitions and often try very hard to 'hit' the Brewfather predicted OG & FG. Makes me very happy when a beer turns out pretty close or exactly as predicted. And, brewers often mention that predictability and consistency are cornerstones to great brewing. I was lucky enough to brew a beer for the second round of NHC, and dismayed greatly when my mash efficiency was off the chart high. Couple that with an Imperial Stout that did not turn out so imperial a few months later.

Despite switching mashing equipment around a few times, I frustratingly noticed that higher gravities were hard to nail accurately, and lower gravity batches would be higher than predicted. And, this frustration remained whether I was using a Brewzilla 3.1 65L, Brewzilla Gen4 65L, or a homemade BIAB eRIMS/Propane setup for mashing. Big batches had lower efficiency, which is nothing new. Problem is I had no way to predict mash efficiency accurately.

So, I started tracking in Excel to see what correlated with what, hoping to find 'something' to predict with. While it's not perfect, below is about the last year-ish of brewing:
Mash Efficiency Table.jpg


And, if I take the grain weight/batch size ratio and graph it with mash efficiency, it looks like this:
Mash Efficiency Graph.png


Not perfect, but using that line to predict efficiency has helped tremendously. I am now 'hitting' my OGs with much more success. The real test was an American Strong Ale recently that had 20.25lb malt, which in the past caused a miss to the OG by 5+ points. This time around I hit it exactly. I don't mind the lower efficiency so much as not knowing how to get to my OG accurately despite the batch differences.

Along the way, I found that (1) consistent and effective milling is critical (and anxiously awaiting a Spike Mill for that reason), and (2) enzymes like Glucabuster in the mash helps a great deal, especially with an AIO unit with little to no batch sparging (pouring water over the pulled malt pipe). There are almost two different groupings of efficiency because of Glucabuster. And, I am almost starting to see mashing equipment as all fairly the same. It is after all holding malted grain and water at a certain temp for a length of time--all of them do this with relatively 'minor' variations to me. I'll pay for consistency and an easier brew day, but otherwise most of the mashing equipment seems to output the same or similar result. YMMV.

Just curious if anyone else has struggled with this challenge and how they have solved it. Or, if one of you has a better methodology to start tracking. What say you, brew experts?

And, most of all, I hope this helps some other brewer with the common "My efficiency is low!" question.
 
You need to crack the grain properly. Hit gelatinization temp, nail pH and lauter efficiently. A recipe will not give you this information, a COA might not tell you this, but experience will with good notes and accurate measurements.
 
That "Glucabuster" stuff is interesting - it's not what I expected at all.

I agree with ^that^ - I totally owe where I am today wrt brewing to keeping records early on. I've been using Beersmith since 2007 (v1.4), started milling my own grains at the same time, have always been fairly rigorous about recording each batch's metrics. And when I built my current 3v2p single tier I kept even tighter track of volumes so I could tune the Beersmith Equipment Profile which goes to the core of how it can work.

At this point, years (and now version 3.something) later, and after creating modified Equipment Profiles for low/medium/high hopped recipes to account for the corresponding losses to hops to get even tighter results for each batch, I routinely hit the predicted OG to the fermentors within a point, and then the final packaged volume within a quart total across a 10 gallon batch.

As a (retired) engineer I find that all quite satisfying :)

Cheers!
 
Your efficiencies are affected by quite a few things. Accuracy of your volume measurements at every aspect of your brew and the size of your grain crush being some of the big ones. Also the methods you use to brew, your equipment as well as your skill and artistry play some part.

I am honestly surprised there were no comments to this, good/bad or indifferent. Not useful, or no other thoughts on this?

So from the way your initial post seemed to want to quantify it to a simple formula or spreadsheet, it did sort of make me not want to respond initially.
 
I will address your figure. It would be helpful to have your grain batch size on the horizontal axis and the mash efficiency on the vertical axis. It's just customary for the independent and dependent variables, plus we could look at how well the fitted equation fits by it's R squared value. You wouldn't have to rearrange the equation either as the one produced by Excel as is will be predicting ratio.

Next, rRatios can be a little harder to work with because changes in the value of the ratio are affected by two measurements, here weight over volume. You can increase the value for the ratio by increasing the value for the numerator or decreasing the value of the denominator. I think perhaps you were trying to make it two dimensional to fit the line but you could just fit a multiple linear regression with both batch size and grain weight and even include an interaction term using both. A significant interaction term would mean thta the effect of grain weight on efficiency is dependent on the value for batch size. If not significant, the effect of grain weight on efficiency would be independent of batch size (the two effects are simply additive).

But, ignoring the positioning of the axes, you would still see what is called a megaphone pattern to the data cloud. That patterns suggests heteroscadasticity of the variance. It's minor but notice for high values of your ratio you have more scatter around your fitted line. There is more variability in your observations for greater ratio values. Your point cloud tightens up around the line for ratio values of 1.5-2.5. I think also that maybe a curve would fit a little better, but it would be really helpful to have your independent variable (ratio) on the horizontal and the efficiency on the vertical. There's actually a lot of variability you may be missing. For instance, your ratio values close to 2 range from 65-85%.

Please do exchange the axes. It's hard to visualize because it takes a rotation and mirror flip in my head to properly understand.
 
I spent some time with your data...

I find efficiency on both your system and mine to be a bent line relationship (like an L shape but not as strong an angle), with a corner at about 1.065. Below an intended or actual OG of 1.065, average efficiency is a constant (for you about 77%). Beyond that corner of intended or actual 1.065, efficiency falls by a few percent for every 5 gravity points past 1.065 (for you, you lose about 4% for every 5 points, thus e.g., at about 1.070, expect 73% efficiency; at 1.075, expect 69%, etc.). On my own system, I lose closer to 5% for every 5 points, or 1 for 1.

The other thing I noticed about your data is that you have a few batches that are outliers and don't follow the general rules above. Something must have gone wrong with a few batches. Could be any number of reasons. But for purposes of analyzing trends, outliers need to be thrown out and ignored entirely, when they do not make any sense among the rest of the data. Common thing with data analysis.

Hope this helps. For those interested, I recommend analyzing your own data using a bent line with a corner at roughly 1.065 to 1.067, the corner is in there somewhere, just the way it is. Below that, your efficiency should be pretty constant, on average. Above that, if you don't change your process at all... efficiency is going to fall as your intended or actual OG increases. But if you want, you can take special actions to avoid this, such as by sparging a lot extra and boiling off a lot longer (several hours).

Cheers all.

EDIT:

My own data looks a little something like this. Others are similar.

1707540037316.png
 
Last edited:
I spent some time with your data...

I find efficiency on both your system and mine to be a bent line relationship (like an L shape but not as strong an angle), with a corner at about 1.065. Below an intended or actual OG of 1.065, average efficiency is a constant (for you about 77%). Beyond that corner of intended or actual 1.065, efficiency falls by a few percent for every 5 gravity points past 1.065 (for you, you lose about 4% for every 5 points, thus e.g., at about 1.070, expect 73% efficiency; at 1.075, expect 69%, etc.). On my own system, I lose closer to 5% for every 5 points, or 1 for 1.

The other thing I noticed about your data is that you have a few batches that are outliers and don't follow the general rules above. Something must have gone wrong with a few batches. Could be any number of reasons. But for purposes of analyzing trends, outliers need to be thrown out and ignored entirely, when they do not make any sense among the rest of the data. Common thing with data analysis.

Hope this helps. For those interested, I recommend analyzing your own data using a bent line with a corner at roughly 1.065 to 1.067, the corner is in there somewhere, just the way it is. Below that, your efficiency should be pretty constant, on average. Above that, if you don't change your process at all... efficiency is going to fall as your intended or actual OG increases. But if you want, you can take special actions to avoid this, such as by sparging a lot extra and boiling off a lot longer (several hours).

Cheers all.

EDIT:

My own data looks a little something like this. Others are similar.

View attachment 841328

The pattern you are suggesting is a segmented linear model. It may be a good fit. Deciding the join point (corner) ahead of time isn't always necessary. It could be fit be adding another parameter to the estimation. I would suggest that first although sometimes that creates problems with the fitting algorithms.

Outliers should not just be thrown out without good consideration. There are diagnostic tests that can be run to determine if outliers are present or not. (Maybe you did that here but you didn't even present the figure for the OP's data.) Doing that for someone else's data based on your own preconceived estimation of the underlying relationship is not common. Also, consider that you may have overlooked doing so for your own data! There are multiple questionable data points in your figure. For instance, there is a point at ~1.042, 0.96, the two points in the low 50's in efficiency, and perhaps even the point close to 1.100. I would also note there are 6 stacked points at about 1.056 that range from about 75-95%. Why are those efficiencies so far apart? The model is not going to predict well given the wide range of points around the segments. A prediction for the efficiency using this model should also include a confidence interval around the estimate, and I would think it would be a bit wider than what people would want.

Your data is particularly sparse after 1.080, and that makes the fit there not particularly strong. While a segmented linear model may be the best fit, it's also possible a different type of function/curve could fit better. That last data point to the right would surely have pulled the regression line for a polynomial of greater degree than one up for instance. Another thing to consider is that there is more than one important variable at play. Forcing the relationship to be simply determined by one variable may not be the best model, particularly given the many different parts of the process.
 
I spent some time with your data...

I find efficiency on both your system and mine to be a bent line relationship (like an L shape but not as strong an angle), with a corner at about 1.065. Below an intended or actual OG of 1.065, average efficiency is a constant (for you about 77%). Beyond that corner of intended or actual 1.065, efficiency falls by a few percent for every 5 gravity points past 1.065 (for you, you lose about 4% for every 5 points, thus e.g., at about 1.070, expect 73% efficiency; at 1.075, expect 69%, etc.). On my own system, I lose closer to 5% for every 5 points, or 1 for 1.

The other thing I noticed about your data is that you have a few batches that are outliers and don't follow the general rules above. Something must have gone wrong with a few batches. Could be any number of reasons. But for purposes of analyzing trends, outliers need to be thrown out and ignored entirely, when they do not make any sense among the rest of the data. Common thing with data analysis.

Hope this helps. For those interested, I recommend analyzing your own data using a bent line with a corner at roughly 1.065 to 1.067, the corner is in there somewhere, just the way it is. Below that, your efficiency should be pretty constant, on average. Above that, if you don't change your process at all... efficiency is going to fall as your intended or actual OG increases. But if you want, you can take special actions to avoid this, such as by sparging a lot extra and boiling off a lot longer (several hours).

Cheers all.

EDIT:

My own data looks a little something like this. Others are similar.

View attachment 841328
Oh boy, a great opportunity for a DMAIC project! :p
 
The pattern you are suggesting is a segmented linear model. It may be a good fit. Deciding the join point (corner) ahead of time isn't always necessary. It could be fit be adding another parameter to the estimation. I would suggest that first although sometimes that creates problems with the fitting algorithms.

Outliers should not just be thrown out without good consideration. There are diagnostic tests that can be run to determine if outliers are present or not. (Maybe you did that here but you didn't even present the figure for the OP's data.) Doing that for someone else's data based on your own preconceived estimation of the underlying relationship is not common. Also, consider that you may have overlooked doing so for your own data! There are multiple questionable data points in your figure. For instance, there is a point at ~1.042, 0.96, the two points in the low 50's in efficiency, and perhaps even the point close to 1.100. I would also note there are 6 stacked points at about 1.056 that range from about 75-95%. Why are those efficiencies so far apart? The model is not going to predict well given the wide range of points around the segments. A prediction for the efficiency using this model should also include a confidence interval around the estimate, and I would think it would be a bit wider than what people would want.

Your data is particularly sparse after 1.080, and that makes the fit there not particularly strong. While a segmented linear model may be the best fit, it's also possible a different type of function/curve could fit better. That last data point to the right would surely have pulled the regression line for a polynomial of greater degree than one up for instance. Another thing to consider is that there is more than one important variable at play. Forcing the relationship to be simply determined by one variable may not be the best model, particularly given the many different parts of the process.
Sorry... Here's the OP's data with segmented linear model:

1707602321937.png


With my own data shown previously, I'll admit it's just a quick dump of 16.5 years' of data. I didn't throw out all the outliers, of which yes there are many. My brewing process has also definitely changed multiple times over that long time period, and I'm not going to bother right now with breaking that down into smaller pieces. Just trying to show that the while yes it looks a little like a shotgun pattern, it also generally stays within a certain band up until around 1.065, which seems to be a breaking point, and not just for me but for others as well. I find that a curve just doesn't work very well for this relationship IMO; it seems to be more of a segmented linear. Not sure exactly why. But I've seen the same thing in other data sets over the years.

Yep, the data in both cases is sparse at high gravities. I just don't brew that many high gravity beers anymore. And the one really high point near 1.100 OG really shouldn't be on my graph, it's an outlier because I mashed 2 hours then sparged a lot extra and boiled 2 hours (instead of just 1 hour each). Not sure why I left that one on the graph, oops, it's obviously not typical of my other brew days.

Maybe the OP's data set is better than mine. So now you have it, using the same format as mine. Enjoy.
 
It fits OP's data better and as you said you have changed methods over time which would likely influenced variability (if you have improved which I am totally fine with as an assumption). One type of function I was thinking of that might fit also is something asymptotic, like growth functions. They can be useful when there are natural bounds, which exist here at 0 and 100%. The data isn't bumping up against 100% (or 0), but the models don''t have to have some intrinsic measurement bounds to use them as the asymptote can be solved for.

The relationship between efficiency and OG is more constant in the figure you made then the OP felt they observed. The first three points with the lowest gravity are on the lower side of efficiency. They may have fit a straight line however.

Taking a second look at what the OP asked, I think they were simply trying to standardize the grains with a grain per gallon value. I was mainly just looking at this through a modeling perspective with what was suggested. I'm not so well versed with all the mechanics of everything to suggest what might be the best variables to consider. Another possibility that might take into account specific conditions might be something like decision trees-random forests. Something like that might better address the categorical nature of the batch sizes and possibly other variables. It's more black box, but if the OP just wants an efficiency value it's a possible way to get it! One variable not mentioned is mash temperature. Were these all the same or where you trying for full body, medium body, light body for different brews? This is likely confounded with style as I tend to aim for the some body in most cases by style but not always.
 
Back
Top