Shouldn't spreadsheet be automatic

Homebrew Talk - Beer, Wine, Mead, & Cider Brewing Discussion Forum

Help Support Homebrew Talk - Beer, Wine, Mead, & Cider Brewing Discussion Forum:

This site may earn a commission from merchant affiliate links, including eBay, Amazon, and others.
It is possible to automate the acid addition, and my spreadsheet Q-Water does that. You can find it on this forum. It is still in development though.

There is a number of problems with automating the mineral addition.
First, multiple minerals that can be used to increase a given ion. Second, all ions should produce the net charge =0.

It might possible to bypass that if you leave Ca2+ and HCO3- unconstained.

Then the algorithm could be like this:
MgSO4 will be used to increase Mg2+
CaSO4 will be used to increase SO4-
CaCl2 will be used to increase Cl-
NaHCO3 or NaOH will be used to increase Na+

Whatever HCO3- resulted after this would be taken care of by acid addition, most likely phosphoric. Ca2+ is hardly a problem even at high concentrations.
The only clash may occur here if you would want SO4- lower than what MgSO4 adds. But then it can tell you your minimum SO4- for your chosen level of Mg2+.
 
I'd argue the function of interest here being something like the sum of squared errors is more out of mathematical convenience and historical precedent than having much physical meaning for the problem. The squared error is making an inherent statistical assumption that the noise in the model is gaussian.
I think you're confusing the two types of problems we have discussed. In the first we have a set of parameters, x, (the amounts of some number of salts) a model that shows how they relate to a set of quantities y which are of interest (concentrations of ions) and a set of desired concentrations for those ions, call it d. If we choose an algorithm that minimizes |Ax - d| we get the mmse solution which is the Euclidean distance between the ion concentration we want and the closest one we can find. The Euclidean distance has obvious intuitive appeal but it has limitations. If we have two points on the surface of the earth at (x1,y1) and (x2,y2) and wish to proceed from one to the other the minimum distance we would have to traverse is sqrt( (x1-x2)^2 + (y1 - y2)^2), the Euclidean distance but if we are forced to walk on streets laid out in a N-S, E-W grid we would be have to walk farther than the Euclidean distance and another measure, |x1 - x2| + |y1 - y2| is more suitable. If we want to compare the color of two batches of beer we measure their colors in L*a*b* space and compute the Euclidean distances between them. This is good for beer (as it's colors are restricted to one quadrant in Lab space but if we were in, say, the paint business we might decide that the Euclidean measure wasn't the best and use one of the modified (near Euclidean but not quite) metrics.

If we want 5 chloride and 300 sulfate and come up with a solution that gives us 10 chloride and 305 sulfate each of those ions will contribute an equal amount to the error (Euclidean distance between target and realization). That just doesn't seem right as in one case the amount of the ion is off by 50% and in the other by 1.7%. Suppose further that we really don't care very much about the chloride wanting mainly to focus on the sulfate. This suggests that perhaps the scalar f = e_trans*W*e where e is the vector e = log(Ax) - log(d) and W is a diagonal matrix of desired weights might be a better function to minimize. The point is that it is up to the analyst to pick whatever error function he feels gives him the most meaningful measure of the error and the most meaningful solution. I have mentioned earlier that this is the error function I prefer for the ion matching problem.

The other type of problem we have discussed is where one tries to estimate a parameter, such as the pH, from a set of observables (or in this case the single observable of proton deficit) where the observables are known to be corrupted by noise. Estimating x with elements of latitude, longitude, altitude and clock offset from a series, y, of pseudoranges to GPS sattelites would be a good exapmple. Here the relevant equation is r = Ax + n where r is the vector of pseudoranges and n the vector of noise samples. Again we may consider the mmse solution, the one that minimizes |r - Ax - n|. If n is Gaussian then the solution is the maximum liklihood solution and that solution has tremendous appeal (tells you where you most likely are given the set of meaurements your iPhone was able to get from the satellite). If the noise isn't Gaussian then the solution isn't the MLE solution; it is just the mmse solution. So even here the choice of the mmse solution makes no assumption about the noise. It's just that every engineer worth his salt knows that if the noise is indeed Gaussian that the solution is not only the mmse solution but the MLE solution.

But the point here is that in the optimization problem (which salts to use) there is no assumption made about noise if mmse is chosen because there is no noise.


For something like least squares regression where you're finding one set of parameters to fit a large number of data points to a model the gaussian assumption is certainly reasonable due to central limit theorem arguments.

Regression is the case where mmse is used and the coefficients often deemed to be MLE estimates though the Gaussian assumption may in fact be shaky. Consider estimating the parameters of an antenna pattern using received power measurements as a function of azimuth and elevation. The received noise is indeed gaussian but the power meter will produce a sequence of numbers that are Rayleigh distributed. Nevertheless it is common practice to weight the observations by their signal to noise ratios before doing the regression. It's a valid technique because the better measurements contribute more to the result than the poor ones but we can't really say that the estimates we obtain are MLE.

When you only have a single 'data point' such as a target ion concentration you're trying to match this isn't necessarily the case.
It isn't the case at all because there is no noise.

What you're describing will sometimes work and sometimes won’t depending on the geometry of the constraint set. For minimizing a quadratic objective function, such as |Ax-b|^2, this certainly can be solved in closed form using a Moore-Penrose pseudo inverse if the system is unconstrained.

I won't comment further here because there are indeed cases where simple Moore-Penrose doesn't work - the under determined one, which can be relevant in terms of the current discussion (consider chloride and sulfate of Na, K, Mg, and Ca - 8 salts to choose, 6 ion concentrations) for example.


I’m not totally sure what you’re defining as your error function, but assuming it’s sum_i (p[Ci]/wi – p[Ti]/wi)^2 where Ci is the concentration we’re trying to fit and Ti is the target, then you’ll have to be a bit careful here if you try to apply Newton’s method as this is no longer convex wrt the Ci variables. As a result, the Hessian may not be positive definite and the Newton direction may not give a descent direction. The Excel solver is likely using a version of gradient descent, which should at least give a local minimum, but not necessarily a global minimum.
Yes, that's the function and I'm sure what you are saying is true. Solver, as supplied with Excel, gives the options of GRG Nonlinear, SimplexLP and 'Evolutionary'. I'm trying to remember as to whether it (Solver) has ever driven me to a bizarre solution. If it has the fix is to simply try another starting point.

The only downside to them is that the Hessian grows by the number of variables squared, so for large scale problems second order methods aren’t feasible.
We don't have that kind of problem here and I think it's important to recognize that. You are obviously very much into this and I'm sure deal with some really hairy problems that require special techniques. Those aren't the problems we have here. We're pretty much limited to picking mass values for a dozen or fewer salts in order to set the concentration of a dozen or fewer ions. The pH estimation problem is so simple that one doesn't really even need Solver. You have a single variable (proton deficit) that is monotonic in pH. A root bisection scheme (which you can set up on the spreadsheet itself without VBasic or any other kind of macro) will solve it.

I'll close by pointing out that there is apparently, a lot more to Solver that what comes with Excel. There is, for example, an SDK of some sort so that a skilled programmer could access the 'engine' within it. This would probably be the quickest way to an elegant solution i.e. one in which all you do is type in your desired profile and salt quantities automatically appear in the cells. Solver is made by a company called Frontline Systems. Google them for more details.
 
It is possible to automate the acid addition, and my spreadsheet Q-Water does that. You can find it on this forum. It is still in development though.

There is a number of problems with automating the mineral addition.
First, multiple minerals that can be used to increase a given ion. Second, all ions should produce the net charge =0.

These are not actually problems. The approach I have taken is to allow the user to specify arbitrary quantities of 16 salts and acids (one of which is water). The spreadsheet calculates from the mineral content of the starting water and its pH, the mineral content and pH of the dilution water and the other 15 additions the concentrations of all ions and the total electrical charge at a chosen pH. The user also specifies a desired ion profile. The logs (reason described in my last post) of the desired and realized concentrations are computed and the differences (errors) computed. A weight is applied to each. The rms error is computed from the weighted individual errors. Excel is asked to minimize the rms error by varying any number of the 16 additions subject to the constraint that the total charge be 0.


The only clash may occur here if you would want SO4- lower than what MgSO4 adds. But then it can tell you your minimum SO4- for your chosen level of Mg2+.

What I have described is very powerful. Dilution water is treated just like any other addition. If your starting water has more SO4-- than your target the algorithm simply adds more dilution water to get that down while simultaneously adding other salts to make up for what you have diluted away. You can impose other constraints as well as long as the basic spreadsheet computes them. For example, if the spreadsheet computes saturation pH you can constrain Solver to produce solutions that are restricted to those above a desired saturation pH. If you want at least 50 ppm calcium you can constrain it to produce solutions that have at least 50 ppm calcium.
 
What I have described is very powerful.

Oh yes!
In fact I was using Excel 2007 before and what I was referring to as Solver was Goal Seek. I just installed Excel 2013 and this Solver thing blows my mind.
It can surely automate mineral additions.
But how do I minimize both charge for pH calculations and difference between the desired and available water profile? Do I assign some sort of weight to them and add them within 1 cell?
Or am I supposed to leave HCO- out of desired water profile and ... do something else about it?
I have not given it a lot of thought at the moment I'm writing this
 
Personally, it might be that the easiest solution is to let the alkalinity/bicarbonate free rise and then leverage dilution as a strategy to minimize acid additions. Complicated array probably.

Or limit to specific salts and reserve any alkali to manage a low mash pH estimate. I would love to see one that gives a reading for target pH, and recommendations to move mash pH +/- 0.1 pH units should the mash require a small correction.
 
But how do I minimize both charge for pH calculations and difference between the desired and available water profile? Do I assign some sort of weight to them and add them within 1 cell?

You ask it to minimize the ion content error under the constraint that the charge must be 0. It lets you optimize for one cell minimize, maximize or set to a specified value) but you can ask it to honor several constraints. In older versions, for example, you would have to demand it only allow 0 or positive values for cells it is adjusting (salt amounts). Now there is a check box that imposes that requirement for all cells being varied but you can add additional constraints (charge = 0 etc.

Or am I supposed to leave HCO- out of desired water profile and ... do something else about it?
If your users do not care about the bicarbonate (and I've been trying to educate them to focus on alkalinity instead) then you can leave bicarbonate out of the error computation but you must compute charge on bicarbonate (and phosphate and lactate etc.) in order to insure that the net charge, after adjustments, is 0.

I have not given it a lot of thought at the moment I'm writing this
Experimenting is the best way to get familiar with it.
 
Personally, it might be that the easiest solution is to let the alkalinity/bicarbonate free rise and then leverage dilution as a strategy to minimize acid additions. Complicated array probably..

There are, ultimately, many ways to handle this. I have a very elaborate spreadsheet which will allow me to match any desired water profile that is physically realizable to about 1% error in any of the ions. This uses the Solver to adjust the salts to match the desired output under the constraint of electrical balance.

Then I have a separate spreadsheet that models the malts. I put the relevant water profile data (alkalinity, pH, calcium and magnesium) numbers in along with the amounts of malts and water to grist ratio and it calculates proton deficit which Solver (or a root bisector) then zeroes by varying pH. If I want a given pH I can let Solver vary the amount of acid or acid malt or base to 0 out proton deficit. The two could be merged but as I consider profile chasing a little silly I have never seen the need to do that and so probably never will.
 
The two could be merged but as I consider profile chasing a little silly I have never seen the need to do that and so probably never will
We already knew this!

Honestly I am split in mind over the Easy Button approach. There are already so-called color based pre-packaged salt packages one can buy and add to RO/DI water. I am truly blind to if my beers have dramatically improved simply because of the focus on water quality - OR and more likely, that I am paying far more attention to the critical things - and the water chem over analysis has fed those very improvements. I suspect it is somewhere in the middle.

At this point, this is an interesting exercise in a software driven approach and how it might mesh with real world application in a brewery... not unlike designing a test, measurement and DSP correction tool for acoustic phase correction in an cinema- and figuring out how to deliver the solution successfully and elegantly to engineers/techs with just enough knowledge to totally screw it up.
 
I think you're confusing the two types of problems we have discussed. In the first we have a set of parameters, x, (the amounts of some number of salts) a model that shows how they relate to a set of quantities y which are of interest (concentrations of ions) and a set of desired concentrations for those ions, call it d. If we choose an algorithm that minimizes |Ax - d| we get the mmse solution which is the Euclidean distance between the ion concentration we want and the closest one we can find.

I wouldn’t say I’m “confusing” the two problems, but yes, perhaps making the analogy to statistics was not the best one to make as this seems to have spurred a tangent discussion about various statistical estimators (as an aside, the MMSE estimator is Bayesian, so if you’re talking about problems like y=Ax-b+e the MMSE estimator is trying to minimize |x-x_true|^2 and only reduces to minimizing |Ax-b|^2 if either a solution Ax=b exists or we assume that the entries of x are uncorrelated, x is uncorrelated with e, e is iid Gaussian, etc…, but as you mention this is probably not a helpful analogy to make as we’re more in a deterministic setting here).

The Euclidean distance has obvious intuitive appeal but it has limitations. If we have two points on the surface of the earth at (x1,y1) and (x2,y2) and wish to proceed from one to the other the minimum distance we would have to traverse is sqrt( (x1-x2)^2 + (y1 - y2)^2), the Euclidean distance but if we are forced to walk on streets laid out in a N-S, E-W grid we would be have to walk farther than the Euclidean distance and another measure, |x1 - x2| + |y1 - y2| is more suitable. … If we want 5 chloride and 300 sulfate and come up with a solution that gives us 10 chloride and 305 sulfate each of those ions will contribute an equal amount to the error (Euclidean distance between target and realization). That just doesn't seem right as in one case the amount of the ion is off by 50% and in the other by 1.7%. Suppose further that we really don't care very much about the chloride wanting mainly to focus on the sulfate. This suggests that perhaps the scalar f = e_trans*W*e where e is the vector e = log(Ax) - log(d) and W is a diagonal matrix of desired weights might be a better function to minimize. The point is that it is up to the analyst to pick whatever error function he feels gives him the most meaningful measure of the error and the most meaningful solution. I have mentioned earlier that this is the error function I prefer for the ion matching problem.

This is essentially the point I was trying to make from the beginning. There are dozens of potentially useful objective functions one could propose for this problem. The Euclidian distance often gets used because people shy away from things like non-differentiable objective functions because they can’t be solved with something like Newton’s method or gradient descent, but there are many standard optimization methods (beyond Newton’s method) which can employed to exactly solve many of these problems.

As an example, I agree that working with some sort of normalized scale is a good idea, but suppose I don’t like the log transformation you’ve proposed above because it makes a convex problem non-convex and I don’t want to deal with issues like having multiple local-minima which aren’t globally optimal (I realized this log-transformed problem probably can’t be solved in closed form as I mentioned earlier unless A is invertible, which it typically won’t be). So instead, suppose I propose defining

e = (Ax-b)/b
where here the division is done element wise for each entry of the vector. Now I could solve

min_x |e|^2 subject to x>=0

with a standard Euclidian distance, but instead, maybe I want to solve

min_x max_i {|e_i|} subject to x>=0

i.e. minimize the maximum absolute value of the %error across all ion concentrations. The second problem will, essentially by definition, find a solution where the ‘worst’ %error across the various ion concentrations will be better or equal to what you get with the Euclidian solution at the expense of a potentially higher average rms %error. Whether I prefer problem 1 or problem 2 is of course a personal preference, but you’d be hard pressed to solve problem 2 with something like basic Newton’s method unless you get lucky and there exists a solution Ax=b. Nevertheless, getting back to the point I was making in my first post, there are plenty of standard optimization techniques that can easily handle problem 2 (standard enough that there’s a good chance they’ve been built into Excel).

Didn't pay attention to this before but actually DOP = (J_t*J)^-1 i.e. the inverse of the 'square' of the Jacobian.
You’re kind of mincing terms here. If we’re talking about minimizing the function

f(x) = |Ax-b|^2

the Jacobian of f(x) is J(x) = [A^T (Ax-b)]^T. What you have for DOP above is using the Jacobian of the linear map x-->Ax, which is just J(x) = A. The Hessian of f(x) is just the constant matrix H(x) = A^T*A.
 
I wouldn’t say I’m “confusing” the two problems, but yes, perhaps making the analogy to statistics was not the best one to make as this seems to have spurred a tangent discussion about various statistical estimators (as an aside, the MMSE estimator is Bayesian,

In the estimation case we have the set of equations J*x = r where r are observations which are corrupted by noise and J, the Jacobian is the 'geometry' of the problem (in the case of GPS, for example, J contains the direction cosines between the dead reckoning position and each of the satellites with the fourth column all being 1 as the first derivative of pseudorange WRT delay is 1 ns/foot). An estimator which picks x to minimize |J*x - b|, i.e. the MMSE estimator, is not a maximum liklihood estimator (Baysian estimator) unless the noise is Gaussian, each noise component is independent from the others and has equal standard deviation. This derives from the quadratic in the exponent in the multivariate Gaussian distribution. If the probability distribution of the noise is not Gaussian then the log of the probability of the observations conditioned on the hypotheses do not contain the quadratic form of the Gaussian and minimizing the quadratic would not be the Baysian estimate nor the maximum liklihood estimate.

...so if you’re talking about problems like y=Ax-b+e the MMSE estimator is trying to minimize |x-x_true|^2 and only reduces to minimizing |Ax-b|^2 if either a solution Ax=b exists or we assume that the entries of x are uncorrelated, x is uncorrelated with e, e is iid Gaussian,
The MLE estimator is the set of x that minimizes

x'*J'*Q*J*x - c'*J'*Q*r.

where Q is the inverse of the covariance matrix of Gaussian, 0 mean (has to be Gaussian and 0 mean) noise. I've changed notation here a bit using J' to represent the transpose of J. Easier to type than J_T. This is

x = (J'*Q*J)^-1*J'*Q*r = (B'*B)^-1*B*R

where B = (√Q)*J and R = (√Q)*r

Thus for the MMSE estimator to be the Baysian estimator we have the requirements that
1)The noise be Gaussian
2)The noise be 0 mean
3)The square root of the covariance matrix must exist.

Covariance matrices being covariance matrices they have square roots (and so too do their inverses). Thus as long as the noise is 0 mean and Gaussian the MLE can be obtained from minimizing |B*x - R|. There is NO requirement that the noises be identically distributed and NO requirement that they be independent. This is not to say that an MMSE estimate may not be a useful one in cases where the noise is not Gaussian. Certainly the idea that noisy measurements be weighted less than good quality ones is intuitively pleasing. Lots of (most, nearly all) curve fitting routines minimize
|(√Q)*J - (√Q)*r| whether you are looking for a fit to experimental data with noisy measurements or the ASBC table whose numbers are, by fiat, 'correct' IOW they let you choose the weights (diagonal elements of Q) you want to put on particular parts of the data set. Perhaps it is from this that your confusion stems.

.etc…, but as you mention this is probably not a helpful analogy to make as we’re more in a deterministic setting here).
Then why keep bringing it up?



This is essentially the point I was trying to make from the beginning. There are dozens of potentially useful objective functions one could propose for this problem. The Euclidian distance often gets used because people shy away from things like non-differentiable objective functions because they can’t be solved with something like Newton’s method or gradient descent, but there are many standard optimization methods (beyond Newton’s method) which can employed to exactly solve many of these problems.
I would say those are very good reasons for shying away from such metrics. If you can think of a better metric for expressing the difference between one set of ion concentrations and another then propose it and let people decide whether they think that metric is somehow better than the a Euclidean metric or, in particular, enough so to justify the computational complexities.


As an example, I agree that working with some sort of normalized scale is a good idea, but suppose I don’t like the log transformation you’ve proposed above because it makes a convex problem non-convex and I don’t want to deal with issues like having multiple local-minima which aren’t globally optimal (I realized this log-transformed problem probably can’t be solved in closed form as I mentioned earlier unless A is invertible, which it typically won’t be).
A, or J as I am calling it now, doesn't have to have to be invertible and beyond that I don't think the log transformation makes J any less invertible than not doing the transformation. I've been fiddling with some matrices based on the ratios in common salts (NaCl, K2SO4 etc) and find in the linear model a J matrix condition number of about 20. Doing the log transformation that actually drops to 7. If convexity were a problem with the log transformation Solver, which uses a gradient method, shouldn't be able to find solutions and I've done hundreds of ion profiles using the weighted log metric with Solver. Nor should I be able to fit curves to power measurements expressed in dB and I've done thousands of those with SVD.


So instead, suppose I propose defining ...

e = (Ax-b)/b
where here the division is done element wise for each entry of the vector. Now I could solve

min_x |e|^2 subject to x>=0

with a standard Euclidian distance, but instead, maybe I want to solve

min_x max_i {|e_i|} subject to x>=0

i.e. minimize the maximum absolute value of the %error across all ion concentrations... Whether I prefer problem 1 or problem 2 is of course a personal preference, but you’d be hard pressed to solve problem 2 with something like basic Newton’s method unless you get lucky and there exists a solution Ax=b.
I would certainly never say that there aren't applications where a metric like that might be useful but it would certainly be of no interest to me in this application except perhaps...


Nevertheless, getting back to the point I was making in my first post, there are plenty of standard optimization techniques that can easily handle problem 2 (standard enough that there’s a good chance they’ve been built into Excel).

... as a curiosity. Were any such available in Excel I might want to play with them. AFAIK they aren't. Solver does the three algorithms I have mentioned in an earlier post in the standard Excel release. I've also mentioned that Frontline does have other/augmented products which may well do some of this stuff. As I have no applications for it I certainly would not pay the big bucks they demand for one of those packages.


What you have for DOP above is using the Jacobian of the linear map x-->Ax, which is just J(x) = A.

Yes!! DOP = (J'*J)^-1. No second derivatives involved. The covariance of x is cov(x) = (J'*Q*J)^-1 and if the noises are Gaussian, and IID then
cov(x) = sigma* (J'*J)^-1 where sigma is the standard deviation of all the observables. Thus DOP is the covariance per unit of measurement noise and reflective of the 'geometry' of the problem i.e. solely on how the observables relate to the parameters being estimated. Again, I think there is some confusion between estimation and optimization. In optimization (setting an ion profile) there is no noise and we can pick any function we want as the merit/penalty function. In estimation (trying to figure out what the pH of a solution is based on calculated and probably corrupted proton deficits) we use MMSE because if the noise is Gaussian then the estimate is the MLE estimate and as we have no idea what the distribution of the noise might be Gaussian is as good an assumption as any.
 
Oy! I cannot tell you how I am regretting that fourth glass of wine. Statistics is difficult enough sober!
 
In the estimation case we have the set of equations […]

Well, ok, we could continue to go back on forth about whether or not we should call the MMSE Baysian with a Gaussian prior since it “approximates” things with just the covariance matrix which is the sufficient statistic of a zero-mean Gaussian, when/how this relates to the MLE or MVUE, what exactly we should be referring to as the Hessian/Jacobian, etc., but I think (and I’m assuming you would agree) that 1) we are both well aware of these various subtleties and the only reason we’re talking about them is because we’re not writing textbook chapters here (although one may not know it from how long this post ended up being) so we are both bound to make incomplete/slightly imprecise statements which the other then tries to correct and 2) aside from the guy who stumbled into this after 4 glasses of wine and didn’t know any better, most people tuned out from this many posts ago :)

So, returning to the topic at hand…

I would say those [computational complexities] are very good reasons for shying away from such metrics.

In this case actually, the computational complexity is basically the same and both problems can be solved in Excel (to answer your curiosity question). Before I get into things, I’ll just define a few norms for notational convenience. Given a vector x, let

|x|_1 = sum_i abs(x_i) – the sum of the absolute values of x
|x|_2 = sqrt(sum_i x_i^2)) – standard Euclidian distance
|x|_inf = max_i {abs(x_i)} – the maximum absolute value of any element of x

If we define e(x)=(Ax-b)/b like in my last post, the two problems I described before are just

min_x (|e(x)|_2)^2 subject to x>=0
min_x |e(x)|_inf subject to x>=0

The second problem, by definition of |e(x)|_inf, is equivalent to

min_{t,x} t subject to t>=0, -t <= e(x), and t>= e(x)

which is just a linear program in standard form (which Excel apparently can do). Alternatively, you can replace the t with a t squared in the objective function and you now have a weighted least squares problem with linear inequality constraints, which is exactly the type of problem we were discussing before.

If you can think of a better metric for expressing the difference between one set of ion concentrations and another then propose it and let people decide whether they think that metric is somehow better than the a Euclidean metric.

Well, off the top of my head, in addition to the forms I discussed before (which won’t really give drastically different answers from each other), I think most of us would agree that at the end of day, if we’re talking about ion concentrations in our brewing water, only the most OCD of us want to hit the target water profile at 1e-6 precision, so instead we can define a range of errors we’re willing to accept:

L <= e(x) <= U

for some lower and upper bound vectors {L,U} (or alternatively, a constraint like |e(x)|<=epsilon for some choice of norm and epsilon>0). Now we could just find any x that satisfies this requirement, but of course as long as we have picked physically realizable bounds, there is now possibly an entire set of feasible solution vectors to choose from (similarly, if A is underdetermined then we can have a set of multiple solution vectors even if we shrink the bounds to an equality constraint).

Of course, so far we haven’t really bought ourselves anything over just using a least squares fit, but now, suppose I’m lazy and don’t want to make 16 different manipulations to my water (I think another post mentioned something like solving for 16 variables in a spreadsheet). This suggests we should pick feasible solutions with as many zeros in x as possible (as a 0 in x means I don’t need to add that “ingredient”), so we’d want to solve something like

Minimize the number of non-zero entries of x, subject to L <= e(x) <= U and x>=0

The problem with this is that we now have a combinatorial objective function, so we’d have to first test if a feasible solution exists using just one of the ingredients, then for all pairs of ingredients, all triplets, etc... For 16 parameters/ingredients, as a worst case we might have to solve 2^16 least squared/feasibility problems; if we really wanted to brute force this, this isn’t too many problems to solve, but obviously we wouldn’t want to do it in Excel and this keeps growing exponentially if we add more variables.

Instead, a much more friendly objective function would be the closest convex relaxation of the above problem (where ‘closest’ has a specific mathematical meaning that I won’t get into), which is given by

min_x |x|_1 subject to L <= e <= U and x>=0

Here, from the definition of |x|_1 and the fact that x>=0, this is identical to

min_x sum(x) subject to L <= e <= U and x>=0

This is now a convex problem (and again a standard form linear program), so it can be easily solved. Further, if A satisfies certain properties it can be proven that the solution to this convex problem will give the exact solution to the combinatorial problem from above. I’d suspect that for the brewing water problem A will not satisfy these properties, but we will still empirically get solutions for x which promote using just a few non-zero elements in x, and as we relax the bounds we get solutions with fewer and fewer non-zero elements.

As an aside, this is also a good example of where the choice of norm becomes interesting. Suppose we want to solve an underdetermined system Ax=b (assuming A has full column rank) for x. Without any other constraints, the solution picked by the Moore-Penrose inverse is the solution to the problem

min_x |x|_2 subject to Ax=b

If x=0 is not a feasible solution, this will in general give solutions where every entry of x is non-zero because the Euclidian norm is isotropic geometrically, whereas using the |x|_1 norm in the problem above promotes finding a solution with a small number of non-zeros due to the fact that the |x|_1 norm from a geometric perspective is a polytope with vertices along the coordinate axes.

A, or J as I am calling it now, doesn't have to have to be invertible and beyond that I don't think the log transformation makes J any less invertible than not doing the transformation. I've been fiddling with some matrices based on the ratios in common salts (NaCl, K2SO4 etc) and find in the linear model a J matrix condition number of about 20. Doing the log transformation that actually drops to 7. If convexity were a problem with the log transformation Solver, which uses a gradient method, shouldn't be able to find solutions and I've done hundreds of ion profiles using the weighted log metric with Solver. Nor should I be able to fit curves to power measurements expressed in dB and I've done thousands of those with SVD.

Yes, I’m certainly not saying the log transformation will for sure cause problems. In fact, because this problem is “almost” convex (-log(Ax) is convex wrt x and the Euclidian distance is certainly convex, but the composition of the two is not convex), it wouldn’t surprise me if this satisfies one of the more exotic forms of convexity like quasi-convexity or pseudo-convexity (real names, I promise). Rather, my point is more of a general statement that one should tread carefully when moving to non-convex optimization problems.

The analogy I’d make is that it’s similar to the jump from linear-time-invariant systems to non-linear, time varying, whatever systems. Sure, people can do some nice things with certain non-LTI problems, but clearly the toolbox is much bigger in LTI land and if a given problem can be modeled as a LTI system that’s the way most people would go unless there were major deficiencies with that choice.

The same thing happens in optimization. Certainly things can be done with non-convex problems, but methods to verify true global optimality, robustness against poor initialization, etc. which you have in the convex world start to go away in the non-convex world and the performance guarantees that you get from various solvers become much weaker.
 
I think it's well past time to try to restore some perspective here. We have a very simple problem which most people, including me, aren't really that interested in solving as experienced brewers know that getting the calcium into a certain range, the chloride into a certain range and the sulfate into a certain range with perhaps a twist of magnesium and sodium in special cases is more than adequate to make a good beer and that tweaking within those ranges to get the best beer depends more on palate and preference than on matching a particular ion profile.

Being of a practical bent I figured I'd try the cost functions you suggest against a real problem so I took pH 8 water with 100 ppm alkalinity, 40 ppm calcium hardness and 3.94 mg/L sodium and studied preparation of water at pH 7 with alkalinity 100, calcium hardness of 174, sulfate content of 50, chloride content of 50 and sodium content 22. To do this, out of the 24 or so choices the my spreadsheet allows I choose to use the common salts CaCl2, NaCl, CaSO4, NaHCO3 and CO2. Lots of experience has shown me that CO2 almost always necessary to get a good match whenever bicarbonate or carbonate are added or whenever pH adjustment is part of the problem. This is:
1)Overlooked by most of the spreadsheets in which sense discussion of 'optimization' using them is at least partly mooted and our discussions become largely academic
2)A PITA and one of the main reasons why profile matching isn't often undertaken by the savvy brewer. See Note at the end.

In any case using these salts I am able to achieve my goal, minimizing mmse (your |x|_2) on p(conc), to the following error levels:

Ca: -5.2%
HCO3: -9.3%
SO4: 1.6%
Cl: 2.1%
Na: 4.5%

RMSE: 4.86%
Taxi: 22.76%
Max: 9.33%

'Taxi' is your |x|_1 norm which I'm calling 'taxi' by analogue to the distance - taxi fare - measured on an orthogonal street grid that I mentioned in a previous post) and, of course, Max is the maximum error (your |x|_&#8734;). If, instead of minimzing the rmse I minimize the taxi norm then I get

Ca: -6.6%
HCO3: -11.5%
SO4: 0%
Cl: 0%
Na: 0%

RMSE: 5.41%
Taxi: 18.08%
Max: 11.5%

And if I shoot for the smallest maximum error I get:

Ca: -6.6%
HCO3: -7.8%
SO4: 0%
Cl: 0%
Na: 7.8%

RMSE: 5.22%
Taxi: 22.76%
Max: 7.76%

So which of these metrics should I use? If I'm trying to minimize my transportation costs between point A and point B on the prairie with a dirt bike clearly I should go the minimum Euclidean distance and use the mmse cost function. If, however, I'm in a city with the rectangular grid of streets I mentioned in an earlier post and I have to use a taxi I should use the taxi (1-norm) cost function. If I am the owner of some financial derivative which calculates my gains or losses based on the maximum difference between LIBOR and Moody's BAA then I should use the maximum metric. In the ion concentration problem I can advance the argument that geometric error (p(target) - p(desired)) is more meaningful than just (target - desired) because the effects of ion concentrations (or other stimuli) in nature tend to be dependent on the log so that my true cost in being wrong is better reflected by the log concentrations but I don't have any basis for arguing that (sum (p(target) - p(desired))^n)^1/n is better for any particular n (except that if n !=2 the math can get hairy) or that any of that family is better than a maximum based cost function or a minimum - maximum based error function or any other such cost function.

Looking at the results from any of the three metrics it is clear that there isn't enough calcium bicarbonate. The path to a better solution is not in choosing a different metric but in modifying the salt set to provide more calcium bicarbonate. This is easily done by adding lime (Ca(OH)2) to the list of available salts as it contains the needed calcium and the hydroxyl can react with supplied CO2 to provide the bicarbonate. Adding lime to the salt selection the rmse goes down to 0.006% and the maximum error goes down to 0.01%. Thus understanding the chemistry is much more important than understanding the nuances of the optimization algorithm. It's been my experience that any real (electrically balanced) water report that I've even been asked to reproduce can be reproduced with a handful of salts to better than 1% concentration error in each ion using mmse on log concentrations (I have also been asked to reproduce ion profiles which cannot exit). What would would be the advantage in switching to something different?

I think the points you are trying to make are:
1) You know a lot about optimization algorithms
2) There are optimization schemes that may have benefit beyond what are available from the simple weighted Euclidian cost function.

We can say that you have demonstrated 1) and acknowledge that 2) is also unquestionably true if you are doing things like weather forcasting, modeling the economy of the US or trying to market a derivative. While some of these benefits may accrue to the accountants at InBev they do not appear to be any for small or, in particular, home brewers. For them the capabilities of Solver seem to be more than adequate and mmse seems to do the job.

I have, in my career, run into situations involving impassioned possessors of solutions in search of problems and this reminds me of those days. We never want to turn our backs on new ideas but in this case it just seems that you are going after flies with cannons.

Again, I encourage you to explore any method you think might give better results to brewers that what is currently available, post spreadsheets or other programs which use them, write papers for JASBC, MBAA... post your findings here etc.

Now when it comes to the related estimation problem:
Two applications come to mind: pH prediction and pH meter calibration. The former is trivial the latter less so. In a nutshell, one makes multiple readings of the voltage and temperature reported by an electrode in at least two standardized buffers (pH(T) is known) over a range of temperatures and, from that data, estimates the electrode's slope offset and isoelectric pH, the three parameters that a calibrated meter needs to convert sample electrode voltage and temperature to a pH value. Here we definitely want the maximum liklihood estimates of the three parameters and would also like to look at the DOP in order to determine whether we are looking at the proper number of readings in the proper temperature ranges. As we are justified in the Gaussian assumption WRT both voltage and temperature error we can do this and must, thus minimize the mse.

I don't really know that there is much else to say. You can keep mentioning other norms if you want to but unless you can put some solid support under them I don't see much point. It was kind of fun checking out the 1 and &#8734; norms. To be honest I didn't think Solver would handle them but it did. I'm not prepared to check everything you can come up with and so ask that you do that yourself and present the results.

Note: In case anyone is actually reading this and wondering how one measures out CO2 the answer is that one doesn't. One runs Solver which tells you that, in the example discussed, you'd need, per liter, 78.16 mg CaCl2, 89.61 mg gypsum, 65.84 mg NaHCO3, 30.29 mg CO2 and 8.52 mg Ca(OH)2. You measure out all the salts, add them to the water and then bubble CO2 through the water until the desired pH (7 in this case) is attained.
 
Back
Top