In the estimation case we have the set of equations [
]
Well, ok, we could continue to go back on forth about whether or not we should call the MMSE Baysian with a Gaussian prior since it approximates things with just the covariance matrix which is the sufficient statistic of a zero-mean Gaussian, when/how this relates to the MLE or MVUE, what exactly we should be referring to as the Hessian/Jacobian, etc., but I think (and Im assuming you would agree) that 1) we are both well aware of these various subtleties and the only reason were talking about them is because were not writing textbook chapters here (although one may not know it from how long this post ended up being) so we are both bound to make incomplete/slightly imprecise statements which the other then tries to correct and 2) aside from the guy who stumbled into this after 4 glasses of wine and didnt know any better, most people tuned out from this many posts ago
So, returning to the topic at hand
I would say those [computational complexities] are very good reasons for shying away from such metrics.
In this case actually, the computational complexity is basically the same and both problems can be solved in Excel (to answer your curiosity question). Before I get into things, Ill just define a few norms for notational convenience. Given a vector x, let
|x|_1 = sum_i abs(x_i) the sum of the absolute values of x
|x|_2 = sqrt(sum_i x_i^2)) standard Euclidian distance
|x|_inf = max_i {abs(x_i)} the maximum absolute value of any element of x
If we define e(x)=(Ax-b)/b like in my last post, the two problems I described before are just
min_x (|e(x)|_2)^2 subject to x>=0
min_x |e(x)|_inf subject to x>=0
The second problem, by definition of |e(x)|_inf, is equivalent to
min_{t,x} t subject to t>=0, -t <= e(x), and t>= e(x)
which is just a linear program in standard form (which Excel apparently can do). Alternatively, you can replace the t with a t squared in the objective function and you now have a weighted least squares problem with linear inequality constraints, which is exactly the type of problem we were discussing before.
If you can think of a better metric for expressing the difference between one set of ion concentrations and another then propose it and let people decide whether they think that metric is somehow better than the a Euclidean metric.
Well, off the top of my head, in addition to the forms I discussed before (which wont really give drastically different answers from each other), I think most of us would agree that at the end of day, if were talking about ion concentrations in our brewing water, only the most OCD of us want to hit the target water profile at 1e-6 precision, so instead we can define a range of errors were willing to accept:
L <= e(x) <= U
for some lower and upper bound vectors {L,U} (or alternatively, a constraint like |e(x)|<=epsilon for some choice of norm and epsilon>0). Now we could just find any x that satisfies this requirement, but of course as long as we have picked physically realizable bounds, there is now possibly an entire set of feasible solution vectors to choose from (similarly, if A is underdetermined then we can have a set of multiple solution vectors even if we shrink the bounds to an equality constraint).
Of course, so far we havent really bought ourselves anything over just using a least squares fit, but now, suppose Im lazy and dont want to make 16 different manipulations to my water (I think another post mentioned something like solving for 16 variables in a spreadsheet). This suggests we should pick feasible solutions with as many zeros in x as possible (as a 0 in x means I dont need to add that ingredient), so wed want to solve something like
Minimize the number of non-zero entries of x, subject to L <= e(x) <= U and x>=0
The problem with this is that we now have a combinatorial objective function, so wed have to first test if a feasible solution exists using just one of the ingredients, then for all pairs of ingredients, all triplets, etc... For 16 parameters/ingredients, as a worst case we might have to solve 2^16 least squared/feasibility problems; if we really wanted to brute force this, this isnt
too many problems to solve, but obviously we wouldnt want to do it in Excel and this keeps growing exponentially if we add more variables.
Instead, a much more friendly objective function would be the closest convex relaxation of the above problem (where closest has a specific mathematical meaning that I wont get into), which is given by
min_x |x|_1 subject to L <= e <= U and x>=0
Here, from the definition of |x|_1 and the fact that x>=0, this is identical to
min_x sum(x) subject to L <= e <= U and x>=0
This is now a convex problem (and again a standard form linear program), so it can be easily solved. Further, if A satisfies certain properties it can be proven that the solution to this convex problem will give the exact solution to the combinatorial problem from above. Id suspect that for the brewing water problem A will not satisfy these properties, but we will still empirically get solutions for x which promote using just a few non-zero elements in x, and as we relax the bounds we get solutions with fewer and fewer non-zero elements.
As an aside, this is also a good example of where the choice of norm becomes interesting. Suppose we want to solve an underdetermined system Ax=b (assuming A has full column rank) for x. Without any other constraints, the solution picked by the Moore-Penrose inverse is the solution to the problem
min_x |x|_2 subject to Ax=b
If x=0 is not a feasible solution, this will in general give solutions where every entry of x is non-zero because the Euclidian norm is isotropic geometrically, whereas using the |x|_1 norm in the problem above promotes finding a solution with a small number of non-zeros due to the fact that the |x|_1 norm from a geometric perspective is a polytope with vertices along the coordinate axes.
A, or J as I am calling it now, doesn't have to have to be invertible and beyond that I don't think the log transformation makes J any less invertible than not doing the transformation. I've been fiddling with some matrices based on the ratios in common salts (NaCl, K2SO4 etc) and find in the linear model a J matrix condition number of about 20. Doing the log transformation that actually drops to 7. If convexity were a problem with the log transformation Solver, which uses a gradient method, shouldn't be able to find solutions and I've done hundreds of ion profiles using the weighted log metric with Solver. Nor should I be able to fit curves to power measurements expressed in dB and I've done thousands of those with SVD.
Yes, Im certainly not saying the log transformation will for sure cause problems. In fact, because this problem is almost convex (-log(Ax) is convex wrt x and the Euclidian distance is certainly convex, but the composition of the two is not convex), it wouldnt surprise me if this satisfies one of the more exotic forms of convexity like quasi-convexity or pseudo-convexity (real names, I promise). Rather, my point is more of a general statement that one should tread carefully when moving to non-convex optimization problems.
The analogy Id make is that its similar to the jump from linear-time-invariant systems to non-linear, time varying, whatever systems. Sure, people can do some nice things with certain non-LTI problems, but clearly the toolbox is much bigger in LTI land and if a given problem can be modeled as a LTI system thats the way most people would go unless there were major deficiencies with that choice.
The same thing happens in optimization. Certainly things can be done with non-convex problems, but methods to verify true global optimality, robustness against poor initialization, etc. which you have in the convex world start to go away in the non-convex world and the performance guarantees that you get from various solvers become much weaker.