Certainly the functions of interest here are differentiable and continuous and the non zero constraint is easily implemented.
I'd argue the function of interest here being something like the sum of squared errors is more out of mathematical convenience and historical precedent than having much physical meaning for the problem. The squared error is making an inherent statistical assumption that the noise in the model is gaussian. For something like least squares regression where you're finding one set of parameters to fit a large number of data points to a model the gaussian assumption is certainly reasonable due to central limit theorem arguments. When you only have a single 'data point' such as a target ion concentration you're trying to match this isn't necessarily the case.
If your system is Ax = b with A the matrix of partials at x0, your initial guess at a solution, then your correction to x0 is ∆x = A#(Ax0-b) (A# is Moore-Penrose pseudo inverse of A). If ∆x carries any element of x below 0 you limit that element to 0 and carry on. I won't say that there aren't situations where that might back you into a corner distant from the best solution but it's always worked for me on these problems (and actually any other I've applied it to) anyway. Maybe I shouldn't call this Newton's method but Newton's method is what you get if you do it in 1 dimension.
What you're describing will sometimes work and sometimes won’t depending on the geometry of the constraint set. For minimizing a quadratic objective function, such as |Ax-b|^2, this certainly can be solved in closed form using a Moore-Penrose pseudo inverse if the system is unconstrained. Because the objective function is quadratic the Newton direction will always point from the current value of x to the unconstrained solution. As a result, when you add constraints you’re going to move from the initial point towards the unconstrained solution until you hit a constraint boundary (or arrive at the unconstrained solution if it lies in the feasible set). Assuming you hit a constraint boundary, you now need to proceed along the constraint boundary; for simple constraints like x>=0 you can remove variables from the ‘active set’ of variables we’re optimizing over once they hit the boundary and continue with a reduced dimensionality problem, but for more complicated constraints this won’t necessarily work. For example, if we want to limit the maximum concentrations of some ions/acids then you’ll have a constraint of the form Mx <= y. Once this is combined with the constraint x>=0, you now have ‘corners’ in the feasible set where you can get stuck, so you’ll potentially need to return variables to the active set and move along a new facet of the constraint set. But, at that point, you’re basically just doing a bizarre form of quadratic programming.
In general, for ion concentration matching, I prefer the error function to be the rms of p[Ci]/wi in other words, to minimize the weighted geometric error (Ci is the concentration of the ion, wi the weight assigned to it an p the -log operator. As this function is clearly differentiable WRT each Ci Moore-Pensrose is suited. More to the point, Solver can certainly handle it. If I want to minimize the error with respect to one of the ions I just put a bigger weight on it.
Solver is capable of doing problems like this. You can, for example, ask it to minimize proton deficit (predict mash pH) while getting the calcium ion concentration as close to 100 as possible trimming any unneutralized alkalinity with sauermalz.
I’m not totally sure what you’re defining as your error function, but assuming it’s sum_i (p[Ci]/wi – p[Ti]/wi)^2 where Ci is the concentration we’re trying to fit and Ti is the target, then you’ll have to be a bit careful here if you try to apply Newton’s method as this is no longer convex wrt the Ci variables. As a result, the Hessian may not be positive definite and the Newton direction may not give a descent direction. The Excel solver is likely using a version of gradient descent, which should at least give a local minimum, but not necessarily a global minimum.
edit: Rereading this, I'm assuming you mean you do a change of variables yi = p[Ci], solve an unconstrained weighted least squares problem for the yi variables and then just take Ci = exp{-yi}. If so, then sure this can obviously be done in closed form and an optimization method isn't needed.
The beauty of Moore-Penrose (Roger P is on my mind as I just saw 'Theory of Everything') is that the error analysis is built in. A# = GDOP*A_tranpose where GDOP is the dilution of precision matrix so you have to compute it anuyway to get to A# (you don't really, of course if you used SVD on A but it's trivial to do so). If you have estimates for errors, even for consider parameters, it is possible to get covariance estimates for the things you are solving for.
Right, the fact that second order methods like Newton’ or interior point methods (which is basically an augmentation of Newton’s method to handle non-smooth constraints/functions) include the Hessian information (your GDOP is the inverse of the Hessian) is why they will typically be orders of magnitude faster than a first order method like gradient descent. The only downside to them is that the Hessian grows by the number of variables squared, so for large scale problems second order methods aren’t feasible.