Ed, I'll get to your questions in a day or so when I have more time. If I space it out, please PM me to remind me!
Thank you, sir. I have read about a study of similar design (Brauwelt) in Fix's "Principles of Brewing Science," though it compared FWH to late additions while yours compared it to 60min additions.
In your experiment, 7 of 18 picked out the different beer, which is slightly better than the expected value when considering this a random variable. However, if these were purely random choices, you would expect to get 7 or more successes out of 18 about 39% of the time.
In the 1995 Brauwelt study, comparing FWH to late additions, 23 of 25 tasters correctly identified the different beer. There is only a 0.00000015% chance of this level of success being simply random. Thus, we have good evidence that FWH can be distinguished from late additions and this is even under the condition that "middle additions" were used in both the FWH and the reference beer.
Ironically, given the popularity of the opinion that the perception of FWH is similar to a 20 min addition in both bitterness and flavor, I don't know of any study designed to directly compare FWH to a 20 min addition. Such a study seems overdue. Would tasters pick the different beer in a triangle test at around the value expected by random chance, as in your study when comparing FWH to 60min, or would it show a clearly detectable difference, as in the Brauwelt study?