Longtime reader of the blog writes (slightly edited):
The author of the comment cites calibration as the problem–different data set from which rule derived, etc.
However, he uses the case of smoker and non-smoker (5x risk) and downplays the impact of above and all the attention the “flawed” model received. At least I think that’s what he’s saying.
Therefore, and my question: is he saying discrimination works within model, but it’s poorly calibrated?
To which our other reader responds:
That’s right. The entire argument was that the new score (which was made from 4 large cohort studies) calibrated poorly to 3 large RCTs. Yesterday it was impossible to know because the NYTimes article was referring to a Lancet article that was just now e-published. (How’d that happen? Isn’t this what embargo rules are about?). Previous studies (cf D’agostino JAMA 2001) have also found this – poor calibration between datasets, good discrimination. There’s no “error,” just complicated findings.
I find this discussion fascinating, but I admit I’m an observer here, not an expert. On the other hand, why doesn’t the NYT have this debate?