Key points are not available for this paper at this time.
We analyze fifteen Twitter user geolocation models and two baselines comparing how they are evaluated. Our results demonstrate that the choice of effectiveness metric can have a substantial impact on the conclusions drawn from an experiment. We show that for general evaluations, a range of metrics should be reported to ensure that a complete picture of system effectiveness is conveyed.
Mourad et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: