it is well-known that k-means has the major drawback of not being able to separate data points that are not linearly separable in the given feature space (e.g, see Dhillon et al. (2004))
出自：Mine the Easy, Classify the Hard: A Semi-Supervised Approach to Automatic Sentiment Classification
For example, Foster et al. (2011) report a drastic drop in performance when moving from the Wall Street Journal (WSJ) domain (training set) to the Twit-ter dataset (used for evaluation)
引用 Foster, Jennifer, ?zlem ?etinoglu, Joachim Wagner, Joseph Le Roux, Stephen Hogan, Joakim Nivre, Deirdre Ho-gan, & Josef Van Genabith. 2011. # hard-to-parse: POS Tagging and Parsing the Twitterverse. In proceedings of the Workshop On Analyzing Microtext (AAAI 2011), pp. 20-25. 2011.