from string import punctuation as pnc
tokens = {':)', 'cool', 'happy', 'fun'}
tweets = ['this has been a fun day :)', 'i find python cool! it makes me happy']
for tweet in tweets:
s = [(word in tokens or word.strip(pnc) in tokens) for word in tweet.split()]
print(' '.join('1' if t else '0' for t in s))
输出:
0 0 0 0 1 0 1
0 0 0 1 0 0 0 1
如@EOL所示,or
第四行中的可以处理:)
。
仍然存在无法正确处理的情况,例如使用cool :), I like it
。问题是需求所固有的。