您好, 欢迎来到 !    登录 | 注册 | | 设为首页 | 收藏本站

使用NLTK WordNet查找专有名词

使用NLTK WordNet查找专有名词

我认为您不需要WordNet来查找专有名词,我建议使用词性标记pos_tag

from nltk.tag import pos_tag

sentence = "Michael Jackson likes to eat at McDonalds"
tagged_sent = pos_tag(sentence.split())
# [('Michael', 'NNP'), ('Jackson', 'NNP'), ('likes', 'VBZ'), ('to', 'TO'), ('eat', 'VB'), ('at', 'IN'), ('McDonalds', 'NNP')]

propernouns = [word for word,pos in tagged_sent if pos == 'NNP']
# ['Michael','Jackson', 'McDonalds']

您可能没有,因为很满意Michael,并Jackson分裂成2个令牌,则可能需要更复杂的东西,如名称实体恶搞。

penntreebank标签集所记录的那样,对于所有格名词而言,只要找到POS标签,您就可以轻松找到http://www.mozart-oz.org/mogul/doc/lager/brill-tagger/penn.html。但往往是恶搞不标记POS时,它的一个NNP

from nltk.tag import pos_tag

sentence = "Michael Jackson took Daniel Jackson's hamburger and Agnes' fries"
tagged_sent = pos_tag(sentence.split())
# [('Michael', 'NNP'), ('Jackson', 'NNP'), ('took', 'VBD'), ('Daniel', 'NNP'), ("Jackson's", 'NNP'), ('hamburger', 'NN'), ('and', 'CC'), ("Agnes'", 'NNP'), ('fries', 'NNS')]

possessives = [word for word in sentence if word.endswith("'s") or word.endswith("s'")]
# ["Jackson's", "Agnes'"]

另外,您可以使用NLTK,ne_chunk但是除非您担心从句子中获得哪种专有名词,否则它似乎没有其他作用:

>>> from nltk.tree import Tree; from nltk.chunk import ne_chunk
>>> [chunk for chunk in ne_chunk(tagged_sent) if isinstance(chunk, Tree)]
[Tree('PERSON', [('Michael', 'NNP')]), Tree('PERSON', [('Jackson', 'NNP')]), Tree('PERSON', [('Daniel', 'NNP')])]
>>> [i[0] for i in list(chain(*[chunk.leaves() for chunk in ne_chunk(tagged_sent) if isinstance(chunk, Tree)]))]
['Michael', 'Jackson', 'Daniel']

使用ne_chunk有点冗长,并不能使您拥有所有格。

dotnet 2022/1/1 18:35:31 有349人围观

撰写回答


你尚未登录,登录后可以

和开发者交流问题的细节

关注并接收问题和回答的更新提醒

参与内容的编辑和改进,让解决方法与时俱进

请先登录

推荐问题


联系我
置顶