我已经运行了您的代码,并得到了相同的错误。有关有效的解决方案,请参见下文。这里是解释:
LazyCorpusLoader
是代理对象,代表在加载主体之前的主体对象。(这可以防止NLTK在需要它们之前将大量的语料库加载到内存中。)但是,第一次访问此代理对象时,它将成为 您打算加载的语料库。也就是说,LazyCorpusLoader
代理对象将其__dict__
和__class__
转换为您正在加载的语料库的__dict__
和__class__
。
如果将代码与上述错误进行比较,当您尝试创建类的10个实例时,您会看到收到9个错误。第一次将LazyCorpusLoader
代理对象转换为WordNetCorpusReader
对象。首次访问wordnet时触发了此操作:
from nltk.corpus import wordnet as wn
def is_good_word(word):
...
wn.ensure_loaded() # `LazyCorpusLoader` conversion into `WordNetCorpusReader` starts
但是,当您开始is_good_word
在第二个线程中运行函数时,第一个线程尚未将LazyCorpusLoader
代理对象完全转换为WordNetCorpusReader
。wn
仍然是LazyCorpusLoader
代理对象,因此它将__load
再次开始该过程。但是,一旦到达尝试将其__class__
和__dict__
转换为WordNetCorpusReader
对象的地步,第一个线程便已将LazyCorpusLoader
代理对象转换为WordNetCorpusReader
。我的猜测是,您在以下我的评论行中遇到错误:
class LazyCorpusLoader(object):
...
def __load(self):
...
corpus = self.__reader_cls(root, *self.__args, **self.__kwargs) # load corpus
...
# self.__args == self._LazyCorpusLoader__args
args, kwargs = self.__args, self.__kwargs # most likely the line throwing the error
一旦第一个线程将LazyCorpusLoader
代理对象转换为WordNetCorpusReader
对象,名称混乱的名称将不再起作用。该WordNetCorpusReader
对象不会有LazyCorpusLoader
在其重整的名称的任何地方。(self.__args
当对象是LazyCorpusLoader
对象时,等效于self._LazyCorpusLoader__args 。)因此,您将收到以下错误:
AttributeError: 'WordNetCorpusReader' object has no attribute '_LazyCorpusLoader__args'
鉴于此问题,您将需要 在 进入线程 之前 访问该wn
对象。这是您的代码被适当修改的: __
from nltk.corpus import wordnet as wn
from nltk.corpus import stopwords
from nltk.corpus.reader.wordnet import WordNetError
import sys
import time
import threading
cachedStopWords = stopwords.words("english")
def is_good_word(word):
word = word.strip()
if len(word) <= 2:
return 0
if word in cachedStopWords:
return 0
try:
if len(wn.lemmas(str(word), lang='en')) == 0: # no longer the first access of wn
return 0
except WordNetError as e:
print("WordNetError on concept {}".format(word))
except AttributeError as e:
print("Attribute error on concept {}: {}".format(word, e.message))
except:
print("Unexpected error on concept {}: {}".format(word, sys.exc_info()[0]))
else:
return 1
return 1
class ProcessMetaThread(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
def run(self):
is_good_word('dog')
def process_Meta(numberOfThreads):
print wn.__class__ # <class 'nltk.corpus.util.LazyCorpusLoader'>
wn.ensure_loaded() # first access to wn transforms it
print wn.__class__ # <class 'nltk.corpus.reader.wordnet.WordNetCorpusReader'>
threadsList = []
for i in range(numberOfThreads):
start = time.clock()
t = ProcessMetaThread()
print time.clock() - start
t.setDaemon(True)
t.start()
threadsList.append(t)
numComplete = 0
while numComplete < numberOfThreads:
# Iterate over the active processes
for processNum in range(0, numberOfThreads):
# If a process actually exists
if threadsList != None:
# If the process is finished
if not threadsList[processNum] == None:
if not threadsList[processNum].is_alive():
numComplete += 1
threadsList[processNum] = None
time.sleep(5)
print('Processes Finished')
if __name__ == '__main__':
process_Meta(10)