Skip to content Skip to sidebar Skip to footer

How To Convert Token List Into Wordnet Lemma List Using Nltk?

I have a list of tokens extracted out of a pdf source. I am able to pre process the text and tokenize it but I want to loop through the tokens and convert each token in the list to

Solution 1:

You are calling wordnet.synsets(text) with a list of words (check what is text at that point) and you should call it with a word. The preprocessing of wordnet.synsets is trying to apply .lower() to its parameters and therefore the error (AttributeError: 'list' object has no attribute 'lower').

Below there is a functional version of clean_text with a fix of this problem:

import string
import re
import nltk
from nltk.corpus import wordnet

stopwords = nltk.corpus.stopwords.words('english')
wn = nltk.WordNetLemmatizer()

def clean_text(text):
    text = "".join([word for word intextif word notinstring.punctuation])
    tokens = re.split("\W+", text)
    text = [wn.lemmatize(word) for word in tokens if word notin stopwords]
    lemmas = []
    for token intext:
        lemmas += [synset.lemmas()[0].name() for synset in wordnet.synsets(token)]
    return lemmas


text = "The grass was greener."

print(clean_text(text))

Returns:

['grass', 'Grass', 'supergrass', 'eatage', 'pot', 'grass', 'grass', 'grass', 'grass', 'grass', 'denounce', 'green', 'green', 'green', 'green', 'fleeceable']

Post a Comment for "How To Convert Token List Into Wordnet Lemma List Using Nltk?"