Example usage
Here we will demonstrate how to use clevercloud
in a project to create a meaningful word cloud:
Imports
from clevercloud.CleverClean import CleverClean
from clevercloud.CleverLemStem import CleverLemStem
from clevercloud.CleverStopwords import CleverStopwords
from clevercloud.CleverWordCloud import CleverWordCloud
Create a Pandas series
We will first create a Pandas series to use for CleverClean
:
import pandas as pd
text = ["is is a feet feet crying beautiful123", "maximum feet RUNNING!!", "BEAUTIFUL feet beautiful crying"]
test_text = pd.Series(text)
test_text
0 is is a feet feet crying beautiful123
1 maximum feet RUNNING!!
2 BEAUTIFUL feet beautiful crying
dtype: object
CleverClean
CleverClean
is a preprocessor to convert all the letters to lower case and remove punctuations.
clean_text = CleverClean(test_text)
clean_text
'is is a feet feet crying beautiful maximum feet running beautiful feet beautiful crying '
CleverLemStem
CleverLemStem
is a preprocessor to conduct lemmatization and stemming on the text.
final_text = CleverLemStem(clean_text)
final_text
[nltk_data] Downloading package omw-1.4 to /home/docs/nltk_data...
[nltk_data] Unzipping corpora/omw-1.4.zip.
[nltk_data] Downloading package wordnet to /home/docs/nltk_data...
[nltk_data] Unzipping corpora/wordnet.zip.
'is is a foot foot cry beauty maxim foot run beauty foot beauty cry'
CleverStopwords
CleverStopwords
is a comprehensive list of English stopwords that allow adding more customized words.
new_stopwords = CleverStopwords({"foot", "cry"})
new_stopwords
[nltk_data] Downloading package stopwords to /home/docs/nltk_data...
[nltk_data] Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package stopwords to /home/docs/nltk_data...
[nltk_data] Package stopwords is already up-to-date!
{'a',
'about',
'above',
'after',
'again',
'against',
'ain',
'all',
'am',
'an',
'and',
'any',
'are',
'aren',
"aren't",
'as',
'at',
'be',
'because',
'been',
'before',
'being',
'below',
'between',
'both',
'but',
'by',
'can',
'couldn',
"couldn't",
'cry',
'd',
'did',
'didn',
"didn't",
'do',
'does',
'doesn',
"doesn't",
'doing',
'don',
"don't",
'down',
'during',
'each',
'few',
'foot',
'for',
'from',
'further',
'had',
'hadn',
"hadn't",
'has',
'hasn',
"hasn't",
'have',
'haven',
"haven't",
'having',
'he',
'her',
'here',
'hers',
'herself',
'him',
'himself',
'his',
'how',
'i',
'if',
'in',
'into',
'is',
'isn',
"isn't",
'it',
"it's",
'its',
'itself',
'just',
'll',
'm',
'ma',
'me',
'mightn',
"mightn't",
'more',
'most',
'mustn',
"mustn't",
'my',
'myself',
'needn',
"needn't",
'no',
'nor',
'not',
'now',
'o',
'of',
'off',
'on',
'once',
'only',
'or',
'other',
'our',
'ours',
'ourselves',
'out',
'over',
'own',
're',
's',
'same',
'shan',
"shan't",
'she',
"she's",
'should',
"should've",
'shouldn',
"shouldn't",
'so',
'some',
'such',
't',
'than',
'that',
"that'll",
'the',
'their',
'theirs',
'them',
'themselves',
'then',
'there',
'these',
'they',
'this',
'those',
'through',
'to',
'too',
'under',
'until',
'up',
've',
'very',
'was',
'wasn',
"wasn't",
'we',
'were',
'weren',
"weren't",
'what',
'when',
'where',
'which',
'while',
'who',
'whom',
'why',
'will',
'with',
'won',
"won't",
'wouldn',
"wouldn't",
'y',
'you',
"you'd",
"you'll",
"you're",
"you've",
'your',
'yours',
'yourself',
'yourselves'}
CleverWordCloud
CleverWordCloud
is a function to generate a meaningful word cloud that allows customized stopwords.
image = CleverWordCloud(final_text, new_stopwords, 3)
image
Matplotlib is building the font cache; this may take a moment.