How To Work With N-grams For Classification Tasks?
I'm going to train a classifier on a sample dataset using n-gram. I searched for related content and wrote the code below. As I'm a beginner in python, I have two questions. 1- Why
Solution 1:
Before data training, you need to transform your n-grams into matrix of codes with size <number_of_documents, max_document_representation_length>. For example, document representation is a bag-of-words where each word/n-gram of a corpus dictionary has its frequency in a document.
Naive Bayes classifier is the most simple classifier. But it works bad on noisy data and needs balanced data classes' distribution for training. You can try to use any boosting classifier, for example, gradient boosting machine or support vector machine.
All classifiers and transformers are available in scikit-learn
library.
Post a Comment for "How To Work With N-grams For Classification Tasks?"