python - Pickling a trained classifier yields different results from the results obtained directly from a newly but identically trained classifier -
i'm trying pickle trained svm classifier scikit-learn library don't have train on , on again. when pass test data classifier loaded pickle, unusually high values accuracy, f measure, etc. if test data passed directly classifier not pickled, gives lower values. don't understand why pickling , unpickling classifier object changing way behaves. can please me out this?
i'm doing this:
from sklearn.externals import joblib joblib.dump(grid, 'grid_trained.pkl')
here, grid
trained classifier object. when unpickle it, acts different when directly used.
there should not difference @andreasmueller stated, here's modified example using pickle
from sklearn.datasets import fetch_20newsgroups sklearn.feature_extraction.text import countvectorizer sklearn.feature_extraction.text import tfidftransformer sklearn.naive_bayes import multinomialnb # set labels , data categories = ['alt.atheism', 'soc.religion.christian', '', ''] twenty_train = fetch_20newsgroups(subset='train', categories=categories, shuffle=true, random_state=42) # vectorize data count_vect = countvectorizer() x_train_counts = count_vect.fit_transform( # tf-idf transformation tf_transformer = tfidftransformer(use_idf=false).fit(x_train_counts) x_train_tf = tf_transformer.transform(x_train_counts) tfidf_transformer = tfidftransformer() x_train_tfidf = tfidf_transformer.fit_transform(x_train_counts) # train classifier clf = multinomialnb().fit(x_train_tfidf, # tag new data docs_new = ['god love', 'opengl on gpu fast'] x_new_counts = count_vect.transform(docs_new) x_new_tfidf = tfidf_transformer.transform(x_new_counts) predicted = clf.predict(x_new_tfidf) answers = [(doc, twenty_train.target_names[category]) doc, category in zip(docs_new, predicted)] # pickle classifier import pickle open('', 'wb') fout: pickle.dump(clf, fout) # let's clear classifier clf = none open('', 'rb') fin: clf = pickle.load(fin) # retag new data docs_new = ['god love', 'opengl on gpu fast'] x_new_counts = count_vect.transform(docs_new) x_new_tfidf = tfidf_transformer.transform(x_new_counts) predicted = clf.predict(x_new_tfidf) answers_from_loaded_clf = [(doc, twenty_train.target_names[category]) doc, category in zip(docs_new, predicted)] assert answers_from_loaded_clf == answers print "answers freshly trained classifier , loaded pre-trained classifer same !!!"
it's same when using sklearn.externals.joblib
# pickle classifier sklearn.externals import joblib joblib.dump(clf, '') # let's clear classifier clf = none # loads pretrained classifier clf = joblib.load('') # retag new data docs_new = ['god love', 'opengl on gpu fast'] x_new_counts = count_vect.transform(docs_new) x_new_tfidf = tfidf_transformer.transform(x_new_counts) predicted = clf.predict(x_new_tfidf) answers_from_loaded_clf = [(doc, twenty_train.target_names[category]) doc, category in zip(docs_new, predicted)] assert answers_from_loaded_clf == answers print "answers freshly trained classifier , loaded pre-trained classifer same !!!"
Post a Comment