python - Pickling a trained classifier yields different results from the results obtained directly from a newly but identically trained classifier -


i'm trying pickle trained svm classifier scikit-learn library don't have train on , on again. when pass test data classifier loaded pickle, unusually high values accuracy, f measure, etc. if test data passed directly classifier not pickled, gives lower values. don't understand why pickling , unpickling classifier object changing way behaves. can please me out this?

i'm doing this:

from sklearn.externals import joblib joblib.dump(grid, 'grid_trained.pkl') 

here, grid trained classifier object. when unpickle it, acts different when directly used.

there should not difference @andreasmueller stated, here's modified example http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html#loading-the-20-newgroups-dataset using pickle:

from sklearn.datasets import fetch_20newsgroups sklearn.feature_extraction.text import countvectorizer sklearn.feature_extraction.text import tfidftransformer sklearn.naive_bayes import multinomialnb  # set labels , data categories = ['alt.atheism', 'soc.religion.christian', 'comp.graphics', 'sci.med'] twenty_train = fetch_20newsgroups(subset='train', categories=categories, shuffle=true, random_state=42)  # vectorize data count_vect = countvectorizer() x_train_counts = count_vect.fit_transform(twenty_train.data)  # tf-idf transformation tf_transformer = tfidftransformer(use_idf=false).fit(x_train_counts) x_train_tf = tf_transformer.transform(x_train_counts) tfidf_transformer = tfidftransformer() x_train_tfidf = tfidf_transformer.fit_transform(x_train_counts)  # train classifier clf = multinomialnb().fit(x_train_tfidf, twenty_train.target)  # tag new data docs_new = ['god love', 'opengl on gpu fast'] x_new_counts = count_vect.transform(docs_new) x_new_tfidf = tfidf_transformer.transform(x_new_counts) predicted = clf.predict(x_new_tfidf)  answers = [(doc, twenty_train.target_names[category]) doc, category in zip(docs_new, predicted)]   # pickle classifier import pickle open('clf.pk', 'wb') fout:     pickle.dump(clf, fout)  # let's clear classifier clf = none  open('clf.pk', 'rb') fin:     clf = pickle.load(fin)  # retag new data docs_new = ['god love', 'opengl on gpu fast'] x_new_counts = count_vect.transform(docs_new) x_new_tfidf = tfidf_transformer.transform(x_new_counts) predicted = clf.predict(x_new_tfidf)  answers_from_loaded_clf = [(doc, twenty_train.target_names[category]) doc, category in zip(docs_new, predicted)]  assert answers_from_loaded_clf == answers print "answers freshly trained classifier , loaded pre-trained classifer same !!!" 

it's same when using sklearn.externals.joblib too:

# pickle classifier sklearn.externals import joblib joblib.dump(clf, 'clf.pk')  # let's clear classifier clf = none  # loads pretrained classifier clf = joblib.load('clf.pk')  # retag new data docs_new = ['god love', 'opengl on gpu fast'] x_new_counts = count_vect.transform(docs_new) x_new_tfidf = tfidf_transformer.transform(x_new_counts) predicted = clf.predict(x_new_tfidf)  answers_from_loaded_clf = [(doc, twenty_train.target_names[category]) doc, category in zip(docs_new, predicted)]  assert answers_from_loaded_clf == answers print "answers freshly trained classifier , loaded pre-trained classifer same !!!" 

Comments

Popular posts from this blog

java - Oracle EBS .ClassNotFoundException: oracle.apps.fnd.formsClient.FormsLauncher.class ERROR -

c# - how to use buttonedit in devexpress gridcontrol -

How do you convert a timestamp into a datetime in python with the correct timezone? -