csv - Column separation python -
i working on bachelor thesis , working python analyze data. unfortunately not programming expert nor know working python.
i have code seperates columns in csv files comma. want code seperate columns |.
i have tried replace comma in line 58 | not work, surprise surprise. because such noob in programming field, google search did not make sense me @ all. largely appreciated!
from sklearn.feature_extraction.text import countvectorizer sklearn import linear_model import csv import cpickle sklearn.metrics import accuracy_score def main(): train_file = "train.csv" test_file = "test.csv" # read documents train_docs, y = read_docs(train_file) # define features extract (character bigrams in case) extract = countvectorizer(lowercase=false, ngram_range=(2,2), analyzer="char") extract.fit(train_docs) # create vocabulary training data # extract features train data x = extract.transform(train_docs) # initialize model model = linear_model.logisticregression() # train model model.fit(x, y) # write model file can reused cpickle.dump((extract,model),open("model.pickle","w")) # print coefficients see features important i,f in enumerate(extract.get_feature_names()): print f, model.coef_[0][i] # testing # read test data test_docs, y_test = read_docs(test_file) # extract features test data x_test = extract.transform(test_docs) # apply model test data y_predict = model.predict(x_test) # evaluation print accuracy_score(y_test, y_predict) def read_docs(filename): ''' return x,y x list of documents , y list of labels. ''' x = [] y = [] open(filename) f: r = csv.reader(f) row in r: text,label = row x.append(text) y.append(int(label)) return x,y main()
at moment got far this:
csv.register_dialect('pipes', delimiter='|') open(filename) f: r = csv.reader(f, dialect ='pipes') row in r: text,label = row x.append(text) y.append(int(label)) return x,y
but keep getting error now:
traceback (most recent call last): file "d:/python/logreggwen.py", line 67, in <module> main() file "d:/python/logreggwen.py", line 11, in main train_docs, y = read_docs(train_file) file "d:/python/logreggwen.py", line 61, in read_docs text,label = row valueerror: need more 1 value unpack
you need tell csv reader delimiter data file uses:
csv.reader(f, delimiter='|')
but actually, need read corresponding documentation:
Comments
Post a Comment