python - How to deal with "None" when I using sklearn-decisiontreeclassifier? -


when use sklearn built decisiontree,examples:

clf = tree.decisiontreeclassifier() clf = clf.fit(x,y) result = clf.predict(testdata) 

x training input samples,if there "none" in x,how it?

decision trees , ensemble methods random forests (based on such trees) accept numerical data since performs splits on each node of tree in order minimize given impurity function (entropy, gini index ...)

if have categorical features or nan in data, learning step throw error.

to circumvent :

  • transform categorical data numerical data : use example one hot encoder. here link sklearn's documentation.

warning : if have feature lot of categories (e.g. id feature) onehotencoding may lead memory issues. try avoid encoding such features.

  • impute values missing ones. many strategies exist (mean, median, frequent ...). here link sklearn's documentation.

once you've done preprocessing, can fit decision tree data.


Comments

Popular posts from this blog

php - How to add and update images or image url in Volusion using Volusion API -

javascript - jQuery UI Splitter/Resizable for unlimited amount of columns -

javascript - IE9 error '$'is not defined -