python - How to deal with "None" when I using sklearn-decisiontreeclassifier? -


when use sklearn built decisiontree,examples:

clf = tree.decisiontreeclassifier() clf = clf.fit(x,y) result = clf.predict(testdata) 

x training input samples,if there "none" in x,how it?

decision trees , ensemble methods random forests (based on such trees) accept numerical data since performs splits on each node of tree in order minimize given impurity function (entropy, gini index ...)

if have categorical features or nan in data, learning step throw error.

to circumvent :

  • transform categorical data numerical data : use example one hot encoder. here link sklearn's documentation.

warning : if have feature lot of categories (e.g. id feature) onehotencoding may lead memory issues. try avoid encoding such features.

  • impute values missing ones. many strategies exist (mean, median, frequent ...). here link sklearn's documentation.

once you've done preprocessing, can fit decision tree data.


Comments

Popular posts from this blog

sql - can we replace full join with union of left and right join? why not? -

javascript - Parallax scrolling and fixed footer code causing width issues -

iOS: Performance of reloading UIImage(name:...) -