python - How to deal with "None" when I using sklearn-decisiontreeclassifier? -
when use sklearn built decisiontree,examples:
clf = tree.decisiontreeclassifier() clf = clf.fit(x,y) result = clf.predict(testdata)
x training input samples,if there "none" in x,how it?
decision trees , ensemble methods random forests (based on such trees) accept numerical data since performs splits on each node of tree in order minimize given impurity function (entropy, gini index ...)
if have categorical features or nan
in data, learning step throw error.
to circumvent :
- transform categorical data numerical data : use example one hot encoder. here link
sklearn
's documentation.
warning : if have feature lot of categories (e.g. id
feature) onehotencoding
may lead memory issues. try avoid encoding such features.
- impute values missing ones. many strategies exist (mean, median, frequent ...). here link
sklearn
's documentation.
once you've done preprocessing, can fit decision tree data.
Comments
Post a Comment