python - How to deal with "None" when I using sklearn-decisiontreeclassifier? -


when use sklearn built decisiontree,examples:

clf = tree.decisiontreeclassifier() clf = clf.fit(x,y) result = clf.predict(testdata) 

x training input samples,if there "none" in x,how it?

decision trees , ensemble methods random forests (based on such trees) accept numerical data since performs splits on each node of tree in order minimize given impurity function (entropy, gini index ...)

if have categorical features or nan in data, learning step throw error.

to circumvent :

  • transform categorical data numerical data : use example one hot encoder. here link sklearn's documentation.

warning : if have feature lot of categories (e.g. id feature) onehotencoding may lead memory issues. try avoid encoding such features.

  • impute values missing ones. many strategies exist (mean, median, frequent ...). here link sklearn's documentation.

once you've done preprocessing, can fit decision tree data.


Comments

Popular posts from this blog

php - How to add and update images or image url in Volusion using Volusion API -

Laravel mail error `Swift_TransportException in StreamBuffer.php line 269: Connection could not be established with host smtp.gmail.com [ #0]` -

c# SetCompatibleTextRenderingDefault must be called before the first -