python - How to deal with "None" when I using sklearn-decisiontreeclassifier? -


when use sklearn built decisiontree,examples:

clf = tree.decisiontreeclassifier() clf = clf.fit(x,y) result = clf.predict(testdata) 

x training input samples,if there "none" in x,how it?

decision trees , ensemble methods random forests (based on such trees) accept numerical data since performs splits on each node of tree in order minimize given impurity function (entropy, gini index ...)

if have categorical features or nan in data, learning step throw error.

to circumvent :

  • transform categorical data numerical data : use example one hot encoder. here link sklearn's documentation.

warning : if have feature lot of categories (e.g. id feature) onehotencoding may lead memory issues. try avoid encoding such features.

  • impute values missing ones. many strategies exist (mean, median, frequent ...). here link sklearn's documentation.

once you've done preprocessing, can fit decision tree data.


Comments

Popular posts from this blog

c# SetCompatibleTextRenderingDefault must be called before the first -

C#.NET Oracle.ManagedDataAccess ConfigSchema.xsd -

c++ - Fill runtime data at compile time with templates -