python - predicting crime in san francisco, ValueError -
i ran error while trying project: valueerror: found arrays inconsistent numbers of samples: [878049 884262]
.
i when try run knn classifier @ bottom. i've been reading , know it's because x , y not same. shape x (878049, 2) , y (884262, ).
how can fix error match?
code:
# drop features wont using # train.head() df = train.drop(['descript', 'resolution', 'address'], axis=1) df2 = test.drop(['address'], axis=1) # trying see times during day particular crime occurs, example # rapes occur more 12am-4am during weekend. # example below dow = { 'monday':0, 'tuesday':1, 'wednesday':2, 'thursday':3, 'friday':4, 'saturday':5, 'sunday':6 } df['dow'] = df.dayofweek.map(dow) # add column containing time of day df['hour'] = pd.to_datetime(df.dates).dt.hour # making feature column feature_cols = ['dow', 'hour'] x = df[feature_cols] df2['dow'] = df2.dayofweek.map(dow) y = df2['dow'] # columns in x , y don't match print(x.shape) print(y.shape) print(y.head()) print(x.head()) # knn classifier k = 5 my_knn_for_cs4661 = kneighborsclassifier(n_neighbors=k) my_knn_for_cs4661.fit(x, y) # knn (with k=5), decision tree accuracy y_predict = my_knn_for_cs4661.predict(x) print('\n') score = accuracy_score(y, y_predict) print("k=",k,"has ",score, "accuracy") results = pd.dataframe() results['actual'] = y results['prediction'] = y_predict print(results.head(10))
stack trace:
--------------------------------------------------------------------------- valueerror traceback (most recent call last) <ipython-input-11-5a002c1fd668> in <module>() 7 k = 5 8 my_knn_for_cs4661 = kneighborsclassifier(n_neighbors=k) ----> 9 my_knn_for_cs4661.fit(x, y) 10 #knn (with k=5), decision tree accuracy 11 y_predict = my_knn_for_cs4661.predict(x) c:\users\michael\anaconda3\lib\site-packages\sklearn\neighbors\base.py in fit(self, x, y) 776 """ 777 if not isinstance(x, (kdtree, balltree)): --> 778 x, y = check_x_y(x, y, "csr", multi_output=true) 779 780 if y.ndim == 1 or y.ndim == 2 , y.shape[1] == 1: c:\users\michael\anaconda3\lib\site-packages\sklearn\utils\validation.py in check_x_y(x, y, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator) 518 y = y.astype(np.float64) 519 --> 520 check_consistent_length(x, y) 521 522 return x, y c:\users\michael\anaconda3\lib\site-packages\sklearn\utils\validation.py in check_consistent_length(*arrays) 174 if len(uniques) > 1: 175 raise valueerror("found arrays inconsistent numbers of samples: " --> 176 "%s" % str(uniques)) 177 178 valueerror: found arrays inconsistent numbers of samples: [878049 884262]
check shape of x , y using x.shape. stack trace says have different no of instances(no of samples) in x , y. why fit function throwing valueerror.
refer documentation states:
"""fit model using x training data , y target values parameters ---------- x : {array-like, sparse matrix, balltree, kdtree} training data. if array or matrix, shape [n_samples, n_features], or [n_samples, n_samples] if metric='precomputed'. y : {array-like, sparse matrix} target values, array of float values, shape = [n_samples] or [n_samples, n_outputs] """
in simple words,
x (878049, 2) -> n_samples = 878049 , n_features = 2 y (884262,) -> here, n_samples = 884262
you passing target values. reduce no of target values in y. n_samples x 878049, must pass same number of target values(878049).
you can try:
my_knn_for_cs4661.fit(x, y[:878049])
refer : sklearn error valueerror: input contains nan, infinity or value large dtype('float64')
accepted answer states: "the dimensions of input array skewed, input csv had empty spaces."
check source file.
Comments
Post a Comment