python - predicting crime in san francisco, ValueError -

i ran error while trying project: valueerror: found arrays inconsistent numbers of samples: [878049 884262].

i when try run knn classifier @ bottom. i've been reading , know it's because x , y not same. shape x (878049, 2) , y (884262, ).

how can fix error match?

code:

# drop features wont using # train.head() df = train.drop(['descript', 'resolution', 'address'], axis=1)  df2 = test.drop(['address'], axis=1)  # trying see times during day particular crime occurs, example # rapes occur more 12am-4am during weekend. # example below dow = {     'monday':0,     'tuesday':1,     'wednesday':2,     'thursday':3,     'friday':4,     'saturday':5,     'sunday':6 } df['dow'] = df.dayofweek.map(dow)  # add column containing time of day df['hour'] = pd.to_datetime(df.dates).dt.hour  # making feature column feature_cols = ['dow', 'hour'] x = df[feature_cols]   df2['dow'] = df2.dayofweek.map(dow)   y = df2['dow']  # columns in x , y don't match print(x.shape) print(y.shape) print(y.head()) print(x.head())  # knn classifier k = 5 my_knn_for_cs4661 = kneighborsclassifier(n_neighbors=k) my_knn_for_cs4661.fit(x, y)  # knn (with k=5), decision tree accuracy y_predict = my_knn_for_cs4661.predict(x) print('\n') score = accuracy_score(y, y_predict)  print("k=",k,"has ",score, "accuracy") results = pd.dataframe() results['actual'] = y results['prediction'] = y_predict  print(results.head(10))

stack trace:

--------------------------------------------------------------------------- valueerror                                traceback (most recent call last) <ipython-input-11-5a002c1fd668> in <module>()       7 k = 5       8 my_knn_for_cs4661 = kneighborsclassifier(n_neighbors=k) ----> 9 my_knn_for_cs4661.fit(x, y)      10 #knn (with k=5), decision tree accuracy      11 y_predict = my_knn_for_cs4661.predict(x)  c:\users\michael\anaconda3\lib\site-packages\sklearn\neighbors\base.py in fit(self, x, y)     776         """     777         if not isinstance(x, (kdtree, balltree)): --> 778             x, y = check_x_y(x, y, "csr", multi_output=true)     779      780         if y.ndim == 1 or y.ndim == 2 , y.shape[1] == 1:  c:\users\michael\anaconda3\lib\site-packages\sklearn\utils\validation.py in check_x_y(x, y, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)     518         y = y.astype(np.float64)     519  --> 520     check_consistent_length(x, y)     521      522     return x, y  c:\users\michael\anaconda3\lib\site-packages\sklearn\utils\validation.py in check_consistent_length(*arrays)     174     if len(uniques) > 1:     175         raise valueerror("found arrays inconsistent numbers of samples: " --> 176                          "%s" % str(uniques))     177      178   valueerror: found arrays inconsistent numbers of samples: [878049 884262]

check shape of x , y using x.shape. stack trace says have different no of instances(no of samples) in x , y. why fit function throwing valueerror.

refer documentation states:

"""fit model using x training data , y target values         parameters         ----------         x : {array-like, sparse matrix, balltree, kdtree}             training data. if array or matrix, shape [n_samples, n_features],             or [n_samples, n_samples] if metric='precomputed'.         y : {array-like, sparse matrix}             target values, array of float values, shape = [n_samples]              or [n_samples, n_outputs]         """

in simple words,

x (878049, 2) -> n_samples  = 878049 , n_features = 2 y (884262,)  -> here, n_samples = 884262

you passing target values. reduce no of target values in y. n_samples x 878049, must pass same number of target values(878049).

you can try:

my_knn_for_cs4661.fit(x, y[:878049])

refer : sklearn error valueerror: input contains nan, infinity or value large dtype('float64')

accepted answer states: "the dimensions of input array skewed, input csv had empty spaces."

check source file.

Search This Blog

Alcombright

python - predicting crime in san francisco, ValueError -

Comments

Post a Comment

Popular posts from this blog

php - How to add and update images or image url in Volusion using Volusion API -

c# SetCompatibleTextRenderingDefault must be called before the first -

Laravel mail error `Swift_TransportException in StreamBuffer.php line 269: Connection could not be established with host smtp.gmail.com [ #0]` -