Pandas: substitute NA's with shadow values obtained from contemporary non missing values in other column -
suppose have following prices, asset x:
2004 nan 2005 nan 2006 246313.490770 2007 245557.678822 2008 nan 2009 nan 2010 246221.051892
the values of x missing of years.
and have price comparable asset, asset y (asset y never has nan prices):
2004 249008.038563 2005 248558.125114 2006 249644.313926 2007 249381.243924 2008 248739.371177 2009 249514.093672 2010 251120.103454
for every nan in price of asset x (top matrix), replace nan last non nan value of asset x multiplied growth rate of y (botom matrix) in intermediate years.
so example substitution value in 2009 be:
245557.678822/249381.243924*249514.093672
(value 2007 of x times growth rate of y between 2007 , 2009)
the values 2004 , 2005 remain nan's though.
i can code using ugly nested loops wondering if there elegant 2 liners in pandas (i have dataframe several columns x.1, x.2,... scattered missing values , column y no missing values)
i think need first divide div
both columns, forward fill nan
ffill
, last multiplicate mul
:
print (x) col1 2004 nan 2005 nan 2006 246313.490770 2007 245557.678822 2008 nan 2009 nan 2010 246221.051892 print (y) col1 2004 249008.038563 2005 248558.125114 2006 249644.313926 2007 249381.243924 2008 248739.371177 2009 249514.093672 2010 251120.103454
print (x.col1.div(y.col1)) 2004 nan 2005 nan 2006 0.986658 2007 0.984668 2008 nan 2009 nan 2010 0.980491 name: col1, dtype: float64 print (x.col1.div(y.col1).ffill()) 2004 nan 2005 nan 2006 0.986658 2007 0.984668 2008 0.984668 2009 0.984668 2010 0.980491 name: col1, dtype: float64 x['new'] = (x.col1.div(y.col1).ffill().mul(y.col1)) print (x) col1 new 2004 nan nan 2005 nan nan 2006 246313.490770 246313.490770 2007 245557.678822 245557.678822 2008 nan 244925.647401 2009 nan 245688.491690 2010 246221.051892 246221.051892 print (245557.678822/249381.243924*249514.093672) 245688.4916900315
Comments
Post a Comment