TimeSeries Resampling with binary variables

User 2493 | 10/27/2015, 9:15:30 AM

I am trying to resample a TimeSeries which has binary input. No NaNs or NULLs exist in the original TimeSeries, but after resampling I get quite a few None values. How is this possible? The command I use is

nrminutes=60
ts_resample = ts.resample(datetime.timedelta(minutes=nrminutes), downsample_method='avg', upsample_method='none')

I was talking of rounding the values afterward to ensure the output is all binary again.

Comments

User 15 | 10/27/2015, 9:30:53 PM

Hi,

Do you have consecutive entries in your TimeSeries with more than 60 minutes between their time values? If so, resample will use the upsample method to make sure there is an entry every 60 minutes. In your case, you chose "none" as the upsample method, so it's filling it with None. You can use "nearest" to propagate the other data values from the nearest data point time-wise, or "ffill" to use the closest previous data point, and "bfill" to use the closest data point after. For example, suppose I have a TimeSeries that looks like this:

<pre> In [12]: ts Out[12]: +---------------------+------------+------------+ | date | aoindex | timestamp | +---------------------+------------+------------+ | 1950-01-31 00:00:00 | -0.06031 | -628560000 | | 1950-02-28 00:00:00 | 0.62681 | -626140800 | | 1950-03-31 00:00:00 | -0.0081275 | -623462400 | | 1950-04-30 00:00:00 | 0.5551 | -620870400 | | 1950-05-31 00:00:00 | 0.071577 | -618192000 | | 1950-06-30 00:00:00 | 0.53857 | -615600000 | | 1950-07-31 00:00:00 | -0.80248 | -612921600 | | 1950-08-31 00:00:00 | -0.85101 | -610243200 | | 1950-09-30 00:00:00 | 0.35797 | -607651200 | | 1950-10-31 00:00:00 | -0.3789 | -604972800 | +---------------------+------------+------------+ [789 rows x 3 columns] Note: Only the head of the TimeSeries is printed. You can use printrows(numrows=m, numcolumns=n) to print more rows and columns. The index column of the TimeSeries is: date </pre>

Using the "none" upsample method will result in this, as you describe: <pre> In [13]: ts.resample(datetime.timedelta(minutes=60), downsamplemethod='avg', upsamplemethod='none') Out[13]: +---------------------+----------+--------------+ | date | aoindex | timestamp | +---------------------+----------+--------------+ | 1950-01-31 00:00:00 | -0.06031 | -628560000.0 | | 1950-01-31 01:00:00 | None | None | | 1950-01-31 02:00:00 | None | None | | 1950-01-31 03:00:00 | None | None | | 1950-01-31 04:00:00 | None | None | | 1950-01-31 05:00:00 | None | None | | 1950-01-31 06:00:00 | None | None | | 1950-01-31 07:00:00 | None | None | | 1950-01-31 08:00:00 | None | None | | 1950-01-31 09:00:00 | None | None | +---------------------+----------+--------------+ [575593 rows x 3 columns] Note: Only the head of the TimeSeries is printed. You can use printrows(numrows=m, numcolumns=n) to print more rows and columns. The index column of the TimeSeries is: date </pre>

and using "ffill" will result in this:

<pre> In [15]: ts.resample(datetime.timedelta(minutes=60), downsamplemethod='avg', upsamplemethod='ffill') Out[15]: +---------------------+----------+--------------+ | date | aoindex | timestamp | +---------------------+----------+--------------+ | 1950-01-31 00:00:00 | -0.06031 | -628560000.0 | | 1950-01-31 01:00:00 | -0.06031 | -628560000.0 | | 1950-01-31 02:00:00 | -0.06031 | -628560000.0 | | 1950-01-31 03:00:00 | -0.06031 | -628560000.0 | | 1950-01-31 04:00:00 | -0.06031 | -628560000.0 | | 1950-01-31 05:00:00 | -0.06031 | -628560000.0 | | 1950-01-31 06:00:00 | -0.06031 | -628560000.0 | | 1950-01-31 07:00:00 | -0.06031 | -628560000.0 | | 1950-01-31 08:00:00 | -0.06031 | -628560000.0 | | 1950-01-31 09:00:00 | -0.06031 | -628560000.0 | +---------------------+----------+--------------+ [575593 rows x 3 columns] Note: Only the head of the TimeSeries is printed. You can use printrows(numrows=m, numcolumns=n) to print more rows and columns. The index column of the TimeSeries is: date </pre>

Hope this helps.

Evan