Time series analysis involves analyzing and modeling data that is indexed and ordered based on time. Pandas and NumPy provide powerful tools for working with time series data in Python.
Overview on Pandas and NumPy
Here’s an overview of how Pandas and NumPy can be used for time series analysis:
- Date and Time Handling:
- Pandas has a built-in DatetimeIndex class, which provides efficient indexing and manipulation of time series data.
- You can create a DatetimeIndex using various methods, such as specifying the range of dates, parsing dates from strings, or converting other time representations to dates.
- Data Alignment:
- Pandas allows you to align time series data from different sources or with different frequencies.
- You can use the resample() function to resample data to a lower or higher frequency, such as converting daily data to monthly or vice versa.
- Indexing and Slicing:
- Pandas provides convenient methods for indexing and slicing time series data based on dates or time intervals.
- You can use date-based indexing with the loc[] accessor or select time intervals using the between_time() or truncate() methods.
- Time-based Operations:
- Pandas supports various time-based operations, such as shifting data forward or backward in time, calculating time differences, or generating date ranges.
- You can use the shift() function to shift data values, or the diff() function to calculate differences between consecutive values.
- Rolling and Expanding Windows:
- Pandas allows you to compute rolling statistics, such as moving averages or rolling sums, using the rolling() function.
- The expanding() function enables you to calculate expanding statistics that accumulate values over time.
- Time Series Plotting:
- Pandas integrates with popular plotting libraries like Matplotlib and Seaborn to create visualizations of time series data.
- You can use the plot() function to create line plots, bar plots, scatter plots, or other types of visualizations.
- Time Series Analysis:
- Pandas provides statistical and analytical functions for time series analysis, including autocorrelation, decomposition, and stationarity testing.
- You can use the autocorr() function to compute autocorrelation, or the seasonal_decompose() function to decompose time series into trend, seasonal, and residual components.
- Time Series Modeling:
- NumPy and Pandas can be used together with other libraries like StatsModels or scikit-learn for time series modeling and forecasting.
- You can apply various models, such as ARIMA, SARIMA, or machine learning algorithms, to build predictive models based on time series data.
Dates and Times
Generate series of time:
A series of time can be generated using ‘date_range’ command. In below code, ‘periods’ is the total number of samples; whereas freq = ‘M’ represents that series must be generated based on ‘Month’.
By default, pandas consider ‘M’ as end of the month. Use ‘MS’ for start of the month. Similarly, other
options are also available for day (‘D’), business days (‘B’) and hours (‘H’) etc.
import pandas as pd
import numpy as np
rng = pd.date_range('2023-05-14 10:15', periods = 10, freq = 'M')
rng
DatetimeIndex(['2023-05-31 10:15:00', '2023-06-30 10:15:00',
'2023-07-31 10:15:00', '2023-08-31 10:15:00',
'2023-09-30 10:15:00', '2023-10-31 10:15:00',
'2023-11-30 10:15:00', '2023-12-31 10:15:00',
'2024-01-31 10:15:00', '2024-02-29 10:15:00'],
dtype='datetime64[ns]', freq='M')
Similarly, we can generate the time series using ‘start’ and ‘end’ parameters as below.
rng = pd.date_range(start = '2023 Jul 2 10:15', end = '2023 July 12', freq = '12H')
rng
DatetimeIndex(['2023-07-02 10:15:00', '2023-07-02 22:15:00',
'2023-07-03 10:15:00', '2023-07-03 22:15:00',
'2023-07-04 10:15:00', '2023-07-04 22:15:00',
'2023-07-05 10:15:00', '2023-07-05 22:15:00',
'2023-07-06 10:15:00', '2023-07-06 22:15:00',
'2023-07-07 10:15:00', '2023-07-07 22:15:00',
'2023-07-08 10:15:00', '2023-07-08 22:15:00',
'2023-07-09 10:15:00', '2023-07-09 22:15:00',
'2023-07-10 10:15:00', '2023-07-10 22:15:00',
'2023-07-11 10:15:00', '2023-07-11 22:15:00'],
dtype='datetime64[ns]', freq='12H')
Time zone can be specified for generating the series.
rng = pd.date_range(start = '2023 Jul 2 10:15', end = '2023 July 12', freq = '12H', tz='Asia/Kolkata')
rng
DatetimeIndex(['2023-07-02 10:15:00+05:30', '2023-07-02 22:15:00+05:30',
'2023-07-03 10:15:00+05:30', '2023-07-03 22:15:00+05:30',
'2023-07-04 10:15:00+05:30', '2023-07-04 22:15:00+05:30',
'2023-07-05 10:15:00+05:30', '2023-07-05 22:15:00+05:30',
'2023-07-06 10:15:00+05:30', '2023-07-06 22:15:00+05:30',
'2023-07-07 10:15:00+05:30', '2023-07-07 22:15:00+05:30',
'2023-07-08 10:15:00+05:30', '2023-07-08 22:15:00+05:30',
'2023-07-09 10:15:00+05:30', '2023-07-09 22:15:00+05:30',
'2023-07-10 10:15:00+05:30', '2023-07-10 22:15:00+05:30',
'2023-07-11 10:15:00+05:30', '2023-07-11 22:15:00+05:30'],
dtype='datetime64[ns, Asia/Kolkata]', freq='12H')
type(rng[0])
pandas._libs.tslibs.timestamps.Timestamp
Convert string to dates
Dates in string formats can be converted into time stamp using ‘to_datetime’ option as below.
dd = ['07/07/2023', '08/12/2023', '12/04/2023']
# American style
list(pd.to_datetime(dd))
[Timestamp('2023-07-07 00:00:00'),
Timestamp('2023-08-12 00:00:00'),
Timestamp('2023-12-04 00:00:00')]
Periods
Periods represents the time span e.g. days, years, quarter or month etc. Period class in pandas allows us to convert the frequency easily.
Generating periods and frequency conversion.
In following code, period is generated using ‘Period’ command with frequency ‘M’. Note that, when we use ‘asfreq’ operation with ‘start’ operation the date is ‘01’ where as it is ‘31’ with ‘end’ option.
pr = pd.Period('2023', freq='M')
pr.asfreq('D', 'start')
Period('2023-01-01', 'D')
pr.asfreq('D', 'end')
Period arithmetic
We can perform various arithmetic operation on periods. All the operations will be performed based on ‘freq’.
pr = pd.Period('2023', freq='A') # Annual
pr
Period('2023', 'A-DEC')
pr + 1
Period('2024', 'A-DEC')
Creating period range
A range of periods can be created using ‘period_range’ command.
prg = pd.period_range('2010', '2015', freq='A')
prg
PeriodIndex(['2010', '2011', '2012', '2013', '2014', '2015'], dtype='period[A-DEC]')
Convert periods to timestamps
prd.to_timestamp()
Time offsets
Time offset can be defined as follows. Further we can perform various operations on time as well e.g. adding and subtracting etc.
# generate time offset
pd.Timedelta('3 days')
Timedelta('3 days 00:00:00')
pd.Timedelta('4 days 3M')
Conclusion
By leveraging the functionalities of Pandas and NumPy, you can efficiently manipulate, analyze, visualize, and model time series data. These libraries provide a comprehensive toolkit for handling the complexities of time series analysis in Python, making it easier to extract valuable insights and make data-driven decisions based on time-based data.