How to Utilize Time Series in Pandas?

Time series data represents observations recorded over time intervals and is crucial for analyzing trends, patterns, and temporal relationships. Pandas provides comprehensive functionality for working with time series data, from basic manipulation to advanced analysis and visualization.

Creating Sample Time Series Data

Let's start by creating sample time series data to demonstrate the concepts ?

import pandas as pd
import numpy as np
from datetime import datetime, timedelta

# Create sample time series data
dates = pd.date_range('2023-01-01', periods=100, freq='D')
values = np.random.randn(100).cumsum() + 100

data = pd.DataFrame({
    'value': values
}, index=dates)

print(data.head())
print(f"\nData shape: {data.shape}")
print(f"Index type: {type(data.index)}")
                value
2023-01-01  99.496714
2023-01-02  98.358148
2023-01-03  99.706946
2023-01-04  98.302979
2023-01-05  99.146402

Data shape: (100, 1)
Index type: <class 'pandas.core.indexes.datetimes.DatetimeIndex'>

Converting Strings to DateTime Index

When working with time series data from files, you often need to convert date columns to proper datetime format ?

# Create DataFrame with string dates
data_str = pd.DataFrame({
    'date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04'],
    'sales': [100, 120, 95, 110]
})

print("Before conversion:")
print(data_str.dtypes)

# Convert date column to datetime and set as index
data_str['date'] = pd.to_datetime(data_str['date'])
data_str.set_index('date', inplace=True)

print("\nAfter conversion:")
print(data_str.dtypes)
print(data_str.head())
Before conversion:
date     object
sales     int64
dtype: object

After conversion:
sales    int64
dtype: object
            sales
date             
2023-01-01    100
2023-01-02    120
2023-01-03     95
2023-01-04    110

Indexing and Slicing Time Series

Pandas provides intuitive ways to select data based on time periods ?

# Create larger sample dataset
dates = pd.date_range('2022-01-01', '2023-12-31', freq='D')
ts_data = pd.DataFrame({
    'value': np.random.randn(len(dates)).cumsum() + 100
}, index=dates)

# Select data for specific year
year_2023 = ts_data['2023']
print(f"Year 2023 data points: {len(year_2023)}")

# Select specific date range
jan_2023 = ts_data['2023-01-01':'2023-01-31']
print(f"January 2023 data points: {len(jan_2023)}")

# Select by month across all years
march_data = ts_data[ts_data.index.month == 3]
print(f"All March data points: {len(march_data)}")

print("\nFirst few rows of January 2023:")
print(jan_2023.head())
Year 2023 data points: 365
January 2023 data points: 31
All March data points: 62

First few rows of January 2023:
                value
2023-01-01  98.875932
2023-01-02  99.543839
2023-01-03  99.178784
2023-01-04  99.177369
2023-01-05  98.871758

Handling Missing Values in Time Series

Time series often have missing values that need proper handling ?

# Create time series with missing values
dates = pd.date_range('2023-01-01', periods=10, freq='D')
values_with_nan = [10, np.nan, 12, 13, np.nan, np.nan, 16, 17, np.nan, 19]

ts_missing = pd.DataFrame({
    'value': values_with_nan
}, index=dates)

print("Original data with missing values:")
print(ts_missing)

# Forward fill
ts_ffill = ts_missing.copy()
ts_ffill['value'] = ts_ffill['value'].ffill()
print("\nAfter forward fill:")
print(ts_ffill)

# Interpolation
ts_interp = ts_missing.copy()
ts_interp['value'] = ts_interp['value'].interpolate()
print("\nAfter interpolation:")
print(ts_interp)
Original data with missing values:
            value
2023-01-01   10.0
2023-01-02    NaN
2023-01-03   12.0
2023-01-04   13.0
2023-01-05    NaN
2023-01-06    NaN
2023-01-07   16.0
2023-01-08   17.0
2023-01-09    NaN
2023-01-10   19.0

After forward fill:
            value
2023-01-01   10.0
2023-01-02   10.0
2023-01-03   12.0
2023-01-04   13.0
2023-01-05   13.0
2023-01-06   13.0
2023-01-07   16.0
2023-01-08   17.0
2023-01-09   17.0
2023-01-10   19.0

After interpolation:
            value
2023-01-01   10.0
2023-01-02   11.0
2023-01-03   12.0
2023-01-04   13.0
2023-01-05   14.0
2023-01-06   15.0
2023-01-07   16.0
2023-01-08   17.0
2023-01-09   18.0
2023-01-10   19.0

Resampling Time Series Data

Resampling allows you to change the frequency of your time series data ?

# Create daily data
daily_dates = pd.date_range('2023-01-01', periods=30, freq='D')
daily_data = pd.DataFrame({
    'sales': np.random.randint(50, 150, 30)
}, index=daily_dates)

print("Daily data (first 7 days):")
print(daily_data.head(7))

# Resample to weekly frequency
weekly_sum = daily_data.resample('W').sum()
weekly_mean = daily_data.resample('W').mean()

print("\nWeekly sum:")
print(weekly_sum)

print("\nWeekly mean:")
print(weekly_mean.round(2))

# Resample to monthly frequency
monthly_stats = daily_data.resample('M').agg({
    'sales': ['sum', 'mean', 'max', 'min']
})

print("\nMonthly statistics:")
print(monthly_stats.round(2))
Daily data (first 7 days):
            sales
2023-01-01    106
2023-01-02     71
2023-01-03     89
2023-01-04     73
2023-01-05    146
2023-01-06     50
2023-01-07     88

Weekly sum:
            sales
2023-01-08    623
2023-01-15    722
2023-01-22    685
2023-01-29    659
2023-02-05     88

Weekly mean:
              sales
2023-01-08    89.00
2023-01-15   103.14
2023-01-22    97.86
2023-01-29    94.14
2023-02-05    88.00

Monthly statistics:
           sales           
             sum  mean   max min
2023-01-31  2777  89.6   149  50

Basic Time Series Analysis

Perform basic statistical analysis on time series data ?

# Create sample time series
dates = pd.date_range('2023-01-01', periods=365, freq='D')
trend = np.linspace(100, 120, 365)
seasonal = 10 * np.sin(2 * np.pi * np.arange(365) / 365.25 * 4)
noise = np.random.normal(0, 2, 365)
ts_values = trend + seasonal + noise

ts_analysis = pd.DataFrame({
    'value': ts_values
}, index=dates)

# Calculate rolling statistics
ts_analysis['rolling_mean_7'] = ts_analysis['value'].rolling(window=7).mean()
ts_analysis['rolling_std_7'] = ts_analysis['value'].rolling(window=7).std()

print("Time series with rolling statistics (first 10 rows):")
print(ts_analysis.head(10).round(2))

# Calculate monthly aggregates
monthly_agg = ts_analysis.resample('M').agg({
    'value': ['mean', 'std', 'min', 'max']
}).round(2)

print("\nMonthly aggregates:")
print(monthly_agg.head())
Time series with rolling statistics (first 10 rows):
              value  rolling_mean_7  rolling_std_7
2023-01-01   102.48             NaN            NaN
2023-01-02    95.61             NaN            NaN
2023-01-03    96.70             NaN            NaN
2023-01-04   102.58             NaN            NaN
2023-01-05    96.45             NaN            NaN
2023-01-06   101.59             NaN            NaN
2023-01-07    99.14           99.22           3.11
2023-01-08    98.53           98.67           2.49
2023-01-09    95.42           98.54           2.83
2023-01-10    99.33           99.04           2.38

Monthly aggregates:
            value                    
             mean   std    min    max
2023-01-31  99.71  4.04  88.92  108.59
2023-02-28  99.30  3.86  91.14  108.12
2023-03-31 101.42  4.21  91.05  111.76
2023-04-30 107.49  3.83  97.85  115.48
2023-05-31 110.15  3.67 100.43  118.73

Comparison of Time Series Methods

Operation Method Use Case Result
Missing Values ffill() Forward fill missing values Uses last valid observation
Missing Values interpolate() Estimate missing values Linear interpolation between points
Frequency Change resample('W') Convert to weekly data Aggregated weekly values
Frequency Change resample('M') Convert to monthly data Aggregated monthly values

Conclusion

Pandas provides powerful tools for time series analysis, from basic indexing and slicing to advanced resampling and statistical operations. These capabilities make it easy to analyze temporal data, handle missing values, and extract meaningful insights from time-based datasets.

Updated on: 2026-03-27T08:46:40+05:30

289 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements