rhondamuse.com

Time Series Data Transformations for Better Forecasting

Written on

When dealing with time series data, it is essential to identify ways to simplify patterns by eliminating superfluous variation. A consistent pattern enhances the accuracy of future forecasts.

Calendar variations must also be considered, as the number of days in each month can lead to discrepancies. For instance, the graph illustrating historical monthly crime rates in Baltimore reveals that while the grey line shows monthly totals, the red time series presents a more uniform pattern after removing calendar effects through daily averages. The adjusted data offers a clearer representation of underlying trends, devoid of distortions due to calendar-related variability. This adjustment is crucial for improving the accuracy of forecasting models, preventing them from being misled by irregularities.

library(readr) library(dplyr) library(lubridate) library(tsibble)

crime <- readr::read_csv("../data/baltimore_crime.csv")

# Aggregate by days tb_crime <- tibble(crime) %>%

select(CrimeDate) %>%

group_by(CrimeDate) %>%

summarise(total = n()) %>%

arrange(CrimeDate) %>%

mutate(CrimeDate = as_date(CrimeDate, format="%m/%d/%Y")) %>%

filter(between(year(CrimeDate), 2011, 2015))

ts_crime <- tb_crime %>% as_tsibble(index = CrimeDate)

######## Monthly Average ######## ts_crime_monthly_avg <- ts_crime %>%

index_by(Month = ~ floor_date(.x, "month")) %>%

filter(between(year(Month), 2011, 2015)) %>%

summarise(Monthly_Total = sum(total)) %>%

mutate(Avg_perDay = Monthly_Total / Num_Days)

Adjusting time series data for population allows for per-capita figures, providing normalized data that enables more accurate comparisons across time or geographic regions. This adjustment helps reveal genuine trends by considering population growth, as increases in raw numbers may merely indicate population growth rather than actual consumption increases. Such normalization is vital for informed decision-making in policy, economic assessments, and understanding societal trends.

library(readr) library(dplyr) library(tsibble)

auto <- readr::read_csv('../data/us_car_reg.csv', col_names = c('year', 'total')) us_pop <- readr::read_csv('../data/us_pop.csv', col_names = c('year', 'total'))

auto_tsb <- tsibble(auto, index = year) us_pop_tsb <- tsibble(us_pop, index = year)

auto_pop <- left_join(auto_tsb, us_pop_tsb, by="year")

auto_pop <- auto_pop %>%

rename(car_regs_total = total.x, population = total.y) %>%

mutate(cars_per_1000 = (car_regs_total / population) * 1000)

For financial time series analysis involving dollar amounts, adjusting for inflation is crucial to examine patterns in real terms rather than nominal terms. By removing the inflationary effects, one can achieve a clearer view of economic trends.

To adjust for inflation, select a Consumer Price Index (CPI) and gather its historical data. The Bureau of Labor Statistics in the U.S. provides this information, and the FRED database is a reliable source for economic indicators.

Revenue adjusted to the price level of 2017 can be calculated using the following formula:

library(readr) library(dplyr)

# Load dataset watch_sales <- readr::read_csv("../data/watch_sales.csv")

# Adjust for inflation using CPI index from 2017 watch_sales_adjusted <- watch_sales %>%

mutate(revenue_in_2017_prices = (nominal / cpi) * 240.01)

To enhance the predictive accuracy of a model, it is essential to ensure that the data is stationary with stable variance over time. The air passenger traffic dataset from Kaggle illustrates such fluctuations, where the variance in the number of passengers increases over time.

The Box-Cox transformation employs a parameter known as lambda to stabilize variance. For the Passengers dataset, I utilized the Guerrero method to determine an optimal lambda value. This derived lambda is then applied to transform the data, resulting in more consistent variance.

library(readr) library(tsibble) library(dplyr) library(feasts)

air <- readr::read_csv("../data/air_passengers.csv")

air <- air %>%

mutate(Month = as.Date(paste0(Month, "-01"), format="%Y-%m-%d")) %>%

as_tsibble(index = Month)

# Apply Guerrero method to derive lambda lambda <- air %>%

features(Passengers, features = guerrero) %>%

pull(lambda_guerrero)

# Apply Box-Cox transformation air <- air %>%

mutate(Passengers_bc = box_cox(Passengers, lambda))

For those interested in diving deeper into the Box-Cox transformation, I recommend reading the article by Egor Howell.

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

Exploring Capture the Flag: A Journey Through TryHackMe and HTB

Discover the world of Capture the Flag events through TryHackMe and HackTheBox, and learn how to enhance your cybersecurity skills.

Exploring Telepathic Connections with Animals and Children

Discover the fascinating world of telepathic communication with pets and children, exploring real-life experiences and insights.

# Understanding Memory Loss: Insights into Aging and Cognition

This article explores memory loss associated with aging, examining the science behind it and offering strategies for cognitive health.

Mastering Data Serialization in Python: Day 88 Insights

Explore data serialization in Python, focusing on JSON, XML, and YAML for effective data handling.

Maximize Your Blogging Output with This Essential Writing Tip

Discover a powerful tip for writing blog posts efficiently while maintaining quality.

Exploring GATTACA: A Reflection on Genetics and Ethics

A deep dive into the themes of GATTACA, examining genetics, ethics, and the impact of biotechnological advancements.

Embracing Freedom: The Courage to Leave a Toxic Home

Leaving a toxic household can be daunting, but choosing peace and healing is essential for personal growth and well-being.

Understanding the Target Audience of Psychologists: Insights and Trends

This piece delves into the importance of understanding target audiences in psychology, highlighting historical context, key figures, and future developments.