Title: | Routines for Descriptive and Model-Based APC Analysis |
---|---|
Description: | Age-Period-Cohort (APC) analyses are used to differentiate relevant drivers for long-term developments. The 'APCtools' package offers visualization techniques and general routines to simplify the workflow of an APC analysis. Sophisticated functions are available both for descriptive and regression model-based analyses. For the former, we use density (or ridgeline) matrices and (hexagonally binned) heatmaps as innovative visualization techniques building on the concept of Lexis diagrams. Model-based analyses build on the separation of the temporal dimensions based on generalized additive models, where a tensor product interaction surface (usually between age and period) is utilized to represent the third dimension (usually cohort) on its diagonal. Such tensor product surfaces can also be estimated while accounting for further covariates in the regression model. See Weigert et al. (2021) <doi:10.1177/1354816620987198> for methodological details. |
Authors: | Alexander Bauer [aut, cre] , Maximilian Weigert [aut] , Hawre Jalal [aut] |
Maintainer: | Alexander Bauer <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.6 |
Built: | 2024-11-26 04:38:19 UTC |
Source: | https://github.com/bauer-alex/apctools |
Internal helper function that is called in plot_density
to
calculate the density of a metric variable. If plot_density
is called
from within plot_densityMatrix
(i.e., when some of the columns
c("age_group","period_group","cohort_group")
are part of the dataset,
the density is computed individually for all respective APC groups.
calc_density(dat, y_var, weights_var = NULL, ...)
calc_density(dat, y_var, weights_var = NULL, ...)
dat |
Dataset with columns |
y_var |
Character name of the main variable to be plotted. |
weights_var |
Optional character name of a weights variable used to project the results in the sample to some population. |
... |
Additional arguments passed to |
Dataset with the calculated densities.
Internal helper function to capitalize the first letter of a character value. The use case is to create a plot label like 'Age' from a variable name like 'age'.
capitalize_firstLetter(char)
capitalize_firstLetter(char)
char |
Character value whose first letter should be capitalized |
Internal helper function to add lower and upper confidence boundaries pointwise
compute_marginalAPCeffects(dat, model, variable, plot_CI = FALSE)
compute_marginalAPCeffects(dat, model, variable, plot_CI = FALSE)
dat |
Dataset containing predicted effects for a grid of all APC dimensions and covariates used in the model. |
model |
|
variable |
One of |
plot_CI |
Indicator if 95% confidence intervals for marginal APC effects should be computed. Defaults to FALSE. |
If the model was estimated with a log or logit link, the function automatically performs an exponential transformation of the effect.
Internal helper function to be called in plot_APChexamap
,
to tilt the x-axis for the hexamap plot.
compute_xCoordinate(period_vec)
compute_xCoordinate(period_vec)
period_vec |
Numeric vector of period values. |
Internal helper function to be called in plot_APChexamap
,
to tilt the x-axis for the hexamap plot.
compute_yCoordinate(period_vec, age_vec)
compute_yCoordinate(period_vec, age_vec)
period_vec |
Numeric vector of period values. |
age_vec |
Numeric vector of age values. |
Create a table to summarize the overall effect strengths of the age, period
and cohort effects for models fitted with gam
or
bam
. The output format can be adjusted by passing
arguments to kable
via the ...
argument.
create_APCsummary( model_list, dat, digits = 2, apc_range = NULL, kable = TRUE, ... )
create_APCsummary( model_list, dat, digits = 2, apc_range = NULL, kable = TRUE, ... )
model_list |
A list of regression models estimated with
|
dat |
Dataset with columns |
digits |
Number of digits for numeric columns. Defaults to 2. |
apc_range |
Optional list with one or multiple elements with names |
kable |
Should the output be a table in kable style? Defaults to
|
... |
Optional additional arguments passed to |
If the model was estimated with a log or logit link, the function automatically performs an exponential transformation of the effect.
Table created with kable
.
Alexander Bauer [email protected]
library(APCtools) library(mgcv) data(travel) # create the summary table for one model model_pure <- gam(mainTrip_distance ~ te(age, period), data = travel) create_APCsummary(model_pure, dat = travel) # create the summary table for multiple models model_cov <- gam(mainTrip_distance ~ te(age, period) + s(household_income), data = travel) model_list <- list("pure model" = model_pure, "covariate model" = model_cov) create_APCsummary(model_list, dat = travel)
library(APCtools) library(mgcv) data(travel) # create the summary table for one model model_pure <- gam(mainTrip_distance ~ te(age, period), data = travel) create_APCsummary(model_pure, dat = travel) # create the summary table for multiple models model_cov <- gam(mainTrip_distance ~ te(age, period) + s(household_income), data = travel) model_list <- list("pure model" = model_pure, "covariate model" = model_cov) create_APCsummary(model_list, dat = travel)
Internal helper function to create a group variable based on the
categorization of either age, period or cohort. To be called from within
plot_densityMatrix
.
create_groupVariable(dat, APC_var, groups_list)
create_groupVariable(dat, APC_var, groups_list)
dat |
Dataset with a column |
APC_var |
One of |
groups_list |
A list with each element specifying the borders of one
row or column in the density matrix. E.g., if the period should be visualized
in decade columns from 1980 to 2009, specify
|
Vector for the grouping that can be added as additional column to the data.
Internal helper function to create a dataset for ggplot2
that can
be used to highlight specific diagonals in a density matrix.
create_highlightDiagonalData(dat, highlight_diagonals)
create_highlightDiagonalData(dat, highlight_diagonals)
dat |
Dataset with columns |
highlight_diagonals |
Optional internal parameter which is only
specified when |
Create publication-ready summary tables of all linear and nonlinear effects
for models fitted with gam
or bam
.
The output format of the tables can be adjusted by passing arguments to
kable
via the ...
argument.
create_modelSummary( model_list, digits = 2, method_expTransform = "simple", ... )
create_modelSummary( model_list, digits = 2, method_expTransform = "simple", ... )
model_list |
list of APC models |
digits |
number of displayed digits |
method_expTransform |
One of |
... |
additional arguments to |
If the model was estimated with a log or logit link, the function automatically performs an exponential transformation of the effects.
The table for linear coefficients includes the estimated coefficient
(coef
), the corresponding standard error (se
), lower and upper
limits of 95% confidence intervals (CI_lower
, CI_upper
) and
the p-values for all coefficients apart from the intercept.
The table for nonlinear coefficients include the estimated degrees of freedom
(edf
) and the p-value for each estimate.
List of tables created with kable
.
Alexander Bauer [email protected]
library(APCtools) library(mgcv) data(travel) model <- gam(mainTrip_distance ~ te(age, period) + residence_region + household_size + s(household_income), data = travel) create_modelSummary(list(model), dat = travel)
library(APCtools) library(mgcv) data(travel) model <- gam(mainTrip_distance ~ te(age, period) + residence_region + household_size + s(household_income), data = travel) create_modelSummary(list(model), dat = travel)
Internal helper function to be called in create_APCsummary
.
This function creates the summary table for one model estimated with
gam
or bam
.
create_oneAPCsummaryTable(model, dat, apc_range = NULL)
create_oneAPCsummaryTable(model, dat, apc_range = NULL)
model |
Optional regression model estimated with |
dat |
Dataset with columns |
apc_range |
Optional list with one or multiple elements with names
|
data.frame
containing aggregated information on the
individual effects.
Dataset on the number of unintentional drug overdose deaths in the United States for each age group between 1999 and 2019, retrieved from the CDC WONDER Online Database. The data only cover white men.
data(drug_deaths)
data(drug_deaths)
A dataframe containing
Calendar year
Age group.
Number of observed unintentional drug overdose deaths in the respective age group and calendar year.
Number of white men in the respective age group and calendar year in the U.S. population.
Drug overdose mortality rate for the respective age
group and calendar year, reported as the number of deaths per 100,000
people. Calculated as 100000 * deaths / population
.
The data were exported from the CDC WONDER Online Database (see link in references down below), based on the following settings:
Group by Year and by Single-Year Ages
Demographics: Gender Male; Ethnicity White
Cause of death: Drug / Alcohol Induced Causes. Then select the more specific category Drug poisonings (overdose) Unintentional (X40-X44).
Jalal, H., & Burke, D. S. (2020). Hexamaps for Age-Period-Cohort Data Visualization and Implementation in R. Epidemiology (Cambridge, Mass.), 31(6), e47. doi:10.1097/EDE.0000000000001236.
Centers for Disease Control and Prevention, National Center for Health Statistics. Underlying Cause of Death 1999-2019 on CDC WONDER Online Database, released in 2020. Data are from the Multiple Cause of Death Files, 1999-2019, as compiled from data provided by the 57 vital statistics jurisdictions through the Vital Statistics Cooperative Program. Accessed at wonder.cdc.gov/ucd-icd10.html on 6 December 2021.
Internal helper function to be called from within
gg_addReferenceLines
. This function takes the dataset prepared
for adding diagonal reference lines in the plot, checks if some diagonals
exceed the plot limits, cuts them accordingly, if necessary, and again
returns the corrected dataset.
ensure_segmentsInPlotRange(dat_segments, plot_dat)
ensure_segmentsInPlotRange(dat_segments, plot_dat)
dat_segments |
Dataset containing information on the diagonal reference lines. |
plot_dat |
Dataset used for creating the heatmap. |
Internal helper function to create a data.frame
containing the linear
effects summary of a model fitted with gam
or
bam
.
extract_summary_linearEffects(model, method_expTransform = "simple")
extract_summary_linearEffects(model, method_expTransform = "simple")
model |
|
method_expTransform |
One of |
If the model was estimated with a log or logit link, the function
automatically performs an exponential transformation of the effect,
see argument method_expTransform
.
Internal helper function to extract the values returned of
plot.gam
while suppressing creation of the plot.
get_plotGAMobject(model)
get_plotGAMobject(model)
model |
Internal helper function to add reference lines in an APC heatmap
(vertically, horizontally or diagonally). The function takes an existing list
of ggplot objects, adds the specified reference lines in each plot and
returns the edited ggplot list again. To be called from within
plot_APCheatmap
.
gg_addReferenceLines( gg_list, dimensions, plot_dat, markLines_list, markLines_displayLabels )
gg_addReferenceLines( gg_list, dimensions, plot_dat, markLines_list, markLines_displayLabels )
gg_list |
Existing list of ggplot objects where the reference lines should be marked in each individual ggplot. |
dimensions |
Character vector specifying the two APC dimensions that
should be visualized along the x-axis and y-axis. Defaults to
|
plot_dat |
Dataset used for creating the heatmap. |
markLines_list |
Optional list that can be used to highlight the borders
of specific age groups, time intervals or cohorts. Each element must be a
numeric vector of values where horizontal, vertical or diagonal lines should
be drawn (depends on which APC dimension is displayed on which axis).
The list can maximally have three elements and must have names out of
|
markLines_displayLabels |
Optional character vector defining for which
dimensions the lines defined through |
Internal helper function to highlight diagonals in a density matrix. The function takes an existing ggplot object, adds the diagonal highlighting and returns the edited ggplot object again.
gg_highlightDiagonals(gg, dat, dat_highlightDiagonals)
gg_highlightDiagonals(gg, dat, dat_highlightDiagonals)
gg |
Existing ggplot object to which the diagonal highlighting should be added. |
dat |
Dataset with columns |
dat_highlightDiagonals |
Dataset created by
|
gam
modelsPlots 1D smooth effects for a GAM model fitted with gam
or bam
.
plot_1Dsmooth( model, plot_ci = TRUE, select, alpha = 0.05, ylim = NULL, method_expTransform = "simple", return_plotData = FALSE )
plot_1Dsmooth( model, plot_ci = TRUE, select, alpha = 0.05, ylim = NULL, method_expTransform = "simple", return_plotData = FALSE )
model |
|
plot_ci |
If |
select |
Index of smooth term to be plotted. |
alpha |
|
ylim |
Optional limits of the y-axis. |
method_expTransform |
One of |
return_plotData |
If TRUE, the dataset prepared for plotting is returned. Defaults to FALSE. |
If the model was estimated with a log or logit link, the function
automatically performs an exponential transformation of the effect,
see argument method_expTransform
.
ggplot object
Alexander Bauer [email protected]
library(APCtools) library(mgcv) data(travel) model <- gam(mainTrip_distance ~ te(age, period) + residence_region + household_size + s(household_income), data = travel) plot_1Dsmooth(model, select = 2)
library(APCtools) library(mgcv) data(travel) model <- gam(mainTrip_distance ~ te(age, period) + residence_region + household_size + s(household_income), data = travel) plot_1Dsmooth(model, select = 2)
Plot the heatmap of an APC structure. The function can be used in two ways:
Either to plot the observed mean structure of a metric variable, by
specifying dat
and the variable y_var
, or by specifying
dat
and the model
object, to plot some mean structure
represented by an estimated two-dimensional tensor product surface. The model
must be estimated with gam
or bam
.
plot_APCheatmap( dat, y_var = NULL, model = NULL, dimensions = c("period", "age"), apc_range = NULL, bin_heatmap = TRUE, bin_heatmapGrid_list = NULL, markLines_list = NULL, markLines_displayLabels = c("age", "period", "cohort"), y_var_logScale = FALSE, plot_CI = TRUE, method_expTransform = "simple", legend_limits = NULL, legend_title = NULL )
plot_APCheatmap( dat, y_var = NULL, model = NULL, dimensions = c("period", "age"), apc_range = NULL, bin_heatmap = TRUE, bin_heatmapGrid_list = NULL, markLines_list = NULL, markLines_displayLabels = c("age", "period", "cohort"), y_var_logScale = FALSE, plot_CI = TRUE, method_expTransform = "simple", legend_limits = NULL, legend_title = NULL )
dat |
Dataset with columns |
y_var |
Optional character name of a metric variable to be plotted. |
model |
Optional regression model estimated with |
dimensions |
Character vector specifying the two APC dimensions that
should be visualized along the x-axis and y-axis. Defaults to
|
apc_range |
Optional list with one or multiple elements with names
|
bin_heatmap , bin_heatmapGrid_list
|
|
markLines_list |
Optional list that can be used to highlight the borders
of specific age groups, time intervals or cohorts. Each element must be a
numeric vector of values where horizontal, vertical or diagonal lines should
be drawn (depends on which APC dimension is displayed on which axis).
The list can maximally have three elements and must have names out of
|
markLines_displayLabels |
Optional character vector defining for which
dimensions the lines defined through |
y_var_logScale |
Indicator if |
plot_CI |
Indicator if the confidence intervals should be plotted.
Only used if |
method_expTransform |
One of |
legend_limits |
Optional numeric vector passed as argument |
legend_title |
Optional character legend title. |
See also plot_APChexamap
to plot a hexagonal heatmap with
adapted axes.
If the plot is created based on the model
object and the model was
estimated with a log or logit link, the function automatically performs an
exponential transformation of the effect.
Plot grid created with ggarrange
(if
plot_CI
is TRUE) or a ggplot2
object (if plot_CI
is
FALSE).
Alexander Bauer [email protected], Maximilian Weigert [email protected]
Weigert, M., Bauer, A., Gernert, J., Karl, M., Nalmpatian, A., Küchenhoff, H., and Schmude, J. (2021). Semiparametric APC analysis of destination choice patterns: Using generalized additive models to quantify the impact of age, period, and cohort on travel distances. Tourism Economics. doi:10.1177/1354816620987198.
plot_APChexamap
library(APCtools) library(mgcv) data(travel) # variant A: plot observed mean structures # observed heatmap plot_APCheatmap(dat = travel, y_var = "mainTrip_distance", bin_heatmap = FALSE, y_var_logScale = TRUE) # with binning plot_APCheatmap(dat = travel, y_var = "mainTrip_distance", bin_heatmap = TRUE, y_var_logScale = TRUE) # variant B: plot some smoothed, estimated mean structure model <- gam(mainTrip_distance ~ te(age, period) + residence_region + household_size + s(household_income), data = travel) # plot the smooth tensor product surface plot_APCheatmap(dat = travel, model = model, bin_heatmap = FALSE, plot_CI = FALSE) # ... same plot including the confidence intervals plot_APCheatmap(dat = travel, model = model, bin_heatmap = FALSE) # the APC dimensions can be flexibly assigned to the x-axis and y-axis plot_APCheatmap(dat = travel, model = model, dimensions = c("age","cohort"), bin_heatmap = FALSE, plot_CI = FALSE) # add some reference lines plot_APCheatmap(dat = travel, model = model, bin_heatmap = FALSE, plot_CI = FALSE, markLines_list = list(cohort = c(1910,1939,1955,1980))) # default binning of the tensor product surface in 5-year-blocks plot_APCheatmap(dat = travel, model = model, plot_CI = FALSE) # manual binning manual_binning <- list(period = seq(min(travel$period, na.rm = TRUE) - 1, max(travel$period, na.rm = TRUE), by = 5), cohort = seq(min(travel$period - travel$age, na.rm = TRUE) - 1, max(travel$period - travel$age, na.rm = TRUE), by = 10)) plot_APCheatmap(dat = travel, model = model, plot_CI = FALSE, bin_heatmapGrid_list = manual_binning)
library(APCtools) library(mgcv) data(travel) # variant A: plot observed mean structures # observed heatmap plot_APCheatmap(dat = travel, y_var = "mainTrip_distance", bin_heatmap = FALSE, y_var_logScale = TRUE) # with binning plot_APCheatmap(dat = travel, y_var = "mainTrip_distance", bin_heatmap = TRUE, y_var_logScale = TRUE) # variant B: plot some smoothed, estimated mean structure model <- gam(mainTrip_distance ~ te(age, period) + residence_region + household_size + s(household_income), data = travel) # plot the smooth tensor product surface plot_APCheatmap(dat = travel, model = model, bin_heatmap = FALSE, plot_CI = FALSE) # ... same plot including the confidence intervals plot_APCheatmap(dat = travel, model = model, bin_heatmap = FALSE) # the APC dimensions can be flexibly assigned to the x-axis and y-axis plot_APCheatmap(dat = travel, model = model, dimensions = c("age","cohort"), bin_heatmap = FALSE, plot_CI = FALSE) # add some reference lines plot_APCheatmap(dat = travel, model = model, bin_heatmap = FALSE, plot_CI = FALSE, markLines_list = list(cohort = c(1910,1939,1955,1980))) # default binning of the tensor product surface in 5-year-blocks plot_APCheatmap(dat = travel, model = model, plot_CI = FALSE) # manual binning manual_binning <- list(period = seq(min(travel$period, na.rm = TRUE) - 1, max(travel$period, na.rm = TRUE), by = 5), cohort = seq(min(travel$period - travel$age, na.rm = TRUE) - 1, max(travel$period - travel$age, na.rm = TRUE), by = 10)) plot_APCheatmap(dat = travel, model = model, plot_CI = FALSE, bin_heatmapGrid_list = manual_binning)
Plot the heatmap of an APC structure using a hexagon-based plot with adapted
axes. In this way, the one temporal dimension that is represented by the
diagonal structure is visually not underrepresented compared to the other two
dimensions on the x-axis and y-axis.
The function can be used in two ways: Either to plot the observed mean
structure of a metric variable, by specifying dat
and the variable
y_var
, or by specifying dat
and the model
object, to
plot some mean structure represented by an estimated two-dimensional tensor
product surface. The model must be estimated with gam
or
bam
.
plot_APChexamap( dat, y_var = NULL, model = NULL, apc_range = NULL, y_var_logScale = FALSE, obs_interval = 1, iso_interval = 5, color_vec = NULL, color_range = NULL, line_width = 0.5, line_color = gray(0.5), label_size = 0.5, label_color = "black", legend_title = NULL )
plot_APChexamap( dat, y_var = NULL, model = NULL, apc_range = NULL, y_var_logScale = FALSE, obs_interval = 1, iso_interval = 5, color_vec = NULL, color_range = NULL, line_width = 0.5, line_color = gray(0.5), label_size = 0.5, label_color = "black", legend_title = NULL )
dat |
Dataset with columns |
y_var |
Optional character name of a metric variable to be plotted. |
model |
Optional regression model estimated with |
apc_range |
Optional list with one or multiple elements with names
|
y_var_logScale |
Indicator if |
obs_interval |
Numeric specifying the interval width based on which the
data is spaced. Only used if |
iso_interval |
Numeric specifying the interval width between the isolines along each axis. Defaults to 5. |
color_vec |
Optional character vector of color names, specifying the color continuum. |
color_range |
Optional numeric vector with two elements, specifying the ends of the color scale in the legend. |
line_width |
Line width of the isolines. Defaults to 0.5. |
line_color |
Character color name for the isolines. Defaults to gray. |
label_size |
Size of the labels along the axes. Defaults to 0.5. |
label_color |
Character color name for the labels along the axes. |
legend_title |
Optional character title for the legend. |
See also plot_APCheatmap
to plot a regular heatmap.
If the plot is created based on the model
object and the model was
estimated with a log or logit link, the function automatically performs an
exponential transformation of the effect.
Creates a plot with base R functions (not ggplot2
).
Hawre Jalal [email protected], Alexander Bauer [email protected]
Jalal, H., Burke, D. (2020). Hexamaps for Age–Period–Cohort Data Visualization and Implementation in R. Epidemiology, 31 (6), e47-e49. doi: 10.1097/EDE.0000000000001236.
library(APCtools) library(mgcv) library(dplyr) data(drug_deaths) # restrict to data where the mortality rate is available drug_deaths <- drug_deaths %>% filter(!is.na(mortality_rate)) # hexamap of an observed structure plot_APChexamap(dat = drug_deaths, y_var = "mortality_rate", color_range = c(0,40)) # hexamap of a smoothed structure model <- gam(mortality_rate ~ te(age, period, bs = "ps", k = c(8,8)), data = drug_deaths) plot_APChexamap(dat = drug_deaths, model = model)
library(APCtools) library(mgcv) library(dplyr) data(drug_deaths) # restrict to data where the mortality rate is available drug_deaths <- drug_deaths %>% filter(!is.na(mortality_rate)) # hexamap of an observed structure plot_APChexamap(dat = drug_deaths, y_var = "mortality_rate", color_range = c(0,40)) # hexamap of a smoothed structure model <- gam(mortality_rate ~ te(age, period, bs = "ps", k = c(8,8)), data = drug_deaths) plot_APChexamap(dat = drug_deaths, model = model)
Create a density plot or a boxplot of one metric variable or a barplot of one categorical variable, based on a specific subset of the data.
plot_density( dat, y_var, plot_type = "density", apc_range = NULL, highlight_diagonals = NULL, y_var_cat_breaks = NULL, y_var_cat_labels = NULL, weights_var = NULL, log_scale = FALSE, xlab = NULL, ylab = NULL, legend_title = NULL, ... )
plot_density( dat, y_var, plot_type = "density", apc_range = NULL, highlight_diagonals = NULL, y_var_cat_breaks = NULL, y_var_cat_labels = NULL, weights_var = NULL, log_scale = FALSE, xlab = NULL, ylab = NULL, legend_title = NULL, ... )
dat |
Dataset with columns |
y_var |
Character name of the main variable to be plotted. |
plot_type |
One of |
apc_range |
Optional list with one or multiple elements with names
|
highlight_diagonals |
Optional internal parameter which is only
specified when |
y_var_cat_breaks |
Optional numeric vector of breaks to categorize
|
y_var_cat_labels |
Optional character vector for the names of the
categories that were defined based on |
weights_var |
Optional character name of a weights variable used to project the results in the sample to some population. |
log_scale |
Indicator if the main variable should be log10 transformed.
Only used if the |
xlab , ylab , legend_title
|
Optional plot annotations. |
... |
Additional arguments passed to |
If plot_density
is called internally from within
plot_densityMatrix
(i.e., if the dataset contains some of the
columns c("age_group","period_group","cohort_group")
), this function
will calculate the metric densities individually for these groups.
ggplot object
Alexander Bauer [email protected], Maximilian Weigert [email protected]
library(APCtools) data(travel) plot_density(dat = travel, y_var = "mainTrip_distance") plot_density(dat = travel, y_var = "mainTrip_distance")
library(APCtools) data(travel) plot_density(dat = travel, y_var = "mainTrip_distance") plot_density(dat = travel, y_var = "mainTrip_distance")
Internal helper function to plot one categorical density, to be called from
within plot_density
.
plot_density_categorical( dat, y_var, dat_highlightDiagonals = NULL, weights_var = NULL, xlab = NULL, ylab = NULL )
plot_density_categorical( dat, y_var, dat_highlightDiagonals = NULL, weights_var = NULL, xlab = NULL, ylab = NULL )
dat |
Dataset with columns |
y_var |
Character name of the main variable to be plotted. |
dat_highlightDiagonals |
Optional dataset created by
|
weights_var |
Optional character name of a weights variable used to project the results in the sample to some population. |
xlab , ylab
|
Optional plot annotations. |
Internal helper function to plot one metric density, to be called from within
plot_density
.
plot_density_metric( dat, y_var, plot_type = "density", dat_highlightDiagonals = NULL, y_var_cat_breaks = NULL, y_var_cat_labels = NULL, weights_var = NULL, log_scale = FALSE, xlab = NULL, ylab = NULL, legend_title = NULL, ... )
plot_density_metric( dat, y_var, plot_type = "density", dat_highlightDiagonals = NULL, y_var_cat_breaks = NULL, y_var_cat_labels = NULL, weights_var = NULL, log_scale = FALSE, xlab = NULL, ylab = NULL, legend_title = NULL, ... )
dat |
Dataset with columns |
y_var |
Character name of the main variable to be plotted. |
plot_type |
One of |
dat_highlightDiagonals |
Optional dataset created by
|
y_var_cat_breaks |
Optional numeric vector of breaks to categorize
|
y_var_cat_labels |
Optional character vector for the names of the
categories that were defined based on |
weights_var |
Optional character name of a weights variable used to project the results in the sample to some population. |
log_scale |
Indicator if the main variable should be log10 transformed.
Only used if the |
xlab , ylab , legend_title
|
Optional plot annotations. |
... |
Additional arguments passed to |
This function creates a matrix of individual density plots (i.e., a ridgeline matrix) or boxplots (for metric variables) or of individual barplots (for categorical variables). The age, period or cohort information can each either be plotted on the x-axis or the y-axis.
plot_densityMatrix( dat, y_var, dimensions = c("period", "age"), age_groups = NULL, period_groups = NULL, cohort_groups = NULL, plot_type = "density", highlight_diagonals = NULL, y_var_cat_breaks = NULL, y_var_cat_labels = NULL, weights_var = NULL, log_scale = FALSE, legend_title = NULL, ... )
plot_densityMatrix( dat, y_var, dimensions = c("period", "age"), age_groups = NULL, period_groups = NULL, cohort_groups = NULL, plot_type = "density", highlight_diagonals = NULL, y_var_cat_breaks = NULL, y_var_cat_labels = NULL, weights_var = NULL, log_scale = FALSE, legend_title = NULL, ... )
dat |
Dataset with columns |
y_var |
Character name of the main variable to be plotted. |
dimensions |
Character vector specifying the two APC dimensions that
should be visualized along the x-axis and y-axis. Defaults to
|
age_groups , period_groups , cohort_groups
|
Each a list. Either containing
purely scalar values or with each element specifying the two borders of one
row or column in the density matrix. E.g., if the period should be visualized
in decade columns from 1980 to 2009, specify
|
plot_type |
One of |
highlight_diagonals |
Optional list to define diagonals in the density that should be highlighted with different colors. Each list element should be a numeric vector stating the index of the diagonals (counted from the top left) that should be highlighted in the same color. If the list is named, the names are used as legend labels. |
y_var_cat_breaks |
Optional numeric vector of breaks to categorize
|
y_var_cat_labels |
Optional character vector for the names of the
categories that were defined based on |
weights_var |
Optional character name of a weights variable used to project the results in the sample to some population. |
log_scale |
Indicator if the main variable should be log10 transformed.
Only used if the |
legend_title |
Optional plot annotation. |
... |
Additional arguments passed to |
ggplot object
Alexander Bauer [email protected], Maximilian Weigert [email protected]
Weigert, M., Bauer, A., Gernert, J., Karl, M., Nalmpatian, A., Küchenhoff, H., and Schmude, J. (2021). Semiparametric APC analysis of destination choice patterns: Using generalized additive models to quantify the impact of age, period, and cohort on travel distances. Tourism Economics. doi:10.1177/1354816620987198.
library(APCtools) # define categorizations for the main trip distance dist_cat_breaks <- c(1,500,1000,2000,6000,100000) dist_cat_labels <- c("< 500 km","500 - 1,000 km", "1,000 - 2,000 km", "2,000 - 6,000 km", "> 6,000 km") age_groups <- list(c(80,89),c(70,79),c(60,69),c(50,59),c(40,49),c(30,39),c(20,29)) period_groups <- list(c(1970,1979),c(1980,1989),c(1990,1999),c(2000,2009),c(2010,2019)) cohort_groups <- list(c(1980,1989),c(1970,1979),c(1960,1969),c(1950,1959),c(1940,1949), c(1930,1939),c(1920,1929)) plot_densityMatrix(dat = travel, y_var = "mainTrip_distance", age_groups = age_groups, period_groups = period_groups, log_scale = TRUE) # highlight two cohorts plot_densityMatrix(dat = travel, y_var = "mainTrip_distance", age_groups = age_groups, period_groups = period_groups, highlight_diagonals = list(8, 10), log_scale = TRUE) # also mark different distance categories plot_densityMatrix(dat = travel, y_var = "mainTrip_distance", age_groups = age_groups, period_groups = period_groups, log_scale = TRUE, y_var_cat_breaks = dist_cat_breaks, y_var_cat_labels = dist_cat_labels, highlight_diagonals = list(8, 10), legend_title = "Distance category") # flexibly assign the APC dimensions to the x-axis and y-axis plot_densityMatrix(dat = travel, y_var = "mainTrip_distance", dimensions = c("period","cohort"), period_groups = period_groups, cohort_groups = cohort_groups, log_scale = TRUE, y_var_cat_breaks = dist_cat_breaks, y_var_cat_labels = dist_cat_labels, legend_title = "Distance category") # use boxplots instead of densities plot_densityMatrix(dat = travel, y_var = "mainTrip_distance", plot_type = "boxplot", age_groups = age_groups, period_groups = period_groups, log_scale = TRUE, highlight_diagonals = list(8, 10)) # plot categorical variables instead of metric ones plot_densityMatrix(dat = travel, y_var = "household_size", age_groups = age_groups, period_groups = period_groups, highlight_diagonals = list(8, 10))
library(APCtools) # define categorizations for the main trip distance dist_cat_breaks <- c(1,500,1000,2000,6000,100000) dist_cat_labels <- c("< 500 km","500 - 1,000 km", "1,000 - 2,000 km", "2,000 - 6,000 km", "> 6,000 km") age_groups <- list(c(80,89),c(70,79),c(60,69),c(50,59),c(40,49),c(30,39),c(20,29)) period_groups <- list(c(1970,1979),c(1980,1989),c(1990,1999),c(2000,2009),c(2010,2019)) cohort_groups <- list(c(1980,1989),c(1970,1979),c(1960,1969),c(1950,1959),c(1940,1949), c(1930,1939),c(1920,1929)) plot_densityMatrix(dat = travel, y_var = "mainTrip_distance", age_groups = age_groups, period_groups = period_groups, log_scale = TRUE) # highlight two cohorts plot_densityMatrix(dat = travel, y_var = "mainTrip_distance", age_groups = age_groups, period_groups = period_groups, highlight_diagonals = list(8, 10), log_scale = TRUE) # also mark different distance categories plot_densityMatrix(dat = travel, y_var = "mainTrip_distance", age_groups = age_groups, period_groups = period_groups, log_scale = TRUE, y_var_cat_breaks = dist_cat_breaks, y_var_cat_labels = dist_cat_labels, highlight_diagonals = list(8, 10), legend_title = "Distance category") # flexibly assign the APC dimensions to the x-axis and y-axis plot_densityMatrix(dat = travel, y_var = "mainTrip_distance", dimensions = c("period","cohort"), period_groups = period_groups, cohort_groups = cohort_groups, log_scale = TRUE, y_var_cat_breaks = dist_cat_breaks, y_var_cat_labels = dist_cat_labels, legend_title = "Distance category") # use boxplots instead of densities plot_densityMatrix(dat = travel, y_var = "mainTrip_distance", plot_type = "boxplot", age_groups = age_groups, period_groups = period_groups, log_scale = TRUE, highlight_diagonals = list(8, 10)) # plot categorical variables instead of metric ones plot_densityMatrix(dat = travel, y_var = "household_size", age_groups = age_groups, period_groups = period_groups, highlight_diagonals = list(8, 10))
This function creates a joint plot of the marginal APC effects of multiple estimated models. It creates a plot with one pane per age, period and cohort effect, each containing one lines for each estimated model.
plot_jointMarginalAPCeffects( model_list, dat, vlines_list = NULL, ylab = NULL, ylim = NULL, plot_CI = FALSE )
plot_jointMarginalAPCeffects( model_list, dat, vlines_list = NULL, ylab = NULL, ylim = NULL, plot_CI = FALSE )
model_list |
A list of regression models estimated with
|
dat |
Dataset with columns |
vlines_list |
Optional list that can be used to highlight the borders of
specific age groups, time intervals or cohorts. Each element must be a
numeric vector of values on the x-axis where vertical lines should be drawn.
The list can maximally have three elements and must have names out of
|
ylab , ylim
|
Optional ggplot2 styling arguments. |
plot_CI |
Indicator if 95% confidence intervals should be plotted. Defaults to FALSE. |
If the model was estimated with a log or logit link, the function automatically performs an exponential transformation of the effect.
Since the plot output created by the function is no ggplot2
object,
but an object created with ggpubr::ggarrange
, the overall theme
of the plot cannot be changed by adding the theme in the form of
'plot_jointMarginalAPCeffects(...) + theme_minimal(...)
'.
Instead, you can call theme_set(theme_minimal(...))
as an individual
call before calling plot_jointMarginalAPCeffects(...)
. The latter
function will then use this global plotting theme.
Plot grid created with ggarrange
.
Alexander Bauer [email protected], Maximilian Weigert [email protected]
library(APCtools) library(mgcv) data(travel) # plot marginal effects of one model model_pure <- gam(mainTrip_distance ~ te(age, period), data = travel) plot_jointMarginalAPCeffects(model_pure, dat = travel) # plot marginal effects of multiple models model_cov <- gam(mainTrip_distance ~ te(age, period) + s(household_income), data = travel) model_list <- list("pure model" = model_pure, "covariate model" = model_cov) plot_jointMarginalAPCeffects(model_list, dat = travel) # mark specific cohorts plot_jointMarginalAPCeffects(model_list, dat = travel, vlines_list = list("cohort" = c(1966.5,1982.5,1994.5)))
library(APCtools) library(mgcv) data(travel) # plot marginal effects of one model model_pure <- gam(mainTrip_distance ~ te(age, period), data = travel) plot_jointMarginalAPCeffects(model_pure, dat = travel) # plot marginal effects of multiple models model_cov <- gam(mainTrip_distance ~ te(age, period) + s(household_income), data = travel) model_list <- list("pure model" = model_pure, "covariate model" = model_cov) plot_jointMarginalAPCeffects(model_list, dat = travel) # mark specific cohorts plot_jointMarginalAPCeffects(model_list, dat = travel, vlines_list = list("cohort" = c(1966.5,1982.5,1994.5)))
Create an effect plot of linear effects of a model fitted with
gam
or bam
.
plot_linearEffects( model, variables = NULL, return_plotData = FALSE, refCat = FALSE, ... )
plot_linearEffects( model, variables = NULL, return_plotData = FALSE, refCat = FALSE, ... )
model |
|
variables |
Optional character vector of variable names specifying which effects should be plotted. The order of the vector corresponds to the order in the effect plot. If the argument is not specified, all linear effects are plotted according to the order of their appearance in the model output. |
return_plotData |
If TRUE, the dataset prepared for plotting is returned. Defaults to FALSE. |
refCat |
If TRUE, reference categories are added to the output for categorical covariates. Defaults to FALSE. |
... |
Additional arguments passed to
|
If the model was estimated with a log or logit link, the function automatically performs an exponential transformation of the effect.
ggplot object
Alexander Bauer [email protected]
library(APCtools) library(mgcv) data(travel) model <- gam(mainTrip_distance ~ te(age, period) + residence_region + household_size + s(household_income), data = travel) plot_linearEffects(model)
library(APCtools) library(mgcv) data(travel) model <- gam(mainTrip_distance ~ te(age, period) + residence_region + household_size + s(household_income), data = travel) plot_linearEffects(model)
Plot the marginal effect of age, period or cohort, based on an APC model
estimated as a semiparametric additive regression model with gam
or bam
.
This function is a simple wrapper to plot_partialAPCeffects
,
called with argument hide_partialEffects = TRUE
.
plot_marginalAPCeffects( model, dat, variable = "age", vlines_vec = NULL, plot_CI = FALSE, return_plotData = FALSE )
plot_marginalAPCeffects( model, dat, variable = "age", vlines_vec = NULL, plot_CI = FALSE, return_plotData = FALSE )
model |
Optional regression model estimated with |
dat |
Dataset with columns |
variable |
One of |
vlines_vec |
Optional numeric vector of values on the x-axis where vertical lines should be drawn. Can be used to highlight the borders of specific age groups, time intervals or cohorts. |
plot_CI |
Indicator if 95% confidence intervals should be plotted. Defaults to FALSE. |
return_plotData |
If TRUE, a list of the datasets prepared for plotting
is returned instead of the ggplot object. The list contains one dataset each
for the overall effect (= evaluations of the APC surface to plot the partial
effects) and for each marginal APC effect (no matter the specified value of
the argument |
ggplot object
Alexander Bauer [email protected], Maximilian Weigert [email protected]
Weigert, M., Bauer, A., Gernert, J., Karl, M., Nalmpatian, A., Küchenhoff, H., and Schmude, J. (2021). Semiparametric APC analysis of destination choice patterns: Using generalized additive models to quantify the impact of age, period, and cohort on travel distances. Tourism Economics. doi:10.1177/1354816620987198.
library(APCtools) library(mgcv) data(travel) model <- gam(mainTrip_distance ~ te(age, period), data = travel) plot_marginalAPCeffects(model, dat = travel, variable = "age") # mark specific cohorts plot_marginalAPCeffects(model, dat = travel, variable = "cohort", vlines_vec = c(1966.5,1982.5,1994.5))
library(APCtools) library(mgcv) data(travel) model <- gam(mainTrip_distance ~ te(age, period), data = travel) plot_marginalAPCeffects(model, dat = travel, variable = "age") # mark specific cohorts plot_marginalAPCeffects(model, dat = travel, variable = "cohort", vlines_vec = c(1966.5,1982.5,1994.5))
Create the partial APC plots based on an APC model estimated as a semiparametric
additive regression model with gam
or bam
.
plot_partialAPCeffects( model, dat, variable = "age", hide_partialEffects = FALSE, vlines_vec = NULL, plot_CI = FALSE, return_plotData = FALSE )
plot_partialAPCeffects( model, dat, variable = "age", hide_partialEffects = FALSE, vlines_vec = NULL, plot_CI = FALSE, return_plotData = FALSE )
model |
Optional regression model estimated with |
dat |
Dataset with columns |
variable |
One of |
hide_partialEffects |
If TRUE, only the marginal effect will be plotted. Defaults to FALSE. |
vlines_vec |
Optional numeric vector of values on the x-axis where vertical lines should be drawn. Can be used to highlight the borders of specific age groups, time intervals or cohorts. |
plot_CI |
Indicator if 95% confidence intervals for marginal APC effects
should be plotted. Only used if |
return_plotData |
If TRUE, a list of the datasets prepared for plotting
is returned instead of the ggplot object. The list contains one dataset each
for the overall effect (= evaluations of the APC surface to plot the partial
effects) and for each marginal APC effect (no matter the specified value of
the argument |
If the model was estimated with a log or logit link, the function automatically performs an exponential transformation of the effect.
ggplot object (if hide_partialEffects
is TRUE) or a plot grid
created with ggarrange
(if FALSE).
Alexander Bauer [email protected], Maximilian Weigert [email protected]
Weigert, M., Bauer, A., Gernert, J., Karl, M., Nalmpatian, A., Küchenhoff, H., and Schmude, J. (2021). Semiparametric APC analysis of destination choice patterns: Using generalized additive models to quantify the impact of age, period, and cohort on travel distances. Tourism Economics. doi:10.1177/1354816620987198.
library(APCtools) library(mgcv) data(travel) model <- gam(mainTrip_distance ~ te(age, period), data = travel) plot_partialAPCeffects(model, dat = travel, variable = "age") # mark specific cohorts plot_partialAPCeffects(model, dat = travel, variable = "cohort", vlines_vec = c(1966.5,1982.5,1994.5))
library(APCtools) library(mgcv) data(travel) model <- gam(mainTrip_distance ~ te(age, period), data = travel) plot_partialAPCeffects(model, dat = travel, variable = "age") # mark specific cohorts plot_partialAPCeffects(model, dat = travel, variable = "cohort", vlines_vec = c(1966.5,1982.5,1994.5))
Plot the distribution of one variable in the data against age, period or
cohort. Creates a bar plot for categorical variables (see argument
geomBar_position
) and boxplots or a line plot of median values for
metric variables (see plot_type
).
plot_variable( dat, y_var, apc_dimension = "period", log_scale = FALSE, plot_type = "boxplot", geomBar_position = "fill", legend_title = NULL, ylab = NULL, ylim = NULL )
plot_variable( dat, y_var, apc_dimension = "period", log_scale = FALSE, plot_type = "boxplot", geomBar_position = "fill", legend_title = NULL, ylab = NULL, ylim = NULL )
dat |
Dataset containing columns |
y_var |
Character name of the variable to plot. |
apc_dimension |
One of |
log_scale |
Indicator if the visualized variable should be log10 transformed. Only used if the variable is numeric. Defaults to FALSE. |
plot_type |
One of |
geomBar_position |
Value passed to |
legend_title |
Optional character title for the legend which is drawn for categorical variables. |
ylab , ylim
|
Optional arguments for styling the ggplot. |
ggplot object
Alexander Bauer [email protected]
library(APCtools) data(travel) # plot a metric variable plot_variable(dat = travel, y_var = "mainTrip_distance", apc_dimension = "period", log_scale = TRUE) plot_variable(dat = travel, y_var = "mainTrip_distance", apc_dimension = "period", log_scale = TRUE, plot_type = "line") # plot a categorical variable plot_variable(dat = travel, y_var = "household_size", apc_dimension = "period") plot_variable(dat = travel, y_var = "household_size", apc_dimension = "period", geomBar_position = "stack")
library(APCtools) data(travel) # plot a metric variable plot_variable(dat = travel, y_var = "mainTrip_distance", apc_dimension = "period", log_scale = TRUE) plot_variable(dat = travel, y_var = "mainTrip_distance", apc_dimension = "period", log_scale = TRUE, plot_type = "line") # plot a categorical variable plot_variable(dat = travel, y_var = "household_size", apc_dimension = "period") plot_variable(dat = travel, y_var = "household_size", apc_dimension = "period", geomBar_position = "stack")
This dataset from the Reiseanalyse survey comprises travel information on German travelers between 1971 and 2018. Data were collected in a yearly repeated cross-sectional survey of German pleasure travels, based on a sample representative for the (West) German citizens (until 2009) or for all German-speaking residents (starting 2010). Travelers from former East Germany are only included since 1990. Note that the sample only contains trips with at least five days of trip length. For details see Weigert et al. (2021).
data(travel)
data(travel)
A dataframe containing
Year in which the respondent traveled.
Age of the respondent.
Individual weight of each respondent to account for a not perfectly representative sample and project the sample results to the population of German citizens (until 2009) or of German-speaking residents (starting 2010). Only available since 1974.
Indicator if the respondent is German citizen or not. Only available since 2010. Until 2009, all respondents were German citizens.
Indicator if the respondent's main residence is in a federal state in the former area of West Germany or in the former area of East Germany.
Categorized size of the respondent's household.
Joint income (in €) of the respondent's household.
Categorized trip length of the respondent's main trip. The main trip is the trip which the respondent stated was his/her most important trip in the respective year.
Distance (in km) between the center of the respondent's federal state and the center of the country of destination, for the main trip. The main trip is the trip which the respondent stated was his/her most important trip in the respective year.
The data are a 10% random sample of all respondents who undertook at least one trip in the respective year, between 1971 and 2018. We thank the Forschungsgemeinschaft Urlaub und Reisen e.V. for allowing us to publish this sample.
Weigert, M., Bauer, A., Gernert, J., Karl, M., Nalmpatian, A., Küchenhoff, H., and Schmude, J. (2021). Semiparametric APC analysis of destination choice patterns: Using generalized additive models to quantify the impact of age, period, and cohort on travel distances. Tourism Economics. doi:10.1177/1354816620987198.
Forschungsgemeinschaft Urlaub und Reisen e.V. (FUR) (2020b) Survey of tourist demand in Germany for holiday travel and short breaks. Available at: https://reiseanalyse.de/wp-content/uploads/2022/11/RA2020_First-results_EN.pdf (accessed 13 January 2023).