There are many functions contained in this package and it can get
annoying to have to keep checking documentation. The purpose of this
article is to provide templates for the basic features of each function.
That way, you can simply copy the template and edit where need be.
Installation and Preparations
This package comes pre-installed on the GVSU Posit Workbench. All you
need to do is load the package to access its functions.
Note: The Software Investigation starter program will always include
a code chunk that loads the gvsu215
package that includes
the functions we need for STA215.
Templates
The package’s functions can be broken down into a few general
categories: utilities, tables, plots, and inference. The templates below
are broken up by GVSU’s STA 215 textbook chapter and will include a
generic example template. To evaluate, simply copy the code chunk (click
on the copy icon that appears when you hover your mouse over the code
chunk), double click on the “filler code” (the text in all capital
letters and surrounded by underscores, _EXAMPLE_
) and
replace them with your respective code.
Utilities
Reading in Data
To read in data files, you will need to know the file path to the
file. Then use
_DATANAME_ <- read.csv("_FILEPATH_", header = TRUE)
Note: The Software Investigation starter program will include the
code to read in the data.
Subset Observations
_NEWDATANAME_ <- _OLDDATANAME_ %>%
filter(_CONDITION_)
Subset Variables
_NEWDATANAME_ <- _OLDDATANAME_ %>%
select(_VARSTOKEEP_)
Chapter 2: Categorical Data
Frequency Table
tbl_1var(_DATANAME_, ~_VARIABLE_)
Bar Graph Using Percent
plot_bar(_DATANAME_, ~_VARIABLE_, type = "percent", na_rm = FALSE)
Note: Change to na_rm = TRUE
to eliminate missing values
from plot.
Bar Graph Using Counts
plot_bar(_DATANAME_, ~_VARIABLE_, type = "count", na_rm = FALSE)
Note: Change to na_rm = TRUE
to eliminate missing values
from plot.
Two-Way Table
tbl_2var(_DATANAME_, _RESPONSE_~_EXPLANATORY_)
Clustered Bar Graph
plot_bar(_DATANAME_, ~_RESPONSE_, fill = ~_EXPLANATORY_, na_rm = FALSE)
Chapter 3: One Quantitative
Basic Numerical Summaries
tbl_num_sum(_DATANAME_, ~_VARIABLE_, na_rm = TRUE)
Percentile
tbl_pctile(_DATANAME_, ~_VARIABLE_, probs = c(_PERCENTILES_))
Note: Replace percentiles with the values you want separated by
commas. For example, c(0.80, 0.90, 0.95)
Boxplot
plot_box(_DATANAME_, ~_VARIABLE_)
Histogram
plot_hist(_DATANAME_, ~_VARIABLE_)
Note: You can use breaks
to control how many bars there
are.
Basic Numerical Summaries By Group
tbl_num_sum(_DATANAME_, _RESPONSE_~_GROUPVARIABLE_, na_rm = TRUE)
Percentiles By Group
tbl_pctile(_DATANAME_, _RESPONSE_~_GROUPVARIABLE_)
Boxplot By Group
plot_box(_DATANAME_, _RESPONSE_~_GROUPVARIABLE_)
Histogram By Group
plot_hist(_DATANAME_, ~_VARIABLE_, group = ~_GROUPVARIABLE_)
Note: You can use breaks
to control how many bars there
are.
Chapter 5: Estimation
Note: Confidence levels default to 95% but can be overridden with
conf_lvl = _DECIMAL_
(e.g.,
conf_lvl = 0.9
).
Confidence Interval on
infer_1prop_int(_DATANAME_, ~_VARIABLE_, success = "_SUCCESSCATEGORY_", conf_lvl = _CONFIDENCELEVEL_)
Confidence Interval on μ
infer_1mean_int(_DATANAME_, ~_VARIABLE_, conf_lvl = _CONFIDENCELEVEL_)
Chapter 6: Two Quantitative
Scatterplot
plot_scatter(_DATANAME_, _RESPONSE_~_EXPLANATORY_, axis_lines = "none", ls_line = "hide")
Note: Change to axis_lines = "both"
to grid the
scatterplot. Note: Change to ls_line = "show"
to plot the
regression line.
Linear Correlation
tbl_corr(_DATANAME_, _RESPONSE_~_EXPLANATORY_, na_rm = TRUE)
Linear Regression
infer_reg(_DATANAME_, _RESPONSE_~_EXPLANATORY_, reduced = "yes")
Note: Change to reduced = "no"
to get test statistic and
p-value.
Scatterplot By Group
plot_scatter(_DATANAME_, _RESPONSE_~_EXPLANATORY_, fill = ~_GROUPVARIABLE_, legend_title = "_LEGEND_")
Chapter 7: Hypothesis Testing Introduction
-Test
# standard test
infer_chisq(_DATANAME_, _EXPLANATORY_~_RESPONSE_, type = "test")
# expected counts
infer_chisq(_DATANAME_, _EXPLANATORY_~_RESPONSE_, type = "expected")
# observed counts
infer_chisq(_DATANAME_, _EXPLANATORY_~_RESPONSE_, type = "observed")
Confidence Interval for the Difference in Two Proportions
Note: Confidence levels default to 95% but can be overridden with
conf_lvl = _DECIMAL_
(e.g.,
conf_lvl = 0.90
).
infer_2prop_int(_DATANAME_, _RESPONSE_~_EXPLANATORY_, success = "_SUCCESSCATEGORY_", conf_lvl = _CONFIDENCELEVEL_)
Note: For this code to work the explanatory variable must only have
two categories.
Chapter 8: Hypothesis Testing Means
Paired
-Test
and Confidence Interval
Note: Confidence levels default to 95% but can be overridden with
conf_lvl = _DECIMAL_
(e.g.,
conf_lvl = 0.90
).
infer_paired(_DATANAME_, var1 = ~_VARIABLE1_, var2 = ~_VARIABLE2_, conf_lvl = _CONFIDENCELEVEL_)
Use the mu0
argument to specify the value of the null
hypothesis, if different from 0. E.g., mu0 = 3
.
Independent
-Test
infer_2mean_test(_DATANAME_, _VARIABLE_~_GROUPVARIABLE_)
Use the null
argument to specify the value of the null
hypothesis, if different from 0. E.g., null = 3
.
Independent 2 Groups Confidence Interval
infer_2mean_int(_DATANAME_, _VARIABLE_~_GROUPVARIABLE_, conf_lvl = _CONFIDENCELEVEL_)
ANOVA
infer_anova(_DATANAME_, _VARIABLE_~_GROUPVARIABLE_)
Optional Functions / Topics
Hypothesis Test on
infer_1prop_test(_DATANAME_, ~_VARIABLE_, success = "_SUCCESSCATEGORY_", p0 = _NULL_)
Use the p0
argument to specify the value of the null
hypothesis, if different from 0.5. E.g., p0 = 0.3
.
Hypothesis Test on μ
infer_1mean_test(_DATANAME_, ~_VARIABLE_, mu0 = _NULL_)
Use the mu0
argument to specify the value of the null
hypothesis, if different from 0. E.g., p0 = 3
.