Cheatsheet

There are many functions contained in this package and it can get annoying to have to keep checking documentation. The purpose of this article is to provide templates for the basic features of each function. That way, you can simply copy the template and edit where need be.

Installation and Preparations

This package comes pre-installed on the GVSU Posit Workbench. All you need to do is load the package to access its functions.

library(gvsu215)

Note: The Software Investigation starter program will always include a code chunk that loads the gvsu215 package that includes the functions we need for STA215.

Templates

The package’s functions can be broken down into a few general categories: utilities, tables, plots, and inference. The templates below are broken up by GVSU’s STA 215 textbook chapter and will include a generic example template. To evaluate, simply copy the code chunk (click on the copy icon that appears when you hover your mouse over the code chunk), double click on the “filler code” (the text in all capital letters and surrounded by underscores, _EXAMPLE_) and replace them with your respective code.

Utilities

Reading in Data

To read in data files, you will need to know the file path to the file. Then use

_DATANAME_ <- read.csv("_FILEPATH_", header = TRUE)

Note: The Software Investigation starter program will include the code to read in the data.

Subset Observations

_NEWDATANAME_ <- _OLDDATANAME_ %>% 
  filter(_CONDITION_)

Subset Variables

_NEWDATANAME_ <- _OLDDATANAME_ %>% 
  select(_VARSTOKEEP_)

Chapter 2: Categorical Data

Frequency Table

tbl_1var(_DATANAME_, ~_VARIABLE_)

Bar Graph Using Percent

plot_bar(_DATANAME_, ~_VARIABLE_, type = "percent", na_rm = FALSE)

Note: Change to na_rm = TRUE to eliminate missing values from plot.

Bar Graph Using Counts

plot_bar(_DATANAME_, ~_VARIABLE_, type = "count", na_rm = FALSE)

Note: Change to na_rm = TRUE to eliminate missing values from plot.

Two-Way Table

tbl_2var(_DATANAME_, _RESPONSE_~_EXPLANATORY_)

Clustered Bar Graph

plot_bar(_DATANAME_, ~_RESPONSE_, fill = ~_EXPLANATORY_, na_rm = FALSE)

Chapter 3: One Quantitative

Basic Numerical Summaries

tbl_num_sum(_DATANAME_, ~_VARIABLE_, na_rm = TRUE)

Percentile

tbl_pctile(_DATANAME_, ~_VARIABLE_, probs = c(_PERCENTILES_))

Note: Replace percentiles with the values you want separated by commas. For example, c(0.80, 0.90, 0.95)

Boxplot

plot_box(_DATANAME_, ~_VARIABLE_)

Histogram

plot_hist(_DATANAME_, ~_VARIABLE_)

Note: You can use breaks to control how many bars there are.

Basic Numerical Summaries By Group

tbl_num_sum(_DATANAME_, _RESPONSE_~_GROUPVARIABLE_, na_rm = TRUE)

Percentiles By Group

tbl_pctile(_DATANAME_, _RESPONSE_~_GROUPVARIABLE_)

Boxplot By Group

plot_box(_DATANAME_, _RESPONSE_~_GROUPVARIABLE_)

Histogram By Group

plot_hist(_DATANAME_, ~_VARIABLE_, group = ~_GROUPVARIABLE_)

Note: You can use breaks to control how many bars there are.

Chapter 5: Estimation

Note: Confidence levels default to 95% but can be overridden with conf_lvl = _DECIMAL_ (e.g., conf_lvl = 0.9).

Confidence Interval on $\mathrm{p}$

infer_1prop_int(_DATANAME_, ~_VARIABLE_, success = "_SUCCESSCATEGORY_", conf_lvl = _CONFIDENCELEVEL_)

Confidence Interval on μ

infer_1mean_int(_DATANAME_, ~_VARIABLE_, conf_lvl = _CONFIDENCELEVEL_)

Chapter 6: Two Quantitative

Scatterplot

plot_scatter(_DATANAME_, _RESPONSE_~_EXPLANATORY_, axis_lines = "none", ls_line = "hide")

Note: Change to axis_lines = "both" to grid the scatterplot. Note: Change to ls_line = "show" to plot the regression line.

Linear Correlation

tbl_corr(_DATANAME_, _RESPONSE_~_EXPLANATORY_, na_rm = TRUE)

Linear Regression

infer_reg(_DATANAME_, _RESPONSE_~_EXPLANATORY_, reduced = "yes")

Note: Change to reduced = "no" to get test statistic and p-value.

Scatterplot By Group

plot_scatter(_DATANAME_, _RESPONSE_~_EXPLANATORY_, fill = ~_GROUPVARIABLE_, legend_title = "_LEGEND_")

Chapter 7: Hypothesis Testing Introduction

$\chi ^2$ -Test

# standard test
infer_chisq(_DATANAME_, _EXPLANATORY_~_RESPONSE_, type = "test")

# expected counts
infer_chisq(_DATANAME_, _EXPLANATORY_~_RESPONSE_, type = "expected")

# observed counts
infer_chisq(_DATANAME_, _EXPLANATORY_~_RESPONSE_, type = "observed")

Confidence Interval for the Difference in Two Proportions

Note: Confidence levels default to 95% but can be overridden with conf_lvl = _DECIMAL_ (e.g., conf_lvl = 0.90).

infer_2prop_int(_DATANAME_, _RESPONSE_~_EXPLANATORY_, success = "_SUCCESSCATEGORY_", conf_lvl = _CONFIDENCELEVEL_)

Note: For this code to work the explanatory variable must only have two categories.

Chapter 8: Hypothesis Testing Means

Paired $\mathrm{T}$ -Test and Confidence Interval