Title: | Data Cleaning for Psychological Analyses |
---|---|
Description: | Useful for preparing and cleaning data. It includes functions to center data, reverse coding, dummy code and effect code data, and more. |
Authors: | Jason Moy [aut, cre] |
Maintainer: | Jason Moy <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.1.1 |
Built: | 2025-02-04 04:55:34 UTC |
Source: | https://github.com/jasonmoy28/psyccleaning |
This function will compute grand-mean-centered scores.
center_grand_mean(data, cols, keep_original = TRUE)
center_grand_mean(data, cols, keep_original = TRUE)
data |
A data.frame or a data.frame extension (e.g. a tibble). |
cols |
Columns that need to be centered. See 'dplyr::dplyr_tidy_select' for available options. |
keep_original |
default is 'FALSE'. Set to 'TRUE' to keep original columns |
An object of the same type as .data. The output has the following properties: 1. Columns from .data will be preserved 2. Columns with scores that are grand-mean-centered.
center_grand_mean(iris,where(is.numeric))
center_grand_mean(iris,where(is.numeric))
This function will compute group-mean-centered scores.
center_group_mean(data, cols, group, keep_original = TRUE)
center_group_mean(data, cols, group, keep_original = TRUE)
data |
A data.frame or a data.frame extension (e.g. a tibble). |
cols |
Columns that need to be centered. See 'dplyr::dplyr_tidy_select' for available options. |
group |
character. grouping variable |
keep_original |
default is 'TRUE'. Set to 'FALSE' to remove original columns |
An object of the same type as .data. The output has the following properties: 1. Columns from .data will be preserved 2. Columns with scores that are group-mean centered
center_group_mean(iris,where(is.numeric), group = Species)
center_group_mean(iris,where(is.numeric), group = Species)
This function will group mean centered the scores at the level 1 and create a mean score for each group at L2.
center_mlm(data, cols, group, keep_original = TRUE)
center_mlm(data, cols, group, keep_original = TRUE)
data |
A data.frame or a data.frame extension (e.g. a tibble). |
cols |
Columns that need to be centered. See 'dplyr::dplyr_tidy_select' for available options. |
group |
the grouping variable. Must be character. |
keep_original |
default is 'TRUE'. Set to 'FALSE' to remove original columns |
An object of the same type as .data. The output has the following properties: 1. Columns from .data will be preserved 2. Columns with L1 scores that are group-mean centered. 3. Columns with L2 aggregated means.
center_mlm(iris,dplyr::ends_with('Length'),group = 'Species')
center_mlm(iris,dplyr::ends_with('Length'),group = 'Species')
The function will perform a row-wise aggregation which then divided by the total number of columns.
composite_score( data, cols = dplyr::everything(), na.rm = FALSE, composite_col_name = "composited_column" )
composite_score( data, cols = dplyr::everything(), na.rm = FALSE, composite_col_name = "composited_column" )
data |
A data.frame or a data.frame extension (e.g. a tibble). |
cols |
Columns that need to be composited See 'dplyr::dplyr_tidy_select' for available options. |
na.rm |
Ignore NA. The default is 'FALSE'. If set to 'TRUE', the composite score will be 'NA' if there is one or more 'NA' in any of the columns. |
composite_col_name |
Name for the new composited columns. Default is 'composite_column'. |
An object of the same type as .data. The output has the following properties: 1. Columns from .data will be preserved. 2. Columns with composited scores.
test_df = data.frame(col1 = c(1,2,3,4),col2 = c(1,2,3,4), col3 = c(1,2,NA,4)) composite_df = composite_score(data = test_df)
test_df = data.frame(col1 = c(1,2,3,4),col2 = c(1,2,3,4), col3 = c(1,2,NA,4)) composite_df = composite_score(data = test_df)
Create dummy-coded columns, supporting tidyselect syntax to process multiple columns simultaneously.
dummy_coding(data, cols)
dummy_coding(data, cols)
data |
data.frame object |
cols |
Columns that need to be dummy-coded See 'dplyr::dplyr_tidy_select' for available options. |
An object of the same type as .data. The output has the following properties: 1. Columns from .data will be preserved. 2. Columns that are dummy-coded.
dummy_coding(iris,Species)
dummy_coding(iris,Species)
Create effect-coded columns, supporting tidyselect syntax to process multiple columns simultaneously.
effect_coding(data, cols, factor = FALSE, ref_group = NULL)
effect_coding(data, cols, factor = FALSE, ref_group = NULL)
data |
A data.frame or a data.frame extension (e.g. a tibble). |
cols |
Columns that need to be effect-coded. See 'dplyr::dplyr_tidy_select' for available options. |
factor |
The default is 'FALSE'. If factor is set to 'TRUE', this function returns a tibble with effect-coded factors. If factor is set to 'FALSE', this function returns a tibble with effect-coded columns. |
ref_group |
Reference group. Optional. |
An object of the same type as .data. The output has the following properties: 1. Columns from .data will be preserved. 2. Columns that are effect-coded.
effect_coding(iris,Species)
effect_coding(iris,Species)
Perform listwise deletion (the entire rows is disregarded if the row has one 'NA' value)
listwise_deletion(data, cols = dplyr::everything())
listwise_deletion(data, cols = dplyr::everything())
data |
A data.frame or a data.frame extension (e.g. a tibble). |
cols |
Columns that need to use listwise deletion. See 'dplyr::dplyr_tidy_select' for available options. |
An object of the same type as .data with rows revmoed if the row has one 'NA' value
test_df = data.frame(col1 = c(1,2,3),col2 = c(1,NA,3),col3 = c(1,2,NA)) listwise_deletion(test_df,col1:col2) # you can see that the row with NA in col3 is not deleted
test_df = data.frame(col1 = c(1,2,3),col2 = c(1,NA,3),col3 = c(1,2,NA)) listwise_deletion(test_df,col1:col2) # you can see that the row with NA in col3 is not deleted
Classic data-set from Snijders, Tom A.B., and Bosker, Roel J. Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling, second edition.
mlbook_data
mlbook_data
A data frame with 3758 rows and 34 variables:
School ID
Student Identifier (Level 1 units)
Student language score
Student socioeconomic score, grand-mean centered (in points, M = 0))
Student verbal IQ, grand-mean centered (in points, M = 0)
Student binary gender, 1 = female, 0 = not female
Student minority status, 1 = minoritized, 0 = not minoritized
School-level religious denominations, 5 categories
Dummy coded sex
Effect-coded sex
Group-mean-centered of female_eff
Aggregated mean female_dum for each school
Z-scored aggregated mean female_dum for each school
Group-mean-centered SES
Z-scored group-mean-centered SES
Aggregated mean SES for each school
Z-scored aggregated mean SES for each school
https://www.stats.ox.ac.uk/~snijders/mlbook.htm
Recode values of a data frame
recode_item(data, cols, code_from = NULL, code_to = NULL, retain_code = NULL)
recode_item(data, cols, code_from = NULL, code_to = NULL, retain_code = NULL)
data |
A data.frame or a data.frame extension (e.g. a tibble). |
cols |
Columns that need to be recoded. See 'dplyr::dplyr_tidy_select' for available options. |
code_from |
vector. the order must match with vector for 'code_to' |
code_to |
vector. the order must match with vector for 'code_from' |
retain_code |
vector. Specify the values to be retain |
An object of the same type as .data. The output has the following properties: 1. Columns except the recoded columns from .data will be preserved 2. Recoded columns
pre_recoded_df = tibble::tibble(x1 = 1:5, x2 = 5:1) recoded_df = recode_item(pre_recoded_df, cols = dplyr::contains('x'), code_from = 1:5, code_to = 5:1)
pre_recoded_df = tibble::tibble(x1 = 1:5, x2 = 5:1) recoded_df = recode_item(pre_recoded_df, cols = dplyr::contains('x'), code_from = 1:5, code_to = 5:1)
It counts the number of missing (i.e.,'NA') values in each column.
summarize_missing_values( data, cols = dplyr::everything(), group = NULL, verbose = TRUE, return_result = FALSE )
summarize_missing_values( data, cols = dplyr::everything(), group = NULL, verbose = TRUE, return_result = FALSE )
data |
A data.frame or a data.frame extension (e.g. a tibble). |
cols |
Columns that need to be checked for missing values. See 'dplyr::dplyr_tidy_select' for available options. |
group |
character. count missing values by group. |
verbose |
default is 'TRUE'. Print the missing value data frame |
return_result |
default is 'FALSE'. Return 'data_frame' if set to yes |
An object of the same type as .data. that specified the number of NA values of the columns (only when 'return_result = TRUE')
df1 = data.frame(col1 = c(1,2,3),col2 = c(1,NA,3),col3 = c(1,2,NA)) summarize_missing_values(df1,everything())
df1 = data.frame(col1 = c(1,2,3),col2 = c(1,NA,3),col3 = c(1,2,NA)) summarize_missing_values(df1,everything())
This function will compute z-scores with respect to the grand mean.
z_scored_grand_mean(data, cols, keep_original = TRUE)
z_scored_grand_mean(data, cols, keep_original = TRUE)
data |
A data.frame or a data.frame extension (e.g. a tibble). |
cols |
Columns that need to be centered. See 'dplyr::dplyr_tidy_select' for available options. |
keep_original |
default is 'FALSE'. Set to 'TRUE' to keep original columns |
An object of the same type as .data. The output has the following properties: 1. Columns from .data will be preserved 2. Columns with scores that are z-scored
z_scored_grand_mean(iris,where(is.numeric))
z_scored_grand_mean(iris,where(is.numeric))
This function will compute group-mean-centered scores, and then z-scored the group-mean-centered scores with respect to the grand mean.
z_scored_group_mean(data, cols, group, keep_original = TRUE)
z_scored_group_mean(data, cols, group, keep_original = TRUE)
data |
A data.frame or a data.frame extension (e.g. a tibble). |
cols |
Columns that need to be centered. See 'dplyr::dplyr_tidy_select' for available options. |
group |
the grouping variable. If you need to pass multiple group variables, try to use quos(). Passing multiple group variables is not tested. |
keep_original |
default is 'FALSE'. Set to 'TRUE' to keep original columns |
return a dataframe with a group-mean centered columns that are z-scored with respect to the grand mean.
z_scored_group_mean(iris, dplyr::ends_with("Petal.Width"), "Species")
z_scored_group_mean(iris, dplyr::ends_with("Petal.Width"), "Species")
This function will group mean centered the scores at the level 1 and create an aggregated mean score for each group at L2. After that, the group-mean-centered L1 scores and mean L2 scores will be z-scored with respect to the grand mean. Please see 'center_mlm' if you want to use the version without the z-scoring.
z_scored_mlm(data, cols, group, keep_original = TRUE)
z_scored_mlm(data, cols, group, keep_original = TRUE)
data |
A data.frame or a data.frame extension (e.g. a tibble). |
cols |
Columns that need to be centered. See 'dplyr::dplyr_tidy_select' for available options. |
group |
The grouping/cluster variable. |
keep_original |
default is 'TRUE'. Set to 'FALSE' to remove original columns |
An object of the same type as .data. The output has the following properties: 1. Columns from .data will be preserved 2. Columns with L1 scores that are group-mean centered then grand-mean z-scored. 3. Columns with L2 aggregated means that are z-scored
z_scored_mlm(iris,dplyr::ends_with('Length'),group = 'Species')
z_scored_mlm(iris,dplyr::ends_with('Length'),group = 'Species')
This is a specialized function for mean centering categorical variables. There are two cases where this function should be used instead of the generic 'center_mlm'. 1. This function should be used when you need group mean centering for non-dummy-coded variables at L1. Variables at L2 are always dummy-coded as they represent the percentage of subjects in that group. 2. This function should be used whenever you want to z-score the aggregated L2 means
z_scored_mlm_categorical( data, cols, dummy_coded = NA, group, keep_original = TRUE )
z_scored_mlm_categorical( data, cols, dummy_coded = NA, group, keep_original = TRUE )
data |
A data.frame or a data.frame extension (e.g. a tibble). |
cols |
Dummy-coded or effect-coded columns for group-mean centering. Support 'dplyr::dplyr_tidy_select' options. |
dummy_coded |
Dummy-coded variables (cannot be effect-coded) for L2 aggregated means. Support 'dplyr::dplyr_tidy_select' options. |
group |
the grouping variable. Must be character |
keep_original |
default is 'FALSE'. Set to 'TRUE' to keep original columns |
An object of the same type as .data. The output has the following properties: 1. Columns from .data will be preserved 2. Columns with L1 scores that are group-mean centered 3. Columns with L2 aggregated means (i.e., percentage) that are z-scored
z_scored_mlm_categorical(mlbook_data,cols='female_eff',dummy_coded='female_dum','schoolnr')
z_scored_mlm_categorical(mlbook_data,cols='female_eff',dummy_coded='female_dum','schoolnr')