Title: | Generation of Full Rank Design Matrix |
---|---|
Description: | Creates a full rank matrix out of a given matrix. The intended use is for one-hot encoded design matrices that should be used in linear models to ensure that significant associations can be correctly interpreted. However, 'fullRankMatrix' can be applied to any matrix to make it full rank. It removes columns with only 0's, merges duplicated columns and discovers linearly dependent columns and replaces them with linearly independent columns that span the space of the original columns. Columns are renamed to reflect those modifications. This results in a full rank matrix that can be used as a design matrix in linear models. The algorithm and some functions are inspired by Kuhn, M. (2008) <doi:10.18637/jss.v028.i05>. |
Authors: | Paula Weidemueller [aut, cre, cph] (<https://orcid.org/0000-0003-3867-3131>, @PaulaH_W), Constantin Ahlmann-Eltze [aut] (<https://orcid.org/0000-0002-3762-068X>, @const_ae) |
Maintainer: | Paula Weidemueller <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.0.9000 |
Built: | 2025-02-25 04:32:21 UTC |
Source: | https://github.com/pweidemueller/fullrankmatrix |
The function performs a depths-first search to find all connected components.
find_connected_components(connections)
find_connected_components(connections)
connections |
a list where each element is a vector with connected nodes. Each node must be either a character or an integer. |
a list where each element is a set of connected items.
find_connected_components(list(c(1,2), c(1,3), c(4,5)))
find_connected_components(list(c(1,2), c(1,3), c(4,5)))
Find linear dependent columns in a design matrix
find_linear_dependent_columns(mat, tol = 1e-12)
find_linear_dependent_columns(mat, tol = 1e-12)
mat |
a matrix |
tol |
a double that specifies the numeric tolerance |
a list with vectors containing the indices of linearly dependent columns
The algorithm and function is inspired by the internalEnumLC
function in the 'caret' package (GitHub)
mat <- matrix(rnorm(3 * 10), nrow = 10, ncol = 3) mat <- cbind(mat, mat[,1] + 0.5 * mat[,3]) find_linear_dependent_columns(mat) # returns list(c(1,3,4))
mat <- matrix(rnorm(3 * 10), nrow = 10, ncol = 3) mat <- cbind(mat, mat[,1] + 0.5 * mat[,3]) find_linear_dependent_columns(mat) # returns list(c(1,3,4))
First remove empty columns. Then discover linear dependent columns. For each set of linearly dependent columns, create orthogonal vectors that span the space. Add these vectors as columns to the final matrix to replace the linearly dependent columns.
make_full_rank_matrix(mat, verbose = FALSE)
make_full_rank_matrix(mat, verbose = FALSE)
mat |
A matrix. |
verbose |
Print how column numbers change with each operation. |
a list containing:
matrix
: A matrix of full rank. Column headers will be renamed to reflect how columns depend on each other.
(c1_AND_c2)
If multiple columns are exactly identical, only a single instance is retained.
SPACE_<i>_AXIS<j>
For each set of linearly dependent columns, a space i
with max(j)
dimensions was created using orthogonal axes to replace the original columns.
space_list
: A named list where each element corresponds to a space and contains the names of the original linearly dependent columns that are contained within that space.
# Create a 1-hot encoded (zero/one) matrix c1 <- rbinom(10, 1, .4) c2 <- 1-c1 c3 <- integer(10) c4 <- c1 c5 <- 2*c2 c6 <- rbinom(10, 1, .8) c7 <- c5+c6 # Turn into matrix mat <- cbind(c1, c2, c3, c4, c5, c6, c7) # Turn the matrix into full rank, this will: # 1. remove empty columns (all zero) # 2. merge columns with the same entries (duplicates) # 3. identify linearly dependent columns # 4. replace them with orthogonal vectors that span the same space result <- make_full_rank_matrix(mat) # verbose=TRUE will give details on how many columns are removed in every step result <- make_full_rank_matrix(mat, verbose=TRUE) # look at the create full rank matrix mat_full <- result$matrix # check which linearly dependent columns spanned the identified spaces spaces <- result$space_list
# Create a 1-hot encoded (zero/one) matrix c1 <- rbinom(10, 1, .4) c2 <- 1-c1 c3 <- integer(10) c4 <- c1 c5 <- 2*c2 c6 <- rbinom(10, 1, .8) c7 <- c5+c6 # Turn into matrix mat <- cbind(c1, c2, c3, c4, c5, c6, c7) # Turn the matrix into full rank, this will: # 1. remove empty columns (all zero) # 2. merge columns with the same entries (duplicates) # 3. identify linearly dependent columns # 4. replace them with orthogonal vectors that span the same space result <- make_full_rank_matrix(mat) # verbose=TRUE will give details on how many columns are removed in every step result <- make_full_rank_matrix(mat, verbose=TRUE) # look at the create full rank matrix mat_full <- result$matrix # check which linearly dependent columns spanned the identified spaces spaces <- result$space_list
This function checks a vector of column names to ensure they are valid. It performs the following checks:
The column names must not be NULL
.
The column names must not contain empty strings.
The column names must not contain NA
values.
The column names must be unique.
validate_column_names(names)
validate_column_names(names)
names |
A character vector of column names to validate. |
Returns TRUE
if all checks pass. If any check fails, the function stops and returns an error message.
validate_column_names(c("name", "age", "gender"))
validate_column_names(c("name", "age", "gender"))