Tuesday, 25 July 2017

paste - Anonymize data for each distinct row in R




Example



Value




15   
15
15
4
37
37
37


There's three distinct values but 7 rows, below is what I want. Since I want to Anonymize my data. I keep getting the error "replacement has 3 rows, data has 7"




This is the code I'm using



final_df$Value <- paste("Value",seq(1:length(unique(final_df$Value))))


Value



Value 1
Value 1

Value 1
Value 2
Value 3
Value 3
Value 3

Answer



create function that does the job:



anon <- function(x) {

rl <- rle(x)$lengths
ans<- paste("Value", rep(seq_along(rl), rl))
return(ans)
}


call function:



anon(final_df$Value)



result:



# [1] "Value 1" "Value 1" "Value 1" "Value 2" "Value 3" "Value 3" "Value 3"


generalization:



df1 <- mtcars
df1[] <- lapply(df1, anon)

names(df1) <- paste0("V", seq_along(names(df1)))
rownames(df1) <- NULL

df1

No comments:

Post a Comment

casting - Why wasn&#39;t Tobey Maguire in The Amazing Spider-Man? - Movies &amp; TV

In the Spider-Man franchise, Tobey Maguire is an outstanding performer as a Spider-Man and also reprised his role in the sequels Spider-Man...