When reading in a data file into R, I can read it in either as a data.frame
or a data.table
using the data.table
package. I would prefer to use data.table
in the future since it deals with large data better. However, there are issues with both methods (read.table
for data.frames, fread
for data.tables) and I'm wondering if there's a simple fix out there.
When I use read.table
to produce a data.frame
, if my column names include colons or spaces, they are replaced by periods, which I do not want. I want the column names to be read in "as is."
Alternatively, when I use fread
to produce a data.table
, my column names are not read in at all, which is obviously not desired.
Check out this gist below for a reproducible example:
https://gist.github.com/jeffbruce/b966d41eedc2662bbd4a
Cheers
Answer
R always try to convert column names to ensure that they are valid variable names, hence it adds periods in place of spaces and colons. If you dont want that you can use check.names=FALSE
while using read.table
df1<-read.table("data.txt",check.names = FALSE)
sample(colnames(df1),10)
[1] "simple lobule white matter"
[2] "anterior lobule white matter"
[3] "hippocampus"
[4] "lateral olfactory tract"
[5] "lobules 1-2: lingula and central lobule (ventral)"
[6] "Medial parietal association cortex"
[7] "Primary somatosensory cortex: trunk region"
[8] "midbrain"
[9] "Secondary auditory cortex: ventral area"
[10] "Primary somatosensory cortex: forelimb region"
you can see that colnames
are kept as it is.
No comments:
Post a Comment