Maho Takahashi

Linguistics PhD

research

CV

code

mtakahas[at]ucsd[dot]edu

How to alter character values in a dataframe using regular expression

here is a sample dataset

Name <- c("Jon_new", "Bill", "maria", "Ben_1107", "tina")
Age <- c(23, 41, 32, 58, 26)

df <- data.frame(Name, Age)
df
##       Name Age
## 1  Jon_new  23
## 2     Bill  41
## 3    maria  32
## 4 Ben_1107  58
## 5     tina  26

the code below extracts just names and formats them so that each of them begins with a capital letter.

df$Name <- gsub("_.*$", "", df$Name)
df$Name <- str_to_title(df$Name)
df
##    Name Age
## 1   Jon  23
## 2  Bill  41
## 3 Maria  32
## 4   Ben  58
## 5  Tina  26

here is another dataset. I want to extract the second and the third pieces of information separated by a period (e.g., phon.1) under the Trial column.

Trial <- c("crit.phon.1.a", "crit.morph.3.c", "fill.G.4.b", "crit.phon.3.b", "fill.UG.2.a")
RT <- c(580, 1230, 780, 633, 269) #stands for reaction time

df1 <- data.frame(Trial, RT)
df1
##            Trial   RT
## 1  crit.phon.1.a  580
## 2 crit.morph.3.c 1230
## 3     fill.G.4.b  780
## 4  crit.phon.3.b  633
## 5    fill.UG.2.a  269

here is one way to make it happen.

#don't forget to escape the period with TWO backslashes
df1 = df1 %>% mutate(Trial = sapply(strsplit(Trial, '\\.'), function(x) paste(x[2:3], collapse = '.')))
df1
##     Trial   RT
## 1  phon.1  580
## 2 morph.3 1230
## 3     G.4  780
## 4  phon.3  633
## 5    UG.2  269