Linguistics PhD Student at UC San Diego
here is a sample dataset
Name <- c("Jon_new", "Bill", "maria", "Ben_1107", "tina")
Age <- c(23, 41, 32, 58, 26)
df <- data.frame(Name, Age)
df
## Name Age
## 1 Jon_new 23
## 2 Bill 41
## 3 maria 32
## 4 Ben_1107 58
## 5 tina 26
the code below extracts just names and formats them so that each of them begins with a capital letter.
df$Name <- gsub("_.*$", "", df$Name)
df$Name <- str_to_title(df$Name)
df
## Name Age
## 1 Jon 23
## 2 Bill 41
## 3 Maria 32
## 4 Ben 58
## 5 Tina 26
here is another dataset. I want to extract the second and the third pieces of information separated by a period (e.g., phon.1) under the Trial
column.
Trial <- c("crit.phon.1.a", "crit.morph.3.c", "fill.G.4.b", "crit.phon.3.b", "fill.UG.2.a")
RT <- c(580, 1230, 780, 633, 269) #stands for reaction time
df1 <- data.frame(Trial, RT)
df1
## Trial RT
## 1 crit.phon.1.a 580
## 2 crit.morph.3.c 1230
## 3 fill.G.4.b 780
## 4 crit.phon.3.b 633
## 5 fill.UG.2.a 269
here is one way to make it happen.
#don't forget to escape the period with TWO backslashes
df1 = df1 %>% mutate(Trial = sapply(strsplit(Trial, '\\.'), function(x) paste(x[2:3], collapse = '.')))
df1
## Trial RT
## 1 phon.1 580
## 2 morph.3 1230
## 3 G.4 780
## 4 phon.3 633
## 5 UG.2 269