rflashtext can be used to find and replace words in a given text with only one pass over the document.
It’s a R implementation of the FlashText algorithm and it’s inspired on the python library flashtext.
Installation
You can install the released version of rflashtext from CRAN with:
install.packages("rflashtext")
And the development version from GitHub with:
install.packages("devtools")
devtools::install_github("AbrJA/rflashtext")
Example
This is a basic example which shows you how to use the API:
New processor
library(rflashtext)
processor <- KeywordProcessor$new(keys = c("NY", "LA"), words = c("New York", "Los Angeles"))
processor$show_trie()
#> [1] "{\"L\":{\"A\":{\"_word_\":\"Los Angeles\"}},\"N\":{\"Y\":{\"_word_\":\"New York\"}}}"
Find keys in a sentence
words_found <- processor$find_keys(sentences = c("I live in LA and I like NY", "Have you been in TX?"))
words_found
#> [[1]]
#> [[1]]$word
#> [1] "Los Angeles" "New York"
#>
#> [[1]]$start
#> [1] 11 25
#>
#> [[1]]$end
#> [1] 12 26
#>
#>
#> [[2]]
#> [[2]]$word
#> [1] "Texas"
#>
#> [[2]]$start
#> [1] 18
#>
#> [[2]]$end
#> [1] 19
data.table::rbindlist(words_found)
#> word start end
#> 1: Los Angeles 11 12
#> 2: New York 25 26
#> 3: Texas 18 19