rflashtext

R-CMD-check Grand-total Per-month

rflashtext can be used to find and replace words in a given text with only one pass over the document.

It’s a R implementation of the FlashText algorithm and it’s inspired on the python library flashtext.

Installation

You can install the released version of rflashtext from CRAN with:

install.packages("rflashtext")

And the development version from GitHub with:

install.packages("devtools")
devtools::install_github("AbrJA/rflashtext")

Example

This is a basic example which shows you how to use the API:

New processor

library(rflashtext)

processor <- KeywordProcessor$new(keys = c("NY", "LA"), words = c("New York", "Los Angeles"))
processor$show_trie()
#> [1] "{\"L\":{\"A\":{\"_word_\":\"Los Angeles\"}},\"N\":{\"Y\":{\"_word_\":\"New York\"}}}"

Add keys-words to processor

processor$add_keys_words(keys = c("TX", "CA"), words = c("Texas", "California"))
processor$show_trie()
#> [1] "{\"C\":{\"A\":{\"_word_\":\"California\"}},\"L\":{\"A\":{\"_word_\":\"Los Angeles\"}},\"N\":{\"Y\":{\"_word_\":\"New York\"}},\"T\":{\"X\":{\"_word_\":\"Texas\"}}}"

Find keys in a sentence

words_found <- processor$find_keys(sentences = c("I live in LA and I like NY", "Have you been in TX?"))
words_found
#> [[1]]
#> [[1]]$word
#> [1] "Los Angeles" "New York"   
#> 
#> [[1]]$start
#> [1] 11 25
#> 
#> [[1]]$end
#> [1] 12 26
#> 
#> 
#> [[2]]
#> [[2]]$word
#> [1] "Texas"
#> 
#> [[2]]$start
#> [1] 18
#> 
#> [[2]]$end
#> [1] 19
data.table::rbindlist(words_found)
#>           word start end
#> 1: Los Angeles    11  12
#> 2:    New York    25  26
#> 3:       Texas    18  19

Replace keys in a sentence

processor$replace_keys(sentences = c("I live in LA and I like NY", "Have you been in TX?"))
#> [1] "I live in Los Angeles and I like New York"
#> [2] "Have you been in Texas?"

To see more details about the performance of the algorithm, click here.