Skip to content Skip to sidebar Skip to footer

How To Search For Multiple Search Terms Across Multiple Rows In A Pandas Dataframe?

So my previous, more simplified question is here - How to search for text across multiple rows in a pandas dataframe? What I want to do is basically to be able to feed a text docum

Solution 1:

There are 2 columns 'start' and 'end'.

import re

terms = [term.strip() for term inopen("terms.txt").readlines()]
word = df["subtitle"].str.strip()
end = word.apply(len).cumsum() + pd.RangeIndex(len(df))
start = end.shift(fill_value=-1) + 1
text = " ".join(word)
df["match"] = False
for term in terms:
    formatchin re.finditer(fr"\b{term}\b", text, re.IGNORECASE):
        idx1 = start[start == match.start()].index[0]
        idx2 = end[end == match.end()].index[0]
        df[idx1:idx2] = True

Output:

$ cat terms.txt
new jersey
hello

>>> df
   id   subtitle   startend  duration  match014new71.98672.0960.11True115     jersey  72.10672.6160.51True216       grew  72.69673.0060.31False317         up  73.00773.1470.14False418  believing  73.15673.7160.56False

Post a Comment for "How To Search For Multiple Search Terms Across Multiple Rows In A Pandas Dataframe?"