How To Search For Multiple Search Terms Across Multiple Rows In A Pandas Dataframe?
So my previous, more simplified question is here - How to search for text across multiple rows in a pandas dataframe? What I want to do is basically to be able to feed a text docum
Solution 1:
There are 2 columns 'start' and 'end'.
import re
terms = [term.strip() for term inopen("terms.txt").readlines()]
word = df["subtitle"].str.strip()
end = word.apply(len).cumsum() + pd.RangeIndex(len(df))
start = end.shift(fill_value=-1) + 1
text = " ".join(word)
df["match"] = False
for term in terms:
formatchin re.finditer(fr"\b{term}\b", text, re.IGNORECASE):
idx1 = start[start == match.start()].index[0]
idx2 = end[end == match.end()].index[0]
df[idx1:idx2] = True
Output:
$ cat terms.txt
new jersey
hello
>>> df
id subtitle startend duration match014new71.98672.0960.11True115 jersey 72.10672.6160.51True216 grew 72.69673.0060.31False317 up 73.00773.1470.14False418 believing 73.15673.7160.56False
Post a Comment for "How To Search For Multiple Search Terms Across Multiple Rows In A Pandas Dataframe?"