How To Calculate Dictionaries Of Lists Using Pandas Dataframe?
Solution 1:
Use -
defdict_op(x):
string1 = x['column1']
string2 = x['column2']
start_pos = x['start']
x['val'] = {i: i + start_pos for i, _ inenumerate(string1)}
return x
defzip_dict(x):
b=pd.DataFrame(x)
return {i:b.loc[:,i].tolist() for i in b.columns }
op = df.apply(dict_op, axis=1).groupby('column1')['val'].apply(list).apply(zip_dict)
print(op)
Output
column1LJNVTJOY {0: [31, 52, 84], 1: [32, 53, 85], 2: [33, 54,...MXRBMVQDHF {0: [79], 1: [80], 2: [81], 3: [82], 4: [83], ...WHLAOECVQR {0: [18], 1: [19], 2: [20], 3: [21], 4: [22], ...Name:val, dtype:object
Explanation
The dict_op
reuses your code to create the dict for every row and then the .apply(list)
zips the dicts together to form a list of dicts.
The zip_dict()
then creates the output dict
out of the interim output.
The last piece that I haven't included is the part where if the length of the list is 1 then you can include the first element only, taking the output from {0: [79], 1: [80], 2: [81], 3: [82], 4: [83], ...
to {0: 79, 1: 80, 2: 81, 3: 82, 4: 83, ...
Solution 2:
First apply groupby function to aggregate the "start" column as a list
df2 = df.groupby("column1")["start"].apply(list).reset_index()
Now, you can write a function to create the new dictionary column
def create_dict(row):
new_dict = {}
for i, j in enumerate(row["column1"]):
if len(row["start"]) == 1:
new_dict[i] = row["start"][0]+i
else:
for k in row["start"]:
if i in new_dict:
new_dict[i].append(k + i)
else:
new_dict[i] = [k + i]
return new_dict
Finally, apply this function to all the rows of df2
df2["new_column"] = df2.apply(create_dict, axis = 1)
Solution 3:
Here's a slightly different approach using a lambda
and two zips
.
df2 = df.groupby('column1')['start'].agg([('s', list)]).reset_index()
df2['l'] = df.column1.str.len()
df2.apply(lambda x: dict(zip(range(x['l'] + 1), zip(*[range(s, s + x['l'] + 1) for s in x['s']]))), axis = 1)
The truncated output of that can be seen here (note that it returns tuples rather than lists):
0 {0:(31, 52, 84), 1:(32, 53, 85), 2:(33, 54,...1 {0:(79,), 1:(80,), 2:(81,), 3:(82,), 4:(8...2 {0:(18,), 1:(19,), 2:(20,), 3:(21,), 4:(2...
First, to cut down on the length of the apply
step, create a DataFrame with the column1
values and the associated starting positions. In addition, add a column with the length of column1
(assuming that the equal length assertion holds).
After that, it's a matter of combining the range of column1
letter indices (0
through len(column1)
, which serves as the keys, and the same range offset by the start
value(s).
Things get a little dicey with the second zip
because [range(s, s + x['l'] + 1) for s in x['s']]
returns something that looks like this (for 'LJNVTJOY'):
[[31, 32, 33, 34, 35, 36, 37, 38, 39],
[52, 53, 54, 55, 56, 57, 58, 59, 60],
[84, 85, 86, 87, 88, 89, 90, 91, 92]]
When we really want to group the elements aligned vertically, so we use the 'splat' or 'unpacking' operator to feed these lists into zip
. Once we've combined those lists, we have a list of keys and a list (of tuples) of values, which can be zipped
into a dict
.
Post a Comment for "How To Calculate Dictionaries Of Lists Using Pandas Dataframe?"