How To Parse Labeled Values Of Columns Into A Pandas Dataframe (some Column Values Are Missing)?
The follow are two rows from my unlabeled dataset, a small subset: random1 147 sub1    95  34  dewdfa3 15000   -1238   SBAASBAQSBARSBATSBAUSBAXBELAAX  AAA:COL:UVTWUVWDUWDUWDWW    B
Solution 1:
This will do it:
text = """random1 147 sub1    95  34  dewdfa3 15000   -1238   SBAASBAQSBARSBATSBAUSBAXBELAAX  AAA:COL:UVTWUVWDUWDUWDWW    BBB:COL:F   CCC:COL:GTATGTCA    DDD:COL:K20 EEE:COL:54T GGG:COL:-30.5    HHH:COL:000.1   III:COL:2  JJJ:COL:0   
random2 123 sub1    996 12  kwnc239 10027    144        LBPRLBPSLBRDLBSDLBSLLBWB    AAA:COL:UWTTUTUVVUWWUUU BBB:COL:F   DDD:COL:CACGTCGG    EEE:COL:K19 FFF:COL:HCC16   GGG:COL:873 III:COL:-77 JJJ:COL:0   KKK:COL:0   LLL:COL:1   MMM:COL:212"""data = [line.split() for line in text.split('\n')]
data1 = [line[:9] for line in data]
data2 = [line[9:] for line in data]
# list of dictionaries from data2, where I parse the columnsdict2 = [[dict([d.split(':COL:') for d in d1]) for d1 in data2]
result = pd.concat([pd.DataFrame(data1),
                    pd.DataFrame(dict2)],
                   axis=1)
result.iloc[:, 9:]

Post a Comment for "How To Parse Labeled Values Of Columns Into A Pandas Dataframe (some Column Values Are Missing)?"