Skip to content Skip to sidebar Skip to footer

Delete Or Remove Unexpected Records And Strings Based On Multiple Criteria By Python Or R Script

I have a .csv file named fileOne.csv that contains many unnecessary strings and records. I want to delete unnecessary records / rows and strings based on multiple condition / crite

Solution 1:

This can be achieved with the following Python script:

import csv
import re
import string

output_header = ['a_id', 'b_id', 'CC', 'DD', 'EE', 'FF', 'GG']

sanitise_table = string.maketrans("","")
nodigits_table = sanitise_table.translate(sanitise_table, string.digits)

defsanitise_cell(cell):
    return cell.translate(sanitise_table, nodigits_table)       # Keep digitswithopen('fileOne.csv') as f_input, open('resultFile.csv', 'wb') as f_output:
    csv_input = csv.reader(f_input)
    csv_output = csv.writer(f_output)

    input_header = next(f_input)
    csv_output.writerow(output_header)

    for row in csv_input:
        bb = re.match(r'(\d+)_(\d+)\.csv', row[1])

        if bb and row[2] notin ['No Bi', 'less']:
            # Remove all columns after 'Mi' if presenttry:
                mi = row.index('Mi')
                row[:] = row[:mi] + [''] * (len(row) - mi)
            except ValueError:
                pass

            row[:] = [sanitise_cell(col) for col in row]
            row[0] = bb.group(1)
            row[1] = bb.group(2)
            csv_output.writerow(row)

To simply remove Mi columns from an existing file the following can be used:

import csv

withopen('input.csv') as f_input, open('output.csv', 'wb') as f_output:
    csv_input = csv.reader(f_input)
    csv_output = csv.writer(f_output)

    for row in csv_input:
        try:
            mi = row.index('Mi')
            row[:] = row[:mi] + [''] * (len(row) - mi)
        except ValueError:
            pass

        csv_output.writerow(row)

Tested using Python 2.7.9

Post a Comment for "Delete Or Remove Unexpected Records And Strings Based On Multiple Criteria By Python Or R Script"