Skip to content Skip to sidebar Skip to footer

Using Python To Remove Duplicated Contents In Cells In Excel

I am trying to use Python to remove the duplicated contents in the cells of an Excel Spreadsheet. The data is in 1 column, in the original file. (names separated by “, ” in eac

Solution 1:

dict.fromkeys() takes a sequence and not a string

Try this:

for row_index inrange(0, old_sheet.nrows):
    column_con = old_sheet.cell(row_index, 0).value

    # First split into a list and convert to sequence
    column_con = tuple(column_con.split(', '))

    aaa = dict.fromkeys(column_con).keys()

    # Since aaa is a list of keys, you also need to join them in a string
    aaa = ', '.join(aaa)
    new_sheet.write(row_index, 0, aaa)

Solution 2:

use set to store the data you read from excel

data=xlrd.open_workbook("C:\\Users\\I307658\\Desktop\\test.xlsx")
old_sheet = data.sheet_by_index(0) 
new_file = xlwt.Workbook(encoding='utf-8', style_compression = 0)
new_sheet = new_file.add_sheet('Result', cell_overwrite_ok = True)

for row_index in range(0, old_sheet.nrows):
    column_con = old_sheet.cell(row_index, 0).value
    print column_con
    aaa =set(column_con.split(","))

    print', '.join(aaa)
    new_sheet.write(row_index, 0, ', '.join(aaa))

new_file.save("C:\\Users\\I307658\\Desktop\\Book New 1.xls")

Post a Comment for "Using Python To Remove Duplicated Contents In Cells In Excel"