I would like to do statistical analysis on an Excel file. Rather than learn a new programming language VBA (Visual Basic for Applications), I would rather work with Excel files in Python. This post shows how to read and write Excel files with Python.
The Python library
openpyxl is designed for reading and writing Excel
xlsx/xlsm/xltx/xltm files. Refer to openpyxl documentation for its usage.
Use the following command to install
$ sudo pip install openpyxl
xlwt are for reading and writing spreadsheet files compatible with older Microsoft Excel files (i.e.,
2. Read an Excel file as lists
2.1 Get a worksheet
load_workbook(...) is used to open an Excel file and return a workbook.
from openpyxl import load_workbook load_workbook(filename, read_only=False, keep_vba=False, data_only=False, guess_types=False) # for instance wb = load_workbook(file_workbook, read_only=True)
In gerneral, a workbook contains several worksheet.
wb.get_sheet_names() returns a list of sheet names. With a given name,
wb.get_sheet_by_name(sheet_name) returns a worksheet. For instance,
ws = wb.get_sheet_by_name(sheet_name)
2.2 Read to a list of lists
ws.iter_rows(...) is used to iterate over cells in the worksheet
ws. For instance,
iter_rows(range_string=None, row_offset=0, column_offset=0) # read the worksheet to a list of lists lists = list() for row in ws.iter_rows(row_offset=1): # row_offset=1, skip the header # select fields selected_fields = [ row[IDX_CARD_ID], row[IDX_NAME], row[IDX_TIME], row[IDX_IN_OUT] ] # deal with blank fields; some fields might contain Chinese characters lists.append([item.value.encode('utf8') if item.value else None for item in selected_fields])