How to merge multiple Excel files using Python
Luca Liu
Posted on December 11, 2023
In today's data-driven landscape, consolidating information scattered across multiple Excel files is a common challenge. Fortunately, Python provides powerful tools to streamline this process, offering a more efficient way to merge and organize data. In this guide, we'll explore how to leverage Python's capabilities to effortlessly combine multiple Excel files into a cohesive and structured dataset.
In the world of data, things get messy when you have multiple Excel files with the same columns. If you're looking to tidy up and merge them all into one file, especially when you need to go through them one by one, you're in the right place. This guide will show you an easy way to make sense of it all and keep your data organized.
Python Solution
Merge multiple Excel files effortlessly with this Python code! π Using pandas
and os
, the script navigates through files in a specified folder, combining them into a neat merged_excel.xlsx
file. Just plug in your folder path, run the code, and voila β streamlined data! ππ»
# import packages
import pandas as pd
import os
# Define a function 'append' to merge Excel files in a specified path
def append(path):
# Create an empty list to store individual DataFrames
frames = []
for root, dirs, files in os.walk(path):
for file in files:
file_with_path = os.path.join(root, file)
df = pd.read_excel(file_with_path)
frames.append(df)
df = pd.concat(frames, axis=0)
return df
# pathοΌThe folder path where storage all the excel files
df = append(path)
df.to_excel("merged_excel.xlsx")
In this code snippet, we're using two powerful tools: pandas
and os
(a module for working with the operating system).
The append
function is the star here. It digs through all the Excel files in a specified folder ('path') and collects them into a DataFrame
, which is like a neat table for our data.
Now, for the magic moment: the last two lines! They use our append
function to merge all the Excel data in the specified folder into one consolidated file called merged_excel.xlsx.
Bonus: Handling Unique Data Situations
While we've explored solutions for straightforward data merging, let's navigate a slight detour to address more nuanced scenarios.
Picture this: your Excel files carry distinct dates in their names, demanding not just consolidation but a thoughtful integration into a final summary so that you can know the date of data in the final version.
Situation 1: Date-Driven Excel Filenames: Crafting a Comprehensive Summary Excel
For example, if the file name follows a pattern like "filename_YYYY-MM-DD.xlsx", and you want to extract both the date and the filename, you can adapt the code as follows:
import pandas as pd
import os
def append(path):
frames = [] # Create an empty list to store individual DataFrames
for root, dirs, files in os.walk(path):
for file in files:
file_with_path = os.path.join(root, file)
# Extract filename and date information from the file name
file_info = os.path.splitext(file)[0].split('_')
# Read each Excel file using pandas
df = pd.read_excel(file_with_path)
# Add filename and date columns to the DataFrame
df["filename"] = file_info[0] # Assuming filename is the first part
df["date"] = pd.to_datetime(file_info[1]) # Assuming date is the second part
# Append the DataFrame to the list
frames.append(df)
# Concatenate all DataFrames in the list along the rows
df = pd.concat(frames, axis=0)
return df
Situation 2: Streamlining Data: Merging Multiple Excel Files into One with Distinct Sheets Named After Each File
Now, let's pivot to another scenario. Suppose your goal is a consolidated master file, but with a twist β each file contributes to a separate sheet, named after its original file.
import pandas as pd
import os
def combine(path):
with pd.ExcelWriter("merged_excel.xlsx") as writer:
for root, dirs, files in os.walk(path):
for file in files:
filename = os.path.join(root, file)
df = pd.read_excel(filename)
# Delete the file name suffix, sometimes it could be xlsv/.xlsx
df.to_excel(writer, sheet_name=file.strip(
'.csv'))
return df
Explore more
Thank you for taking the time to explore data-related insights with me. I appreciate your engagement.
Posted on December 11, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.