Automating my workflow with Python #1: File Organizer
Gagan Deep Singh
Posted on August 25, 2020
This is my first article on Automating my workflow with Python.
In this (beginner-friendly) article, you will learn how you can make your own Python π script for organizing files for your own workflow just the way I did.
What problem did I fix? π€
I tweet a lot of Python π snippets, make videos, and work on projects which require me to download a lot of Images, Videos, and whatnot!
I don't get time to organize my downloads π and if I want to look for a file...
So, I wrote a Python script to automatically organize my files in one go. β¨β¨
Before the script π°
This is how my Downloads folder usually looks like...
After the (magical β¨) script ππ₯°
This is how my Downloads folder looks like now...
As you can see that all files are now organized πβ¨!!
Cool, Right!? ππ
After writing the script, it's a lot easier for me to look for anything, cause every file is organized and belongs to their own ππ, which saves me a lot of time now.
The IDEA π₯
A Script which Moves files to their respective πβ¨
For implementing that, here's what I did (abstract view):
Created a
dictionary
π which contains the file types (extensions) and paired them with their respective π namesTook input (
path
) from the User via the command-line argument π»Looked for files in that
path
and made a mechanism (function
) for getting their respective π names.Moved πΌ, π½, π΅,ποΈ etc to their πππ
So Let's go from this π°...
To this πβ¨β¨
Steps for Writing our (Magical β¨) Script
Step #1: Creating a dictionary
π of Extensions (suffix) and Directories π
This part is the most time consuming (and the easiest) π, we just need to
Create a dictionary
π (Hashmap) where keys are extensions of the files and their corresponding values are the name of Directories we want to move them in.
Note: You can relate Dictionaries
in Python π to Objects
in JavaScript.
So, Every multimedia file has an extension:
Image files have extensions .jpeg, .png, .jpg, .gif...
Video files have extensions .mp4, .mov, .mkv, .webm...
Document files have extensions .doc, .docx, .csv, .pdf...
And so on...
So, we can Google the extensions of Images, Videos, Documents, Program Files, and whatever kind of files we need to organize and store them in a dictionary
in {"extension": "directory"}
pair in a variable named dirs
like this...
dirs = {
# Images
"jpeg": "Images",
"png": "Images",
"jpg": "Images",
"tiff": "Images",
"gif": "Images",
# Videos
"mp4": "Videos",
"mkv": "Videos",
"mov": "Videos",
"webm": "Videos",
"flv": "Videos",
# Music
"mp3": "Music",
"ogg": "Music",
"wav": "Music",
"flac": "Music",
# Program Files
"py": "Program Files",
"js": "Program Files",
"cpp": "Program Files",
"html": "Program Files",
"css": "Program Files",
"c": "Program Files",
"sh": "Program Files",
# Documents
"pdf": "Documents",
"doc": "Documents",
"docx": "Documents",
"txt": "Documents",
"ppt": "Documents",
"ods": "Documents",
"csv": "Documents"
}
Step #2: Taking input (path) from the user π§π»βπ»π©π»βπ» via a command-line argument π»
What are command-line arguments?
Here's a simplified version for those who don't know what a command-line argument is:
when you run cd
command for changing directory and give it a path like this...
cd Documents/TOP_SECRET_STUFF
OR
ls Downloads
that path is called a command-line argument.
Basically, command-line arguments are values that are passed onto the programs or scripts being executed.
Now, getting back to our script πππ»
Python has a module named sys
which has a function argv
which returns a list
of the command-line arguments given while executing the script.
Here's how it works...
When you run a python script like this...
python helloWorld.py
argv
returns a list that contains all the command line arguments given.
In the above example, the first (and only) command-line argument is "helloWorld.py"
which is the name of the script file.
argv returns
['helloWorld.py']
.
If we give command-line arguments like this...
python add.py 2 3
argv
will return['add.py', '2', '3']
.
Let's use it for taking a path from the user in our script file.
from sys import argv
# argv = ['our_script_file.py', 'path/to/dir']
# Select second element (at index 1) as path
path = argv[1] # 'path/to/dir'
There's a tiny bug in it.
If the user runs the script without providing the path
, then our script will throw Index error (cause we are trying to access 2nd element but there won't be any 2nd element to access)
Though, we can easily fix that bug by
adding a condition before accessing the path
from argv
That simple condition is to check if the command-line arguments are exactly 2 (name of the script file and the path).
If this isn't the case then display the Error message along with Usage like this...
from pathlib import Path
if len(argv) != 2:
print("=" * 35)
print("[ERROR] Invalid number of arguments were given")
print(f"[Usage] python {Path(__file__).name} <dir_path>")
print("=" * 35)
exit(1) # terminate the execution of the script with code 1
# Path to Organize
PATH = argv[1]
And... It's fixed! πππ
Note:
__file__
contains the absolute path of the script file (being executed)Path(__file__)
returns aPath
object (the next section digs more into it), and with this object, we can simply print the name of the script file by using thename
data member with thisPath(__file__).name
Now we know how we can take input from the command-line!! π₯π₯π₯
Step #3: Scan the path for files and move them into their folders
Python has an AWESOME π module called pathlib
which provides Classes
for working with paths
.
It's so cool that it can handle paths from different Operating Systems! π€―
For organizing files, we need to...
- π Look for files in the path π£
- Check each file's extension π§ (file type)
- Select a destination πΊοΈ based on that extension
- Move βοΈ files to their destination directory
π Looking for files in the path π£
For doing that, we gonna convert the path (provided via command-line argument) into a Path Object
with the help of Path
class.
For doing that, we import Path class like this...
from pathlib import Path
and then use the path stored in the 1st index of argv
(2nd element) and call the Path
class using that
from pathlib import Path
PATH = Path(argv[1]) # argv[1] = 'path/to/organize'
For looking up the files in that directory, we gonna iterate it with the help of the iterdir
method.
from pathlib import Path
PATH = Path(argv[1]) # 'path/to/organize'
for filename in PATH.iterdir():
# Do Magic (Organize)
This is gonna help us iterate files and folders in any path! π
Cause we want our script to organize only files, we need to check if the filename
is a file or a directory. We can easily do that with the is_file
method.
from pathlib import Path
PATH = Path(argv[1]) # argv[1] = 'path/to/organize'
for filename in PATH.iterdir():
if filename.is_file(): # If filename is a file
# Do Magic (Organize)
Checking each file's extension π§ (file type)
Cause we are working with Path
objects, it makes it pretty simple for us to get the extension of a file.
Let's say filename
is a Path
object. To get the extension name, It's stored in the suffix
member of the Path
object.
# filename = Path object with path value "AWESOME.jpeg"
extension = filename.suffix # .jpeg
cause we need that extension without '.' as a prefix, we gonna remove that with string slicing.
# filename = Path object with path value "AWESOME.jpeg"
extension = filename.suffix[1:] # 'jpeg'
Now, we know how to get extension names! ππ
Selecting a destination πΊοΈ based on that extension
The heading says it all.
For selecting a destination, we will use the dictionary
we created earlier (long key: value pair thingy) and use it for getting the right destination name for every single file.
# filename = Path object with path value "AWESOME.jpeg"
extension = filename.suffix[1:] # 'jpeg'
directory_name = dirs.get(extension) # 'Images'
Cool π! But what if the extension is not in our dictionary
π It's gonna return None
and we don't want None
. Do we?
So, let's add a default value "Miscellaneous"
as a directory name for files that are not in our dictionary
.
# filename = Path object with path value "New_IMG.ppm"
extension = filename.suffix[1:] # 'ppm'
directory_name = dirs.get(extension, 'Miscellaneous') # 'Miscellaneous'
We are almost done
cause we gonna use this code, again and again, we can wrap it in a function.
Gonna name it get_dir
cause it's gonna return directory name for every file it receives as input. Makes sense. Right!?
Also, adding a little docstring never hurts...
def get_dir(filename):
"""
This function takes filename and returns name of the
parent directory of the respective filename
Returns Miscellaneous if Parent is not found
"""
ext = filename.suffix[1:]
return dirs.get(ext, "Miscellaneous")
Move βοΈ files to their destination directory (finally! π)
We gonna use themove
function from shutil
(shell utilities..?) module (Yes, another dope module in Python) for moving files from one place to another.
Moving files is super easy!. Here's an example.
from shutil import move
source_path = '/home/gagan/Downloads/Awesome.jpeg'
destination_path = '/home/gagan/Downloads/Images'
move(source_path, destination_path) # self explanatory
It's similar to executing this command in Bash
mv ~/Downloads/Awesome.jpeg ~/Downloads/Images
For getting path
from a Path
object, we use str
function which will return path
in string
instead of a Path
object.
We gotta use it cause move
function takes in
move(
str(source_path),
str(destination_path)
)
Create a destination directory for files to be moved in
We gonna use the get_dir
function we defined earlier for creating a destination directory for each file by doing the following:
destination = PATH / get_dir(filename)
if not destination.exists():
destination.mkdir()
We made a new Path
object with the path we gave via the command-line and joined that with our selected directory name using PATH / get_dir(filename)
and stored it in destination
variable.
Also, we need to make it if it doesn't exist, which is done by exists
and mkdir
method.
Note:
- "~" contains the path to the home directory (Mine is
/home/gagan/
) - We need an absolute path for moving files (Absolute Path refers to the full path to a file/directory)
- For getting an absolute path, we use the
absolute
method ofPath
object. -
filename.absolute()
returnsPath
object with absolute source path of the file
- For getting an absolute path, we use the
-
pathlib
(kinda) supports moving files too! But we used shutil cause it mimics the behavior ofmv
command and It doesn't have any issues doing its job.
Now, we know everything we need for implementing the (Magical β¨) Script! ππ₯
π₯π₯π₯π₯
LET'S PUT EVERYTHING TOGETHER
WE HAVE LEARNED/DID SO FAR
π₯π₯π₯π₯
from pathlib import Path
from sys import argv
from shutil import move
def get_dir(filename):
"""
This function takes filename and returns name of the
parent directory of the respective filename
Returns Miscellaneous if Parent is not found
"""
ext = filename.suffix[1:]
return dirs.get(ext, "Miscellaneous")
dirs = {
# Images
"jpeg": "Images",
"png": "Images",
"jpg": "Images",
"tiff": "Images",
"gif": "Images",
# Videos
"mp4": "Videos",
"mkv": "Videos",
"mov": "Videos",
"webm": "Videos",
"flv": "Videos",
# Music
"mp3": "Music",
"ogg": "Music",
"wav": "Music",
"flac": "Music",
# Program Files
"py": "Program Files",
"js": "Program Files",
"cpp": "Program Files",
"html": "Program Files",
"css": "Program Files",
"c": "Program Files",
"sh": "Program Files",
# Documents
"pdf": "Documents",
"doc": "Documents",
"docx": "Documents",
"txt": "Documents",
"ppt": "Documents",
"ods": "Documents",
"csv": "Documents"
}
if len(argv) != 2:
print("=" * 35)
print("[ERROR] Invalid number of arguments were given")
print(f"[Usage] python {Path(__file__).name} <dir_path>")
print("=" * 35)
exit(1)
# Directory Path
PATH = Path(argv[1])
for filename in PATH.iterdir():
path_to_file = filename.absolute()
if path_to_file.is_file():
destination = PATH / get_dir(filename)
if not destination.exists():
destination.mkdir()
move(str(path_to_file), str(destination))
Finally, Running the (magical) script π₯
For organizing my downloads folder, I can now simply run the following command...
python organize.py ~/Downloads
Making this long command shorter with alias
I created an alias
that executes the above command so I don't have to write python ~/projects/File-Organizer/organize.py
for executing the script from any directory.
Here's how I now organize files in a path
organize ~/Downloads
SO CLEAN! π
This is how you can create an alias ππ»ππ»ππ»
Paste the following command in .bashrc
file
alias organize="python /path/to/script_file.py $1"
Replace /path/to/script_file.py
with the path where you saved your script file. Make sure that it's an absolute path of your script file.
Note:
$1
is the first argument that will be passed to our script file.
For activating it in your terminal, execute the following command
source ~/.bashrc
And... we are done! πππππ
This was so simple! Right!?
I hope you guys found this article (which I realized is some sort of memeful tutorial π€¨) useful.
If I missed something or you would like to add more to it, feel free to mention it in the comments.
If you guys wanna support me, Please give me some genuine feedback on how I can improve myself and my articles.
Thanks for reading, Stay Vibrant! β¨
Posted on August 25, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.