ENHANCE FILE MANIPULATION WITH PYTHON

Switch From Just Being Busy To Being Productive

Neither your boss nor your client is interested in your hard-work, no matter how much hours you put in if you are not being productive enough.

It’s amazing what we can do with python as programmers. And the fun fact is that these amazing stuffs can also make our daily job super easy and even boost productivity.

We'll learn in this short tutorial some interesting Python File Handling techniques with just few lines of code. Yes you heard right, you can do this amazing stuffs with just few lines of codes.

My Assumptions:

You already know the basics of python like lists, importing of libraries/modules/package, python scope, indentation, input, while loop, functions and a few other similar concepts.

The Take-away:

- How to open and use a file with python.
- How to use the os module to scan directories for a particular file and do something with the file(s).
- How to use the shutil module to move files from one directory to the other.
- How to use the send2trash module to safely delete file(s).
- How to use the glob module to get a specific extension(like .py, .text, .csv and etc) in directories.
- How to do bulk editing of large files and more.

Let's Get Down To Business:

First let’s start by learning how to open a file with python. It’s actually super easy, we can do that without importing any module. We can do that with the code below:

f = open('/path/file1.txt', 'w+')
f.write('This file will be deleted later')
f.close()

With code above we have edited the file “file1.txt” with information inside the parenthesis after the f.write, but bear in mind that with “w+” after the file path, we are actually telling python to create such file if it’s not in existence already. so in essence you can use w+ to read, edit and create new file.

There are other options you can use while handling files, like a+ (reading and appending extra content), r (reading), rb(reading in binary mode) and etc. To understand fully and see other options click here

Now enough of the foreplay, let’s have some real fun:

Let’s import some useful modules.

import os
import glob
import shutil
import send2trash

You may visit this [site] (http://python.org/) to learn more about the imported modules above, in this tutorial we are only going to use them for a quick and comprehensive python file handling functions and codes.

Now let’s create a function to scan our entire document directory.

import os
import glob
import shutil
import send2trash


#This function scans the directory entered, for a specific file extension and do something with them
def scan_ext():
    path = input('Enter path')
    gen = glob.iglob(path + "/**/*.py", recursive=True)
    for py in gen:
        f= open(py, 'a+')
        f.write('\n#moved')
        f.close()

#Now let's call the function
scan_ext()

The function above scans for any directory entered and search recursively for all the .py files (python files), open them and write inside those files ‘#moved’, we have to use a+ because w+ will overwrite the entire file while a+ appends ‘#moved” to the end of the content.

The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, although results are returned in arbitrary order.

The glob among other things is used for such operations as you can see above, we could have also used glob.glob but glob.iglob returns a iterator which yields the same values as glob without storing them all simultaneously instead of memory in searching for the right files. You consider it in this function as a generator. To learn about glob checkout The documentation.

Remember that the path has to be in a string format, for instance in linux: ‘/home/you/Documents/projects/’

Now let’s do more python file handling fun stuffs. Let’s add the following code to the next line(just below the f.close()).

shutil.move(py, '/home/you/Documents/python_projects/'

The last code above will move each of the discovered python files to the folder “python_projects”. You are also expected to have created this folder already before moving files into it.

In case you prefer to delete those discovered python files instead, you can as well add the code below:

send2trash.send2trash(py)

However, to see all the python files discovered, just print them by adding the code below to the last line:

print(py)

Now let’s have some even more python file handling fun. We’ll remove the #moved string added on those python files with another function, because we don’t want to get confused when we see that comment on those files in the future. Remember we used # to make it a comment so as to avoid error while running those files in future.

import os
import glob

#let's remove the '#moved' string
def replace():
    path = input('Enter path\n')

    gen = glob.iglob(path + "/**/*.py", recursive=True)
    for py in gen:
        f = open(py, 'r')
        f_data = f.read()
        f_data = f_data.replace('#moved', '')

        ff = open(py, 'w')
        ff.write(f_data)
        ff.close()

    print('Check removal completed!')

replace()

Meanwhile recursive is actually set to False by default in glob, so we need to set it to True when we need it to be True.

Let’s explain the code further more: You’ll still have to enter the path where you moved those .py files to, open the files, read them, assign them to a variable(here we assigned it to f_data), then we replaced the ‘#moved’ string with empty space.

Then open the same py file again in w mode, then overwrite it with the new edited data(f_data) and close it. We also print that the removal has been completed.

The next python file handling function we’ll write will be with just the os module, and we’ll use the os.walk method to scan through the jupyter notebook files in a given path.

import os

def scan_dirs():
    #Get the file path
    f_path = input('Please enter your file path')
    print(f_path)

    #Make a loop that will scan all the folders, subfolders and files
    for folders, subfolders, files in os.walk(f_path):
        for f in files:
            if f.endswith('.ipynb'):

                print(f'{f}\n')



scan_dirs()

Let’s discuss about the function above briefly:

First we imported os as usual, we got the path with an input, then we used the os.walk method to scan through the files in the path entered, from folder to subfolder down to the files, by default it searches recursively once you get the loop codes right.

Then instead of using glob, we used the .endswith method to search for the right .ipynb files which is the normal jupyter notebook extension.

Then we print all the .ipynb files. Of course we call the function as well.

OK guys..I hope you had a nice python file handling fun with this tutorial. In my next python file handling tutorial we’ll write codes on opening, manipulating, converting and doing more with .csv, pdf and other important file types, using the modules used here and a few other ones like pandas, PYPDF2 and etc.

Meanwhile hitting the "Follow" button will help keeping you in the loop for upcoming tutorials.

To see more and other tutorials like this please visit Pythgenie.

16