Python script to delete files older than 1 day, delete empty sub-directories recursively and create a log file for records!

Who doesn't like to automate repetitive manual tasks, right? But, there a question comes in our mind, why do we need to automate this particular file and folder deletion process! And what is the benefit we'll get!

Don't worry, I've got an answer for you!

It is often necessary to delete files within a directory that are older than a particular number of days. This is especially the case with archive directories. Without performing routine maintenance, these archives can begin consuming large amounts of disk space in the server or your local machine . Cleaning file system regularly and manually seems time consuming and painful. Creating a script to remove older archives makes the maintenance process fast and painless.

Here comes Python to make our lives easier. Python is an excellent programming language for scripting. You can use the same code for different operating systems i.e. windows, linux, ubuntu or ios.

So, Lets jump into our scripting!

In this automate process, we'll list down all files and folders recursively inside a directory. And among those, older files and empty subdirectories need to be deleted as well. Last but not the least, a log file will be created to keep the records of this execution.

Here are the requirements we need to consider first :
  1. Create a script which will delete files and sub folders in a directory recursively which are older than a day
  2. A list of each file inside the directory before delete and put them in a log file
  3. A list of all subfolders inside the directory before delete and put them in a log file
  4. If a subfolder has a recent file, do not delete the folder. Delete the older files only.
  5. But if a subfolder is empty, then delete the subfolder.
  6. Keep the records of deletion in the log file and scripts execution and stop date time.
  7. Log has to be a rolling log like everyday it creates a new log with date (Not appended to single log file)
let's breakdown our todo to solve this problem.
  1. We need to input a directory where we will run our script
  2. We need to input a directory where we will save our log file
  3. Create a log file in the given directory ( in step 2 ) and name it uniquely with timestamp.
  4. Inside the given directory we will search all files and folders recursively.
  5. Get the latest modification date of each file and compare with preferable date.
  6. Check how older the files are and if the files are older than one day, we need to delete those.
  7. Check whether subfolders or subdirectories are empty or not. If any of the subfolders are empty then we also need to delete it.
  8. Finally, we need to keep the records of every files, folders, some metadata and delete status in the log file.

...

Now we are good to go to write the script.

Please go through all the comments in the script to understand every steps there!

# Python 3
import time
import os
from glob import glob, iglob
from pathlib import Path
import glob
import datetime 
from datetime import datetime
import sys


# Directory validation function
def valid_dir(dir):
    #Checking if the path exists or not
    if not os.path.exists(dir):

        #Get current directory where the script has been executed
        cwd = os.getcwd()

        #Create .txt file for log
        f = open(cwd+"/log_createdAt_"+str(datetime.now().timestamp())+".txt",'w')

        #Write in the .txt file
        f.write("* Script execution started at : " + str(currentDate) +"\n"+"\n")
        f.write("This is not a valid path!"+"\n"+"\n")
        f.write("* Script execution stopped at : " + str(currentDate) +"\n"+"\n")
        print("Please provide valid path ")

        #exit
        sys.exit(1)

    #Checking if it is directory or not
    if not os.path.isdir(dir):

        #Get current directory where the script has been executed
        cwd = os.getcwd()

        #Create .txt file for log
        f = open(cwd+"/log_createdAt_"+str(datetime.now().timestamp())+".txt",'w')

        #Write in the .txt file
        f.write("* Script execution started at : " + str(currentDate) +"\n"+"\n")
        f.write("This is not a valid directory path!"+"\n"+"\n")
        f.write("* Script execution stopped at : " + str(currentDate) +"\n"+"\n")
        print("Please provide directory path ")

        #exit
        sys.exit(2)

# Function to convert list into string 
def listToString(s): 

    # initialize an empty string
    str1 = " , " 

    # return string  
    return (str1.join(s))

# Function to list all files and folders recursively inside a directory 
def search_filesNFolders(root_dir,log_dir):

    #Date to compare with file modification date
    compareDate = datetime.today()

    #Iteration integer
    i = 0

    #Create .txt file for log
    f = open(log_dir+"/log_createdAt_"+str(datetime.now().timestamp())+".txt",'w')

    f.write("* Script execution started at : " + str(currentDate) +"\n"+"\n")
    f.write("* Script execution Directory : " + root_dir +"\n")
    f.write("* Log file Directory : " + log_dir+"\n"+"\n")

    f.write("* Date to check with how older the file is : " + str(compareDate)+"\n"+"\n")

    #Loop to search all files and folders in the given directory recursively
    for currentpath, folders, files in os.walk(root_dir):

        #currentpath.replace('\','/')
        f.write("* Current path : "+ currentpath)
        f.write("\n")
        #currentpath.replace('\','/')

        #Iteration integer
        i = 0
        i = i+1

        #Check whether there are any folders in each path or not i.e length of folders list
        #Here there are no folders inside the current directory
        if(len(folders) == 0):

            #Writing the number of files and folders in the log file
            f.write("   Number of Folders : 0"+"\n")
            f.write("   Number of Files: " + str(len(files))+"\n")

            #Check whether there are any files in each folders in the same directory or not i.e length of files list
            if(len(files)==0):

                #Delete the subfolder as it is empty, No files and No folders inside
                os.rmdir(currentpath)

                f.write("   Note: This empty directory has been deleted!"+"\n")
            else:
                f.write("   Filenames: "+"\n")
            print("Folders : 0")

        #Here there are subfolders inside the current directory
        else:
            f.write("   Number of Folders: " + str(len(folders))+"\n")
            f.write("   Foldernames: " + listToString(folders)+"\n")
            f.write("   Number of Files: " + str(len(files))+"\n")

            #If there are files inside the current directory 
            if(len(files)!=0):
                f.write("   Filenames: "+"\n")

            print(folders)

        #Loop to get the metadata and check each file inside current directory
        for file in files:

            #Get the modification time of each file
            t = os.stat(os.path.join(currentpath,file))[8]

            #Check how older the file is from compareDate
            filetime = datetime.fromtimestamp(t) - compareDate
            print(filetime.days)

            #Log the record of file modification date time
            f.write("       "+str(i)+". "+file +"\n"+"          Modifiction date :"+str(datetime.fromtimestamp(t))+"\n"+"          File path : " +currentpath+"/"+file+ "\n")

            i = i+1

            #Check if file is older than 1 day
            if filetime.days < -1:

                #Remove the file
                os.remove(currentpath+"/"+file)

                #Write the delete status in log file
                f.write("       Note: This file has been deleted!"+"\n"+"\n")
                print('Deleted')
            else:
                print('Not older than 1 day!')
            print(file)

        f.write("\n"+"\n")

    #Execution stopped time recorded in log file
    f.write("* Script execution stopped at : " + str(datetime.today().strftime("%Y-%m-%d %H:%M:%S")) +"\n"+"\n")


if __name__=="__main__":

    #Define the directory where you want to run this script
    #root_dir = 'C:/Users/Zeaul.Shuvo/Music/Test'
    root_dir = input("Enter the directory path here for script execution - ")


    #Define the directory where you want to log the records
    #log_dir = 'C:/Users/Zeaul.Shuvo/Music'
    log_dir = input("Enter the directory path here for log file - ")

    #Current date
    currentDate = datetime.today().strftime("%Y-%m-%d %H:%M:%S")

    #Calling the function to validate the root directory
    valid_dir(root_dir)

    #Calling the function to validate the log file directory
    valid_dir(log_dir)

    #Calling the function to search files and folders, delete the older files and empty folders and keep record in log file. 
    search_filesNFolders(root_dir,log_dir)

    #exit
    sys.exit(4)

...

Now?

I'm Kidding!

...
...

Here is a screen shot of the log file (.txt file) where we are keeping all the records of script execution.

Remember : We need to keep the script execution start time and stop time in the log file. In every case, we need to keep this time records in log file. This helps us to track the activity of script execution.

We are done! Happy Scripting!

Any questions or suggestions are welcomed. Waiting for your valuable words! Thank you!

25