19
Python - PDF to JPG & back Converter
Quite often, I have to submit the same documents in either .pdf
or .jpg/jpeg
format. I used online conversion sites to convert between the two formats. Recently a concern had been lingering in my mind that why am I uploading my personal documents on clouds and shouldn't I just create a converter so that the process stays local on my device. Hence, I decided to make a program to do the conversions.
I'd like to include this as a documented step :) The reason is short: I do web development in my day-to-day life, so I decided that coding this small program in Python will be refreshing.
You can either implement conversions on your own or you can look for packages that are well-maintained and have a good download count. I chose following packages for conversions:
I used PAGE to build static GUI. It is a very simple and easy-to-use drag-and-drop GUI generator. I made two GUIs, one for the converter and the other for showing success message. You can also handle all in one.
Note: tkinter applications look different on different OS. I use Windows. If you want to make application same across all platforms, you can add conditional layouts.
After generating scripts from PAGE, I made the GUI dynamic on the basis of conversion_mode
:
- If the
conversion_mode == 'jpg'
, then set text accordingly and show entry field for inputting name of output PDF file. - Similarly, if the
conversion_mode == 'pdf'
, then make relevant textual changes. No need to add entry field as the output JPGs will have the PDF's name followed by index.
I used tkinter's filedialog module to upload files. I set initialdir
to the directory where script is placed. curr_dir = os.getcwd()
-
Upload PDF: Since single PDF file will be uploaded, hence
filedialog.askopenfilename
is used.
filedialog.askopenfilename(initialdir=curr_dir, title="Select PDF",
filetypes=[("PDF File", "*.pdf")])
-
Upload JPGs: Multiple JPG files can be uploaded to create a PDF file, hence
filedialog.askopenfilenames
is used.
filedialog.askopenfilenames(initialdir=curr_dir, title="Select JPG(s)",
filetypes=[("JPG File", "*.jpeg *.jpg")
To convert JPG/JPEG files to a single PDF, simply provide file_paths
, which is a list of image file paths, to img2pdf's convert
function.
import img2pdf
with open(name+".pdf", "wb+") as file:
file.write(img2pdf.convert(*file_paths))
name
is the output PDF name provided by the user.
To use pdf2image
, firstly download poppler
on your system. (Instructions). Since I'm a Windows user, I downloaded poppler and placed the bin/
folder in the same directory as the script.
poppler_path=r"./bin" if (sys.platform == "win32" or sys.platform == "cygwin") else None
Image files created are stored in output_folder
. To avoid out-of-memory error, I set paths_only=True
so that convert_from_path(...)
doesn't need to return PIL Image objects.
from pdf2image import convert_from_path
from pdf2image.exceptions import (
PDFInfoNotInstalledError,
PDFPageCountError,
PDFSyntaxError
)
convert_from_path(
file_path,
output_folder=output_folder_path,
output_file=output_folder_name,
fmt="jpg",
paths_only=True,
poppler_path=r"./bin" if (sys.platform == "win32" or sys.platform == "cygwin") else None
)
You can run the function in chunks (see) if the PDF file has hundreds of pages or you want to work on Image object and are concerned about running into OOM error.
After conversion, I shift the user to different interface where success message is displayed.
Get the full implementation from my GitHub.
So this is how I made a PDF ⇌ JPG converter that fulfills my requirements and now I don't have to worry about security/privacy.
Share your thoughts. Feedback's always welcome :)
19