Pdf Tables To Excel Automation Using Python

#1

Installation of Dependencies:

Step 1 : Python3 must be installed for this automation, to install Python3 refer the below link

Two ways to Installation of Dependencies:

  • Using Zappy dependency installer ( PythonDependencyInstaller.exe):

Step 1 : You can download the ZappyPythonDependencyInstaller.exe from

This should install all the required dependencies automatically.

2. Manually:

Step 1 : You can download the ActiveTcl Community Edition from ActiveState.

Step 2 : You can download the ghostscript from ghostscript downloads page.

After installing the dependencies, which include Tkinter and ghostscript, you can simply use pip to install Camelot:

Step 3 : Open “cmd” and enter following command.

$ pip install camelot-py[cv]

Step 1: Select the PDF file.

User can select any PDF file. In this example we select follwing PDF file.

image

Step 2: Right click on Zappy icon at bottom right corner of your window and select Task Editor .

image

Step 3: Click on Open Folder image and select PdfExtractUsingPython .zappy task, and click Open .

(You can find this zappy file at: %userprofile%\documents\ZappySamples\Python on your system)

image

Step 4: Drag an drop the Pdf data extraction using python and set the properties.

We need to provide python.exe path to the editor in PythonExePath so that it can execute our python script file. We also provide path to our converter python script (Pdf2Excel.py) provided in this sample project in PythonScriptPath . Additionally you can customize this file to support your own requirements.

Step 5: Click the activity as shown below, Now set the properties

• File Type – Pdf File Type like lattice or stream.

Lattice is more deterministic in nature, and it does not rely on guesses. It can be used to parse tables that have demarcated lines between cells, and it can automatically parse multiple tables present on a page.

Stream can be used to parse tables that have whitespaces between cells to simulate a table structure

image

InputFileName – PDF file path. Click on image in front of InputFilePath. Select File Path and then click on image

image

image

PythonExePath Python exe path for run the script.

Note: For illustration purposes in the above screeshot we have set these fields to our system specific locations but please change these paths to your system specific paths.

Step 6: Click on Execute image

When zappy file is running zappy symbol in Task bar looks like Green Circle

image

Step 7 : You will see “ Successfully Executed Task ” message from Zappy on successful execution.

image

Step 8: Check the Converted Excel file which has same contents from PDF file.

image

SOLVED: Python activities - request for working script