Only Coders - Where knowledge meets opportunity

python (12.9k questions)

javascript (9.2k questions)

reactjs (4.7k questions)

java (4.2k questions)

java (4.2k questions)

c# (3.5k questions)

c# (3.5k questions)

html (3.3k questions)

Questions - pdfminer

Tab separated data is confused to tables when parsing pdf to text

I am using pdfMiner to convert pdf to txt. When there are tabs, the data is read column wise instead of row wise. For example, the below snippet in a PDF: titel1 : text1 title2: text2 titl...

A_Matar

pdf

layout

pdfminer

Votes: 0

Answers: 1

Latest Answer

I faced the same issu. Try it with pdfplumer (https://pypi.org/project/pdfplumber/) this is built up on pdfMiner. This Code worked perfectly fine for me: def pdf2txt(path): with pdfplumber.open(pa...

Nico Petermann

Extracting email address, first name and last name from multiple PDF files within a folder

I am trying to extract the following information from all PDF files within a folder, the PDF files are CV's: Email Address, First Name, Last Name for a work project. I have successfully managed to ext...

Berci Vagyok

python

pdf

text-extraction

pdfminer

Votes: 0

Answers: 1

Latest Answer

You can find email information because there is logic behind it match = re.search(r'[\w\.-]+@[a-z0-9\.-]+', text) But also you have to figure out a logic to find out first and last names of your PDF ...

pedro_bb7

Posts

Questions

Blogs