Posted on 2006-08-14 19:29:54, modified on 2006-08-14 19:35:59
Tags: Broken software
One of the jobs I have at this moment is the parsing of PDF files. Very easy... convert PDF to HTML, convert HTML to text and then parse it. Only sometimes the PDF files are protected:
[~] edwin@k7>pdftohtml 3798b854f6245a5b98ec0344aefd44b1.pdf Error: Copying of text from this document is not allowed.
Luckely there is an easy solution for this:
[~] edwin@k7>pdf2ps 3798b854f6245a5b98ec0344aefd44b1.pdf [~] edwin@k7>ps2pdf 3798b854f6245a5b98ec0344aefd44b1.ps [~] edwin@k7>pdftohtml 3798b854f6245a5b98ec0344aefd44b1.pdf Page-1 Page-2
Mission accomplished! :-P| Share on Facebook | Share on Twitter