Text Manipulation from PDF  
Author Message
drmacintosh





PostPosted: Fri Apr 14 07:39:31 CDT 2006 Top

Excel >> Text Manipulation from PDF

Hi

Really weird problem I have with a report generated by a application at
work. I run off a monthly report of products completed by employees. It
shows the following info in a pdf report

User ID No of Products Date Sub
Date Last Sub Last Date Sub
Smith_0001 5 01/01/2006
10/01/2006 10/01/2006
Brown_0002 8 02/01/2006
11/01/2006 11/01/2006

Etc but it lists about 300 staff when I copy it and paste it into Excel it
lists all the names followed by numbers followed by dates all in the A
column. Then when it hits another page starts with names again dates are
blank and a * I spaced there.

I want to be able to drop it in Excel like above with the names split form
the number and that placed in another cell and the No's and dates to fill in
the rows. I cannot special paste from Adobe and Transpose I have to
pasted then re copy but then the Transpose takes a lot of mucking about.

Is there a way I can manipulate the text to do as I wish ???

Please help me with this issue showing me how to do it. I can send a sample
file if you need me to.

Cheers Thorrrr

Excel430  
 
 
Bryan





PostPosted: Fri Apr 14 07:39:31 CDT 2006 Top

Excel >> Text Manipulation from PDF
Hi,

Are you doing the Select Text, Copy & Paste, from the .pdf, or from a
Saved .txt file?

You say there is 'adjustment' before the data is Excel useable, if you
supply a small sample that could be useful, my address is my name as


Do I take it that there should be one name per line, followed by a
Number and 3 dates? or is Date Last Sub and Last Sub Date
intentionally on a second line?

--

Thorrrr Wrote:
> Hi
>
> Really weird problem I have with a report generated by a application
> at
> work. I run off a monthly report of products completed by employees.
> It
> shows the following info in a pdf report
>
> User ID No of Products Date Sub
> Date Last Sub Last Date Sub
> Smith_0001 5 01/01/2006
> 10/01/2006 10/01/2006
> Brown_0002 8 02/01/2006
> 11/01/2006 11/01/2006
>
> Etc but it lists about 300 staff when I copy it and paste it into Excel
> it
> lists all the names followed by numbers followed by dates all in the A
> column. Then when it hits another page starts with names again dates
> are
> blank and a * I spaced there.
>
> I want to be able to drop it in Excel like above with the names split
> form
> the number and that placed in another cell and the No's and dates to
> fill in
> the rows. I cannot special paste from Adobe and Transpose I have to
> pasted then re copy but then the Transpose takes a lot of mucking
> about.
>
> Is there a way I can manipulate the text to do as I wish ???
>
> Please help me with this issue showing me how to do it. I can send a
> sample
> file if you need me to.
>
> Cheers Thorrrr


--
Bryan Hessey
------------------------------------------------------------------------
Bryan Hessey's Profile: http://www.excelforum.com/member.php?action=getinfo&userid=21059
View this thread: http://www.excelforum.com/showthread.php?threadid=532719

 
 
Traveller





PostPosted: Fri Apr 14 08:52:02 CDT 2006 Top

Excel >> Text Manipulation from PDF Try the Data/Text to Columns command.



> Hi
>
> Really weird problem I have with a report generated by a application at
> work. I run off a monthly report of products completed by employees. It
> shows the following info in a pdf report
>
> User ID No of Products Date Sub
> Date Last Sub Last Date Sub
> Smith_0001 5 01/01/2006
> 10/01/2006 10/01/2006
> Brown_0002 8 02/01/2006
> 11/01/2006 11/01/2006
>
> Etc but it lists about 300 staff when I copy it and paste it into Excel it
> lists all the names followed by numbers followed by dates all in the A
> column. Then when it hits another page starts with names again dates are
> blank and a * I spaced there.
>
> I want to be able to drop it in Excel like above with the names split form
> the number and that placed in another cell and the No's and dates to fill in
> the rows. I cannot special paste from Adobe and Transpose I have to
> pasted then re copy but then the Transpose takes a lot of mucking about.
>
> Is there a way I can manipulate the text to do as I wish ???
>
> Please help me with this issue showing me how to do it. I can send a sample
> file if you need me to.
>
> Cheers Thorrrr
>
>
>
 
 
Bryan





PostPosted: Fri Apr 14 17:17:09 CDT 2006 Top

Excel >> Text Manipulation from PDF
Hi,

Received your .jpg, but of course I cannot copy / paste from that as
you can with a .pdf.

Two solutions:

1. In Adobe Acrobat, SAVEAS a .txt file and open in Excel via Data,
Import External Data, Import, locate the file and import the data. Then
walk through the text-to-columns Wizard for your format, select
Delimited and pick a delimiter that works, select each Date column and
check the date format etc.

2. Copy and Paste, if it all falls into column A, then select column A
and do Data, Text to Columns, same settings.

After this, if the date * means there is no date, then use Edit,
Replace and replace all * with nothing.

You should then have a workable file, if not, send a few lines in a
worksheet.

This looks straightforward, and should work easily for you.

Hope this helps

--

Bryan Hessey Wrote:
> Hi,
>
> Are you doing the Select Text, Copy & Paste, from the .pdf, or from a
> Saved .txt file?
>
> You say there is 'adjustment' before the data is Excel useable, if you
> supply a small sample that could be useful, my address is my name as

>
> Do I take it that there should be one name per line, followed by a
> Number and 3 dates? or is Date Last Sub and Last Sub Date
> intentionally on a second line?
>
> --


--
Bryan Hessey
------------------------------------------------------------------------
Bryan Hessey's Profile: http://www.excelforum.com/member.php?action=getinfo&userid=21059
View this thread: http://www.excelforum.com/showthread.php?threadid=532719

 
 
Bryan





PostPosted: Sun Apr 16 08:51:44 CDT 2006 Top

Excel >> Text Manipulation from PDF
Received your .txt file to view the problem, details posted here to keep
the forum updated.

The .txt file had not 'column' saved but rather was various page and
other headings followed by all the column A data, then all column B
data, then all column D and finally column E - this repeated for each
page.

The data items were in sets of 34 lines, so, adjacent to the first real
column A name, in B C D and E was put the formula =Annn where nnn was
the row + 34 for each column, ie. B = row + 34, C = row + 68, D = row +
102, E = row + 136
These four cells were then formula copied for 33 lines, (+1 line for
checking, columns B C D & E on this extra line were then cleared),
the next page was then located, and the process repeated.

The sheet now consists of 34 lines of data and about 150 lines of chaff
for each .pdf page.
This is Sheet 1 of Book 9.

The sheet was copied, and Paste Special, Values to a new sheet.
This is Sheet 2 of Book 9.

The initial headers were arranged, then all non-required lines were
deleted (required lines have names in column A and related other data
in columns B to E).

Column G was numbered from 1 onwards for final sort

Sheet was sorted over column E, and all asterisks were selected and
deleted
Sheet was sorted over column D, and all asterisks were selected and
deleted.
This is Sheet 2(2) of Book 9

Sheet was then sorted over column G, back to original sequence, and
colun G deleted.
This is Sheet Final of Book 9

Workbook ws then saved and sent to OP.
---------

As I have 20-20 vision in hindsight, it would probably have been wiser
to set the first 4 formula, and drag those to the end of the
penultimate sheet, adjust for the last sheet only, ignoring the look of
what was created until all non-useful lines had been deleted.
It would certainly appear to be quicker.
---------

Sheet 3 of Book 9 has some of these notes.

As stated in the email, you need to check the data as re-arranged, it
is YOUR responsibility.

Good luck with this

------------------------------

Bryan Hessey Wrote:
> Hi,
>
> Received your .jpg, but of course I cannot copy / paste from that as
> you can with a .pdf.
>
> Two solutions:
>
> 1. In Adobe Acrobat, SAVEAS a .txt file and open in Excel via Data,
> Import External Data, Import, locate the file and import the data. Then
> walk through the text-to-columns Wizard for your format, select
> Delimited and pick a delimiter that works, select each Date column and
> check the date format etc.
>
> 2. Copy and Paste, if it all falls into column A, then select column A
> and do Data, Text to Columns, same settings.
>
> After this, if the date * means there is no date, then use Edit,
> Replace and replace all * with nothing.
>
> You should then have a workable file, if not, send a few lines in a
> worksheet.
>
> This looks straightforward, and should work easily for you.
>
> Hope this helps
>
> --


--
Bryan Hessey
------------------------------------------------------------------------
Bryan Hessey's Profile: http://www.excelforum.com/member.php?action=getinfo&userid=21059
View this thread: http://www.excelforum.com/showthread.php?threadid=532719