How to trim a string to the last "."  
Author Message
Nessy





PostPosted: Sat Nov 24 21:13:31 PST 2007 Top

Excel Misc >> How to trim a string to the last "."

I have 11,000 long text strings in Excel... I had used LEFT operation to
limit the strings to 500 charaters, now I need to trim the strings to the
last available "." so that I can delete those uncompleted sentence after the
last "."

I tried to use "Text to Column", but it does not give me good result...

Please HELP!!

Thank you!

Excel479  
 
 
Ron





PostPosted: Sat Nov 24 21:13:31 PST 2007 Top

Excel Misc >> How to trim a string to the last "." With the original text in A1

This formula (shown in segments for readability) shortens that string to 500
characters, then truncates it after the last period:
B1: =LEFT(LEFT(A1,500),FIND(CHAR(7),
SUBSTITUTE(LEFT(A1,500),".",CHAR(7),LEN(LEFT(A1,500))-
LEN(SUBSTITUTE(LEFT(A1,500),".",""))))+1)

Copy that formula down as far as you need.

Is that something you can work with?
--------------------------

Regards,

Ron
Microsoft MVP (Excel)
(XL2003, Win XP)



>I have 11,000 long text strings in Excel... I had used LEFT operation to
> limit the strings to 500 charaters, now I need to trim the strings to the
> last available "." so that I can delete those uncompleted sentence after
> the
> last "."
>
> I tried to use "Text to Column", but it does not give me good result...
>
> Please HELP!!
>
> Thank you!
>
>




 
 
Rick





PostPosted: Sun Nov 25 00:45:36 PST 2007 Top

Excel Misc >> How to trim a string to the last "." I realize that, using a maximum of 500 characters, you probably will never
run into the situation where there are no periods in the (initially)
truncated string of text; however, if you should ever want to modify this
formula for use in a situation where that possibility exists (that is, if
you ever need to change the 500 to something much, much less), then the
following function will return the string of text, truncated to 500
characters, if there is no periods within the first 500 characters; also, if
A1 is empty, the formula returns the empty string.

=LEFT(A1,FIND(CHAR(7),SUBSTITUTE(LEFT(A1,500)&".",".",CHAR(7),NOT(ISNUMBER(FIND(".",LEFT(A1,500))))+LEN(LEFT(A1,500))-LEN(SUBSTITUTE(LEFT(A1,500),".","")))))

Although the formula is similar to the one Ron posted, it is built slightly
differently to account for the functionality I outlined above; however, I
did use the same CHAR(7) substitution character that Ron used.

Rick




>I have 11,000 long text strings in Excel... I had used LEFT operation to
> limit the strings to 500 charaters, now I need to trim the strings to the
> last available "." so that I can delete those uncompleted sentence after
> the
> last "."
>
> I tried to use "Text to Column", but it does not give me good result...
>
> Please HELP!!
>
> Thank you!
>
>

 
 
Ron





PostPosted: Sun Nov 25 04:02:48 PST 2007 Top

Excel Misc >> How to trim a string to the last "." On Sun, 25 Nov 2007 00:13:31 -0500, "Ron Coderre"


>This formula (shown in segments for readability) shortens that string to 500
>characters, then truncates it after the last period:
>B1: =LEFT(LEFT(A1,500),FIND(CHAR(7),
>SUBSTITUTE(LEFT(A1,500),".",CHAR(7),LEN(LEFT(A1,500))-
>LEN(SUBSTITUTE(LEFT(A1,500),".",""))))+1)

Your formula also returns the character following the ".". Suggest you omit
the last "+1"


--ron
 
 
VCKW





PostPosted: Sun Nov 25 05:05:00 PST 2007 Top

Excel Misc >> How to trim a string to the last "." Thanks everyone!!! :) It works the way I want and this save me hundred of
hours!!
 
 
Ron





PostPosted: Sun Nov 25 05:50:50 PST 2007 Top

Excel Misc >> How to trim a string to the last "." Thanks, Ron. Good catch.

--------------------------

Best Regards,

Ron
Microsoft MVP (Excel)
(XL2003, Win XP)




> On Sun, 25 Nov 2007 00:13:31 -0500, "Ron Coderre"

>
>>This formula (shown in segments for readability) shortens that string to
>>500
>>characters, then truncates it after the last period:
>>B1: =LEFT(LEFT(A1,500),FIND(CHAR(7),
>>SUBSTITUTE(LEFT(A1,500),".",CHAR(7),LEN(LEFT(A1,500))-
>>LEN(SUBSTITUTE(LEFT(A1,500),".",""))))+1)
>
> Your formula also returns the character following the ".". Suggest you
> omit
> the last "+1"
>
>
> --ron


 
 
Bernd





PostPosted: Sun Nov 25 14:24:04 PST 2007 Top

Excel Misc >> How to trim a string to the last "." Hello,

You already got some solutions, but just for the fun of it:
=regexpreplace(A1,"^(.*\.)?.*$","$1")

This UDF you can get here:
http://www.sulprobil.com/html/regexp.html

Regards,
Bernd
 
 
Rick





PostPosted: Sun Nov 25 15:12:36 PST 2007 Top

Excel Misc >> How to trim a string to the last "." > You already got some solutions, but just for the fun of it:
> =regexpreplace(A1,"^(.*\.)?.*$","$1")
>
> This UDF you can get here:
> http://www.sulprobil.com/html/regexp.html

Well, if you are looking for a UDF type solution, this one-liner would do
what your code does, as it applies to the OP's request (assuming you added
the part to initially truncate the text at 500 characters before looking for
the last period)...

Function TruncateAtPeriod(ByVal Source As String, _
Optional TrimAt As Long = 30000) As String
TruncateAtPeriod = Left(Source, InStrRev(Left(Source, TrimAt), "."))
End Function

where the OP would call it like this...

=TruncateAtPeriod(A1,500)

Both of our code returns the empty string if, in the unlikely event, no
periods exist within the text (in my case, prior to the 500th character), so
additional code would be needed in order to return the whole string of text
(assuming that is what the OP would want to happen when no periods are found
within the text).

Rick

 
 
Bernd





PostPosted: Tue Nov 27 09:52:56 PST 2007 Top

Excel Misc >> How to trim a string to the last "." Hello Rick,

...
> additional code would be needed in order to return the whole string of text
> (assuming that is what the OP would want to happen when no periods are found
> within the text).
...

Less code in my example: just omit the "?" :-)

Regards,
Bernd
 
 
Rick





PostPosted: Tue Nov 27 11:45:19 PST 2007 Top

Excel Misc >> How to trim a string to the last "." >> additional code would be needed in order to return the whole string of
>> text
>> (assuming that is what the OP would want to happen when no periods are
>> found
>> within the text).
> ...
>
> Less code in my example: just omit the "?" :-)

Well, I wasn't talking about a *lot* of additional code. Still a
one-liner....

Function TruncateAtPeriod(ByVal Source As String, _
Optional TrimAt As Long = 30000) As String
TruncateAtPeriod = Left(Source, InStrRev(Left(Source, TrimAt), ".") - _
Len(Source) * (Not Source Like "*.*"))
End Function

Besides, you still have to include *more* code somewhere to perform the
initial truncation (to 500 characters).<g>

Rick

 
 
Ron





PostPosted: Tue Nov 27 13:00:41 PST 2007 Top

Excel Misc >> How to trim a string to the last "."

>Hello Rick,
>
>...
>> additional code would be needed in order to return the whole string of text
>> (assuming that is what the OP would want to happen when no periods are found
>> within the text).
>...
>
>Less code in my example: just omit the "?" :-)
>
>Regards,
>Bernd

Bernd,

A couple of points.

Your "code" is not really shorter if you include the VBA code required for the
UDF.

But, more importantly, I don't believe your code will work with a multiline
text string because of the limitations of the VBScript flavor of regex.

In particular, there is no provision, in VBScript to enable <dot matches
newline>.

OK, I just downloaded the UDF and tested it and, indeed, it does not work on
multiline text strings.

If you want to use regular expressions for this exercise, I would suggest the
following UDF with a regex constructed according to the maximum length of the
string (500 in this case, but there is an optional argument to change that:

=========================================
Option Explicit
Function reTruncateAtDot(str As String, _
Optional NumChars As Long = 500) As String
Dim re As Object, mc As Object
Dim sPat As String

sPat = "^[\s\S]{1," & NumChars & "}\."
Set re = CreateObject("vbscript.regexp")
re.Pattern = sPat
Set mc = re.Execute(str)
reTruncateAtDot = mc(0)
End Function
==============================

The OP could use either:

=reTruncateAtDot(A1) which would truncate at the last dot prior to the 501st
character, or use an optional NumChars to change that.

The regex with the 500 characters "hard-coded" would look like:

"^[\s\S]{1,500}\."

If you wanted to use your formula, and have it work with multiline text
strings, then try:

=regexpreplace(A1,"^([\s\S]*\.)?[\s\S]*$","$1")

But you'd still need the second step to trim to 500 characters.

Oh, and in my contribution, since the OP did not specify what he wants to
happen if there are no dots in the first 500 characters, a VALUE error will be
returned. That, obviously, could be changed depending on the OP's wishes.
--ron
 
 
Rick





PostPosted: Tue Nov 27 13:36:10 PST 2007 Top

Excel Misc >> How to trim a string to the last "." >>> additional code would be needed in order to return the whole string of
>>> text
>>> (assuming that is what the OP would want to happen when no periods are
>>> found
>>> within the text).
>>...
>>
>>Less code in my example: just omit the "?" :-)
>
> A couple of points.
>
> Your "code" is not really shorter if you include the VBA code required for
> the
> UDF.

That was going roughly going to be my answer when I first read Bernd's
response; but I think, given the trimmed portion of the previous responses
in his posting and leaving aside the 500 character maximum truncation
requested by the OP, he was simply just addressing my comment about needing
*additional* code for my function to be able to return the entire text
string whereas to do the same with his code only required removing the
question mark. Yes, there is the question of the needed UDF code to make his
formula line work as well as the 500 character truncation issue; but I'm
pretty sure Bernd's posting was meant to focus in on this much narrower
issue without addressing anything else.

Rick


> But, more importantly, I don't believe your code will work with a
> multiline
> text string because of the limitations of the VBScript flavor of regex.
>
> In particular, there is no provision, in VBScript to enable <dot matches
> newline>.
>
> OK, I just downloaded the UDF and tested it and, indeed, it does not work
> on
> multiline text strings.
>
> If you want to use regular expressions for this exercise, I would suggest
> the
> following UDF with a regex constructed according to the maximum length of
> the
> string (500 in this case, but there is an optional argument to change
> that:
>
> =========================================
> Option Explicit
> Function reTruncateAtDot(str As String, _
> Optional NumChars As Long = 500) As String
> Dim re As Object, mc As Object
> Dim sPat As String
>
> sPat = "^[\s\S]{1," & NumChars & "}\."
> Set re = CreateObject("vbscript.regexp")
> re.Pattern = sPat
> Set mc = re.Execute(str)
> reTruncateAtDot = mc(0)
> End Function
> ==============================
>
> The OP could use either:
>
> =reTruncateAtDot(A1) which would truncate at the last dot prior to the
> 501st
> character, or use an optional NumChars to change that.
>
> The regex with the 500 characters "hard-coded" would look like:
>
> "^[\s\S]{1,500}\."
>
> If you wanted to use your formula, and have it work with multiline text
> strings, then try:
>
> =regexpreplace(A1,"^([\s\S]*\.)?[\s\S]*$","$1")
>
> But you'd still need the second step to trim to 500 characters.
>
> Oh, and in my contribution, since the OP did not specify what he wants to
> happen if there are no dots in the first 500 characters, a VALUE error
> will be
> returned. That, obviously, could be changed depending on the OP's wishes.
> --ron

 
 
Ron





PostPosted: Tue Nov 27 14:14:43 PST 2007 Top

Excel Misc >> How to trim a string to the last "." On Tue, 27 Nov 2007 16:36:10 -0500, "Rick Rothstein \(MVP - VB\)"


>That was going roughly going to be my answer when I first read Bernd's
>response; but I think, given the trimmed portion of the previous responses
>in his posting and leaving aside the 500 character maximum truncation
>requested by the OP, he was simply just addressing my comment about needing
>*additional* code for my function to be able to return the entire text
>string whereas to do the same with his code only required removing the
>question mark. Yes, there is the question of the needed UDF code to make his
>formula line work as well as the 500 character truncation issue; but I'm
>pretty sure Bernd's posting was meant to focus in on this much narrower
>issue without addressing anything else.

Oh, I see.

Of course, I don't believe his regex will work for multiline text strings, for
the reason I pointed out in my response. And it seems likely to me that a 500
character text string in Excel would likely have at least one linefeed. So he
should substitute "[\s\S]" for the "." in his regex.


--ron
 
 
Bernd





PostPosted: Tue Nov 27 14:55:12 PST 2007 Top

Excel Misc >> How to trim a string to the last "." Hi Ron, hi Rick,

let us agree that regex are quite powerful. Since they are line-
orientated (or we are facing line restrictions) we can wait and see
whether my suggesion will work for the OP.

And now let us wait for the next two dozen questions regarding string
manipulation and let's count the overall sum of UDF rows plus calls to
them (it)...

Regards,
Bernd
 
 
Rick





PostPosted: Tue Nov 27 14:53:14 PST 2007 Top

Excel Misc >> How to trim a string to the last "." >>That was going roughly going to be my answer when I first read Bernd's
>>response; but I think, given the trimmed portion of the previous responses
>>in his posting and leaving aside the 500 character maximum truncation
>>requested by the OP, he was simply just addressing my comment about
>>needing
>>*additional* code for my function to be able to return the entire text
>>string whereas to do the same with his code only required removing the
>>question mark. Yes, there is the question of the needed UDF code to make
>>his
>>formula line work as well as the 500 character truncation issue; but I'm
>>pretty sure Bernd's posting was meant to focus in on this much narrower
>>issue without addressing anything else.
>
> Oh, I see.
>
> Of course, I don't believe his regex will work for multiline text strings,
> for
> the reason I pointed out in my response. And it seems likely to me that a
> 500
> character text string in Excel would likely have at least one linefeed.
> So he
> should substitute "[\s\S]" for the "." in his regex.

I'll take your word for the regex stuff... the last time I did anything with
regular expressions was back in the mid-1980s on a UNIX based
mini-computer... I'm afraid I remember next to nothing from back then
(except that it was quite easy to write complex regular expressions whose
operation was perfectly obvious during their construction but that one could
not understand how they worked some 10 minutes or so later on<g>).

Rick

 
 
Ron





PostPosted: Tue Nov 27 15:46:46 PST 2007 Top

Excel Misc >> How to trim a string to the last "." On Tue, 27 Nov 2007 17:53:14 -0500, "Rick Rothstein \(MVP - VB\)"


>(except that it was quite easy to write complex regular expressions whose
>operation was perfectly obvious during their construction but that one could
>not understand how they worked some 10 minutes or so later on<g>).

That is still the case!
--ron
 
 
Ron





PostPosted: Tue Nov 27 15:50:17 PST 2007 Top

Excel Misc >> How to trim a string to the last "."

>Hi Ron, hi Rick,
>
>let us agree that regex are quite powerful. Since they are line-
>orientated (or we are facing line restrictions) we can wait and see
>whether my suggesion will work for the OP.
>
>And now let us wait for the next two dozen questions regarding string
>manipulation and let's count the overall sum of UDF rows plus calls to
>them (it)...
>
>Regards,
>Bernd

Why do you write they are "line-oriented"?

At least in the few flavors with which I am familiar, multiline implementations
are common. The issue here is that with VBA, setting multiline to TRUE (which
is allowed in the UDF you recommended) only changes the behavior of "^" and
"$", and not the behavior of "." as is true for other flavors. Simple enough
to work around, though.
--ron
 
 
Ron





PostPosted: Tue Nov 27 20:30:31 PST 2007 Top

Excel Misc >> How to trim a string to the last "."



>
>>Hi Ron, hi Rick,
>>
>>let us agree that regex are quite powerful. Since they are line-
>>orientated (or we are facing line restrictions) we can wait and see
>>whether my suggesion will work for the OP.
>>
>>And now let us wait for the next two dozen questions regarding string
>>manipulation and let's count the overall sum of UDF rows plus calls to
>>them (it)...
>>
>>Regards,
>>Bernd
>
>Why do you write they are "line-oriented"?
>
>At least in the few flavors with which I am familiar, multiline implementations
>are common. The issue here is that with VBA, setting multiline to TRUE (which
>is allowed in the UDF you recommended) only changes the behavior of "^" and
>"$", and not the behavior of "." as is true for other flavors. Simple enough
>to work around, though.
>--ron

I really should have included the qualification that the original Unix editors,
and regex engines were, indeed, line oriented. However, when I first worked
with Unix (actually I think it was Xenix, if I recall correctly) back in the
early '80's, vi (a screen editor) had already been developed and was pretty
widely distributed.


--ron
 
 
Ron





PostPosted: Wed Nov 28 05:01:30 PST 2007 Top

Excel Misc >> How to trim a string to the last "."



>
>>Hi Ron, hi Rick,
>>
>>let us agree that regex are quite powerful. Since they are line-
>>orientated (or we are facing line restrictions) we can wait and see
>>whether my suggesion will work for the OP.
>>
>>And now let us wait for the next two dozen questions regarding string
>>manipulation and let's count the overall sum of UDF rows plus calls to
>>them (it)...
>>
>>Regards,
>>Bernd
>
>Why do you write they are "line-oriented"?
>
>At least in the few flavors with which I am familiar, multiline implementations
>are common. The issue here is that with VBA, setting multiline to TRUE (which
>is allowed in the UDF you recommended) only changes the behavior of "^" and
>"$", and not the behavior of "." as is true for other flavors. Simple enough
>to work around, though.
>--ron

Just for kicks, if the OP does not want to return an error if there are no dots
in the string, here is a regex that will

1. Return a maximum of 500 characters and, in order of priority
a. Return up to the last "." if there is one
b. If no ".", then split at the last <space>
c. If no "." or <space>, return 500 characters.

(note that it does not return the ending <space> if that is what it is
splitting on. It should work with multiline text strings in VBA:

"^([\s\S]{1,500}\.|[\s\S]{1,500}(?= )|[\s\S]{1,500})"


--ron
 
 
Rick





PostPosted: Wed Nov 28 08:07:53 PST 2007 Top

Excel Misc >> How to trim a string to the last "." > Just for kicks, if the OP does not want to return an error if there are no
> dots
> in the string, here is a regex that will
>
> 1. Return a maximum of 500 characters and, in order of priority
> a. Return up to the last "." if there is one
> b. If no ".", then split at the last <space>
> c. If no "." or <space>, return 500 characters.
>
> (note that it does not return the ending <space> if that is what it is
> splitting on. It should work with multiline text strings in VBA:
>
> "^([\s\S]{1,500}\.|[\s\S]{1,500}(?= )|[\s\S]{1,500})"

I like the "split at last space" condition 'challenge'. A little more
"wordy" than your regex solution, but still a one-liner (my own self-imposed
restriction) VBA function...

Function Truncate(ByVal Source As String, _
Optional TrimAt As Long = 30000) As String
Truncate = RTrim$(Left(Source, InStrRev(Left(Source, TrimAt), ".") - _
(InStrRev(Left(Source, TrimAt), " ")) * _
(Not Left(Source, TrimAt) Like "*.*") - _
(Len(Left(Source, TrimAt))) * _
(Not Left(Source, TrimAt) Like "*[. ]*")))
End Function

Note that I still left the optional TrimAt argument so the OP can set it to
500 (or any other value desired) in the calling formula.

Rick

 
 
Rick





PostPosted: Wed Nov 28 10:02:59 PST 2007 Top

Excel Misc >> How to trim a string to the last "." > I like the "split at last space" condition 'challenge'. A little more
> "wordy" than your regex solution, but still a one-liner (my own
> self-imposed restriction) VBA function...
>
> Function Truncate(ByVal Source As String, _
> Optional TrimAt As Long = 30000) As String
> Truncate = RTrim$(Left(Source, InStrRev(Left(Source, TrimAt), ".") - _
> (InStrRev(Left(Source, TrimAt), " ")) * _
> (Not Left(Source, TrimAt) Like "*.*") - _
> (Len(Left(Source, TrimAt))) * _
> (Not Left(Source, TrimAt) Like "*[. ]*")))
> End Function
>
> Note that I still left the optional TrimAt argument so the OP can set it
> to 500 (or any other value desired) in the calling formula.

And this is how I would write the function in "real life" (that is, without
forcing it to a one-liner)...

Function Truncate(ByVal Text As String, _
Optional TrimAt As Long = 30000) As String
Text = Left$(Text, TrimAt)
Truncate = Left$(Text, InStrRev(Text, "."))
If Len(Truncate) = 0 Then Truncate = RTrim(Left(Text, InStrRev(Text, "
")))
If Len(Truncate) = 0 Then Truncate = Text
End Function

Note: I changed some argument names to avoid using line continuations

Rick

 
 
Rick





PostPosted: Wed Nov 28 10:16:53 PST 2007 Top

Excel Misc >> How to trim a string to the last "." >> I like the "split at last space" condition 'challenge'. A little more
>> "wordy" than your regex solution, but still a one-liner (my own
>> self-imposed restriction) VBA function...
>>
>> Function Truncate(ByVal Source As String, _
>> Optional TrimAt As Long = 30000) As String
>> Truncate = RTrim$(Left(Source, InStrRev(Left(Source, TrimAt), ".") - _
>> (InStrRev(Left(Source, TrimAt), " ")) * _
>> (Not Left(Source, TrimAt) Like "*.*") - _
>> (Len(Left(Source, TrimAt))) * _
>> (Not Left(Source, TrimAt) Like "*[. ]*")))
>> End Function
>>
>> Note that I still left the optional TrimAt argument so the OP can set it
>> to 500 (or any other value desired) in the calling formula.
>
> And this is how I would write the function in "real life" (that is,
> without forcing it to a one-liner)...
>
> Function Truncate(ByVal Text As String, _
> Optional TrimAt As Long = 30000) As String
> Text = Left$(Text, TrimAt)
> Truncate = Left$(Text, InStrRev(Text, "."))
> If Len(Truncate) = 0 Then Truncate = RTrim(Left(Text, InStrRev(Text, "
> ")))
> If Len(Truncate) = 0 Then Truncate = Text
> End Function
>
> Note: I changed some argument names to avoid using line continuations

Damn! The line wrapped at an 'unfortunate' spot (a blank) anyway. Here is
the same function renamed to Chop instead of Truncate; it should have each
statement fit on a single line without wrapping...

Function Chop(ByVal Text As String, _
Optional TrimAt As Long = 30000) As String
Text = Left$(Text, TrimAt)
Chop = Left$(Text, InStrRev(Text, "."))
If Len(Chop) = 0 Then Chop = RTrim(Left(Text, InStrRev(Text, " ")))
If Len(Chop) = 0 Then Chop = Text
End Function

Rick