Discussion:
Deleting text in bookmarks very slow...
(too old to reply)
Alex
2005-04-29 00:27:09 UTC
Permalink
Hello,

I've run into a problem in my C# add-in for Word.

One of the operations that the add-in has to do is going over all bookmarks in a document and removing the bookmarks and its content if certain conditions are true.
Originally, I tried the following:

bookmark.Range.Text = "";

Unfortunately, it did not work correctly in all cases (for example, if the bookmark spanned table cells).
So I used the following:

Microsoft.Office.Interop.Word.Range range = bookmark.Range;
bookmark.Delete();
range.Delete(ref missing, ref missing);

when "missing" was initialized thus:

object missing = System.Reflection.Type.Missing;

That works correctly but, according to the profiler, is 4-5 times slower!

Since this operation is performed in a tight loop, on a document with a lot of bookmarks that need to be deleted, this slowdown is VERY noticeable.

The breakdown of the timing between the 3 lines is roughly:

Microsoft.Office.Interop.Word.Range range = bookmark.Range; // 3.6%
bookmark.Delete(); // 13.8%
range.Delete(ref missing, ref missing); // 82.6%

So the culprit is the range.Delete() operation.

I really need to speed this up!

Any help is appreciated.

Thanks,
Alex.
--
Address email to user "response" at domain "alexoren" with suffix "com"
Cindy M -WordMVP-
2005-04-30 13:54:43 UTC
Permalink
Hi Alex,
Post by Alex
Since this operation is performed in a tight loop, on a document with a lot
of bookmarks that need to be deleted, this slowdown is VERY noticeable.
Post by Alex
Microsoft.Office.Interop.Word.Range range = bookmark.Range; // 3.6%
bookmark.Delete(); // 13.8%
range.Delete(ref missing, ref missing); // 82.6%
So the culprit is the range.Delete() operation.
Hmmm. When you delete a range, if you (or Word) are tracking any other range
objects, all these object references need to be "updated" so that you don't
lose the text to which they're pointing. And for each bookmark, Word probably
has to start calculating again from the beginning of the document to figure out
exactly which characters are the range.

The only suggestion I can make is, try looping another way. Sometimes, when
Word gives us problems, it makes sense to approach it "upside down". Rather
than For...Each (in my pseudo C# syntax):
int nrBkm = doc.Bookmarks.Count
For (int counter = nrBkm, counter==0, -1)
{ doc.Bookmarks(counter).Range.Delete
doc.Bookmarks(counte).Delete }

Note that you may first need to check for the existance of this bookmark
(bookmarks.exists) since deleting the range may well delete the bookmark.

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 8 2004)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or reply
in the newsgroup and not by e-mail :-)
Alex
2005-05-02 16:15:38 UTC
Permalink
Hello Cindy,
Post by Cindy M -WordMVP-
Post by Alex
Since this operation is performed in a tight loop, on a document with a lot
of bookmarks that need to be deleted, this slowdown is VERY noticeable.
Microsoft.Office.Interop.Word.Range range = bookmark.Range; // 3.6%
bookmark.Delete(); // 13.8%
range.Delete(ref missing, ref missing); // 82.6%
So the culprit is the range.Delete() operation.
Hmmm. When you delete a range, if you (or Word) are tracking any other range
objects, all these object references need to be "updated" so that you don't
lose the text to which they're pointing. And for each bookmark, Word probably
has to start calculating again from the beginning of the document to figure out
exactly which characters are the range.
That sounds reasonable.
However, as I wrote in the original post, I was previously deleting the boomnarks and content using
bookmark.Range.Text = "";

That should also have caused Word to update the other references but was 4-5 times faster.
(Unfortunately, I could not use it because it did not play nicely with tables.)
Post by Cindy M -WordMVP-
The only suggestion I can make is, try looping another way. Sometimes, when
Word gives us problems, it makes sense to approach it "upside down". Rather
That is an interesting suggestion. I shall try it.
By the way, feel free to post your suggestion in the syntax you're most comfortable with.
Unless there will be significant functional differences between VB and C#, I'll figure it out :-)
Post by Cindy M -WordMVP-
int nrBkm = doc.Bookmarks.Count
For (int counter = nrBkm, counter==0, -1)
{ doc.Bookmarks(counter).Range.Delete
doc.Bookmarks(counte).Delete }
Note that you may first need to check for the existance of this bookmark
(bookmarks.exists) since deleting the range may well delete the bookmark.
That is a problem that I struggled with earlier.
According to my tests, deleting the range *sometimes* (not always) deletes the bookmark.
If it did, any access to the bookmark will throw an exception.
This is problematic because it seems to me that there is a significant overhead in handling .NET exceptions.

My solution was to save the range in a variable, delete the bookmark first and then delete the saved range.

Best wishes,
Alex.
--
Address email to user "response" at domain "alexoren" with suffix "com"
Alex
2005-05-03 04:18:21 UTC
Permalink
Followup:

My laest uptimization follows:

using Word = Microsoft.Office.Interop.Word;

Word.Document doc = myWordApp.ActiveDocument;
Word.Range[] ranges = new Word.Range[doc.Bookmarks.Count];
int numRanges = 0;
foreach (Word.Bookmark bm in doc.Bookmarks)
{
if (/* bm satisfies condition */)
{
ranges[numRanges++] = bm.Range;
bm.Delete(); // [2]
}
}

Array.Sort(ranges, 0, numRanges, new RangeComparer());
for (int i = 0; i < numRanges; ++i)
{
Word.Range range = ranges[i];
if (range.Start != range.End)
range.Delete(ref missing, ref missing); // [1]
Marshal.ReleaseComObject(range);
}

When:

private class RangeComparer: IComparer
{
public int Compare(object x, object y)
{
Word.Range left = (Word.Range) x;
Word.Range right = (Word.Range) y;
int left_end = left.End;
int right_end = right.End;
return left_end == right_end ? left.Start - right.Start : right_end - left_end;
}
}

The sorting improved the timing slightly.

Anyway, this is the best I could come with.

[1] This is still the biggest time consumer.

[2] This line also consumes a lot of time, which is strange since it only removes the bookmarks,
not the content. Weird...


Best wishes,
Alex.
--
Address email to user "response" at domain "alexoren" with suffix "com"
Cindy M -WordMVP-
2005-05-04 18:26:54 UTC
Permalink
Hi Alex,
Post by Alex
[1] This is still the biggest time consumer.
Yes, but since you can check whether or not a bookmark exists, I'd still try
looping through the bookmark ranges, deleting them as you go, then check if the
bookmark still exists and delete it, if that's the case. My gut feeling is that
this would be fastest.
Post by Alex
[2] This line also consumes a lot of time, which is strange since it only
removes the bookmarks,
Post by Alex
not the content. Weird...
As I said, try looping through backwards in a For...Next (no Each).

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 8 2004)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or reply
in the newsgroup and not by e-mail :-)
Alex
2005-05-06 18:39:43 UTC
Permalink
Hi Cindy,
Post by Cindy M -WordMVP-
Hi Alex,
Post by Alex
[2] This line also consumes a lot of time, which is strange since it only
removes the bookmarks,
Post by Alex
not the content. Weird...
As I said, try looping through backwards in a For...Next (no Each).
Cindy, I am not sure what it will buy me.

Please work with me on this one:

Your original reply:
http://groups.google.ca/group/microsoft.public.office.developer.com.add_ins/msg/5ad3824b78c75f56

If I understand you correctly, you said that removing a bookmark *content* is a slow operation because the following bookmarks and ranges will have to be updated. Correct?

So, in order to minimize the updating, the order of removing the bookmarked elements should be from the end of the document to the beginning.
E.g., if I have added 4 bookmarks (bm1, bm2, bm3 and bm4) in the following positions: ---[bm3]---[bm1]---[bm4]---[bm2]---
I should remove bm2 first, then bm4, bm1 and finally bm3.

However, the ActiveDocument.Bookmarks collection holds the document in the order they were *added*, not their position in the document.
I tested it with the following macro:
For i = ActiveDocument.Bookmarks.Count To 1 Step -1
MsgBox (ActiveDocument.Bookmarks(i))
Next

and the order returned was: bm4, bm3, bm2 and bm1.

Since the order of iterating through the bookmarks is dependent on the order of their creation and unrelated to their locations, I cannot see how reversing the iteration will speed it up.

What I did instead, is collecting all the bookmarks into an array, then sorting the array by the bookmark positions in the document.
That *did* speed it up somewhat.

I was also puzzled as to why the bookmark.Delete() operation would be slow, as it only deletes the actual bookmark, leaving the content intact, so it should not affect the layout of the document (and positions of any other ranges) at all.
Helmut Weber
2005-05-07 11:36:11 UTC
Permalink
Hi Alex,

can't check it right now, but I think to remember, that
Post by Alex
However, the ActiveDocument.Bookmarks collection holds the document in the order they were *added*, not their position in the document.
For i = ActiveDocument.Bookmarks.Count To 1 Step -1
MsgBox (ActiveDocument.Bookmarks(i))
Next
this is not the case if you use activedocument.range.bookmarks

Greetings from Bavaria, Germany

Helmut Weber, MVP
"red.sys" & chr(64) & "t-online.de"
Word XP, Win 98
http://word.mvps.org/
Alex
2005-05-10 18:10:34 UTC
Permalink
Hello Helmut,
Post by Helmut Weber
can't check it right now, but I think to remember, that
Post by Alex
However, the ActiveDocument.Bookmarks collection holds the document in the order they were *added*, not their position in the document.
For i = ActiveDocument.Bookmarks.Count To 1 Step -1
MsgBox (ActiveDocument.Bookmarks(i))
Next
this is not the case if you use activedocument.range.bookmarks
Thanks,
I will try that.

Best wishes,
Alex.
--
Address email to user "response" at domain "alexoren" with suffix "com"
Alex
2005-05-10 21:45:49 UTC
Permalink
"Alex" <***@online.nospam> wrote in message news:***@TK2MSFTNGP09.phx.gbl...
Hello Helmut,
Post by Helmut Weber
can't check it right now, but I think to remember, that
Post by Alex
However, the ActiveDocument.Bookmarks collection holds the document in the order they were *added*, not their position in the document.
My mistake, I got them in alphabetical order. Still, no good.
Post by Helmut Weber
Post by Alex
For i = ActiveDocument.Bookmarks.Count To 1 Step -1
MsgBox (ActiveDocument.Bookmarks(i))
Next
this is not the case if you use activedocument.range.bookmarks
Hmmm...
activedocument.range.bookmarks does sort them by location so using it instead of my custom sorting achieves similar results (within 2%) while keeping the code cleaner.

I still use foreach to put them into an array but instead of sorting the array, I just iterate backwards.
Directly accessing each one by index from managed C# code is murder performance-wise.

Best wishes,
Alex.
--
Address email to user "response" at domain "alexoren" with suffix "com"
Alex
2005-06-02 17:57:34 UTC
Permalink
Post by Alex
I was also puzzled as to why the bookmark.Delete() operation would be slow,
as it only deletes the actual bookmark, leaving the content intact, so it
should not affect the layout of the document (and positions of any other
ranges) at all.
Found the answer:
http://groups.google.ca/group/microsoft.public.mac.office.word/msg/acf4ae50fd60410a
: in the case of a physical object such as a bookmark or a hyperlink,
: if you delete it, Word instantly renumbers them all.
Cindy M -WordMVP-
2005-06-05 10:28:04 UTC
Permalink
Hi Alex,
Post by Alex
Post by Alex
I was also puzzled as to why the bookmark.Delete() operation would be slow,
as it only deletes the actual bookmark, leaving the content intact, so it
should not affect the layout of the document (and positions of any other
ranges) at all.
http://groups.google.ca/group/microsoft.public.mac.office.word/msg/acf4ae50fd60410a
: in the case of a physical object such as a bookmark or a hyperlink,
: if you delete it, Word instantly renumbers them all.
Ah, interesting. Thanks for posting this :-)

OTOH, as far as I'm aware, Word doesn't actually NUMBER either of these, any more
than any other object. However, in the case of bookmarks, I'm sure Word has to rebult
two sets of indexes, one of which is the alphabetized by name one. That would
certainly take some time.

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 8 2004)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or reply in
the newsgroup and not by e-mail :-)
Alex
2005-06-06 15:57:35 UTC
Permalink
Hello Cindy,
Post by Cindy M -WordMVP-
Hi Alex,
Post by Alex
Post by Alex
I was also puzzled as to why the bookmark.Delete() operation would be slow,
as it only deletes the actual bookmark, leaving the content intact, so it
should not affect the layout of the document (and positions of any other
ranges) at all.
http://groups.google.ca/group/microsoft.public.mac.office.word/msg/acf4ae50fd60410a
: in the case of a physical object such as a bookmark or a hyperlink,
: if you delete it, Word instantly renumbers them all.
Ah, interesting. Thanks for posting this :-)
Doing what I can to share the knowledge I stumble upon.

As you probably remember, I was a card carrying member of the "huh?" club myself not so long ago.
It is a little frustrating when one encounters a newsgroup full of problems an a disproportionaly small number of solutions
Post by Cindy M -WordMVP-
OTOH, as far as I'm aware, Word doesn't actually NUMBER either of these, any more
than any other object. However, in the case of bookmarks, I'm sure Word has to rebult
two sets of indexes, one of which is the alphabetized by name one. That would
certainly take some time.
I guess he used "renumber" in the figurative sense.

However, that brings me back to the original issue:
Is there a way to temporarily suppress this processing in order to speed up the deletion of a large number of bookmarks?
Cindy M -WordMVP-
2005-06-08 12:15:29 UTC
Permalink
Hi Alex,
Post by Alex
Is there a way to temporarily suppress this processing in order to speed up
the deletion of a large number of bookmarks?
No change to my answer, either :-) Work from the back, forwards, and don't use
format of the bookmarks collection that relies on the alphabetical list. That's
all you can do...

It's essential for Word to maintain the pointers on this collection
immediately, and cleanly, as bookmarks underlie a lot of Word's features (TOCs,
cross-references, links between files and OLE services, just to mention a few).

Can you remind me what you're using the bookmarks for? Maybe what we need to
look for is an alternative to using bookmarks...

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 8 2004)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or reply
in the newsgroup and not by e-mail :-)
Alex
2005-06-11 04:02:27 UTC
Permalink
Hi Cindy,
Post by Cindy M -WordMVP-
It's essential for Word to maintain the pointers on this collection
immediately, and cleanly, as bookmarks underlie a lot of Word's features (TOCs,
cross-references, links between files and OLE services, just to mention a few).
Too bad I could not find a "delete all" operation.
Post by Cindy M -WordMVP-
Can you remind me what you're using the bookmarks for? Maybe what we need to
look for is an alternative to using bookmarks...
Adding persistent metadata to selected pieces of text in the document.
Cindy M -WordMVP-
2005-07-13 10:11:28 UTC
Permalink
Hi Alex,
Post by Alex
Post by Cindy M -WordMVP-
Can you remind me what you're using the bookmarks for? Maybe what we need to
look for is an alternative to using bookmarks...
Adding persistent metadata to selected pieces of text in the document.
It sorta depends on how volatile things could be (how great the danger is, that
someone will delete it), but have you ever considered using a SET field? It might
be faster to delete a SET field, than a bookmark. (Might not be, since a SET
field defines a bookmark, but you never know.) Or possibly an XE (Index marker)
field.

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 8 2004)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or reply
in the newsgroup and not by e-mail :-)
Alex
2005-07-13 21:13:02 UTC
Permalink
Hello Cindy,
Post by Cindy M -WordMVP-
It sorta depends on how volatile things could be (how great the danger is, that
someone will delete it), but have you ever considered using a SET field? It might
be faster to delete a SET field, than a bookmark. (Might not be, since a SET
field defines a bookmark, but you never know.) Or possibly an XE (Index marker)
field.
Thank you for the suggestion but they do not allow the flexibility of bookmarks.

Does not really matter that much...

Best wishes,
Alex.
--
Address email to user "response" at domain "alexoren" with suffix "com"
Cindy M -WordMVP-
2005-05-04 18:26:53 UTC
Permalink
Hi Alex,
Post by Alex
However, as I wrote in the original post, I was previously deleting the boomnarks and
content using
Post by Alex
bookmark.Range.Text = "";
That should also have caused Word to update the other references but was 4-5 times faster.
(Unfortunately, I could not use it because it did not play nicely with tables.)
Yes... But you aren't "fiddling" with a collection of Ranges in this case.
Post by Alex
That is a problem that I struggled with earlier.
According to my tests, deleting the range *sometimes* (not always) deletes the bookmark.
If it did, any access to the bookmark will throw an exception.
But as I said, you can check directly whether a bookmark exists, there's actually a
property for it that returns true/false:
doc.Bookmarks.Exists(sBookmarkName)

So no need to go with the overhead of an exception, or looping through the collection to
determine if it's there.

Cindy Meister
INTER-Solutions, Switzerland
http://homepage.swissonline.ch/cindymeister (last update Jun 8 2004)
http://www.word.mvps.org

This reply is posted in the Newsgroup; please post any follow question or reply in the
newsgroup and not by e-mail :-)
Alex
2005-05-06 16:42:39 UTC
Permalink
Hi Cindy,
Post by Cindy M -WordMVP-
Hi Alex,
Post by Alex
However, as I wrote in the original post, I was previously deleting the boomnarks and
content using
Post by Alex
bookmark.Range.Text = "";
That should also have caused Word to update the other references but was 4-5 times faster.
(Unfortunately, I could not use it because it did not play nicely with tables.)
Yes... But you aren't "fiddling" with a collection of Ranges in this case.
Sorry?
Why does bookmark.Range.Delete "fiddle with a collection of Ranges" and bookmark.Range.Text = "" will not?
Post by Cindy M -WordMVP-
But as I said, you can check directly whether a bookmark exists, there's actually a
doc.Bookmarks.Exists(sBookmarkName)
I haven't timed it but I doubt that it's free.


Best wishes,
Alex.
--
Address email to user "response" at domain "alexoren" with suffix "com"
Loading...