Page 1 of 1

Mysterious Re-Formatting when Pasting in Word

Posted: Thu Aug 07, 2014 3:35 am UTC
by Jorpho
The office recently switched from WordPerfect X to Word 2013. (Evidently it was time for it to go.) I've never been quite clear on exactly how Word mucks about with "styles", and the problem rears its ugly head again lately. Frequently, when pasting data from an old WordPerfect document into Word, paragraphs will re-format themselves in strange ways – sometimes, left-justification will become full-justification (even though the WordPerfect documents are left-justified), or the header will change completely.

I can of course get around the problem by using Paste Special to insert the text as unformatted text, or by changing Word's options to preserve the document's existing format when pasting, but often the text I'm pasting will have italicized words spread around, and I don't want to lose that formatting.

Is there some way of telling exactly what formatting data Word is going to pull out of pasted text, and somehow repress it?

Re: Mysterious Re-Formatting when Pasting in Word

Posted: Wed Aug 20, 2014 3:28 pm UTC
by KnightExemplar
Jorpho wrote:The office recently switched from WordPerfect X to Word 2013. (Evidently it was time for it to go.) I've never been quite clear on exactly how Word mucks about with "styles", and the problem rears its ugly head again lately. Frequently, when pasting data from an old WordPerfect document into Word, paragraphs will re-format themselves in strange ways – sometimes, left-justification will become full-justification (even though the WordPerfect documents are left-justified), or the header will change completely.

I can of course get around the problem by using Paste Special to insert the text as unformatted text, or by changing Word's options to preserve the document's existing format when pasting, but often the text I'm pasting will have italicized words spread around, and I don't want to lose that formatting.

Is there some way of telling exactly what formatting data Word is going to pull out of pasted text, and somehow repress it?


Have you tried turning the WordPerfect document into doc or docx, and then performing the copy/paste into Word? This technique of mine tends to work with OpenOffice...

Re: Mysterious Re-Formatting when Pasting in Word

Posted: Wed Aug 20, 2014 5:01 pm UTC
by Jorpho
Well, technically the conversion to doc/docx would happen automatically when the document is opened in Word. (I'm not opening the WordPerfect documents in WordPerfect, you see.)

Re: Mysterious Re-Formatting when Pasting in Word

Posted: Sat Aug 23, 2014 2:55 am UTC
by Jorpho
It occurs to me that when I copy and paste data from one Word document to another, the data is actually being copied in HTML form. (That's the default in the Paste Special dialog, anyway.) That doesn't help too much, though – try saving a Word document as HTML, and you get an unspeakably horrible-looking tag salad.

But that does suggest one solution to the problem: is there a tiny app out there that will look at HTML data stored on the clipboard and strip away everything but a small subset of tags?

Re: Mysterious Re-Formatting when Pasting in Word

Posted: Wed Dec 03, 2014 6:05 am UTC
by Jorpho
I found myself grappling with this problem again, and this time stumbled across http://shaunakelly.com/word/styles/styl ... tting.html . The solution is simple: paragraph formatting is not copied as long as you don't copy the line break at the end of the paragraph.

This means, of course, that if I want to copy and paste more than one paragraph, I have to join them up in one document before pasting everything into the new document, and then re-insert the line break, but that's the most convenient solution I've seen so far.

I also found http://word2cleanhtml.com/ , which is almost what I'm looking for. Is there any way I can copy HTML code to the clipboard (like <b>whatever</b>) and then have it pasted as formatted text (like whatever) in Word? The only way I can see is to save the code in Notepad as a .html file, open the .html file in a browser, and copy-paste the text, which is a little bit too convoluted and might introduce more arbitrary HTML elements anyway.

Re: Mysterious Re-Formatting when Pasting in Word

Posted: Fri Dec 12, 2014 9:04 am UTC
by WanderingLinguist
I haven't used MS Word since before they introduced the ribbon, but this might still work. The search and replace function used to have a formatting search option. So I wonder if you use a search and replace-all based on the formatting you want to preserve (italics and bold) to insert some text markers wherever that format exists. Then copy & paste without formatting and use regular search and replace to reapply the formatting you want to preserve. I don't have a copy of MS Word installed to look at, and my memory is from a really long time ago, so no guarantees. Another option might be to save it as HTML and use a text editor with good RE functionality to strip out all the tags except <P>, <B></B> and <I></I>. If I had a copy of MS Word installed, I'd try it out and see what works, but unfortunately (fortunately?) I don't use it any more and haven't for a long time.

Re: Mysterious Re-Formatting when Pasting in Word

Posted: Sat Dec 13, 2014 3:18 pm UTC
by Jorpho
WanderingLinguist wrote:So I wonder if you use a search and replace-all based on the formatting you want to preserve (italics and bold) to insert some text markers wherever that format exists. Then copy & paste without formatting and use regular search and replace to reapply the formatting you want to preserve.
I'm pretty sure it doesn't work that way.

Another option might be to save it as HTML and use a text editor with good RE functionality to strip out all the tags except <P>, <B></B> and <I></I>.
That is precisely the purpose of http://word2cleanhtml.com/ mentioned in my previous post. The problem that remains is what to do with the stripped HTML afterwards.

Re: Mysterious Re-Formatting when Pasting in Word

Posted: Sat Dec 17, 2016 4:07 am UTC
by Jorpho
Oh my gourd.

Today I learned that you can take absolutely plain unformatted text, paste it into a left-justified paragraph, and – if the pasted text contains a line break – the paragraph will switch to full justification, for no reason at all. The "new" paragraph following the line break will stay left-justified.

What is this nonsense!? Why should pasting unformatted text result in a change of formatting!? Is this a bug? Is there any rationality behind this whatsoever?

Re: Mysterious Re-Formatting when Pasting in Word

Posted: Sat Dec 17, 2016 4:31 am UTC
by ucim
Jorpho wrote:Oh my gourd.

Today I learned that you can take absolutely plain unformatted text, paste it into a left-justified paragraph, and – if the pasted text contains a line break – the paragraph will switch to full justification, for no reason at all. The "new" paragraph following the line break will stay left-justified.

What is this nonsense!? Why should pasting unformatted text result in a change of formatting!? Is this a bug? Is there any rationality behind this whatsoever?


I learned somewhere that the formatting instructions for a paragraph are contained inside the paragraph mark. The line break would therefore be a new paragraph mark that contains "no" formatting, for the text that is now the previous paragraph, but the text following that line break would belong to the paragraph that had the original paragraph mark that had the original formatting.

I have not verified this technically, but it explains this kind of behavior (which had puzzled me too - I wish this were documented in an easy-to-discover place)

Jose

Re: Mysterious Re-Formatting when Pasting in Word

Posted: Sat Dec 17, 2016 7:13 am UTC
by Soupspoon
I don't have Word available to check the specifics, but could it also have something to do with the thematic autoformatting changes. "Normal" will switch to bulleted if you do something like "- Foo" at the start of a line or a numbered list if "1) Foo", double-enter may set up for the next line being a Header of some kind, etc, according to some kind of configurable but initially developer-decided 'helpful' scheme. If the pasting funtion passes through the plaintext as if typed, it might easily succumb to such strangeness.

Although I'm not entirely sure this would happen. Often being unable to deactivate enough of the preconfigured autocorrection behaviour to get a single "i", lowercase, into a spreadsheet cell when typed, rather than "I" in uppercase, I either use "=CHAR(105)" with an option of using Copy then Paste Special to make it literal, or type the "i" in a spare notepad window (or even the Run dialogue), and copypasta in, unaltered. So it should behave differently in your case.

Re: Mysterious Re-Formatting when Pasting in Word

Posted: Sun Dec 18, 2016 3:08 am UTC
by Jorpho
ucim wrote:I learned somewhere that the formatting instructions for a paragraph are contained inside the paragraph mark. The line break would therefore be a new paragraph mark that contains "no" formatting, for the text that is now the previous paragraph, but the text following that line break would belong to the paragraph that had the original paragraph mark that had the original formatting.

I have not verified this technically, but it explains this kind of behavior (which had puzzled me too - I wish this were documented in an easy-to-discover place)
Okay, that kind of makes sense. Thank you for that.

So when a paragraph mark with "no" formatting is added, where is the "new" formatting coming from? Something in the document default style, that somehow does not match the paragraph default style, I suppose? Is there not some way to bring these styles back into coherence?

"Normal" will switch to bulleted if you do something like "- Foo" at the start of a line or a numbered list
In the more recent versions of Word, a little lightning bolt will appear to signify that "autoformatting" has been applied when that happens, and it can easily be undone with Ctrl-Z.

Re: Mysterious Re-Formatting when Pasting in Word

Posted: Sun Dec 18, 2016 3:39 am UTC
by ucim
Jorpho wrote:So when a paragraph mark with "no" formatting is added, where is the "new" formatting coming from? Something in the document default style, that somehow does not match the paragraph default style, I suppose? Is there not some way to bring these styles back into coherence?
I would assume it is the default style for all documents, overridden by the default style for this document, overridden by any larger-scope styles that had been defined. However, that's just speculation on my part.

(...and btw, your unattributed (following) quote is from Soupspoon, not me).

Jose