If you cut/paste from a Google Doc into the WordPress WYSIWYG editor you get more than I want. Mainly a bunch of inline CSS that sets the font weight (see below). This is a pain because itβs going to take priority in CSS land and undoing it by hand is a hassle.
This looks great Tom. This might resolve some of the junk code brought across from Blogger too? Will have to investigate.
8 responses on “π¬ Clean Google Doc Cut/Paste into WordPress Editor”
It could do that. Essentially, you just choose the kind of things code/styles/html you want to remove and it’ll do it. I did a poor job of explaining how it’d work but if you share some sample dirty code, I’ll make another post explaining how you’d do that.
Tom, here is an extract of text from one of my Blogger posts brought across to WordPress:
<div style="text-align: justify;"><span style="font-family: Trebuchet MS, sans-serif;">This was all brought to the fore with the recent passing of my mother. It is a strange experience being told that there is no more treatment that they can do, that the cancer is terminal. On the one hand, the doctor gives some indicative date, while others talk about how they were told that there was nothing that could be done for them and that was over ten years ago. Subsequently, every time that I saw my mum in the last few weeks of her life, I was never sure if it would be the final time. A part of you realises this, however I feel that there is also something inside that simply denies that it will never happen. This is something that I have written about elsewhere (<a href="http://readwriterespond.com/?p=82" target="_blank" rel="noopener nofollow">'Denial Never Worked for No-One'</a>).</span></div>
I have started adding additional microformats to some of my posts, would the code as you have it strip those? If so, would I simply need to add them to the code?
if it just removes styling in CSS spans it would be fine. Reading the functions I am not sure it would touch the microformat properties but if it sanitizes just down the html tags then it probably does. . What are you trying to do? (quickthoughts.jgregorymcverry.com/s/N69NN)
There are times Greg when I like to cut and paste from Docs. Really though I am just wondering at this point in time. One of many itches I need to scratch. P.S. Good to have your webmentions back π
It removes/leaves whatever you want. The way it’s set up at the moment removes any inline styles and HTML elements that don’t match the whitelist. I think it’d do pretty well against the HTML you posted. Note that it only works on cut/paste into the WYSIWYG editor (which I see as a feature because I could still write those elements in manually if I wanted).
The white list is below but you can add whatever you’d like to it.
var whitelist = ‘p,b,strong,i,em,h2,h3,h4,h5,h6,ul,li,ol,a,href’;
The explicit things to remove are here- id, class, style . . .
stripped.find(‘*’).removeAttr(‘id’).removeAttr(‘class’).removeAttr(‘style’);
It could do that. Essentially, you just choose the kind of things code/styles/html you want to remove and it’ll do it. I did a poor job of explaining how it’d work but if you share some sample dirty code, I’ll make another post explaining how you’d do that.
Tom, here is an extract of text from one of my Blogger posts brought across to WordPress:
I have started adding additional microformats to some of my posts, would the code as you have it strip those? If so, would I simply need to add them to the code?
Also on:
This Article was mentioned on brid-gy.appspot.com
if it just removes styling in CSS spans it would be fine. Reading the functions I am not sure it would touch the microformat properties but if it sanitizes just down the html tags then it probably does. . What are you trying to do? (quickthoughts.jgregorymcverry.com/s/N69NN)
There are times itches I need to scratch. P.S. Good to have your webmentions back π
when I like to cut and paste from Docs. Really though I am just wondering at this point in time. One of manyAlso on:
Yeah I usllay download the HTML and then just run it through an HTML cleaner. Just google it, ton of websites. That is what I just had to do for wmvsz.glitch.me/rules.html (quickthoughts.jgregorymcverry.com/s/dY13l)
This Article was mentioned on brid-gy.appspot.com
It removes/leaves whatever you want. The way it’s set up at the moment removes any inline styles and HTML elements that don’t match the whitelist. I think it’d do pretty well against the HTML you posted. Note that it only works on cut/paste into the WYSIWYG editor (which I see as a feature because I could still write those elements in manually if I wanted).
The white list is below but you can add whatever you’d like to it.
var whitelist = ‘p,b,strong,i,em,h2,h3,h4,h5,h6,ul,li,ol,a,href’;
The explicit things to remove are here- id, class, style . . .
stripped.find(‘*’).removeAttr(‘id’).removeAttr(‘class’).removeAttr(‘style’);