Fixing Wordpress Code Formatting

As one of my tasks to make a Wordpress site (and now this basic blog) I had to learn a little about Wordpress (2.71), just enough to get it skinned and get writing. I very quickly came across what I can only assume must be an extremely common problem that expert users face when writing their posts / pages in this excellent platform, text formatting. At first I thought I was doing something wrong. When I swapped between the Visual and HTML views I seemed to lose any <p> tags (and likely countless others) and when I actually viewed the site's source I had gained <br /> tags... I did what most people do and hit Google for quick solutions, but I couldn't find anything acceptable.

The comments that I found were less than perfect. Some people advised not to swap between views, some recommended hacking at the core of the PHP code (I am capable of this, but I only wanted to blog!) and some advised of plugins to remove the WYSIWYG style editor (TinyMCE). None of these solutions really fitted with purpose, and it soon became apparent that I could trawl through fora for days without finding a fix. So I decided to investigate.

My first port of call was the database to see what was being saved vs. what was being output. The database code looked the same as that in the editor, so there was something messing with that code upon output. A few minutes in the Wordpress 'wp-includes' directory and I came across the 'default-filters.php' file and on lines 105 - 109 (for me) I could see various functions being called on 'the_content'. I did what every developer does when they quickly want to test something, I commented them out. Bang, output issue solved. A small amount of investigation (common sense / trail and error) revealed that it was the 'wpautop' function that was messing things up. Out it goes. For those of you that are none technical open up /wp-includes/default-filters.php and change line 108 (ish) to read //add_filter('the_content', 'wpautop'); and this will plague you no more.

With this completed so quickly I moved on to taming the WYSIWYG editor. I started by viewing the source of the edit page screen. Wow, there is a load of JS junk included in Wordpress - I had no idea it was so bloated. That aside I quickly found the line that said:

<script type="text/javascript" src="/wp-admin/js/editor.js?ver=20081129"></script>

I fired it up and was met with a simliar function to the PHP 'wpautop' from earlier (or in fact two forms of it). They are called 'pre_wpautop' and 'wpautop'. Again, These met with my wrath and their references were commented out. Immediately the code woes were removed, but in place I was given a new challenge - all formatting in the HTML view was lost, the 'pre_wpautop' was adding this in. So, I once again included the 'pre_wpautop' function, but this time removed it's content and started to create my own. I didn't really want to spend too much time on this, so just slapped some basic rules in place:

// Add two new lines after the closing tag for every block level element
var blocklist1 = 'blockquote|ul|ol|li|table|thead|tbody|tr|th|td|div|h[1-6]|p';
content = content.replace(new RegExp('\\s*</('+blocklist1+')>\\s*', 'mg'), '</$1>\n\n');
// Add one new line after the opening and closing tags for these elements
var blocklist2 = 'blockquote|ul|table|thead|tbody|tr|th|div|br';
content = content.replace(new RegExp('\\s*<(/?('+blocklist2+')[^>]*)>\\s*', 'mg'), '\n<$1>\n');
// Clean up masses of new lines - there is never a need for more than three
content = content.replace(new RegExp('\n\n{4,30}', 'mg'), '\n\n\n');

This doesn't provide perfectly indented HTML code or anything like that, but it does make it readable. You can add as many rules to these as you like (in fact post them to me and I'll improve this list).

Wordpress version 2.8/2.9 that I have now moved to has decided to pack the editor, which is fine, but I have tested this fix on that too - and it works.