XML Sitemap Plugin

<?php
/*
Plugin Name: XML Sitemap Plugin
Plugin URI:        https://sarathy.in/xml-sitemap-plugin/
Description: A simple XML Sitemap plugin for all posts, with meta tag, canonical URL control, and robots.txt management.
Version: 1.4


Author: Sarathy
Author URI:  https://sarathy.in/
*/

// Exit if accessed directly
if (!defined('ABSPATH')) {
    exit;
}

// Hook the sitemap function to init action
add_action('init', 'generate_xml_sitemap');

function generate_xml_sitemap() {
    $domain_name = parse_url(home_url(), PHP_URL_HOST);
    $sitemap_name = str_replace('.', '-', $domain_name) . '-sitemap.xml';

    // Check if the request is for the sitemap file
    if (isset($_GET['sitemap']) && sanitize_text_field($_GET['sitemap']) == 'xml') {
        // Set the content type to XML
        header('Content-Type: application/xml; charset=utf-8');

        // Start the XML output
        echo '<?xml version="1.0" encoding="UTF-8"?>';
        echo '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">';

        // Get all posts
        $args = array(
            'post_type' => 'post',
            'posts_per_page' => -1,
            'post_status' => 'publish',
        );
        $posts = new WP_Query($args);

        // Loop through the posts
        if ($posts->have_posts()) {
            while ($posts->have_posts()) {
                $posts->the_post();
                $post_url = get_permalink();
                $post_date = get_the_date('c');

                echo '<url>';
                echo '<loc>' . esc_url($post_url) . '</loc>';
                echo '<lastmod>' . esc_html($post_date) . '</lastmod>';
                echo '<changefreq>monthly</changefreq>';
                echo '<priority>0.8</priority>';
                echo '</url>';
            }
            wp_reset_postdata();
        }

        // End the XML output
        echo '</urlset>';

        // Update the last generated date
        update_option('xml_sitemap_last_generated', current_time('mysql'));

        // Exit to avoid the default WordPress behavior
        exit;
    }
}

// Add a rewrite rule to handle the sitemap request
function add_sitemap_rewrite_rule() {
    $domain_name = parse_url(home_url(), PHP_URL_HOST);
    $sitemap_name = str_replace('.', '-', $domain_name) . '-sitemap.xml';
    add_rewrite_rule($sitemap_name . '$', 'index.php?sitemap=xml', 'top');
}
add_action('init', 'add_sitemap_rewrite_rule');

// Flush rewrite rules on plugin activation
function activate_xml_sitemap_plugin() {
    add_sitemap_rewrite_rule();
    flush_rewrite_rules();
}
register_activation_hook(__FILE__, 'activate_xml_sitemap_plugin');

// Flush rewrite rules on plugin deactivation
function deactivate_xml_sitemap_plugin() {
    flush_rewrite_rules();
}
register_deactivation_hook(__FILE__, 'deactivate_xml_sitemap_plugin');

// Add admin menu item
add_action('admin_menu', 'xml_sitemap_plugin_menu');

function xml_sitemap_plugin_menu() {
    add_menu_page(
        'XML Sitemap Plugin',
        'XML Sitemap',
        'manage_options',
        'xml-sitemap-plugin',
        'xml_sitemap_plugin_admin_page',
        'dashicons-admin-site',
        20
    );
}

function xml_sitemap_plugin_admin_page() {
    // Check if user has permission
    if (!current_user_can('manage_options')) {
        return;
    }

    // Get total posts count
    $total_posts = wp_count_posts('post')->publish;

    // Get total pages count
    $total_pages = wp_count_posts('page')->publish;

    // Get sitemap URL
    $domain_name = parse_url(home_url(), PHP_URL_HOST);
    $sitemap_name = str_replace('.', '-', $domain_name) . '-sitemap.xml';
    $sitemap_url = home_url($sitemap_name);

    // Get last generated date
    $last_generated = get_option('xml_sitemap_last_generated', 'Never');

    ?>
    <div class="wrap">
        <h1>XML Sitemap Plugin</h1>
        <p>Total Posts: <strong><?php echo esc_html($total_posts); ?></strong></p>
        <p>Total Pages: <strong><?php echo esc_html($total_pages); ?></strong></p>
        <p>Your sitemap URL is: <a href="<?php echo esc_url($sitemap_url); ?>" target="_blank"><?php echo esc_html($sitemap_url); ?></a></p>
        <p>Last Generated: <strong><?php echo esc_html($last_generated); ?></strong></p>
        <form method="post" action="">
            <?php submit_button('Regenerate Sitemap'); ?>
        </form>
    </div>
    <?php
}

// Handle manual sitemap generation
if (isset($_POST['submit'])) {
    generate_xml_sitemap();
}

// Add meta tags and canonical link to specific pages
function add_meta_tags_and_canonical() {
    if (is_author() || is_date() || is_category() || is_tag() || is_search() || is_attachment() || is_paged()) {
        echo '<meta name="robots" content="noindex, follow" />' . "\n";
        echo '<link rel="canonical" href="' . esc_url(home_url('/')) . '" />' . "\n";
    }
}
add_action('wp_head', 'add_meta_tags_and_canonical');

// Add robots.txt functionality
function manage_robots_txt($output, $public) {
    if ($public) {
        $output .= "Disallow: /wp-admin/\n";
        $output .= "Disallow: /wp-login.php\n";
        $output .= "Disallow: /author/\n";
        $output .= "Disallow: /category/\n";
        $output .= "Disallow: /tag/\n";
        $output .= "Disallow: /?s=\n";  // Search query parameter
        $output .= "Disallow: /feed/\n";
        $output .= "Disallow: /page/\n";  // Pagination
        $output .= "Disallow: /attachment/\n";  // Media attachment pages
        // Dynamically disallow date archives, carefully set the pattern based on your URL structure
        $output .= "Disallow: /202*/\n";  // Assuming you want to block all years in the 2020s
    }
    return $output;
}

add_filter('robots_txt', 'manage_robots_txt', 10, 2);

A simple XML Sitemap plugin for WordPress that automatically generates an XML sitemap for all posts. The plugin also includes functionality for managing meta tags, canonical URLs, and custom robots.txt entries to improve SEO.

Features

  • XML Sitemap Generation: Automatically generates a sitemap for all published posts.
  • Meta Tags and Canonical URLs: Adds appropriate meta tags and canonical URLs to specific pages to prevent indexing of non-essential pages.
  • Dynamic robots.txt Management: Adds rules to robots.txt dynamically to guide search engine bots.

What the Sitemap Does

The XML sitemap helps search engines like Google to find and index all of the important pages on your website. By providing a structured list of your website’s URLs, the sitemap ensures that search engines can efficiently crawl your site, improving your website’s visibility and SEO performance.

Actions on Author and Archive Pages

The plugin specifically addresses author and various archive pages to prevent them from being indexed by search engines:

  • Meta Tags: Adds <meta name="robots" content="noindex, follow" /> to author pages, date archives, category archives, tag archives, search results, attachment pages, and paginated pages, preventing them from being indexed while allowing links to be followed.
  • Canonical Link: Adds a canonical link pointing to the homepage to these pages.
  • robots.txt: Adds directives to the robots.txt file to prevent search engines from crawling these pages.

Installation

Follow these steps to install and activate the XML Sitemap Plugin:

  1. Download the Plugin:
  • Download the plugin files and unzip them if necessary.
  1. Upload to WordPress:
  • Upload the plugin folder to the /wp-content/plugins/ directory of your WordPress installation.
  1. Activate the Plugin:
  • Navigate to the WordPress admin dashboard.
  • Go to Plugins > Installed Plugins.
  • Locate “XML Sitemap Plugin” and click Activate.
  1. Verify Installation:
  • After activation, visit Settings > Permalinks and click Save Changes to flush rewrite rules.

.htaccess Configuration

The plugin requires the following rewrite rule to be properly added to your site’s .htaccess file to handle the sitemap generation:

“`apache BEGIN XMLSitemap

RewriteEngine On RewriteRule ^([a-zA-Z0-9-]+)-sitemap.xml$ index.php?sitemap=xml [L]

“`apache END XMLSitemap