Forums >> Programming >> Proof of Concept (POC) >>
Implementing SEO using RPG, XML and Sitemaps




Posted:
bvstone

Implementing SEO using RPG, XML and Sitemaps

 
Implementing SEO using RPG, XML and Sitemaps

Anyone who has a site wants to make sure that to web crawlers find their site, and when they do get the proper information so it can be indexed.

If you dig really deep into Search Engine Optimization (SEO) things get pretty complicated.  But, on the surface there are a few things web developers can do to nudge their site up in the lists.  One of them is using Sitemaps.  You can read about Sitemaps here.

Sitemaps make it easier for web crawlers to index and search your site.  There are even a few webpages that will create Sitemaps for you that you can download and place on your server for crawlers to find.  

But what if your site is dynamic, like this one?  That means you'll probably want to have a dynamic Sitemap as well.  Here's how we did it for this site.

Step 1 - Create an eRPG/CGI program that will output dynamic XML for the web crawler to use

In our case, we have a few static pages, and many dymamic pages (each post will be it's own page).  So we can start out by creating a template with a section for static content. 

/$top
<?xml version="1.0" encoding="UTF-8"?>
<urlset
      xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
            http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">

<url>
  <loc>http://www.fieldexit.com/</loc>
</url>
<url>
  <loc>http://www.fieldexit.com/forum/list</loc>
</url>
<url>
  <loc>http://www.fieldexit.com/login.html</loc>
</url>
<url>
  <loc>http://www.fieldexit.com/signup.html</loc>
</url>
<url>
  <loc>http://www.fieldexit.com/changepw.html</loc>
</url>

 Next, we will want to index our group lists, forum lists, and finally our threads (posts) so each of them gets their own entry in the Sitemap file.

/$groups
<url>
  <loc>http://www.fieldexit.com/forum/list?groupid=/%groupid%/</loc>
</url>

/$forums
<url>
  <loc>http://www.fieldexit.com/forum/thread?groupid=/%groupid%/&amp;forumid=/%forumid%/</loc>
</url>

/$threads
<url>
  <loc>http://www.fieldexit.com/forum/display?threadid=/%threadid%/</loc>
  <lastmod>/%lastmod%/</lastmod>
</url>

/$end
</urlset>

Our eRPG program will then read through the group, forum and thread file and create an entry corresponding to each of them:

     H DFTACTGRP(*NO) BNDDIR('GREENBOARD')
      ****************************************************************
      * Prototypes                                                   *
      ****************************************************************
      /COPY QCOPYSRC,P.ERPGSDK
      /COPY QCOPYSRC,P.GBFORUM
      ****************************************************************
      * Copy Members                                                 *
      ****************************************************************
      /COPY QCOPYSRC,SQL
      ****************************************************************
     D GROUPDS       E DS                  EXTNAME(GROUPPF) PREFIX(g_)
     D FORUMDS       E DS                  EXTNAME(FORUMPF) PREFIX(f_)
     D THREADDS      E DS                  EXTNAME(THREADPF) PREFIX(t_)
      *
     D sysID           S                   LIKE(g_SYSID)
      ****************************************************************
      /free
       Exec Sql Set Option Datfmt=*Iso, Commit=*None, Closqlcsr=*Endmod;

       sysID = #gbf_getSysID();

       #startup();
       #writeTemplate('stdxmlheader.erpg');
       #loadTemplate('sitemap.erpg');
       #writeThisSec('top');

       EXSR $Groups;
       EXSR $Forums;
       EXSR $Threads;

       #writeThisSec('end');
       #cleanup();

       *INLR = *on;
       //-------------------------------------------------------------/
       // List Groups                                                 /
       //-------------------------------------------------------------/
       begsr $Groups;

         #loadSection('groups');

         exec sql
           declare C1 cursor for
             select GROUPID, GROUPDESC
               from GROUPPF
             where
               SYSID = :sysID and
               GROUPID <> 'test';

         exec sql open C1;
         exec sql fetch from C1 into
           :g_GROUPID, :g_GROUPDESC;

         dow (xSQLState2 = Success_On_Sql);
           #replaceData('/%groupid%/':g_GROUPID);
           #writeSection();

           exec sql fetch from C1 into
             :g_GROUPID, :g_GROUPDESC;
         enddo;

         exec sql close C1;

       endsr;
       //-------------------------------------------------------------/
       // List Forums                                                 /
       //-------------------------------------------------------------/
       begsr $Forums;

         #loadSection('forums');

         exec sql
           declare C2 cursor for
             select GROUPID, FORUMID, FORUMDESC
               from FORUMPF
             where
               SYSID = :sysID and
               GROUPID <> 'test';

         exec sql open C2;
         exec sql fetch from C2 into
           :f_GROUPID, :f_FORUMID, :f_FORUMDESC;

         dow (xSQLState2 = Success_On_Sql);
           #replaceData('/%groupid%/':f_GROUPID);
           #replaceData('/%forumid%/':f_FORUMID);
           #writeSection();

           exec sql fetch from C2 into
             :f_GROUPID, :f_FORUMID, :f_FORUMDESC;
         enddo;

         exec sql close C2;

       endsr;
       //-------------------------------------------------------------/
       // List Threads                                                /
       //-------------------------------------------------------------/
       begsr $Threads;

         #loadSection('threads');

         exec sql
           declare C3 cursor for
             select THREADID, SUBJECT, AUTHOR, POSTDATE, EDITDATE
               from THREADPF
             where
               SYSID = :sysID and
               GROUPID <> 'test';
               ACTIVE = 'Y';

         exec sql open C3;
         exec sql fetch from C3 into
           :t_THREADID, :t_SUBJECT, :t_AUTHOR, :t_POSTDATE, :t_EDITDATE;

         dow (xSQLState2 = Success_On_Sql);
           #replaceData('/%threadid%/':t_THREADID);

           if (t_EDITDATE <> *LOVAL);
             #replaceData('/%lastmod%/':%char(%date(t_EDITDATE):*ISO-));
           else;
             #replaceData('/%lastmod%/':%char(%date(t_POSTDATE):*ISO-));
           endif;

           #writeSection();

           exec sql fetch from C3 into
             :t_THREADID, :t_SUBJECT, :t_AUTHOR, :t_POSTDATE, :t_EDITDATE;
         enddo;

         exec sql close C3;

       endsr;

This program, which uses the eRPG SDK, is fairly straightforward.  First it outputs the static section of our template, then it will read through the group, forum and thread files and output items for each of those.  We've been using SQL more and more these days but you could easily use native I/O (ie SETLL, READE) processing for this as well.

UPDATE (08/26/2014):
Our program has been updated to exclude any threads with a group id of "test".  This is because we set up a "test" group and forum where users can play with the editor, play around with the site, etc.  We now exclude these from our sitemap so that they are not indexed (or, at least we're talling google to not index them, we may need to set up a robots.txt file to tell it to ignore those posts as well).

The final result can be seen by clicking here.

Step 2 - Tell Google (or Bing, etc) Where your Sitemap file is

For this example we'll focus on using Google and their Webmaster tools.  Bing is similar and I belive Yahoo may also have their own set of tools.

For Google, you'll go to the Webmaster Tools.  Next, select the site you want to work with (or set one up if you haven't yet!).  On the dashboard for your site you should see (currently it's the far right) a section for sitemap.  If you have a sitemap set up already, it will tell you how many pages it has indexed using your Sitemap.  If not, it will give you an option to tell it where your sitemap is.

In our case, we stuck with the default location by specifying /sitemap.xml in the root of our server as our sitemap file.

You're probably thinking "But wait!  your Sitemap is a CGI program!  Not a static file!"  That's true, but using Server Side Includes (SSIs) we can populate what the website crawler things is a static file with dynamic content from our CGI program creating earlier.

We will create a file in our root named sitemap.xml and it will contain the following code:

<!--#include virtual="/forum/sitemap" -->

In the case of this site, we have the /forum directory mapped to run CGI programs.  A lot of time's you'll see /cgi-bin used, but we decided to use something different.  Also, "sitemap" is the name of the CGI program we created earlier to dynamicall produce our sitemap XML data.

We also need to make sure our Apache server configuration will parse SSI directives in documents that end in XML.  Right now if you have an Apache server set up, it's probably only set up to look for SSI directives in HTML pages like so:

   <FilesMatch "\.html(\..+)?$">
       Options +Includes
       SetOutputFilter Includes
   </FilesMatch>

But, we can easily add files that end in .xml to this directive so they will be processed as well by changing it to this (and of course stopping and restarting the server for the changes to take affect):

   <FilesMatch "\.(html|xml)(\..+)?$">
       Options +Includes
       SetOutputFilter Includes
   </FilesMatch>

Now when our apache server serves up a file ending in .xml (like our sitemap.xml file) it will look for SSI directives and process them.

You can see that the results by clicking here which is a link to the sitemaps.xml file for this site.  You'll notice it's exactly the same as the output created by calling the CGI program directly.

Now, any time a thread is added or updated, the information in the Sitemap file for our site will be automatically updated, and when the Google crawler comes by next time, our site will (hopefully!) be reindexed.

 

 


Last edited 09/08/2014 at 11:32:45


Reply




Copyright 1983-2017 BVSTools
GreenBoard(v3) Powered by the eRPG SDK, MAILTOOL Plus!, GreenTools for Google Apps, jQuery, jQuery UI, BlockUI, CKEditor and running on the IBM i (AKA AS/400, iSeries, System i).