Automatically convert external links to the current site into internal links
Mar 31, 2010
Recently I spoke to a client who had recently migrated to EPiServer from another CMS. They'd noticed that the number of internal broken links was going up rather than down after the migration. I thought that this was strange as EPiServer maintains all internal links by reference so if a page is moved in the page tree then it's link is automatically updated.
On further investigation it was noticed that editors were not using the EPiServer "Link properties" window to add links internal in the expected manner. Rather than navigate around the page tree to select the page, they were simply pasting in the public URL. Given that the site contained 1000's of pages I didn't think that was unreasonable as overall it saved editors a lot of time.
For example, if an editor wanted to set the value of a "URL to page/external address" property they could:
- Paste http://www.MySite.com/Level1/Level2/ThePageIWant/ directly into the link
-OR-
- Open the "Link properties" window
- Click "Page on this Web site"
- Click the "..." button
- Expand "Level1"
- Expand "Level2"
- Select "ThePageIWant"
- Click the "Select" button
- Finally click "OK".
If I was an editor, I know what I'd choose!
The solution
I wanted a simple solution that resolved this problem without asking editors to change the way they work. The requirement is pretty simple - If a link in as marked as an external link but looks like an internal link then attempt to convert it to an internal link. This is pretty easy to achieve the functionality with a simple HttpModule that hooks into the SavingPage page event of the Datafactory:
using System.Web;
using EPiServer.SpecializedProperties;
using System.Text.RegularExpressions;
namespace EPiServer.Modules
{
public class LinkChecker : IHttpModule
{
#region IHttpModule Members
void IHttpModule.Dispose()
{
}
void IHttpModule.Init(HttpApplication context)
{
DataFactory.Instance.SavingPage += new PageEventHandler(Instance_SavingPage);
}
#endregion
#region Datafactory event handlers
void Instance_SavingPage(object sender, PageEventArgs e)
{
// Loop properties on the page looking for those that can contain external links
foreach (var property in e.Page.Property)
{
if (property.Value != null)
{
//URL to page/external address
if (typeof(PropertyUrl) == property.GetType())
{
property.Value = this.parsePropertyUrl((PropertyUrl)property);
}
//XHTML string (>255)
else if (typeof(PropertyXhtmlString) == property.GetType())
{
//Look at the contents of the property, find links and attempt to convert to internal links
property.Value = this.parseMultipleLinks(property.Value.ToString());
}
//Link Collection (a multipage in old money...)
else if (typeof(PropertyLinkCollection) == property.GetType())
{
foreach (var link in ((PropertyLinkCollection)property).Links)
{
string convertedUrl = string.Empty;
if (this.convertLink(link.Href, out convertedUrl))
link.Href = convertedUrl;
}
}
}
}
}
#endregion
#region Private members
private string parsePropertyUrl(PropertyUrl urlProperty)
{
string returnUrl = urlProperty.Value.ToString();
try
{
//Inspect the contents and replace with an internal link if necessary
if (urlProperty.Value != null && urlProperty.ReferencedPermanentLinkIds.Count == 0)
{
//At this point we know that the property contains something but that they are not internal links,
//therefore attempt to convert the external URLs to internal ones in case the user has
//simply pasted a URL instead of using EPiServer to select the page
string newUrl = urlProperty.Value.ToString();
if (this.convertLink(newUrl, out newUrl))
{
returnUrl = newUrl;
}
}
}
catch { }
return returnUrl;
}
private string parseMultipleLinks(string SourceXhtml)
{
string returnXhtml = SourceXhtml;
try
{
string RegexPattern = @"<a.*?href=[""'](?<url>.*?)[""'].*?>(?<name>.*?)</a>";
// Find URL matches
MatchCollection matches = Regex.Matches(returnXhtml, RegexPattern, RegexOptions.IgnoreCase);
foreach (Match m in matches)
{
//Inspect the href part of each link found
string originalUrl = m.Groups["url"].Value;
string convertedUrl;
if (this.convertLink(originalUrl, out convertedUrl))
{
//Need to ensure that we only replace the href='http://www.site.com/Page or
//href="http://www.site.com/Page part of the link to account for the scenario where we have
//<a href="http://www.site.com/Page">http://www.site.com/Page</a>. Otherwise the text the user
//sees will be the internal URL after the replacement has taken place
returnXhtml = returnXhtml.Replace("href='" + originalUrl, "href='" + convertedUrl);
returnXhtml = returnXhtml.Replace(@"href=""" + originalUrl, @"href=""" + convertedUrl);
}
}
}
catch { }
return returnXhtml;
}
private bool convertLink(string originalUrl, out string convertedUrl)
{
bool returnVal = false;
UrlBuilder covertedUrlBuilder = new UrlBuilder(originalUrl);
convertedUrl = originalUrl;
//Only attempt to convert to an internal URL if the site host is the same and the url starts with http
if (originalUrl.StartsWith("http") && covertedUrlBuilder.Host == EPiServer.Configuration.Settings.Instance.SiteUrl.Host)
{
returnVal = Global.UrlRewriteProvider.ConvertToInternal(covertedUrlBuilder);
if (returnVal)
convertedUrl = covertedUrlBuilder.ToString();
}
return returnVal;
}
#endregion
}
}
It inspects the values of links held in the PropertyUrl, PropertyXhtmlString and PropertyLinkCollection properties and where appropriate attempts to convert to internal links.
Conclusion
Currently the solution works PropertyUrl, PropertyXhtmlString and PropertyLinkCollection which covers most bases. If you have any custom properties then it'd pretty simple to implement for those too. The module can be plugged in for relatively standard EPiServer sites with no modification. Also it would be simple to convert this to a scheduled job or admin tool to cover content that was published before the module was plugged in.
I’ve not converted this to work on EPiServer 6 yet but it should be a simple change.
There are no code downloads as the module code can simply be copied from the code example above.