SharePoint Online – Check for broken links

I had the need to check broken links on a lot of SharePoint sites and as I didn’t find anything
useful on the Internet, I decided to create my own tool.

Here is the result of my job, it is by far not perfect, but works well enough for my production environment

Check-SPOBrokenLink.ps1

Check-SPOBrokenLink.ps1 is the main script which calls all the others functions.

  1. get a list of all sites
  2. for each site in the tenant
  3. get the contents of all the pages of the site
  4. for each page
  5. modify the content to be readable by a human being
  6. detect all URLs
  7. for each detected URL
  8. for each page
  9. test URL
  10. add an object to the array

Get-SPOSitePagesContent.ps1

Unfortunately, it is not that simple to grab the content of SPO page as you usually
would do with Invoke-WebRequest. Basically, the content is loaded with JavaScript, and
thus it is not possible to obtain it this way.
I looked further to know how I could get the content and I finally arrived at the API Rest
of SPO.

$Web = Invoke-PnPSPRestMethod -url "$SiteURL/_api/web/lists/getbytitle('$Library')/Items" 

Encode-HumanReadability.ps1

$ContentHuman = $ContentRaw.replace(":", ":")
$ContentHuman = $ContentHuman.replace(""", '"')
$ContentHuman = $ContentHuman.replace("&#123;", "<")
$ContentHuman = $ContentHuman.replace("&#125;", ">")
$ContentHuman = $ContentHuman.replace("&gt;", ">")
$ContentHuman = $ContentHuman.replace("&lt;", "<")

Test-URL.ps1

With a regular expression, we can detect if the URL points to a file or a site and take action depending on the result. I have tried lots of different regular expression, all includes false positive, none of them was working perfectly. Finally, I decided to keep a simple one, this would need more attention.

$URLPatern = "(https|http)://.+?(`"|')"
switch -regex ($URL){
	".pdf$"  {write-host "The file is a pdf" ; $IsFile = $true}
	".docx$" {write-host "The file is a Word"; $IsFile = $true}
	Default  {write-host "This is site page" ; $IsFile = $false}
}
$SiteStatus = Invoke-WebRequest $URL
Test-Path -Path URL

Please feel free to improve and share your changes

I propose this script as-is, the render is enough for my current needs. I’m aware, this script would need lots of improvement. I don’t have the time to improve them. I would be glad to update the script if you send me your changes.

I added this snippet to the end of the script to add dead links to a SharePoint list.

foreach($URL in $ArrayURLStatus){
    if($URL.Status -eq "NOK"){
        Add-PnPListItem -list $List -Values @{"Title" = $URL.Site; "Page" = $URL.Page; "URL" = $URL.URL; "Statut" = $URL.Status}
    }
}

17 thoughts on “SharePoint Online – Check for broken links”

    • Hello,

      I haven’t tried it. I can see, there is a PNP module for on-prem too. Probably you would need to bring a few changes.

      Can you reach that page?
      “$SiteURL/_api/web/lists/getbytitle(‘$Library’)/Items”

      Can you run this?
      $TenantSites = Get-PnPTenantSite | ? {($_.Template -eq “SitePagePublishing#0”) } | Select -ExpandProperty URL

      Yann

      Reply
  1. Hi Yann,

    I appreciate your work here as this is something I am trying to figure as well. I don’t have a lot of experience with PowerShell however.

    Where I am running in to trouble is setting the test filters. Could you give some examples of how those are used?

    I am receiving an error as such:

    Select-Object : A parameter cannot be found that matches parameter name ‘and’.

    Reply
    • Hi Neil,

      What filter are you using? Try the run Get-PnPTenantSite in your console to understand how it works.

      The easiest way is to not use a filter. But the process could be long, depending on how many sites you have. That’s why I proposed to add a filter to reduce the execution time.
      $TenantSites = Get-PnPTenantSite | Select -ExpandProperty URL.

      If you want to filter for communication site only you can add
      $TenantSites = Get-PnPTenantSite | ? {$_.Template -eq “SitePagePublishing#0”} | Select -ExpandProperty URL

      And if you want to add two filters, you can then use the operator “-and”. In this case, only communication sites and site that the URL is $MyTenantURL/sites/INF
      $TenantSites = Get-PnPTenantSite | ? {($_.Template -eq “SitePagePublishing#0”) -and ($_.URL -eq “$MyTenantURL/sites/INF”)} | Select -ExpandProperty URL

      Same thing here, you can run it without a filter
      $TenantSitePages = Get-SPOSitePagesContent -SiteURL $TenantSite -Library $Library

      Or add something
      $TenantSitePages = Get-SPOSitePagesContent -SiteURL $TenantSite -Library $Library | ? Title -like B*

      For more information, see where-object (alias “?”)

      Glad if it helps

      Reply
      • Hi Yann,

        Thanks for your post , it’s really helpful, how this script works for only single communication site, need to get information for single communication site which has many site pages. Is there a way to figure it out

        Reply
        • Hi Krishna,

          To be honest with you, I don’t do SPO anymore and I don’t have an environment available. I’m not sure about the answer I’m providing here. I think there is no problem if you only have 1 site. Maybe you can add a filter to select only your communication site. It shouldn’t be too complicated.

          Hope it helps you

          Reply
  2. Thank you Yann, this definitely helps. I was not formatting that -and section correctly. I am hesitant to bother you with my next issue, as I believe it is my limited understanding with API application IDs and secrets. I registered an app in Azure AD and generated client secret. But I get Connect-PnPOnline : Token request failed. I wonder if I don’t have the correct API permissions applied. Anyway, I understand this is not relevant to your scripts, but if you have any suggestions or can point me to something easily please do. Otherwise, thanks again. I will continue to research on my own regarding basics of API applications.

    Reply
  3. Hi Yann,

    I am trying to add the snippet to add dead links to a list, but it is not working. I have the list created in the contents of the top of the tenant, with corresponding columns. It could be that I don’t have the snippet in the correct place in the script. Where did you add it? I set the path to the list as $List = “” along with the other variables at the top of Check-SPOBrokenLink, but otherwise not sure where to put the snippet itself.

    Thanks again for your time!

    Neil

    Reply
  4. Hey Neil,

    Add simply the snippet at the end of the file. You need first to connect to this site which host the list.

    $ArrayURLStatus | ft

    # connect
    Connect-PnPOnline -url “$MyTenantURL/sites/config” -ClientID $ClientID -ClientSecret $ClientSecret

    # Remove all items
    $List = “Vérification liens sites”
    $Items = Get-PnPListItem -List $List
    foreach($item in $items){
    Remove-PnPListItem -List $List -Identity $Item.id -Force
    }

    # Add items
    foreach($URL in $ArrayURLStatus){
    if($URL.Status -eq “NOK”){
    Add-PnPListItem -list $List -Values @{“Title” = $URL.Site; “Page” = $URL.Page; “URL” = $URL.URL; “Statut” = $URL.Status}
    }
    }

    Reply
    • Just what I needed. This is working for me now. Part of my problem is I was setting $List to the path rather than list name. Thank you yet again Yann!

      Reply
  5. Hi Yann,

    I hope that you will not mind that I write here about 3rd party tool that can be used for exactly what you described above.

    We have tool called ReplaceMagic (www.replacemagic.com) which can scan any SharePoint location and report all found links and also indicates if there are broken or not.

    ReplaceMagic can read also content of documents stored in document library so it is full package supporting identification of links in documents (Office formats, PDF, Email, any text files) but also in SharePoint pages with hard coded links or in Wiki or Canvas pages. Recently we added support also to check SharePoint list item fields of URL type and we started to add support for Web Parts (first will be Summary Links web part).

    Scanning of selected location and identification of broken links is without any limitations. In case that you would like that we fix link for that you’ll need license but as your article topic is anyway how to check for broken links this is fully without any limits covered by ReplaceMagic for free.

    Btw. RM is multi-thread bringing great performances (we are only limited by computer power where RM is or by SharePoint throttling).

    I hope that it is ok to post this reply here as it might help people around.

    BR,
    Oliver

    Reply
    • Can replacemagic work with SharePoint Online?
      Cognillo Broken Link checker is another tool that can find and replace broken links.

      Reply
  6. Hi all,

    I’m not sure if anyone else is running into this issue but in this part:
    > $Web = Invoke-PnPSPRestMethod -url “$SiteURL/_api/web/lists/getbytitle(‘$Library’)/Items”

    On several of our sites, it is reporting:
    “Invoke-PnPSPRestMethod : Error during serialization or deserialization using the JSON JavaScriptSerializer. The length of the string exceeds the value set on the
    maxJsonLength property.”

    I have also tried limiting it by putting `$top=100 to limit the amount of results it returns which works by limiting it to 100 but then I’m unsure how to get items 101-200.

    Has anyone encountered this or know of a way around this? We have several thousand pages on a single site, so it would be great if Invoke-PnPSPRestMethod could be used for this.

    Thanks!

    Reply
      • Dear Yann,

        Thank you for your reply, despite being so busy. It seems it is possible to take a few hundred at a time, although the more reliable way I found of referencing this data was to use the $TenantSitePage.FieldValues.CanvasContent1 property as the -ContentRaw value instead of needing to use the Invoke-PnPRestMethod. This way, we are referencing the data directly from SharePoint without relying on the REST API and the limitations it has.
        It also means you have all fields available to pull from within the library, not just the fields that are available from the API.

        This also turns out to be much faster, especially when there are thousands of pages throughout your tenant.

        Thanks again for your help and all the work gone into this.

        Reply

Leave a Comment