I had the need to check broken links on a lot of SharePoint sites and as I didn’t find anything
useful on the Internet, I decided to create my own tool.
Here is the result of my job, it is by far not perfect, but works well enough for my production environment
Check-SPOBrokenLink.ps1
Check-SPOBrokenLink.ps1 is the main script which calls all the others functions.
- get a list of all sites
- for each site in the tenant
- get the contents of all the pages of the site
- for each page
- modify the content to be readable by a human being
- detect all URLs
- for each detected URL
- for each page
- test URL
- add an object to the array
Get-SPOSitePagesContent.ps1
Unfortunately, it is not that simple to grab the content of SPO page as you usually
would do with Invoke-WebRequest. Basically, the content is loaded with JavaScript, and
thus it is not possible to obtain it this way.
I looked further to know how I could get the content and I finally arrived at the API Rest
of SPO.
$Web = Invoke-PnPSPRestMethod -url "$SiteURL/_api/web/lists/getbytitle('$Library')/Items"
Encode-HumanReadability.ps1
$ContentHuman = $ContentRaw.replace(":", ":") $ContentHuman = $ContentHuman.replace(""", '"') $ContentHuman = $ContentHuman.replace("{", "<") $ContentHuman = $ContentHuman.replace("}", ">") $ContentHuman = $ContentHuman.replace(">", ">") $ContentHuman = $ContentHuman.replace("<", "<")
Test-URL.ps1
With a regular expression, we can detect if the URL points to a file or a site and take action depending on the result. I have tried lots of different regular expression, all includes false positive, none of them was working perfectly. Finally, I decided to keep a simple one, this would need more attention.
$URLPatern = "(https|http)://.+?(`"|')"
switch -regex ($URL){ ".pdf$" {write-host "The file is a pdf" ; $IsFile = $true} ".docx$" {write-host "The file is a Word"; $IsFile = $true} Default {write-host "This is site page" ; $IsFile = $false} }
$SiteStatus = Invoke-WebRequest $URL
Test-Path -Path URL
Please feel free to improve and share your changes
I propose this script as-is, the render is enough for my current needs. I’m aware, this script would need lots of improvement. I don’t have the time to improve them. I would be glad to update the script if you send me your changes.
I added this snippet to the end of the script to add dead links to a SharePoint list.
foreach($URL in $ArrayURLStatus){ if($URL.Status -eq "NOK"){ Add-PnPListItem -list $List -Values @{"Title" = $URL.Site; "Page" = $URL.Page; "URL" = $URL.URL; "Statut" = $URL.Status} } }
Can I use this for on-prem sharepoint sites?
Hello,
I haven’t tried it. I can see, there is a PNP module for on-prem too. Probably you would need to bring a few changes.
Can you reach that page?
“$SiteURL/_api/web/lists/getbytitle(‘$Library’)/Items”
Can you run this?
$TenantSites = Get-PnPTenantSite | ? {($_.Template -eq “SitePagePublishing#0”) } | Select -ExpandProperty URL
Yann
Hi Yann,
I appreciate your work here as this is something I am trying to figure as well. I don’t have a lot of experience with PowerShell however.
Where I am running in to trouble is setting the test filters. Could you give some examples of how those are used?
I am receiving an error as such:
Select-Object : A parameter cannot be found that matches parameter name ‘and’.
Hi Neil,
What filter are you using? Try the run Get-PnPTenantSite in your console to understand how it works.
The easiest way is to not use a filter. But the process could be long, depending on how many sites you have. That’s why I proposed to add a filter to reduce the execution time.
$TenantSites = Get-PnPTenantSite | Select -ExpandProperty URL.
If you want to filter for communication site only you can add
$TenantSites = Get-PnPTenantSite | ? {$_.Template -eq “SitePagePublishing#0”} | Select -ExpandProperty URL
And if you want to add two filters, you can then use the operator “-and”. In this case, only communication sites and site that the URL is $MyTenantURL/sites/INF
$TenantSites = Get-PnPTenantSite | ? {($_.Template -eq “SitePagePublishing#0”) -and ($_.URL -eq “$MyTenantURL/sites/INF”)} | Select -ExpandProperty URL
Same thing here, you can run it without a filter
$TenantSitePages = Get-SPOSitePagesContent -SiteURL $TenantSite -Library $Library
Or add something
$TenantSitePages = Get-SPOSitePagesContent -SiteURL $TenantSite -Library $Library | ? Title -like B*
For more information, see where-object (alias “?”)
Glad if it helps
Hi Yann,
Thanks for your post , it’s really helpful, how this script works for only single communication site, need to get information for single communication site which has many site pages. Is there a way to figure it out
Hi Krishna,
To be honest with you, I don’t do SPO anymore and I don’t have an environment available. I’m not sure about the answer I’m providing here. I think there is no problem if you only have 1 site. Maybe you can add a filter to select only your communication site. It shouldn’t be too complicated.
Hope it helps you
Thank you Yann, this definitely helps. I was not formatting that -and section correctly. I am hesitant to bother you with my next issue, as I believe it is my limited understanding with API application IDs and secrets. I registered an app in Azure AD and generated client secret. But I get Connect-PnPOnline : Token request failed. I wonder if I don’t have the correct API permissions applied. Anyway, I understand this is not relevant to your scripts, but if you have any suggestions or can point me to something easily please do. Otherwise, thanks again. I will continue to research on my own regarding basics of API applications.
Client ID and Client secret are useful if you schedule the script. You could simply connect with Connect-PNPOnline -Credentials. The official doc https://docs.microsoft.com/en-us/sharepoint/dev/solution-guidance/security-apponly-azureacs.
And as is, my notes:
https://YourTenant.sharepoint.com/_layouts/15/appregnew.aspx
https://YourTenant-admin.sharepoint.com/_layouts/15/appinv.aspx
Connect-PnPOnline -url https://YourTenant.sharepoint.com/ -AppId “” -AppSecret “”
Connect-PnPOnline -url https://YourTenant.sharepoint.com/ -ClientId “” -ClientSecret “”
Yann
Thank you Yann, this is just what I needed. Very much appreciated!
I am connected now and running this in a dev environment to test. I will reach out again with any other questions or if I come up with anything that could be helpful.
Neil
Hi Yann,
I am trying to add the snippet to add dead links to a list, but it is not working. I have the list created in the contents of the top of the tenant, with corresponding columns. It could be that I don’t have the snippet in the correct place in the script. Where did you add it? I set the path to the list as $List = “” along with the other variables at the top of Check-SPOBrokenLink, but otherwise not sure where to put the snippet itself.
Thanks again for your time!
Neil
Hey Neil,
Add simply the snippet at the end of the file. You need first to connect to this site which host the list.
$ArrayURLStatus | ft
# connect
Connect-PnPOnline -url “$MyTenantURL/sites/config” -ClientID $ClientID -ClientSecret $ClientSecret
# Remove all items
$List = “Vérification liens sites”
$Items = Get-PnPListItem -List $List
foreach($item in $items){
Remove-PnPListItem -List $List -Identity $Item.id -Force
}
# Add items
foreach($URL in $ArrayURLStatus){
if($URL.Status -eq “NOK”){
Add-PnPListItem -list $List -Values @{“Title” = $URL.Site; “Page” = $URL.Page; “URL” = $URL.URL; “Statut” = $URL.Status}
}
}
Just what I needed. This is working for me now. Part of my problem is I was setting $List to the path rather than list name. Thank you yet again Yann!
Hi Yann,
I hope that you will not mind that I write here about 3rd party tool that can be used for exactly what you described above.
We have tool called ReplaceMagic (www.replacemagic.com) which can scan any SharePoint location and report all found links and also indicates if there are broken or not.
ReplaceMagic can read also content of documents stored in document library so it is full package supporting identification of links in documents (Office formats, PDF, Email, any text files) but also in SharePoint pages with hard coded links or in Wiki or Canvas pages. Recently we added support also to check SharePoint list item fields of URL type and we started to add support for Web Parts (first will be Summary Links web part).
Scanning of selected location and identification of broken links is without any limitations. In case that you would like that we fix link for that you’ll need license but as your article topic is anyway how to check for broken links this is fully without any limits covered by ReplaceMagic for free.
Btw. RM is multi-thread bringing great performances (we are only limited by computer power where RM is or by SharePoint throttling).
I hope that it is ok to post this reply here as it might help people around.
BR,
Oliver
Can replacemagic work with SharePoint Online?
Cognillo Broken Link checker is another tool that can find and replace broken links.
Hi all,
I’m not sure if anyone else is running into this issue but in this part:
> $Web = Invoke-PnPSPRestMethod -url “$SiteURL/_api/web/lists/getbytitle(‘$Library’)/Items”
On several of our sites, it is reporting:
“Invoke-PnPSPRestMethod : Error during serialization or deserialization using the JSON JavaScriptSerializer. The length of the string exceeds the value set on the
maxJsonLength property.”
I have also tried limiting it by putting `$top=100 to limit the amount of results it returns which works by limiting it to 100 but then I’m unsure how to get items 101-200.
Has anyone encountered this or know of a way around this? We have several thousand pages on a single site, so it would be great if Invoke-PnPSPRestMethod could be used for this.
Thanks!
Hi Patrick,
First, sorry for the delay I accepted your message, I’m travelling right now. Unfortunately I’m not able to help you with this issue as I no longer have access to a SPO environment.
Have you googled this? I found this https://github.com/pnp/PnP-PowerShell/issues/2111 which looks like an open case since 2019 🙁
Hope someone can help
Yann
Dear Yann,
Thank you for your reply, despite being so busy. It seems it is possible to take a few hundred at a time, although the more reliable way I found of referencing this data was to use the $TenantSitePage.FieldValues.CanvasContent1 property as the -ContentRaw value instead of needing to use the Invoke-PnPRestMethod. This way, we are referencing the data directly from SharePoint without relying on the REST API and the limitations it has.
It also means you have all fields available to pull from within the library, not just the fields that are available from the API.
This also turns out to be much faster, especially when there are thousands of pages throughout your tenant.
Thanks again for your help and all the work gone into this.