Overview
Inspired by Usability.gov and Kristina Halvorson and Melissa Rach’s book, Content Strategy for the Web, NIC assists in running two, separate, scans of your site, producing a singular Content Inventory. This inventory provides a large amount of metrics, all of which can aid in making decisions during your redesign.
This, however, is not to be confused with a performing a Content Audit, which is a process that produces a separate piece of required documentation for the website redesign process. The inventory may be used to produce the audit, which would include additional columns to score, note, and audit each piece of content for relevancy, accuracy, consolidation, rewrite, etc.
Four inventories in one
1. Storage Inventory
The first scan (which produces the Storage Inventory) is an internal inventory of all documents and pages stored on your SharePoint site, which will provide the following information about each:
- File Name
- The name of the file
- URL
- Fully qualified URL to the file
- Created By
- Who the file was created by. In some cases, this may be blank, which could have occurred due to a migration. In this case, no additional information is available.
- Created Date
- When the file was created.
- Last Modified By
- Who the file was last modified by. Individual version information is not available through this report, however more information may be found in the file’s Version History, through SharePoint’s menu system.
- Last Modified Date
- When the file was last modified. Individual version information is not available through this report, however more information may be found in the file’s Version History, through SharePoint’s menu system.
This is a useful report for getting exact totals of what content your agency hosts, the structure of your site, and what users/authors who have created and/or edited content. You can also identify content which has not been updated recently.
2. Metrics Inventory (Pages/Documents/Overview)
The second scan (which produces the Metrics Inventory) is an external inventory, performed using a combination of a third-party SEO crawling tools which have access to the statewide Google Analytics data, and an additional URL data mining tool.
This scan first crawls the site much like a search bot would (such as those employed by Google), gathering titles, headings, word counts, keywords, etc. Next, it gathers Google Analytics metrics for each page or document. Finally, it uses various third-party services to gather page-level metrics, such as readability scores and topics.
3. Baseline Metrics
The Metrics Inventory provides the following baseline metrics (based on the current state of the pages and documents scanned). The Spider will collect data from the first two elements it encounters in the source code. For example, h1-2 is data from the second h1 heading on the page:
- Address
- The URI crawled
- Content
- The content type of the URI
- Status Code
- Http response code (e.g. 200, 301, etc.)
- Status
- The http header response
- Title
- The page title
- Title Length
- The character length of the page title
- Title Pixel Width
- The pixel width of the page title, used in determining truncation in a search result
- Meta Description
- The meta description
- Meta Description Length
- The character length of the meta description
- Meta Description Pixel Width
- The pixel width of the meta description, used in determining truncation in a search result
- Meta Keyword
- The meta keywords
- Meta Keywords Length
- The character length of the meta keywords
- H1
- The first two H1s on the page
- H1 Length
- The character length of the first two H1s on the page
- Meta Robots
- Meta robots data
- Meta Refresh
- Meta refresh data
- Canonical Link Element
- The canonical link element data
- Size
- Size is in bytes, divide by 1024 to convert to kilobytes
- Word Count
- Approximately all words inside the body tag. This does not include HTML markup
- Text Ratio
- Calculates the text to HTML ratio
- Level
- Depth of the page from the start page (which is always your homepage)
- Inlinks
- Number of internal inlinks to the URI. Internal inlinks are links pointing to a given URI from the same subdomain that is being crawled
- Outlinks
- Number of internal outlinks from the URI. Internal outlinks are links from a given URI to another URI on the same subdomain that is being crawled
- External Outlinks
- Number of external outlinks from the URI. External outlinks are links from a given URI to another subdomain
- Hash
- Hash value of the page. This is a duplicate content check
- Response Time
- Time in seconds to download the URI
- Last Modified
- Read from the Last-Modified header in the server’s HTTP response (the Last Modified Date column in the Storage Inventory is more accurate
- Redirect URI
- If the address URI redirects, this column will include the redirect URI target. The status code above will display the type of redirect, 301, 302 etc.
4. Google Analytics Metrics
The Metrics Inventory provides the following Google Analytics (GA) metrics (based on the previous year and on the Landing Page Path dimension):
- Sessions
- A total count of all sessions which include this page (a session being a group of interactions that take place on your website within a given time frame)
- % New Sessions
- An estimated percentage of first time visits to this page, which excludes returning visitors
- New Users
- A total count of first time visits to this page
- Bounce Rate
- The percentage of visits in which the visitor only views one this page of your Website before leaving
- Page Views Per Session
- The average amount of pages viewed in a session which includes this page
- Avg Session Duration
- The average time spent on the site, during a session which includes this page
- Page Views
- The total count of how many times the page has been viewed. If a user reloads a page, that action will be counted as an additional page view
- Unique Page Views
- The total count of how many times a page has been viewed, only counting one view per session
- Avg Time on Page
- The average amount of time a user spends on the page.
- Entrances
- The total number of times this page was the first page visited by a user.
- Entrance Rate
- The percentage of visitors for whom this page was the first one they visited during their session.
- Exits
- The total amount of times that this page was the final page visited before leaving.
- Exit Rate
- The percentage of visitors for whom this page was the last one they visited during their session.
- Page Value
- the average value for a page that a user visited before landing on the goal page or completing an Ecommerce transaction. This will almost always have a value of 0.
- Avg Page Load Time
- The average the time, in seconds, it takes the page to load
- Avg Redirection Time
- If there are redirects, this will show the time is takes for redirects to start and end
- Avg Server Response Time
- The average amount of time, in seconds, that it takes for your site’s server to respond to what a user is doing on your site such as clicking through pages and loading content
- Avg Page Download Time
- This will always be less than Avg Page Load time because it represents a portion of the total page load; specifically, the portion after the DOM loads, where the browser is downloading all necessary images, stylesheets, or other elements referenced.
Readability Scores, Reference Counts, Social Shares, & Other Details
The Metrics Inventory provides the following Readability, Counts, and Social Share metrics (some of which are repetitive):
- URL
- The URL the profile was performed against. This could be different from the original URL in your list if it redirects to a different URL
- DNS Safe URL
- The URL the profile was performed against.
- Path
- The folder or directory of the URL, relative to the domain
- Domain
- he domain (or subdomain) for the URL. This is the portion between the http protocol and the root directory (e.g. http://www.example.com/myurl.html)
- Root
- The root domain for the URL (e.g. http://www.example.com/myurl.html)
- Domain
- The domain (or subdomain) for the URL. This is the portion between the http protocol and the root directory
- TLD
- The top level domain for the URL (e.g. .gov, .us, etc.)
- Scheme
- The protocol of the URL
- http or https
- HTTP Status Code
- Numerical representation of the HTTP status code (e.g. 200, 301, 404, etc.)
- HTTP Status
- Text representation of the HTTP status code (e.g. Ok, Permanent Redirect, Not Found, etc.)
- Original URL
- The original URL requested (whether or not it was redirected)
- Original HTTP Status Code
- Numerical representation of the HTTP status code of the original URL
- Original HTTP Status
- Text representation of the HTTP status code of the original URL
- Content Type
- The type of data format returned by the server
- Content Length
- The total size of the URL data in bytes
- Charset
- The character set attribute, taken from the HTTP header
- Encoding
- The method of encoding used on the page
- Hash
- A fixed length hash code that uniquely identifies data. If two hash values match, then the pages are exactly the same in content
- HTML Length
- The total size of the HTML data in bytes
- Text Length
- The total size of the text data in bytes
- Text to HTML Ratio
- The physical size (bytes) of the text content divided by the physical size of the HTML content
- Title
- The first page title found on the page
- Title Length
- The character length of the title
- Description
- The meta description of the page
- Description Length
- The character length of the meta description
- Word Count
- The approximate number of words found on the page
- Sentence Count
- The approximate number of sentences found on the page
- Header Count
- The number of headers (h1-h6) found on the page
- Paragraph Count
- The approximate number of paragraphs (as in) found on the page
- Reading Time
- Estimated reading time at 300wpm.
- Word (1-10)
- The 10 or less most important words on the page (taking into account frequency and context of use), in order or importance.
- Sentiment
- measured by an algorithm based on a classification of English words that are scored, based on being positive or negative words
- Sentiment Score
- The summed score of each word’s sentiment across the page
- Dale-Chall Score
- The Dale–Chall readability formula is a readability test that provides a numeric gauge of the comprehension difficulty that readers will have when reading a text
- Flesch Kincaid Grade Level
- The Flesch–Kincaid Grade Level Formula translates the 0–100 score to a U.S. grade level. This readability test is used extensively in the field of education
- Flesch Kincaid Reading Ease Score
- In the Flesch Reading Ease test, higher scores indicate material that is easier to read; lower numbers mark passages that are more difficult to read
- Flesch Kincaid Reading Ease
- The Flesch Kincaid Reading Ease Score in Plain English: Very Confusing, Difficult, Fairly Difficult, Standard, Fairly Easy, Easy, Very Easy. The two other results could be
- Unknown – no text found, Not Checked – URL was not available to check
- Gunning Fog Score
- The Gunning Fog Index estimates the years of formal education needed to understand the text on a first reading.
- Smog Index
- The SMOG grade is a measure of readability that estimates the years of education needed to understand a piece of writing
- Images
- The number of images found on the page
- Images with Alt
- The number of images with alt text set
- Images without Alt
- The number of images with no alt text set.
- Videos
- The number of videos found on the page
- External Link Count
- The number of external links found on the page (pointing off the current domain)
- Internal Link Count
- The number of internal links found on the page (pointing to the current domain)
- Total Link Count
- The total number of links found on the page (internal and external)
- Author
- The name of the author, found using ‘rel=author’
- Author URL
- The corresponding author URL on Google+, found using ‘rel=author’
- uClassify Topic (1-5)
- An algorithm categorizes the page text into a topic (Arts, Business, Computers, Games, Health, Home, Recreation, Science, Society and Sports
- URL Google Plus Ones
- Total number of Google Plus shares
- URL Facebook Likes
- Total number of Facebook likes
- URL Facebook Shares
- Total number of Facebook shares
- URL Facebook Comments
- Total number of Facebook comments
- URL Facebook Total
- Total number of Facebook references
- URL LinkedIn Shares
- Total number of LinkedIn shares
- URL Pinterest Pins
- Total number of Pinterest pins
- URL Total Shares
- Total number of social media shares
Orphan File Inventory
The Storage Inventory typically has a larger total than the Metrics Inventory, as, many times, there is more content stored on your site than is linked to from your pages and documents. Documents and pages found on this Storage Inventory but not the Metrics Inventory are called orphaned files, which may be prime candidate for deletion or consolidation, as they are currently not (easily) accessible, and may be unused. Please note that although this is a good indicator of orphaned files, there may be other sites linking to these files. If any file is in question, cross-reference it with metrics through the Google Analytics website.
Metrics Overview
The Metrics Overview will provide totals on most Baseline Metrics previously mentioned.
Combined, these inventories are your Quantitative Content Inventory. From here, you can use the inventory to locate out of date material, shallow content (low word count), pages with low readability scores, images missing alt text, pages missing meta data, and much more.
Further Reading