Skip to main content

Oregon State Flag An official website of the State of Oregon »

Understand Your Inventory

Overview

Inspired by Usability.gov and Kristina Halvorson and Melissa Rach’s book, Content Strategy for the Web, NIC assists in running two, separate, scans of your site, producing a singular Content Inventory. This inventory provides a large amount of metrics, all of which can aid in making decisions during your redesign.

This, however, is not to be confused with a performing a Content Audit, which is a process that produces a separate piece of required documentation for the website redesign process. The inventory may be used to produce the audit, which would include additional columns to score, note, and audit each piece of content for relevancy, accuracy, consolidation, rewrite, etc.

Four inventories in one

1. Storage Inventory

The first scan (which produces the Storage Inventory) is an internal inventory of all documents and pages stored on your SharePoint site, which will provide the following information about each:

File Name
The name of the file
URL
Fully qualified URL to the file
Created By
Who the file was created by. In some cases, this may be blank, which could have occurred due to a migration. In this case, no additional information is available.
Created Date
When the file was created.
Last Modified By
Who the file was last modified by. Individual version information is not available through this report, however more information may be found in the file’s Version History, through SharePoint’s menu system.
Last Modified Date
When the file was last modified. Individual version information is not available through this report, however more information may be found in the file’s Version History, through SharePoint’s menu system.

This is a useful report for getting exact totals of what content your agency hosts, the structure of your site, and what users/authors who have created and/or edited content. You can also identify content which has not been updated recently.

2. Metrics Inventory (Pages/Documents/Overview)

The second scan (which produces the Metrics Inventory) is an external inventory, performed using a combination of a third-party SEO crawling tools which have access to the statewide Google Analytics data, and an additional URL data mining tool.

This scan first crawls the site much like a search bot would (such as those employed by Google), gathering titles, headings, word counts, keywords, etc. Next, it gathers Google Analytics metrics for each page or document. Finally, it uses various third-party services to gather page-level metrics, such as readability scores and topics.

3. Baseline Metrics

The Metrics Inventory provides the following baseline metrics (based on the current state of the pages and documents scanned). The Spider will collect data from the first two elements it encounters in the source code. For example, h1-2 is data from the second h1 heading on the page:

Address
The URI crawled
Content
The content type of the URI
Status Code
Http response code (e.g. 200, 301, etc.)
Status
The http header response
Title
The page title
Title Length
The character length of the page title
Title Pixel Width
The pixel width of the page title, used in determining truncation in a search result
Meta Description
The meta description
Meta Description Length
The character length of the meta description
Meta Description Pixel Width
The pixel width of the meta description, used in determining truncation in a search result
Meta Keyword
The meta keywords
Meta Keywords Length
The character length of the meta keywords
H1
The first two H1s on the page
H1 Length
The character length of the first two H1s on the page
Meta Robots
Meta robots data
Meta Refresh
Meta refresh data
Canonical Link Element
The canonical link element data
Size
Size is in bytes, divide by 1024 to convert to kilobytes
Word Count
Approximately all words inside the body tag. This does not include HTML markup
Text Ratio
Calculates the text to HTML ratio
Level
Depth of the page from the start page (which is always your homepage)
Inlinks
Number of internal inlinks to the URI. Internal inlinks are links pointing to a given URI from the same subdomain that is being crawled
Outlinks
Number of internal outlinks from the URI. Internal outlinks are links from a given URI to another URI on the same subdomain that is being crawled
External Outlinks
Number of external outlinks from the URI. External outlinks are links from a given URI to another subdomain
Hash
Hash value of the page. This is a duplicate content check
Response Time
Time in seconds to download the URI
Last Modified
Read from the Last-Modified header in the server’s HTTP response (the Last Modified Date column in the Storage Inventory is more accurate
Redirect URI
If the address URI redirects, this column will include the redirect URI target. The status code above will display the type of redirect, 301, 302 etc.

4. Google Analytics Metrics

The Metrics Inventory provides the following Google Analytics (GA) metrics (based on the previous year and on the Landing Page Path dimension):
Sessions
A total count of all sessions which include this page (a session being a group of interactions that take place on your website within a given time frame)
% New Sessions
An estimated percentage of first time visits to this page, which excludes returning visitors
New Users
A total count of first time visits to this page
Bounce Rate
The percentage of visits in which the visitor only views one this page of your Website before leaving
Page Views Per Session
The average amount of pages viewed in a session which includes this page
Avg Session Duration
The average time spent on the site, during a session which includes this page
Page Views
The total count of how many times the page has been viewed. If a user reloads a page, that action will be counted as an additional page view
Unique Page Views
The total count of how many times a page has been viewed, only counting one view per session
Avg Time on Page
The average amount of time a user spends on the page.
Entrances
The total number of times this page was the first page visited by a user.
Entrance Rate
The percentage of visitors for whom this page was the first one they visited during their session.
Exits
The total amount of times that this page was the final page visited before leaving.
Exit Rate
The percentage of visitors for whom this page was the last one they visited during their session.
Page Value
the average value for a page that a user visited before landing on the goal page or completing an Ecommerce transaction. This will almost always have a value of 0.
Avg Page Load Time
The average the time, in seconds, it takes the page to load
Avg Redirection Time
If there are redirects, this will show the time is takes for redirects to start and end
Avg Server Response Time
The average amount of time, in seconds, that it takes for your site’s server to respond to what a user is doing on your site such as clicking through pages and loading content
Avg Page Download Time
This will always be less than Avg Page Load time because it represents a portion of the total page load; specifically, the portion after the DOM loads, where the browser is downloading all necessary images, stylesheets, or other elements referenced.

Readability Scores, Reference Counts, Social Shares, & Other Details

The Metrics Inventory provides the following Readability, Counts, and Social Share metrics (some of which are repetitive):

URL
The URL the profile was performed against. This could be different from the original URL in your list if it redirects to a different URL
DNS Safe URL
The URL the profile was performed against.
Path
The folder or directory of the URL, relative to the domain
Domain
he domain (or subdomain) for the URL. This is the portion between the http protocol and the root directory (e.g. http://www.example.com/myurl.html)
Root
The root domain for the URL (e.g. http://www.example.com/myurl.html)
Domain
The domain (or subdomain) for the URL. This is the portion between the http protocol and the root directory
TLD
The top level domain for the URL (e.g. .gov, .us, etc.)
Scheme
The protocol of the URL
http or https
HTTP Status Code
Numerical representation of the HTTP status code (e.g. 200, 301, 404, etc.)
HTTP Status
Text representation of the HTTP status code (e.g. Ok, Permanent Redirect, Not Found, etc.)
Original URL
The original URL requested (whether or not it was redirected)
Original HTTP Status Code
Numerical representation of the HTTP status code of the original URL
Original HTTP Status
Text representation of the HTTP status code of the original URL
Content Type
The type of data format returned by the server
Content Length
The total size of the URL data in bytes
Charset
The character set attribute, taken from the HTTP header
Encoding
The method of encoding used on the page
Hash
A fixed length hash code that uniquely identifies data. If two hash values match, then the pages are exactly the same in content
HTML Length
The total size of the HTML data in bytes
Text Length
The total size of the text data in bytes
Text to HTML Ratio
The physical size (bytes) of the text content divided by the physical size of the HTML content
Title
The first page title found on the page
Title Length
The character length of the title
Description
The meta description of the page
Description Length
The character length of the meta description
Word Count
The approximate number of words found on the page
Sentence Count
The approximate number of sentences found on the page
Header Count
The number of headers (h1-h6) found on the page
Paragraph Count
The approximate number of paragraphs (as in) found on the page
Reading Time
Estimated reading time at 300wpm.
Word (1-10)
The 10 or less most important words on the page (taking into account frequency and context of use), in order or importance.
Sentiment
measured by an algorithm based on a classification of English words that are scored, based on being positive or negative words
Sentiment Score
The summed score of each word’s sentiment across the page
Dale-Chall Score
The Dale–Chall readability formula is a readability test that provides a numeric gauge of the comprehension difficulty that readers will have when reading a text
Flesch Kincaid Grade Level
The Flesch–Kincaid Grade Level Formula translates the 0–100 score to a U.S. grade level. This readability test is used extensively in the field of education
Flesch Kincaid Reading Ease Score
In the Flesch Reading Ease test, higher scores indicate material that is easier to read; lower numbers mark passages that are more difficult to read
Flesch Kincaid Reading Ease
The Flesch Kincaid Reading Ease Score in Plain English: Very Confusing, Difficult, Fairly Difficult, Standard, Fairly Easy, Easy, Very Easy. The two other results could be
Unknown – no text found, Not Checked – URL was not available to check
Gunning Fog Score
The Gunning Fog Index estimates the years of formal education needed to understand the text on a first reading.
Smog Index
The SMOG grade is a measure of readability that estimates the years of education needed to understand a piece of writing
Images
The number of images found on the page
Images with Alt
The number of images with alt text set
Images without Alt
The number of images with no alt text set.
Videos
The number of videos found on the page
External Link Count
The number of external links found on the page (pointing off the current domain)
Internal Link Count
The number of internal links found on the page (pointing to the current domain)
Total Link Count
The total number of links found on the page (internal and external)
Author
The name of the author, found using ‘rel=author’
Author URL
The corresponding author URL on Google+, found using ‘rel=author’
uClassify Topic (1-5)
An algorithm categorizes the page text into a topic (Arts, Business, Computers, Games, Health, Home, Recreation, Science, Society and Sports
URL Google Plus Ones
Total number of Google Plus shares
URL Facebook Likes
Total number of Facebook likes
URL Facebook Shares
Total number of Facebook shares
URL Facebook Comments
Total number of Facebook comments
URL Facebook Total
Total number of Facebook references
URL LinkedIn Shares
Total number of LinkedIn shares
URL Pinterest Pins
Total number of Pinterest pins
URL Total Shares
Total number of social media shares

Orphan File Inventory

The Storage Inventory typically has a larger total than the Metrics Inventory, as, many times, there is more content stored on your site than is linked to from your pages and documents. Documents and pages found on this Storage Inventory but not the Metrics Inventory are called orphaned files, which may be prime candidate for deletion or consolidation, as they are currently not (easily) accessible, and may be unused. Please note that although this is a good indicator of orphaned files, there may be other sites linking to these files. If any file is in question, cross-reference it with metrics through the Google Analytics website.

Metrics Overview

The Metrics Overview will provide totals on most Baseline Metrics previously mentioned.

Combined, these inventories are your Quantitative Content Inventory. From here, you can use the inventory to locate out of date material, shallow content (low word count), pages with low readability scores, images missing alt text, pages missing meta data, and much more.

Further Reading