Enterprise content storage planning (SharePoint Server 2010)

 

Applies to: SharePoint Server 2010

This article describes how to plan an enterprise content storage solution that uses Microsoft SharePoint Server 2010. Although the examples in this article are primarily relevant for solutions that are based on SharePoint Server 2010, the prescriptive guidance information that is provided here applies to both SharePoint Server 2010 and SharePoint Foundation 2010 unless noted otherwise.

Information and guidance in this topic is meant to serve as an introduction to enterprise content storage concepts. Certain information in this topic is derived from other more detailed documents about performance and capacity testing performed at Microsoft and from other articles providing detailed guidance about particular concepts. We strongly recommended that you use all these resources when planning your enterprise content storage solution. For more information and links, see Additional Resources later in this article.

In this article:

  • Understanding enterprise content storage

  • Typical large-scale content management scenarios

  • Storage levels: benefits and limitations

  • Routing and storing enterprise content based on metadata

  • Navigating and filtering enterprise content by using metadata

  • List views

  • Additional resources

Understanding enterprise content storage

A document management solution is about much more than only providing a location for documents. A complete enterprise-level document management solution addresses document storage at multiple levels, including storage within site collections, sites, libraries, and folders. It also enables companies to efficiently and effectively manage their growing volumes of enterprise documents and ensure that versions of documents from each stage of their life cycle can be retained for reference or legal reasons.

SharePoint Server 2010 supports high-capacity document storage. A document library can contain millions of documents. However, depending on how the content is used, the performance of sites that contain many documents can decrease. The prescriptive guidance provided in this article can help you design large-scale content management solutions that scale out to the requirements of your enterprise while providing the users of your solution with a high-performance environment in which to create and use documents.

Decisions you make about the capacities of site collections, sites, and libraries should allow for not only the physical storage constraints of your environment but also the content usage and viewing patterns of users. For example, if users view or query a set of documents in a document library that contains thousands of documents, performance can decrease if the site is not configured correctly. Or if a service-level agreement requires that content be backed up two times a day, the service might not perform satisfactorily if the set of content is too large.

Typical large-scale content management scenarios

Typically, large-scale content management scenarios are variants of one of the following scenarios:

  • Large-scale authoring environment

  • Large-scale content archive

  • Extremely large-scale content archive

The scenario descriptions provided here are intended to clarify what we mean by large-scale solutions and to provide high-level examples that hopefully reflect your content management goals. Of course, these descriptions do not include all aspects of a particular scenario. There are dozens, even hundreds, of unique aspects of a particular scenario that are beyond the scope of this article.

Large-scale authoring environment

In a large-scale authoring environment, for example, a site can contain a library in which users edit 50,000 or more documents across 500 or more folders. Versioning is enabled, and typically multiple versions of each document exist. Documents are checked in and out frequently, and workflows are used to control their life cycles. A typical database for this kind of site contains approximately 150 gigabytes (GB) of data. Library settings can be used to limit the number of versions saved, reducing database consumption. (Note that each version of a document is stored separately in the database.) Typically, in a large-scale authoring environment, 80 percent of site users are authors who have access to major and minor versions of documents, whereas 20 percent of site users have read-only permissions and can only view major versions of the content.

A large-scale authoring environment site can be based on the SharePoint Server 2010 Document Center site template, which includes a single, large document library that is optimized for large-scale authoring.

Large-scale content archive

A large-scale content archive is a document repository in which users either view documents or upload new documents. Little or no authoring occurs in the site. There are two primary large-scale content archive scenarios: knowledge base and records management.

In a knowledge base site, there is only a single version of most documents, so that the site can scale out to easily hold millions of documents (recommended maximum of 30,000,000 documents). The content is typically stored in a single database as large as 1 terabyte. In a typical scenario, such as an enterprise's technical support center, 10,000 users might access the content, primarily to read it. A subset of users (three or four thousand) might upload new content to the site. A knowledge base site can be based on the Document Center site template.

Another kind of large-scale content archive is a records center, based on the Records Center site template. Using the Records Center site template is recommended for sites that contain one million or more documents. This site template contains features that you can use to manage the retention and disposition of records (documents that serve as evidence of activities or transactions performed by the organization and that must be retained for some time period). Similar to a knowledge base site, a records center contains a single version of each document and could typically hold millions of documents. Many more users submit content to a records center than view or read it.

Extremely large-scale content archive

An extremely large-scale content archive can be used as a reference library or content repository. To provide scale beyond that of a large-scale content archive, a very large-scale content archive might contain 50,000,000 or more documents distributed across multiple site collections. Content in each site collection may be stored as BLOB (Binary Large Object) data in multiple content databases or by using Remote BLOB Storage (RBS). Remote BLOB Storage enables data to be stored outside SQL Server enabling less expensive storage options and reducing content database size. SharePoint Search or FAST Search for SharePoint is used to find content across multiple site collections.

Storage levels: content storage benefits and considerations

Site collections

A site collection is a set of web sites that has the same owner and shares administration settings. Each site collection contains a top-level Web site and can contain one or more subsites. A site collection usually has a shared navigation structure.

The sites in a site collection are usually interrelated by purpose. To maximize your solution's usability, store all related data and content in a single site collection. Benefits of doing this include the following:

  • Content types and columns managed in a site collection can be shared across sites in the site collection. The managed metadata service can be used to syndicate content types and column definitions across multiple site collections.

  • Information management policies managed in the site collection can be made available to content in all sites in the site collection.

  • Search can be used across content in multiple site collections.

  • Some views list documents from multiple sites in a single site collection (for example, a view enumerating all tasks assigned to a user across a site collection). Also, developers can create cross-site database queries in a site collection, but cross-site queries are not supported across multiple site collections.

  • Content quotas and other quotas can only be managed at the site-collection level.

Consider the following limits when planning how to allocate your content across one or more site collections:

  • All sites in a site collection share the same back-end resources. In particular, all content in a site collection must be stored in the same content database. Because of this, the performance of database operations— such as backing up and restoring content — will depend on the amount of content across the site collection, the size of the database, the speed of the servers hosting the database, and other factors. Depending on the amount of content and the configuration of the database, you might have to segment a site collection into multiple site collections to meet service-level agreements for backing up and restoring, throughput, or other requirements. It is beyond the scope of this article to provide prescriptive guidance about how to manage the size and performance of databases.

  • Particularly, keep very active sites in separate site collections. For example, a knowledge base site on the Internet that enables anonymous browsing could generate lots of database activity. If other sites use the same database, their performance could be affected. By putting the knowledge base site in a separate site collection with its own database, you can free resources for other sites that no longer have to compete with it for database resources.

Note

SharePoint Foundation and SharePoint Server 2010 include several features that reduce the need to have the IT department restore content. The Recycle Bin and the Site Collection Recycle Bin provide a double safety mechanism for restoring unintentionally deleted items. Document versioning also provides a safety net of sorts: if a document is lost, at least its previous version will be available. To better ensure the availability of previous versions, an administrator can remove an author's Delete Versions permission; this can help guarantee that previous versions of content are available without having to restore them from the database.

Sites

A web site is the primary way to organize related content in SharePoint Server 2010 and SharePoint Foundation.

Storing content in the same site has the following benefits:

  • It is easier to create pages that display views of multiple libraries and lists when they are in the same site.

  • You can use the Document Center site template to create a site that is optimized for creating and using many documents.

  • The site navigation user interface is optimized to make it easy to find and locate libraries within the same site.

  • You can define a set of content types and site columns for use in a site.

Libraries

Storing content in the same library provides the following benefits:

  • It is easier for users to add new documents or find existing documents in a single library.

  • Many document management settings— such as permissions, content versioning, and approval— are applied at the library level.

  • Views created by using the user interface are bound to a particular library.

  • Information management policies, such as content auditing and retention settings, can be applied to a library. For certain libraries, only retention policies can be used.

Think about the following limits when you plan how to organize content into the same library:

  • Settings such as required checkouts or versioning are specified at the document library level. If you want to specify different settings for other documents, you must put those documents in a different library with the necessary particular settings.

  • Views that contain columns that are used only on one content type may not be useful because no metadata value will be displayed for items of other content types.

  • View performance is limited when the number of items viewed exceeds the list view threshold of 5,000 items (default). In addition, queries are prevented when they exceed the list view threshold. Organize content in the library into folders that contain 5,000 or fewer items, or create views that take advantage of metadata navigation and indexed columns to return sets of 5,000 or fewer items.

Folders

A folder is a named subdivision of the content in a library similar to folders in a file system. The main purpose of folders is to logically organize content to match the expected functionality of the library. For example, if a library is intended to provide product specifications, the set of folders in the library could be named for each feature area in the product or for each team member who writes product specifications.

When you divide content across multiple folders— each of which contains 5,000 (list view threshold default) or fewer items— views on the folders can perform well. Note that to take advantage of this, views available within folders must be configured to only show items inside the folders (this feature is available in the default view-creation interface). Note also that if folders contain 5,000 or fewer items, views in the folders do not have to be filtered by using indexed columns. For folders that contain more than 5,000 items, you can improve performance using metadata navigation and/or indexed columns and then filter the views to return less than 5,000 items.

Consider creating folders as part of a content routing and storage solution that is based on metadata. By using Content Organizer, you can configure settings that automatically create folders when a target folder becomes too large or to automatically create folders for each value of a metadata property. For more information, see Routing and storing enterprise content based on metadata later in this article.

Routing and storing enterprise content based on metadata

SharePoint Server 2010 introduces metadata routing and storage by using Content Organizer. By using Content Organizer, new site level features make it easier for administrators and users to classify, route, and store content by using rules based on metadata.

Based on a document's metadata, Content Organizer can route a document to a specified folder or automatically create a new folder. Folders can be created as a child of the target folder because the number of items in the target folder exceeds a specified limit, or new folders can be created for each new value in a field. New folders will inherit settings from the parent folder. New folders can then also have additional rules that define additional parameters such as permissions, additional metadata, retention policies, and workflows that the documents in them will inherit.

For more information, see Metadata-based routing and storage overview (SharePoint Server 2010).

Metadata Navigation and Filtering is a new feature in SharePoint Server 2010 that enables users to filter and find content by using metadata. The Metadata Navigation and Filtering feature includes a simple user interface that builds upon the SharePoint Tree view hierarchy control and combines it with a new Key Filters control providing users a powerful tool in finding content based on metadata.

List owners can configure metadata navigation settings that promote fields on a list as key navigation fields. Users viewing those lists can then additionally filter the current list view to show only items with the desired values in those fields.

Automatic indexing features can create list indexes automatically depending on the fields promoted as navigational fields for the list. Automatic indexing can improve query results and improve performance.

For more information about how you can integrate metadata navigation into your enterprise content storage solution, see Metadata navigation overview (SharePoint Server 2010).

List views

At the heart of every enterprise content management solution is the ability for users to easily search for and find the content they are looking for. When moving through a library or folder, tree views and list views provide a simple interface for users to visually navigate through content storage taxonomy. At the same time, when a library or folder contains too many items, the ability for the list to query and quickly display results can require considerable system resources. SharePoint Server 2010 can maximize list view performance while minimizing system resource consumption by using Resource Throttling. Resource Throttling properties are set for a web application in General Settings in Central Administration and affect resources allocated to querying and displaying lists within that web application.

Configuring your storage in ways so that when you view the contents of a library or folder the list view threshold is not exceeded prevents resource throttling and maximizes list view performance.

Resource Throttling includes the following properties that relate to list view performance:

Property Description Default value

List View Threshold

The maximum number of list or library items that a database operation, such as a query, can process at one time, outside the daily time window set by the administrator during which queries are unrestricted. We recommend that this property setting not be changed.

5000

Object Model Override

Specifies that users granted special permission can override the List View Threshold programmatically for particular queries.

Yes

List View Threshold for Auditors and Administrators

The maximum number of list or library items that a database operation, such as a query, can process at one time when it is done by an auditor or administrator with appropriate permissions. This setting works together with Allow Object Model Override.

20,000

List View Lookup Threshold

The maximum number of joins allowed per query, such as those based on lookup, Person/Group, or workflow status columns. If the query uses more than eight joins, the operation is blocked. This does not apply to single item operations. When using the maximal view via the OM (by not specifying any view fields), SharePoint will return up to the first eight lookups. We recommend that this property setting not be changed.

8

Daily Time Window for Large Queries

A time period in which large queries can be executed. Time period should be set outside regular working hours because large queries may cause too much server load.

Disabled

Additional resources

In addition to the information in this article, the following resources can help you understand and plan an enterprise content storage solution.