Survivor-Bias-Free US Mutual Fund Guide for SAS and ASCII

PDF Download:
CRSP Survivor-Bias-Free US Mutual Fund Guide for SAS and ASCII


The CRSP Mutual Fund Database is designed to facilitate research on the historical performance of open-ended mutual funds by using survivor-bias-free data.

The CRSP Survivor-Bias-Free US Mutual Fund Database includes a history of each mutual fund’s name, investment style, fee structure, holdings, and asset allocation. Also included are monthly total returns, monthly total net assets, monthly/daily net asset values, and dividends. Additionally, schedules of rear and front load fees, asset class codes, and management company contact information are provided. All data items are for publicly traded open-end mutual funds and begin at varying times between 1962 and 2008 depending on availability. The database is updated quarterly and distributed with a monthly lag. It is delivered in ASCII and SAS formats.

File Overview

Data Accuracy for the CRSP Survivor-Bias-Free Mutual Fund Database

The CRSP Mutual Fund files are designed for research and educational use. CRSP expends considerable resources in the ongoing effort to check and improve data quality both historically, and in each current update. Data corrections to historical information are made as errors are identified and are detailed in the release notes that accompany each data cut.

Utilizing Lipper and other data as sources for the mutual fund database, CRSP is able to do extensive data cross-checking. Quality Assurance and Quality Control procedures have been used throughout the process of updating the CRSP mutual fund database with data from new sources. This included but was not limited to developing and carrying out testing plans based on process requirements and design and assuring that all steps of the process are documented and executed accordingly. Results were independently verified by a dedicated group of database researchers which included random sample selection when appropriate.

Known Biases in Mutual Fund Data

The returns histories are sometimes duplicated in the database. For example, if a fund started in 1962 and split into four share classes in 1993, each new share class of the fund is permitted to inherit the entire return/performance history. This can create a bias when averaging returns across mutual funds.

A selection bias favoring the historical data files of the best past performing private funds that became public does exist. The SEC has recently begun permitting some funds (and eventually probably all funds) with prior returns histories as private funds to add these returns onto the beginning of their public histories. The effect of this is that only the successful private fund histories are included in the database.

File Development and Data Sources

The CRSP Mutual Fund Database was created in 3 stages.

The original CRSP Mutual Fund Database contained open-end mutual fund data beginning December 1961 through December 1995. The database was developed by Mark M. Carhart for his 1995 dissertation submitted to the Graduate School of Business entitled, “Survivor Bias and Persistence in Mutual Fund Performance,” to fill a need for lacking data coverage. Funding of the original project was provided by Eugene F. Fama and the Center for Research in Security Prices.

The Center for Research in Security Prices continued Mr. Carhart’s work after his graduation. Historical data in the database were collected from printed sources, including the Fund Scope Monthly Investment Company Magazine, the Investment Dealers Digest Mutual Fund Guide, Investor’s Mutual Fund Guide, the United and Babson Mutual Fund Selector, and the Wiesenberger Investment Companies Annual Volumes.

The data were compiled into an annual list of active mutual fund names and attributes, along with organizational history such as name changes, mergers, and liquidations. Monthly returns were calculated back to January 1962. Funds that were not in the Wiesenberger Investment Companies Annual Volumes or other printed materials were added, although instances of this were rare. As the last step in this second stage, the data were checked against original and secondary sources for any unusual entries and typographical errors.

Beginning with the December 2007 iteration of the database, current and historical data back to August of 1998 are provided electronically by Lipper and Thomson Reuters. New fund style data items have been added to the original database.