Alternative Data 101 for Students and Professors

I recently met with a friend who’s an economics professor, who agreed with me that one of the best ways for rising academics to make a mark is to acquire and mine new data sets. I prepared for him a summary of resources to help students source and understand new, alternative data sets. If you’re looking for research ideas, I suggest peruse Versatile VC’s public research agenda.  

Depending on the researcher’s reputation, goals with the data, and other parameters, you can get access to very useful data at no charge. But Alexey Loganchuk, co-founder, Augvest and Sidera Labs, warns:

“There is a selection bias at play here: the companies most willing to share data with academics tend to have some of the least useful / worst data out there. To get around this, I would advise academics interested in alt data research to contact resellers like Earnest Analytics or System2 to see if there are datasets they are actively looking to bring to market and could use help with; the resellers then act as a pre-screen on data quality.”

Jordan Hauer, Co-Founder & CEO, Amass Insights, said,

“Generally, if you can prove you are only planning to use the data for academic purposes, and you offer to share the research and insights you create back to the data provider, the provider will likely be open to providing their data products to you free of charge. Most data providers are looking to publish additional examples of the use cases for their data products. The best strategy is to propose a use case for that provider’s offerings that the provider is interested in further marketing. Should the provider decide to utilize/publish your research, your research can attain much greater exposure. I’m willing to give complimentary advice on sourcing data to academics if they are interested.”

I suggest you start by looking through the data that is offered by your university’s library system (a lot of which is expensive commercially-available data) and then the major searchable databases of alternative data, e.g., Wharton Research Data Services (WRDS, which specializes in academic datasets), Amass Insights, and numerous others. The data platforms/aggregators (Snowflake Marketplace, Knoema, AWS Data Exchange, Google Cloud Platform (GCP), Bloomberg Enterprise Access Point (EAP), Dawex) also provide access to many datasets. You could also contact some of the firms which track and provide access to a wide array of data vendors, who can help identify new data sources for your specific needs, e.g., BattleFin, Eagle Alpha and Neudata. The buy-side community has published a set of standards and guidelines that may help researchers pursue their research efficiently and ethically; see FISD Alternative Data Council.

To learn more, join some of the major conferences and communities in the space, e.g.: 

Jim McVeigh, CEO, Cyndx, observed, “We have developed sophisticated algorithms, for example, our Cyndx score, which predicts likelihood of success for capital raise or exit. 200 vectors influence it, summarized into 6 main vectors. We’d be interested in partnering with academics who could mine this to develop signals to help CEOs make better managerial decisions about their business.”

Get invites to exclusive events, jobs, and research.

I’m an investor in about 100 private companies. Many of them sit on data sets that are mineable for academic purposes. Two are explicitly in the data business:

  • Earnest Analytics:  They source, clean, and analyze a range of proprietary data sets. For example, Jacob A. Robbins from the University of Illinois at Chicago, Economics department, wrote a paper on: “COVID-19 and Real-Time Consumer Spending”. Zach Amsel, an executive at Earnest, guests lectures at Columbia Business School annually on applying data to business cases. They also have a relationship with Emory Professor Dan McCarthy who uses and cites their data often (examples).
  • Drop Technologies: They have directly opted-in credit and debit card transaction data, detailed at Cardify.ai.

Some of the companies in which I’m an investor via my prior VC funds are generating and sometimes selling large data sets as an organic byproduct of their core business, e.g.:

For further reading:

Thanks for their contributions to this essay: Max Colas, Director of Marketing, RavenPack; Chris Petrescu, Founder and CEO at CP Capital; Michael Beal, co-chair, FISD alternative data council; and Ken Perry, Founder, Slashrisk, Adjunct Professor at NYU Tandon School of Engineering.

Get invites to exclusive events and research.