Jump to research

Composed by

Profile picture

Anonymous Walrus

Views

61

November 07, 2024

best website for scraping financial data

I explored several discussions on Reddit and one article from AI Multiple's research section to collect information on the best websites for scraping financial data. These discussions spanned a range of topics, from general advice on web scraping techniques to specific recommendations for financial data sources. There was a broad consensus that web scraping is a valuable tool for financial analysis, although the legality and ethics of the practice were also discussed. Despite the wide-ranging nature of the discussions, a few websites and services were consistently highlighted as useful for financial data scraping. My analysis of this information led me to focus on a few key options, including Finviz, Financial Modeling Prep, the SEC EDGAR database, and several web scraping service providers. However, due to the legal and ethical considerations involved, the choice of website or service will depend on the specific needs and comfort level of the user.

Have an opinion? Send us proposed edits/additions and we may incorporate them into this article with credit.

Confidence

80%

Words

504

Time

1m 47s

Contributors

130

Words read

14.6k

Finviz

Finviz

Finviz, an online stock screener, was recommended by Reddit users for its comprehensive statistics, news, and ratings. It was seen as a great source of financial data for scraping. Some users suggested that Finviz offers unique features that make it stand out among other financial data sources. However, it's important to note that one user pointed out that the Finviz news API only provides a subset of the news. To get all the news on Finviz, they suggested learning Python and using it to call the API and parse the data.
Financial Modeling Prep

Financial Modeling Prep

Financial Modeling Prep was another website recommended by Reddit users. This platform offers simple APIs with many endpoints, which allow users to access a wide range of data. The pricing was also seen as quite reasonable, with one user estimating a cost of around $20 per month. However, it's worth noting that the specific data you need and how you plan to use it will determine whether this is the best option for you.
SEC EDGAR Database

SEC EDGAR Database

One Reddit user shared their experience of scraping ~12 years of financial data from the SEC EDGAR database, which covers over 3000 stocks. The data was already in a structured format, being taken from the SEC EDGAR's XBRL data files, which contain US-GAAP common accounting terms that can be parsed by XML parsing. This data was useful for long-term plays, possibly identifying under/overvalued stocks or trends, but less so for algo day trading. The scraped data is available on Kaggle and can be downloaded by anyone with a Kaggle account.

Web Scraping Service Providers

AI Multiple's research article listed numerous web scraping service providers, such as Bright Data, Oxylabs, and Nimble, which offer various services like proxies, scraper APIs for specific pages, and a generalist scraper API. These services could be especially useful for those who need more customized scraping requirements or who prefer not to build and manage their own scraping tools. Although these services come with a cost, they may provide a more efficient and reliable way to access and collect financial data.

Jump to top

Research

"https://research.aimultiple.com/web-scraping-for-finance/"

  • Web scraping is a valuable tool for financial and market research, and is expected to reach a spending of $2B by 2020.
  • A list of the top financial data scrapers of 2024 is provided, including their pricing and services:
    • Bright Data offers proxies, API and IDE for web scraping, keywords automating and click-through testing.
    • Oxylabs offers proxies, scraper APIs for specific pages and a generalist scraper API.
    • Nimble offers proxies, scraper APIs for specific pages and a generalist scraper API.
    • NetNut offers proxies, search engine result page (SERP) scraper API.
    • Smartproxy offers proxies, scraper APIs and a generalist no-code scraper.
    • Diffbot offers a generalist scraper API.
    • Octoparse offers a generalist no-code scraper.
    • Scraper API offers a generalist scraper API.
    • SOAX offers proxies, scraper APIs for specific pages and a generalist scraper API.
    • Zyte offers a generalist scraper API.
  • Web scraping is important for finance as it helps to analyze the current status of the financial market, uncover market changes and trends, monitor news that may affect stocks and economics, and evaluate consumer sentiment and behavior.
  • Some of the use cases of web scraping in finance include:
    • Equity research: Web scrapers gather data about businesses and companies such as market prices, inventory data, clients’ portfolios, product data, product reviews, and company news, etc. to be used for analysis by an equity researcher.
    • Credit ratings: Web scrapers can aggregate data about a business’ financial status from company online resources as well as online public records in order to calculate a data-driven credit rating score which is especially useful for institutional investors, banks, and asset managers.
    • Venture capital funding: Venture capitalists can leverage web scraping to create start-up lists and collect data about their funding from websites such as TechCrunch or CrunchBase.
    • Compliance: Scraping government and news outlets enables financial institutions to keep track of regulations and policy changes to ensure compliance.
    • Market sentiment analysis: Automating the extraction of relevant data using web scrapers provides businesses with constant updates about the general population’s sentiment about specific products or brands, and enables financial leaders to predict the success or failure of certain stocks or ETFs in the market.
  • Web scrapers can collect a variety of financial data which includes industry data, stock market data, company data, news data and alternative data.

"I scraped ~12 years of Financials Data from the SEC EDGAR database for 3000+ stocks"

  • Someone scraped ~12 years of financial data from the SEC EDGAR database which covers 3000+ stocks using Python.
  • The data was scraped from the SEC EDGAR’s XBRL data files hence in a structured format already.
  • SEC should be fine with people scraping their XBRL data because it is essentially public information.
  • The scraped data is from quarterly reports publicly available and easily accessible from the SEC.
  • This type of data is useful for long-term plays, possibly identifying under/overvalued stocks or trends, but less so for algo day trading.
  • The purpose of this scrape was for backtesting day trading strategies, identifying trends, and identifying under/overvalued stocks.
  • The scraped data is available on Kaggle and anyone with a Kaggle account can download multiple versions.
  • The scrape took around 10 hours and the frequency of the scraped data is quarterly.
  • The data has missing fields, for example, the revenue column has many empty rows.
  • Each SEC financials data vendor has the same data because they are from the same source, and hence, there is too much competition in the market for this data.
  • The purpose of the scrape is not for financial gain or selling the data, as there are already many vendors for this data. Rather it is an attempt to share the dataset with other investors or traders to help innovate their strategies.
  • There is a comment asking if SEC approves scraping of their data.
  • The scraped XBRL data contains US-GAAP common accounting terms that can be parsed by XML parsing.
  • Someone requested the link to the script that was used to scrape the data from SEC EDGAR.
  • Some other information includes comments like “huge crash. puts on TSLA @200 aug12th” and “You son of a bitch, I’m in” which are not relevant to the query!

"best websites to scrape financial news from"

  • Reddit users in r/algotrading discussed scraping financial news for building a sentiment trader and paper trading.
  • One user suggested subscribing to RSS feeds for collecting financial data into a database to analyze.
  • Several financial news APIs were suggested, including Finviz, Yahoo, and Financial Modeling Prep.
  • Some users suggested using Selenium and document digitalization solutions as potential alternatives to scraping.
  • Users shared their opinions on different websites/platforms for financial data and bot building tools.
  • Some users suggested Finviz as a great website with comprehensive statistics, news, and ratings. It was also suggested as an easy source of financial data for scraping.
  • Financial Modeling Prep was recommended since it provides simple APIs with many endpoints that are relatively cheap.
  • Some users recommended Gurufocus, Investing.com, and Tradexchange for general market data.
  • Users shared their tips on using signals from news sentiment, particularly in the context of building a profitable trading bot.
  • One user suggested learning programming first, noting that the choice of language doesn’t really matter and to use what you are most comfortable with, such as functional programming, Python, or MATLAB.

"My ultimate guide to web scraping"

  • A reddit user shared their ultimate guide to web scraping for a data science project that covers basic news scraping to massive data analysis and NLP
  • Tips were shared on how to organize HTML content for large web scraping projects and how to save html files in case something goes wrong with the scraping.
  • Other users in the comments requested additional guidance on scheduling scraper jobs and merging the resulting data.
  • A debate in the comments arose about the ethics of scraping websites that prohibit web scraping through robots.txt files.
  • Users also brought up the prevalence of JavaScript on modern websites and the use of Selenium for scraping such sites.
  • Questions were asked about scraping different tables from a website that uses the same element tags for different tables but without any distinct features.
  • Examples were shared on how to scrape information from a sports’ page with a bracket system and accessing or scraping underlying data in spreadsheets behind the numerous data visualizations.
  • Questions were also asked about how people find and connect with web scraping experts for help.

"Is Web Scraping Legal? A Comprehensive Review Of The Legality Of Web Scraping In 2021"

  • The title of the video that the transcript describes is “Is Web Scraping Legal? A Comprehensive Review Of The Legality Of Web Scraping In 2021”.
  • One user believes that if web scraping was illegal, we would be stuck in a technological time warp, as search engines, information exchange, and progress in general are largely based on web scraping.
  • Another user says that judges have ruled in favor of companies being legally unable to prevent web scraping, even of semi-private data, although they also suggest that this may be an encroachment on an individual’s right to privacy.
  • A third user posits that if web scraping was illegal, Google would no longer exist.
  • A fourth user argues that if data is made publicly accessible, it should be “scrapable”. They suggest that complaining about others reading information that was deliberately put in public view is illogical.
  • One user comments that they love fishing, despite the subreddit not being about the topic.
  • Another user shares a link to a blog post that they believe might be helpful to those seeking to evaluate the legal risk of web scraping projects. They also advertise a data extraction service.
  • One user takes issue with the blog post, suggesting that it is not a definitive guide and does not cover the relevant legislation in enough detail. They also question how the technology would be able to guarantee customers that data is not being scraped illegally.
  • This user further takes interest in the accuracy of identifying “data” as “personal data” as opposed to “non-personal data”.
  • The same user provides a sci-hub link to an article about the legal framework surrounding data protection in the EU.
  • Another user comments that they would be out of a job if web scraping were illegal.
  • Overall, the transcript does not provide any specific information about the best website(s) to scrape financial data from. Rather, it primarily focuses on the legality of web scraping and related issues.

"[deleted by user]"

  • A Reddit user says web scraping could be against a website’s Terms of Service (ToS) if the data is scraped after login and then resold as a business.
  • Another Reddit user confirms that webscraping is mostly against ToS, but not strictly illegal, as long as the data is collected in an ethical manner, and the target website is not overloaded or accessed behind logins.
  • A separate Reddit user claims that web scraping cannot be illegal, as the collected data is not that sensitive.
  • A Reddit user recommends reading and respecting the robots.txt and legal framework to avoid any legal trouble when web scraping. They provide links to PromptCloud’s articles about the legality of web crawling and how to read and respect robots.txt.
  • To answer a question about scraping financial data, one Reddit user suggests taking one of three approaches: using your own scraper, using web scraping tools available in the market, or opting for web scraping service providers for more customized scraping requirements.
  • One Reddit user who works for SerpApi promotes their service that provides data scraping for stock data on their “answer_box” field, with a link to their legal and Terms of Service.
  • A Reddit user warns that web scraping, although not illegal, is a grey area because each website has its own ToS.
  • Another Reddit user admits that web scraping excites them because it is illegal, and they claim to have successfully broken some Cloudflare protection material like email and other stuff protected from scraping.
  • A Reddit user challenges the previous user to verify their claim. The challenged user refers the challenger to their blog post about how to decode Cloudflare email addresses.
  • Another Reddit user clarifies that just because a website does not want their data scraped, it does not necessarily make scraping illegal.

Overall, the transcript provides information about the legality and ethical considerations of webscraping, as well as some recommendations for how to approach web scraping financial data. The discussion also touched upon some users’ experiences and opinions, including some who see web scraping as a grey area and/or an exciting challenge.

"Legal risk from scraping?"

  • The text discusses the legal risks associated with webscraping and provides some general guidelines for avoiding legal issues. Some comments also touch on the specific legal and ethical considerations for scraping when it comes to data privacy, IP, and PII.
  • Some tips for web scraping ethically and responsibly include: checking the robots.txt file, not overloading the server, respecting terms of service, staying up-to-date with changing page structures, leveraging APIs or alternate means for accessing data, and safeguarding any data you scrape, particularly sensitive or private data with protection laws.
  • Some specific tips for avoiding legal issues when webscraping include being mindful of DDoS attacks, respecting copyright laws, and approaching the owner of a website to find a compromise or more manageable approach for scraping their data.
  • One user warns that many data companies intentionally poison their databases with fake data to catch and sue scrapers, even if they are not violating provisions within robots.txt files.
  • A user suggests that website policies and click-wrap terms of service agreements may have different levels of legal binding for scrapers than for typical website users.
  • A legal professional suggests in a separate blog post linked within the comments that scrapers may also be subject to liability if the scraped data is used for marketing or if the content of the data is altered by the scraper.
  • The comments contain assertions for and against whether scraping public data is intrinsically legal or illegal, depending on the specific situation.
  • While the transcript does not provide any financial data scraping tips specifically, the general advice on legal and ethical considerations for scraping may be helpful for anyone scraping any type of data.
  • Finally, it seems that undercutting the legal gray area of webscraping may be easiest by simply leveraging accessible public APIs or alternate means for accessing data, where possible.

💭  Looking into

How to scrape financial data safely and legally using scraping tools

💭  Looking into

List of the top 5 websites for scraping financial data, with brief reviews of each