Scraping Financial Data with Selenium

Updated March 25, 2021. Published November 22, 2016. 18 Comments

Note: The following post is a significant step up in difficulty from the previous selenium-based post, Automate Your Browser: A Guided Selenium Adventure. Please see the start of that post for links on getting selenium set up if this is your first time using it. If you really do need financial data, there are likely easier ways to obtain it than scraping Nasdaq or Yahoo or Morningstar with selenium. Examples may include Quandl and Yahoo’s finance API, or perhaps building a scraper with scrapy and splash. And there are many proprietary (and expensive) databases out there that will provide such data. But in any case, I hope this post is helpful in demonstrating a few more of the practices involved in real-life webscraping. The full script is at the end of the post for your convenience.

One fine Monday morning, Todd is sipping a hot cup of decaf green tea, gazing out the office window in a state of Zen oneness as a Selenium script does his work for him. But just as he is on the brink of enlightenment, his boss, Mr. Peabody, bursts into his cubicle and barks, “TODD, quit daydreaming. I just got word from the CEO: we need quarterly financials on some of our competitors.” “Oh? What for?” “Some competitive analysis or something. We’ll be doing it on a regular basis. In any case, we need that data TODAY or YOU’RE FIRED!”

As Mr. Peabody stomps away Todd lets out a sigh. His morning had been going so well, but now it seems he has to actually do some work. He decides though that if he’s going to do work, he’s going to do everything in his power to make sure he never has to do that work again. Brainstorming sources of financial data, Todd figures he could get it from nasdaq.com as easily as anywhere else. He navigates to the quarterly income statement of the first company on the list, Apple (ticker symbol: AAPL).

http://www.nasdaq.com/symbol/aapl/financials?query=income-statement&data=quarterly

The first thing Todd notices is that the actual financial data table is being generated via JavaScript (look for the tags in the html). This means that Python packages such as lxml and beautiful soup, which don’t support javascript, won’t be much help here. Todd knows that selenium doesn’t make for the fastest webscraper, but because he only needs data on 5 companies (Amazon, Apple, Facebook, IBM, Microsoft), he still decides to write up another quick selenium script.

To start, he knows he needs to make some imports, initialize a dataframe to store his scraped data in, and launch the browser.

import pandas as pd
from numpy import nan
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

## create a pandas dataframe to store the scraped data
df = pd.DataFrame(index=range(20),
                  columns=['company', 'quarter', 'quarter_ending', 
                           'total_revenue', 'gross_profit', 'net_income', 
                           'total_assets', 'total_liabilities', 'total_equity', 
                           'net_cash_flow'])

## launch the Chrome browser
my_path = "C:\\Users\\gstanton\\Downloads\\chromedriver.exe"
browser = webdriver.Chrome(executable_path=my_path)
browser.maximize_window()

import pandas as pd

from numpy import nan

from selenium import webdriver

from selenium.webdriver.common.by import By

from selenium.webdriver.support.ui import WebDriverWait

from selenium.webdriver.support import expected_conditions as EC

## create a pandas dataframe to store the scraped data

df = pd.DataFrame(index=range(20),

columns=['company', 'quarter', 'quarter_ending',

'total_revenue', 'gross_profit', 'net_income',

'total_assets', 'total_liabilities', 'total_equity',

'net_cash_flow'])

## launch the Chrome browser

my_path = "C:\\Users\\gstanton\\Downloads\\chromedriver.exe"

browser = webdriver.Chrome(executable_path=my_path)

browser.maximize_window()

Next, Todd thinks about how he’s going to get from company page to company page. Observing the current page’s url, he sees that substituting in the company’s ticker symbol and desired financial statement at the appropriate places should allow him to navigate to all the pages he needs, no simulated-clicking required. He also sees a common pattern in the xpath for the financial data he’ll be scraping.

url_form = "http://www.nasdaq.com/symbol/{}/financials?query={}&data=quarterly" 
financials_xpath = "//tbody/tr/th[text() = '{}']/../td[contains(text(), '$')]"

## company ticker symbols
symbols = ["amzn", "aapl", "fb", "ibm", "msft"]

for i, symbol in enumerate(symbols):
    ## navigate to income statement quarterly page    
    url = url_form.format(symbol, "income-statement")
    browser.get(url)

url_form = "http://www.nasdaq.com/symbol/{}/financials?query={}&data=quarterly"

financials_xpath = "//tbody/tr/th[text() = '{}']/../td[contains(text(), '$')]"

## company ticker symbols

symbols = ["amzn", "aapl", "fb", "ibm", "msft"]

for i, symbol in enumerate(symbols):

## navigate to income statement quarterly page

url = url_form.format(symbol, "income-statement")

browser.get(url)

The first thing he wants to grab is the company ticker symbol, just so he can verify he’s scraping the correct page.

for i, symbol in enumerate(symbols):
    ## navigate to income statement quarterly page    
    url = url_form.format(symbol, "income-statement")
    browser.get(url)
    
    company_xpath = "//h1[contains(text(), 'Company Financials')]"
    try:
        company = WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.XPATH, company_xpath))).text
    except:
        company = nan

for i, symbol in enumerate(symbols):

## navigate to income statement quarterly page

url = url_form.format(symbol, "income-statement")

browser.get(url)

company_xpath = "//h1[contains(text(), 'Company Financials')]"

try:

company = WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.XPATH, company_xpath))).text

except:

company = nan

Notice the line for the assignment of the company variable. This tells the browser to check and see if the element is present, just as it normally would. If the element isn’t present, the browser will check again for the element every half second until the specified 10 seconds are up. Then it will throw an exception. This sort of precaution can be very useful for making your scrapers more reliable.

Examining the xpaths for the rest of the financial info, Todd sees that he will be collecting data points in groups of 4 (one data point for each quarter). To account for the possibility that some data might be missing, and to efficiently extract the text from the web elements, Todd writes the following function to simplify the scraping code.

## return nan values if elements not found, and convert the webelements to text
def get_elements(xpath):
    elements = browser.find_elements_by_xpath(xpath) # find the elements
    if len(elements) != 4: # if any are missing, return all nan values
        return [nan] * 4 
    else: # otherwise, return just the text of the element
        text = []
        for e in elements:
            text.append(e.text)
        return text

## return nan values if elements not found, and convert the webelements to text

def get_elements(xpath):

elements = browser.find_elements_by_xpath(xpath) # find the elements

if len(elements) != 4: # if any are missing, return all nan values

return [nan] * 4

else: # otherwise, return just the text of the element

text = []

for e in elements:

text.append(e.text)

return text

Todd then finishes the code to loop through each of the company symbols and get the quarterly financial data from each of the financial statements.

## company ticker symbols
symbols = ["amzn", "aapl", "fb", "ibm", "msft"]

for i, symbol in enumerate(symbols):
    
    ## navigate to income statement quarterly page    
    url = url_form.format(symbol, "income-statement")
    browser.get(url)
    
    company_xpath = "//h1[contains(text(), 'Company Financials')]"
    company = WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.XPATH, company_xpath))).text
    
    quarters_xpath = "//thead/tr[th[1][text() = 'Quarter:']]/th[position()>=3]"
    quarters = get_elements(quarters_xpath)
    
    quarter_endings_xpath = "//thead/tr[th[1][text() = 'Quarter Ending:']]/th[position()>=3]"
    quarter_endings = get_elements(quarter_endings_xpath)
    
    total_revenue = get_elements(financials_xpath.format("Total Revenue"))
    gross_profit = get_elements(financials_xpath.format("Gross Profit"))
    net_income = get_elements(financials_xpath.format("Net Income"))
    
    ## navigate to balance sheet quarterly page 
    url = url_form.format(symbol, "balance-sheet")
    browser.get(url)
    
    total_assets = get_elements(financials_xpath.format("Total Assets"))
    total_liabilities = get_elements(financials_xpath.format("Total Liabilities"))
    total_equity = get_elements(financials_xpath.format("Total Equity"))
    
    ## navigate to cash flow quarterly page 
    url = url_form.format(symbol, "cash-flow")
    browser.get(url)
    
    net_cash_flow = get_elements(financials_xpath.format("Net Cash Flow"))

## company ticker symbols

symbols = ["amzn", "aapl", "fb", "ibm", "msft"]

for i, symbol in enumerate(symbols):

## navigate to income statement quarterly page

url = url_form.format(symbol, "income-statement")

browser.get(url)

company_xpath = "//h1[contains(text(), 'Company Financials')]"

company = WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.XPATH, company_xpath))).text

quarters_xpath = "//thead/tr[th[1][text() = 'Quarter:']]/th[position()>=3]"

quarters = get_elements(quarters_xpath)

quarter_endings_xpath = "//thead/tr[th[1][text() = 'Quarter Ending:']]/th[position()>=3]"

quarter_endings = get_elements(quarter_endings_xpath)

total_revenue = get_elements(financials_xpath.format("Total Revenue"))

gross_profit = get_elements(financials_xpath.format("Gross Profit"))

net_income = get_elements(financials_xpath.format("Net Income"))

## navigate to balance sheet quarterly page

url = url_form.format(symbol, "balance-sheet")

browser.get(url)

total_assets = get_elements(financials_xpath.format("Total Assets"))

total_liabilities = get_elements(financials_xpath.format("Total Liabilities"))

total_equity = get_elements(financials_xpath.format("Total Equity"))

## navigate to cash flow quarterly page

url = url_form.format(symbol, "cash-flow")

browser.get(url)

net_cash_flow = get_elements(financials_xpath.format("Net Cash Flow"))

So for each iteration of the loop, Todd is collecting these data points. But he needs somewhere to store them. That’s where the pandas dataframe comes in. The following for loop ensures that the data is placed appropriately in the dataframe.

    ## fill the datarame with the scraped data, 4 rows per company
    for j in range(4):  
        row = (i * 4) + j
        df.loc[row, 'company'] = company
        df.loc[row, 'quarter'] = quarters[j]
        df.loc[row, 'quarter_ending'] = quarter_endings[j]
        df.loc[row, 'total_revenue'] = total_revenue[j]
        df.loc[row, 'gross_profit'] = gross_profit[j]
        df.loc[row, 'net_income'] = net_income[j]
        df.loc[row, 'total_assets'] = total_assets[j]
        df.loc[row, 'total_liabilities'] = total_liabilities[j]
        df.loc[row, 'total_equity'] = total_equity[j]
        df.loc[row, 'net_cash_flow'] = net_cash_flow[j]

## fill the datarame with the scraped data, 4 rows per company

for j in range(4):

row = (i * 4) + j

df.loc[row, 'company'] = company

df.loc[row, 'quarter'] = quarters[j]

df.loc[row, 'quarter_ending'] = quarter_endings[j]

df.loc[row, 'total_revenue'] = total_revenue[j]

df.loc[row, 'gross_profit'] = gross_profit[j]

df.loc[row, 'net_income'] = net_income[j]

df.loc[row, 'total_assets'] = total_assets[j]

df.loc[row, 'total_liabilities'] = total_liabilities[j]

df.loc[row, 'total_equity'] = total_equity[j]

df.loc[row, 'net_cash_flow'] = net_cash_flow[j]

After remembering to close the browser and write his dataframe to a .csv file, Todd has his scraper. Kicking his feet back up on his desk, he breathes a sigh of relief and continues his deep meditations on the nature of being while selenium once again does his work for him.

If you enjoyed this post be sure to subscribe, and let me know if you have any other topics you’d like to see covered. Full script below.

import pandas as pd
from numpy import nan
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

## return nan values if elements not found, and convert the webelements to text
def get_elements(xpath):
    elements = browser.find_elements_by_xpath(xpath) # find the elements
    if len(elements) != 4: # if any are missing, return all nan values
        return [nan] * 4 
    else: # otherwise, return just the text of the element
        text = []
        for e in elements:
            text.append(e.text)
        return text

## create a pandas dataframe to store the scraped data
df = pd.DataFrame(index=range(20),
                  columns=['company', 'quarter', 'quarter_ending', 
                           'total_revenue', 'gross_profit', 'net_income', 
                           'total_assets', 'total_liabilities', 'total_equity', 
                           'net_cash_flow'])

## launch the Chrome browser   
my_path = "C:\\Users\\astanton\\Downloads\\chromedriver.exe"
browser = webdriver.Chrome(executable_path=my_path)
browser.maximize_window()

url_form = "http://www.nasdaq.com/symbol/{}/financials?query={}&data=quarterly" 
financials_xpath = "//tbody/tr/th[text() = '{}']/../td[contains(text(), '$')]"

## company ticker symbols
symbols = ["amzn", "aapl", "fb", "ibm", "msft"]

for i, symbol in enumerate(symbols):
    
    ## navigate to income statement quarterly page    
    url = url_form.format(symbol, "income-statement")
    browser.get(url)
    
    company_xpath = "//h1[contains(text(), 'Company Financials')]"
    company = WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.XPATH, company_xpath))).text
    
    quarters_xpath = "//thead/tr[th[1][text() = 'Quarter:']]/th[position()>=3]"
    quarters = get_elements(quarters_xpath)
    
    quarter_endings_xpath = "//thead/tr[th[1][text() = 'Quarter Ending:']]/th[position()>=3]"
    quarter_endings = get_elements(quarter_endings_xpath)
    
    total_revenue = get_elements(financials_xpath.format("Total Revenue"))
    gross_profit = get_elements(financials_xpath.format("Gross Profit"))
    net_income = get_elements(financials_xpath.format("Net Income"))
    
    ## navigate to balance sheet quarterly page 
    url = url_form.format(symbol, "balance-sheet")
    browser.get(url)
    
    total_assets = get_elements(financials_xpath.format("Total Assets"))
    total_liabilities = get_elements(financials_xpath.format("Total Liabilities"))
    total_equity = get_elements(financials_xpath.format("Total Equity"))
    
    ## navigate to cash flow quarterly page 
    url = url_form.format(symbol, "cash-flow")
    browser.get(url)
    
    net_cash_flow = get_elements(financials_xpath.format("Net Cash Flow"))

    ## fill the datarame with the scraped data, 4 rows per company
    for j in range(4):  
        row = (i * 4) + j
        df.loc[row, 'company'] = company
        df.loc[row, 'quarter'] = quarters[j]
        df.loc[row, 'quarter_ending'] = quarter_endings[j]
        df.loc[row, 'total_revenue'] = total_revenue[j]
        df.loc[row, 'gross_profit'] = gross_profit[j]
        df.loc[row, 'net_income'] = net_income[j]
        df.loc[row, 'total_assets'] = total_assets[j]
        df.loc[row, 'total_liabilities'] = total_liabilities[j]
        df.loc[row, 'total_equity'] = total_equity[j]
        df.loc[row, 'net_cash_flow'] = net_cash_flow[j]
   
browser.quit()

## create a csv file in our working directory with our scraped data
df.to_csv("test.csv", index=False)

import pandas as pd

from numpy import nan

from selenium import webdriver

from selenium.webdriver.common.by import By

from selenium.webdriver.support.ui import WebDriverWait

from selenium.webdriver.support import expected_conditions as EC

## return nan values if elements not found, and convert the webelements to text

def get_elements(xpath):

elements = browser.find_elements_by_xpath(xpath) # find the elements

if len(elements) != 4: # if any are missing, return all nan values

return [nan] * 4

else: # otherwise, return just the text of the element

text = []

for e in elements:

text.append(e.text)

return text

## create a pandas dataframe to store the scraped data

df = pd.DataFrame(index=range(20),

columns=['company', 'quarter', 'quarter_ending',

'total_revenue', 'gross_profit', 'net_income',

'total_assets', 'total_liabilities', 'total_equity',

'net_cash_flow'])

## launch the Chrome browser

my_path = "C:\\Users\\astanton\\Downloads\\chromedriver.exe"

browser = webdriver.Chrome(executable_path=my_path)

browser.maximize_window()

url_form = "http://www.nasdaq.com/symbol/{}/financials?query={}&data=quarterly"

financials_xpath = "//tbody/tr/th[text() = '{}']/../td[contains(text(), '$')]"

## company ticker symbols

symbols = ["amzn", "aapl", "fb", "ibm", "msft"]

for i, symbol in enumerate(symbols):

## navigate to income statement quarterly page

url = url_form.format(symbol, "income-statement")

browser.get(url)

company_xpath = "//h1[contains(text(), 'Company Financials')]"

company = WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.XPATH, company_xpath))).text

quarters_xpath = "//thead/tr[th[1][text() = 'Quarter:']]/th[position()>=3]"

quarters = get_elements(quarters_xpath)

quarter_endings_xpath = "//thead/tr[th[1][text() = 'Quarter Ending:']]/th[position()>=3]"

quarter_endings = get_elements(quarter_endings_xpath)

total_revenue = get_elements(financials_xpath.format("Total Revenue"))

gross_profit = get_elements(financials_xpath.format("Gross Profit"))

net_income = get_elements(financials_xpath.format("Net Income"))

## navigate to balance sheet quarterly page

url = url_form.format(symbol, "balance-sheet")

browser.get(url)

total_assets = get_elements(financials_xpath.format("Total Assets"))

total_liabilities = get_elements(financials_xpath.format("Total Liabilities"))

total_equity = get_elements(financials_xpath.format("Total Equity"))

## navigate to cash flow quarterly page

url = url_form.format(symbol, "cash-flow")

browser.get(url)

net_cash_flow = get_elements(financials_xpath.format("Net Cash Flow"))

## fill the datarame with the scraped data, 4 rows per company

for j in range(4):

row = (i * 4) + j

df.loc[row, 'company'] = company

df.loc[row, 'quarter'] = quarters[j]

df.loc[row, 'quarter_ending'] = quarter_endings[j]

df.loc[row, 'total_revenue'] = total_revenue[j]

df.loc[row, 'gross_profit'] = gross_profit[j]

df.loc[row, 'net_income'] = net_income[j]

df.loc[row, 'total_assets'] = total_assets[j]

df.loc[row, 'total_liabilities'] = total_liabilities[j]

df.loc[row, 'total_equity'] = total_equity[j]

df.loc[row, 'net_cash_flow'] = net_cash_flow[j]

browser.quit()

## create a csv file in our working directory with our scraped data

df.to_csv("test.csv", index=False)

Comments

Taylor says

December 2, 2016 at 9:42 pm

Could you either point me to an XPath tutorial or create an XPath tutorial?

Thanks!

Reply
- Grayson Stanton says
  
  December 3, 2016 at 7:06 pm
  
  Hi there. Sure thing, here are a few. And here’s an overview of all the different options for selecting elements in selenium. I do plan to do an xpath tutorial at some point in the near future, so look for that. Best of luck!
  
  Reply
Jay says

April 18, 2017 at 1:54 pm

Just wanted to say thank you for providing such clear and helpful tutorials. I was stuck for a week on trying to understand selenium before I found your site.

Reply
- Grayson Stanton says
  
  April 18, 2017 at 4:09 pm
  
  Hi Jay. Thanks for your comment, I’m glad you found the tutorials helpful. Let me know if there’s any other subjects you’re stuck on, I’m always open to suggestions on what to cover.
  
  Reply
Jay says

April 18, 2017 at 4:46 pm

For some reason I’m only retrieving one quarter of data for amazon, apple, Facebook, IBM. But for Microsoft, I’m getting all four quarters. I tried to play around with the code but I haven’t been successful.

Reply
- Grayson Stanton says
  
  April 18, 2017 at 6:52 pm
  
  Ah, thanks for pointing that out, there was indeed a bug in the code. That last section of code where the data is being copied into the dataframe wasn’t properly indexing things and so data was being overwritten, except for Microsoft data because Microsoft was last. I also noticed I made the initial dataframe too big (I think originally I was going to do 10 companies, so 40 rows were needed) but that’s changed now too.
  
  Note that there is also a pop up that sometimes appears, which can result in a page’s data not being scraped. To get around this, you could try clicking to close the window, or maybe even reload the page if the elements aren’t detected (assuming the pop up isn’t likely to occur twice in a row). I’m planning to write another financial scraping post that shouldn’t have this problem and should be an improvement over this whole strategy, though hopefully this is still useful as an exercise in selenium.
  
  Thanks again, hope this helps.
  
  Reply
  - Jay says
    
    April 18, 2017 at 7:06 pm
    
    Thank you for getting back to me, everything works now. I think I’m going to try to add on to this and eventually try to create discounted cash flow model. Thanks again for the help, I appreciate it.
    
    Reply
michel Dupuis says

March 11, 2018 at 4:09 pm

Merci c’est le meilleur tutorial que j’ai trouvé sur le sujet!
https://www.nasdaq.com/symbol/dcix/financials
Juste un petit bug essayé avec “dcix” last (Period Ending:12/31/2016)!
Get Quarterly Data =>
There is currently no data for this symbol.
Thanks (Michel from France)

Reply
- Grayson Stanton says
  
  March 12, 2018 at 5:54 pm
  
  Hi Michel. It looks like, for some unspecified reason, that Nasdaq is currently missing that data for DCIX. However, Yahoo, Marketwatch, and Morningstar all appear to have it, so you might check those out:
  https://finance.yahoo.com/quote/DCIX/financials?p=DCIX
  https://www.marketwatch.com/investing/stock/dcix/financials/income/quarter
  http://financials.morningstar.com/income-statement/is.html?t=DCIX&region=usa&culture=en-US
  
  Reply
  - Michel says
    
    March 14, 2018 at 4:10 pm
    
    it’s morningstar that is the fastest to update the data
    example ADXS 12 march AMC => update 14mars nothing on the other site
    
    Can be directly at the source:
    https://www.sec.gov/edgar/searchedgar/companysearch.html
    search_bar = browser.find_element_by_xpath(“//label[@for=’cik’]”)
    search_bar.send_keys(‘ACY’) # ACY earn 13mars AMC
    Bug!!!!
    Maybe with your help!
    
    Reply
    - Grayson Stanton says
      
      March 15, 2018 at 12:33 am
      
      Hi Michel. Assuming you’re sending keys to the “Fast Search” bar, this worked for me:
      
      search_bar = browser.find_element_by_xpath("//input[@id='cik']") search_bar.send_keys('ACY')
      
      If you still get an error from this, make sure you have the most up-to-date gecko/chrome driver, browser, and selenium module. Let me know how this goes for you.
      
      Reply
    - Michel says
      
      March 15, 2018 at 8:19 am
      
      Bonjour,
      après quelques tâtonnements:
      
      search_bar = browser.find_element_by_xpath(“//*[@id=’fast-search’]/fieldset/input[@type=’text’]”)
      search_bar.send_keys(‘ACY’)
      search_button =WebDriverWait(browser, 5).until(EC.presence_of_element_located((By.XPATH, (“//*[@id=’fast-search’]/fieldset/input[@id=’cik_find’]”)))).click()
      dernier =WebDriverWait(browser, 5).until(EC.presence_of_element_located((By.XPATH, (“//*[@id=’interactiveDataBtn’]”)))).click()
      F_stat = WebDriverWait(browser, 5).until(EC.presence_of_element_located((By.XPATH, (“//*[@id=’menu_cat2′]”)))).click() #déroule menu
      cons_bal = WebDriverWait(browser, 5).until(EC.presence_of_element_located((By.XPATH, (“//*[@id=’r2′]/a”)))).click()
      
      Have a good day
      
      Reply
      - Grayson Stanton says
        
        March 15, 2018 at 4:36 pm
        
        Awesome, I’m glad you got that figured out. Best of luck!
        
        Reply

Michel Dupuis says

March 12, 2018 at 11:43 am

I try to do the same with yahoo without success …!
https://finance.yahoo.com/quote/adsk/financials?p=adsk
If you could help me?

f_xpath_yahoo = “//tbody/tr/td[text() = ‘Total Revenue’]/????
#copy xpath=> //*[@id=”Col1-1-Financials-Proxy”]/section/div[3]/table/tbody/tr[2]/td[1]/span
#copy selector => #Col1-1-Financials-Proxy > section > div.Mt\28 10px\29.Ovx\28 a\29.W\28 100\25 \29 > table > tbody > tr:nth-child(2) > td.Fz\28 s\29.H\28 35px\29.Va\28 m\29 > span
#copy element => Total Revenue
I am completely lost …!

Grayson Stanton says

March 12, 2018 at 7:11 pm

Hi Michel. Here’s part of the script which I’ve revised to work on Yahoo:

url_form = "https://finance.yahoo.com/quote/{0}/financials?p={0}" 
financials_xpath = "//table/tbody/tr/td[1]/span[text() = '{}']/../../td[position()>=2]"

## company ticker symbols

symbols = ["dcix"]

    for i, symbol in enumerate(symbols): # note that everything from here on is indented, not sure how to do that in the comments

    ## navigate to income statement quarterly page    
    url = url_form.format(symbol)
    browser.get(url)

    company_xpath = "//h1[contains(text(), {})]".format(symbol.upper())
    company = WebDriverWait(browser, 5).until(EC.presence_of_element_located((By.XPATH, company_xpath))).text

    quarters_button_xpath = "//button/div/span[contains(text(), 'Quarterly')]"
    quarters_button = WebDriverWait(browser, 5).until(EC.presence_of_element_located((By.XPATH, quarters_button_xpath))).click()

    quarter_endings_xpath = "//table/tbody/tr[1]/td[position()>=2]/span"
    quarter_endings = get_elements(quarter_endings_xpath)

    quarters = ['4th', '3rd', '2nd', '1st']

    total_revenue = get_elements(financials_xpath.format("Total Revenue"))
    gross_profit = get_elements(financials_xpath.format("Gross Profit"))
    net_income = get_elements(financials_xpath.format("Net Income"))</code>

Does this make any sense? See if you can figure the rest out from here. If not, let me know. Best of luck!

Michel says

March 13, 2018 at 1:32 pm

Merci,
ça fonctionne impeccable!
Juste les fenêtres graphiques un peut longue à s’ouvrir…
Mais comme PHANTOMJS ne fonctionne plus!
Pas d’autres solutions
Thanks again

Reply

Joshua Thompson says

July 2, 2021 at 3:32 am

Hello, is there a way to adapt this code to Google Colab? Thank you very much!

Reply
- Grayson Stanton says
  
  July 6, 2021 at 8:25 pm
  
  See if this stackoverflow question helps at all: https://stackoverflow.com/questions/51046454/how-can-we-use-selenium-webdriver-in-colab-research-google-com
  
  Reply

Reader Interactions

Comments

Leave a comment Cancel reply