All summary statistics by sector
You can apply the various summary statistics that you have learned about in the last chapter to a groupby
object to obtain the result on a per-category basis. This includes the .describe()
function, which provides several insights all at once!
Here, you will practice this with the NASDAQ listings. pandas
has been imported as pd
, and the NASDAQ stock exchange listings data is available in your workspace in the nasdaq
DataFrame.
This exercise is part of the course
Importing and Managing Financial Data in Python
Exercise instructions
- Inspect the
nasdaq
data using.info()
. - Create a new column
market_cap_m
that contains the market cap in millions of USD. On the next line, drop the column'Market Capitalization'
. - Group your
nasdaq
data by'Sector'
and assign tonasdaq_by_sector
. - Call the method
.describe()
onnasdaq_by_sector
, assign tosummary
, and print the result. - This works, but
result
is in long format and uses apd.MultiIndex()
that you saw earlier. Convertsummary
to wide format by calling.unstack()
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Inspect NASDAQ data
nasdaq.____()
# Create market_cap_m
nasdaq['market_cap_m'] = ____[____].div(1e6)
# Drop the Market Capitalization column
nasdaq.drop('Market Capitalization', axis=1, inplace=True)
# Group nasdaq by Sector
nasdaq_by_sector = ____.____(____)
# Create summary statistics by sector
summary = ____.____()
# Print the summary
print(summary)
# Unstack
summary = ____.____()
# Print the summary again
print(summary)