Median market capitalization by sector
Aggregate data is data combined from several measurements. As you learned in the video, the .groupby()
function is helpful in aggregating your data by a specific category.
You have seen previously that the market capitalization data has large outliers. To get a more robust summary of the market value of companies in each sector, you will calculate the median market capitalization by sector. pandas
as pd
and matplotlib.pyplot
as plt
have been imported, and the NYSE stock exchange listings are available in your workspace as the DataFrame nyse
.
This exercise is part of the course
Importing and Managing Financial Data in Python
Exercise instructions
- Inspect
nyse
using.info()
. - With broadcasting and
.div()
, create a new columnmarket_cap_m
that contains the market capitalization in million USD. - Omit the column
'Market Capitalization'
with.drop()
. - Apply the
.groupby()
method tonyse
, using'Sector'
as the column to group your data by. - Calculate the median of the
market_cap_m
column asmedian_mcap_by_sector
. - Plot the result as a horizontal bar chart with the title
'NYSE - Median Market Capitalization'
. Useplt.xlabel()
with'USD mn'
to add a label. - Show the result.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Inspect NYSE data
nyse.____()
# Create market_cap_m
nyse['market_cap_m'] = ____[____].div(1e6)
# Drop market cap column
nyse = ____.____('Market Capitalization', axis=1)
# Group nyse by sector
mcap_by_sector = ____.____(____)
# Calculate median
median_mcap_by_sector = mcap_by_sector.____.____()
# Plot and show as horizontal bar chart
median_mcap_by_sector.plot(____=____, title='NYSE - Median Market Capitalization')
# Add the label
plt.____('USD mn')
# Show the plot
plt.show()