An API request requires variable codes of the selected variables and ucgid (geographic code). This notebook is used to retrieve variable codes in ACS tables.
The U.S. Census Bureau provides variable metadata in JSON format. These JSON format documentations were downloaded and saved as .json files in project folder:
- b_variables.json was downloaded from Detailed Tables Variables.
- s_variables.json was downloaded from Subject Tables Variables.
- dp_variables.json was downloaded from Data Profiles Variables.
These 3 files are used to look up variable codes based on table IDs and variable labels.
import json
import pandas as pd
# Load the feature metadata for Detailed Tables
with open ('b_variables.json', 'r') as file:
b_variables = json.load(file)
b_var = b_variables['variables']
b_var = pd.DataFrame(b_var).transpose().reset_index()
b_var.rename(columns={'index':'var_code'}, inplace=True)
b_var.tail()
var_code | label | concept | predicateType | group | limit | predicateOnly | hasGeoCollectionSupport | attributes | required | |
---|---|---|---|---|---|---|---|---|---|---|
36719 | B20005I_071E | Estimate!!Total:!!Female:!!Worked full-time, y... | Sex by Work Experience in the Past 12 Months b... | int | B20005I | 0 | NaN | NaN | B20005I_071EA,B20005I_071M,B20005I_071MA | NaN |
36720 | B08113_054E | Estimate!!Total:!!Worked from home:!!Speak oth... | Means of Transportation to Work by Language Sp... | int | B08113 | 0 | NaN | NaN | B08113_054EA,B08113_054M,B08113_054MA | NaN |
36721 | B06009_006E | Estimate!!Total:!!Graduate or professional degree | Place of Birth by Educational Attainment in th... | int | B06009 | 0 | NaN | NaN | B06009_006EA,B06009_006M,B06009_006MA | NaN |
36722 | B05009_012E | Estimate!!Total:!!Under 6 years:!!Living with ... | Age and Nativity of Own Children Under 18 Year... | int | B05009 | 0 | NaN | NaN | B05009_012EA,B05009_012M,B05009_012MA | NaN |
36723 | B02021_001E | Estimate!!Total Groups Tallied: | American Indian and Alaska Native Alone or in ... | int | B02021 | 0 | NaN | NaN | B02021_001EA,B02021_001M,B02021_001MA | NaN |
# Load the feature metadata for Subject Tables
with open ('s_variables.json', 'r') as file:
s_variables = json.load(file)
s_var = s_variables['variables']
s_var = pd.DataFrame(s_var).transpose().reset_index()
s_var.rename(columns={'index':'var_code'}, inplace=True)
s_var.tail()
var_code | label | concept | predicateType | group | limit | predicateOnly | hasGeoCollectionSupport | attributes | required | |
---|---|---|---|---|---|---|---|---|---|---|
18642 | S2402_C02_035E | Estimate!!Male!!Full-time, year-round civilian... | Occupation by Sex for the Full-Time, Year-Roun... | int | S2402 | 0 | NaN | NaN | S2402_C02_035EA,S2402_C02_035M,S2402_C02_035MA | NaN |
18643 | S1002_C04_004E | Estimate!!60 years and over!!Percent distribut... | Grandparents | float | S1002 | 0 | NaN | NaN | S1002_C04_004EA,S1002_C04_004M,S1002_C04_004MA | NaN |
18644 | S0601_C02_009E | Estimate!!Native; born in state of residence!!... | Selected Characteristics of the Total and Nati... | float | S0601 | 0 | NaN | NaN | S0601_C02_009EA,S0601_C02_009M,S0601_C02_009MA | NaN |
18645 | S2411_C01_012E | Estimate!!Median earnings (dollars)!!Civilian ... | Occupation by Sex and Median Earnings in the P... | int | S2411 | 0 | NaN | NaN | S2411_C01_012EA,S2411_C01_012M,S2411_C01_012MA | NaN |
18646 | S2302_C04_014E | Estimate!!Percent Families with own children u... | Employment Characteristics of Families | int | S2302 | 0 | NaN | NaN | S2302_C04_014EA,S2302_C04_014M,S2302_C04_014MA | NaN |
# Load the feature metadata for Data Profiles
with open ('dp_variables.json', 'r') as file:
dp_variables = json.load(file)
dp_var = dp_variables['variables']
dp_var = pd.DataFrame(dp_var).transpose().reset_index()
dp_var.rename(columns={'index':'var_code'}, inplace=True)
dp_var.tail()
var_code | label | concept | predicateType | group | limit | predicateOnly | hasGeoCollectionSupport | attributes | required | |
---|---|---|---|---|---|---|---|---|---|---|
1384 | DP02PR_0001PE | Percent!!HOUSEHOLDS BY TYPE!!Total households | Selected Social Characteristics in Puerto Rico | int | DP02PR | 0 | NaN | NaN | DP02PR_0001PEA,DP02PR_0001PM,DP02PR_0001PMA | NaN |
1385 | DP03_0039PE | Percent!!INDUSTRY!!Civilian employed populatio... | Selected Economic Characteristics | float | DP03 | 0 | NaN | NaN | DP03_0039PEA,DP03_0039PM,DP03_0039PMA | NaN |
1386 | DP02_0098E | Estimate!!YEAR OF ENTRY!!Population born outsi... | Selected Social Characteristics in the United ... | int | DP02 | 0 | NaN | NaN | DP02_0098EA,DP02_0098M,DP02_0098MA | NaN |
1387 | DP04_0095PE | Percent!!SELECTED MONTHLY OWNER COSTS (SMOC)!!... | Selected Housing Characteristics | float | DP04 | 0 | NaN | NaN | DP04_0095PEA,DP04_0095PM,DP04_0095PMA | NaN |
1388 | DP02_0036PE | Percent!!MARITAL STATUS!!Females 15 years and ... | Selected Social Characteristics in the United ... | float | DP02 | 0 | NaN | NaN | DP02_0036PEA,DP02_0036PM,DP02_0036PMA | NaN |
Variables in this project are gathered from 3 different types of tables: Subject Tables, Detailed Tables, Data Profiles.
A variable code starts with a table ID with the first letter representing the table type: "S" for Subject Tables, "B" for Detailed Tables, "DP" for Data Profiles. For examples: S1701_C03_001E, DP03_0009PE, etc.
Relevant variables are selected by browsing tables and variable labels on https://data.census.gov/.
To retrieve variable code of a selected variable, run the below code and enter table ID, column names, and labels as prompted (case insensitive; leading and trailing spaces are accepted).
Example: to get variable code for poverty rate:
- Input:
Table ID: s1701
Parent column: percent below poverty level
Child column: estimate
Parent label: Population for whom poverty status is determined
Child label 1:Press Enter to skip
Child label 2:Press Enter to skip
Child label 3:Press Enter to skip
- Output:
Table: S1701
Variable label: Estimate!!Percent below poverty level!!Population for whom poverty status is determined
Variable code: S1701_C03_001E
def input_prompt ():
'''Prompt user to input table ID and label's components'''
table = input('Table ID:')
parent_col = input('Parent column:')
child_col = input('Child column:')
parent_label = input('Parent label:')
child_label_1 = input('Child label 1:')
child_label_2 = input('Child label 2:')
child_label_3 = input('Child label 3:')
# A list of label's components
components = [child_col, parent_col, parent_label, child_label_1, child_label_2, child_label_3]
return table, components
def get_var_label (components):
''' Return variable's label by assembling user's inputs'''
label = []
for i in components:
# Filter out empty elements
if i != '':
label.append(i.strip())
if len(label) <= 1: # A label should have at least 2 components
var_label = None
else:
var_label = '!!'.join(label) # Components are assembled and separated by "!!"
return var_label
def get_var_id (table, components):
'''Call get_var_label function to retrieve variable label.
Validate the format of table ID and variable label.
If the table ID and variable label are valid, look up variable code based on table ID and variable label.'''
var_label = get_var_label(components)
if not table or not table.lower().startswith(('b','s','dp')) or var_label == None:
raise ValueError('Invalid inputs!')
else:
if table.lower().startswith('b'):
result = b_var.loc[(b_var['group'].str.contains(table, case=False)) & (b_var['label'].str.lower() == var_label.lower())]
elif table.lower().startswith('s'):
result = s_var.loc[(s_var['group'].str.contains(table, case=False)) & (s_var['label'].str.lower() == var_label.lower())]
elif table.lower().startswith('dp'):
result = dp_var.loc[(dp_var['group'].str.contains(table, case=False)) & (dp_var['label'].str.lower() == var_label.lower())]
print(f'Table: {result['group'].values[0]}')
print(f'Variable label: {result['label'].values[0]}')
print(f'Variable code: {result['var_code'].values[0]}')
if __name__=='__main__':
try:
table, components = input_prompt()
get_var_id(table, components)
except ValueError as excpt:
print(excpt)
except:
print('Variable not found! No matching table or label.')
Table: S1701 Variable label: Estimate!!Percent below poverty level!!Population for whom poverty status is determined Variable code: S1701_C03_001E