At rOpenSci’s annual unconf, I suggested a project to work on functions that convert physical growth measurements into z-scores and percentiles. For example, researchers in the United States studying childhood obesity often calculate the BMI z-score or percentage of the 95th percentile of BMI from nationally representative survey data.

During the process of brainstorming and working on this project, the team1 I was working with found out some related R packages that do this. To help us move forward with the project and identify next steps, I round up these packages below.2 If I’m missing something, please comment so I can add it to this post!

Below is an overview of 4 packages: AGD, growthstandards/hgbd, zscorer, and childsds.

Anthropometric data

The National Health and Nutrition Examination Survey (NHANES) has data on height, weight, blood pressure, and other measurements for adolescents from 0 to 239 months that I will use to explore these packages. I’m converting the original variables names to names that are easier to use. The data is in an .xpt file, which I’ll read in using the haven package.

library(tidyverse)
## ── Attaching packages ─────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 2.2.1     ✔ purrr   0.2.5
## ✔ tibble  1.4.2     ✔ dplyr   0.7.6
## ✔ tidyr   0.8.1     ✔ stringr 1.3.1
## ✔ readr   1.1.1     ✔ forcats 0.3.0
## ── Conflicts ────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(haven)

demo_data <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2015-2016/DEMO_I.XPT"))

demo_need <- demo_data %>%
  select(id = SEQN, 
         agemos = RIDEXAGM, 
         sex_num = RIAGENDR)

exam_data <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2015-2016/BMX_I.XPT"))

exam_need <- exam_data %>%
  select(id = SEQN, 
         bmi = BMXBMI, 
         height_cm = BMXHT, 
         length_cm = BMXRECUM, 
         weight_kg = BMXWT)

df <- demo_need %>%
  left_join(exam_need, by = "id") %>%
  filter(!is.na(agemos)) %>%
  mutate(sex_mf = if_else(sex_num == 1, "M", "F"),
         sex = if_else(sex_num == 1, "Male", "Female"),
         # recumbent length is used if under 2 years old
         height_length_cm = if_else(agemos < 24, length_cm, height_cm),
         ageyears = agemos/12)

# xpt file had column labels, remove column labels 
child_df <- df[1:nrow(df), ]

AGD

The Analysis of Growth Data (AGD) package is developed by Stef van Buuren. More info here.

Tools for the analysis of growth data: to extract an LMS table from a gamlss object, to calculate the standard deviation scores and its inverse, and to superpose two wormplots from different models. The package contains a some varieties of reference tables, especially for The Netherlands.

It has functions for reference tables from the CDC 2000 growth charts, Third Dutch Growth Study (1980), Fourth Dutch Growth Study (1997), and WHO growth charts. The function y2z() converts measurements into standard deviation scores (z-scores) and z2y() does the opposite, convert standard deviation scores into measurements.

The package is available on CRAN.

install.packages("AGD")

To calculate the age- and sex-adjusted BMI z-scores for the children in the NHANES dataset, we can do the following:

library(AGD)

bmi_levels <- c("Underweight", "Healthy weight", "Overweight", 
                "Obesity", "Severe obesity")

agd_df <- child_df %>%
  # the CDC growth charts are typically used for children 2 years or older
  filter(agemos >= 24) %>%
  mutate(bmiz =  AGD::y2z(y = bmi, x = ageyears, sex = sex_mf, ref = cdc.bmi),
         bmipct = pnorm(bmiz),
         # what is the 95th percentile for the child's age and sex?,
         z = qnorm(.95),
         p95 = AGD::z2y(z = z, x= ageyears, sex = sex_mf, ref = cdc.bmi),
         # percentage of the 95th percentile
         bmipct95 = bmi/p95,
         # bmi category
         bmicat = factor(case_when(bmipct < .05 ~ "Underweight",
                                   bmipct < .85 ~ "Healthy weight",
                                   bmipct < .95 ~ "Overweight",
                                   bmipct95 < 1.2 ~ "Obesity",
                                   bmipct95 >= 1.2 ~ "Severe obesity"),
                         levels = bmi_levels))

agd_df %>%
  select(agemos, bmi, bmiz, bmipct, bmipct95) %>%
  skimr::skim()
## Skim summary statistics
##  n obs: 3426 
##  n variables: 5 
## 
## ── Variable type:numeric ───────────────────────────────────────────
##  variable missing complete    n   mean    sd       p0   p25    p50    p75
##    agemos       0     3426 3426 121.86 60.89 24       70    118    173   
##       bmi      86     3340 3426  20.32  5.71 11.5     16.2   18.6   22.8 
##    bmipct      86     3340 3426   0.66  0.29  2.3e-06  0.44   0.72   0.93
##  bmipct95      86     3340 3426   0.88  0.18  0.51     0.76   0.85   0.96
##      bmiz      86     3340 3426   0.59  1.13 -4.58    -0.16   0.58   1.45
##    p100     hist
##  239    ▇▇▇▇▆▆▆▅
##   54.7  ▇▇▃▂▁▁▁▁
##    1    ▂▂▂▂▂▃▃▇
##    1.82 ▂▇▇▂▁▁▁▁
##    4.67 ▁▁▁▅▇▆▁▁

A plot of the category values:

agd_df %>%
  filter(!is.na(bmicat)) %>%
  ggplot() +
  stat_count(mapping = aes(x = bmicat, y=..prop.., group=1)) +
  r_ladies_theme() +
  labs(title = "BMI categories for U.S. children 2 years and older",
       subtitle = "NHANES 2015-2016; Calculated using the AGD package",
       y = "Proportion",
       x = "BMI Category")

Note: I haven’t taken the complex survey design into account here, so these proportions are not an accurate estimate of the prevalence in the U.S. See this report for those estimates.

growthstandards / hgbd

The growthstandards package, developed by Ryan Hafen, is an offshoot from a larger package, hbgd, and some docs for the growth standards calculations can be found here. It provides calculations for:

  • WHO height, weight, head circumference, muacc, subscalpular skinfold, triceps skinfold
  • INTERGROWTH fetal head circumference, biparietel diameter, occipito-frontal diameter, abdominal circumference, femur length
  • INTERGROWTH newborn (including very preterm) weight, length, and head circumference

The package has some very useful functions for converting units such as:

  • months2days() for converting age in months to age in days, and
  • lb2kg() for pounds to kilograms.
devtools::install_github("HBGDki/growthstandards")

The Center for Disease Control (CDC) and American Academy of Pediatrics (AAP) recommend using the WHO growth standards for children in the United States under 2 years old. To calculate the age- and sex-adjusted BMI z-scores for the children in the NHANES dataset using the WHO reference charts we can do the following:

child_who <- child_df %>%
  filter(agemos < 24) %>%
  mutate(
    bmi = weight_kg/((length_cm/100)^2) ,
    agedays = growthstandards::months2days(agemos),
    # measurements to bmi z-score
    bmiz = growthstandards::who_bmi2zscore(
      agedays = agedays,
      bmi = bmi,
      sex = sex),
    # measurements to percentiles
    bmipct = growthstandards::who_bmi2centile(
      agedays = agedays, 
      bmi = bmi, 
      sex = sex),
    # find the 95th percentile for age and sex
    p95 = growthstandards::who_centile2bmi(
      agedays = agedays, 
      p = 95, 
      sex = sex),
    # calculate the percentage of the 95th
    bmipct95 = (bmi/p95)*100,
    # obesity?
    obesity = if_else(bmipct >= 95, 1, 0))


child_who %>%
  select(bmi, bmiz, bmipct, bmipct95, obesity) %>%
  skimr::skim()
## Skim summary statistics
##  n obs: 634 
##  n variables: 5 
## 
## ── Variable type:numeric ───────────────────────────────────────────
##  variable missing complete   n  mean    sd     p0    p25   p50   p75
##       bmi       4      630 634 17.24  1.67 11.63  16.17  17.17 18.34
##    bmipct       4      630 634 65.78 26.93  0.015 47.82  71.54 89.13
##  bmipct95       4      630 634 91.47  8.25 64.85  86.21  90.89 96.41
##      bmiz       4      630 634  0.57  1.03 -3.62  -0.055  0.57  1.23
##   obesity       4      630 634  0.14  0.35  0      0      0     0   
##    p100     hist
##   27.77 ▁▂▇▅▁▁▁▁
##  100    ▁▂▂▂▃▅▅▇
##  142.17 ▁▃▇▅▁▁▁▁
##    5.56 ▁▁▃▇▅▁▁▁
##    1    ▇▁▁▁▁▁▁▁

Density plots of our new values:

vars <- c("bmi", "bmiz", "bmipct", "bmipct95")
sub <- "Using WHO growth charts from growthstandards package"

density_fun <- function(xvar, df, sub) {
  myplot <- ggplot(df, aes_string(x = xvar)) +
    geom_density() +
    labs(title = paste0("Density plot of ", xvar),
         subtitle = sub) +
    r_ladies_theme() 
  
  plot(myplot)
}

purrr::walk(.x = vars, .f = density_fun, df = child_who, sub = sub)
## Warning: Removed 4 rows containing non-finite values (stat_density).

## Warning: Removed 4 rows containing non-finite values (stat_density).

## Warning: Removed 4 rows containing non-finite values (stat_density).

## Warning: Removed 4 rows containing non-finite values (stat_density).

zscorer

The zscorer package is developed by Mark Myatt and Ernest Guevarra.

zscorer refers to the results of the WHO Multicentre Growth Reference Study as standard for calculating the z-scores hence it comes packaged with this reference data.

The WHO Multicentre Growth Reference Study combined longitudinal data for children from birth to 24 months and cross-sectional data for children up to 71 months

install.packages("zscorer")

The authors note that:

The function fails messily when secondPart is outside of the range given in the WGS reference (i.e. 45 to 120 cm for height and 0 to 60 months for age). It is up to you to check the ranges of your data.

Because this function is meant for a specific age and height range, I’ll filter the data first.

zscorer_df <- child_df %>%
  filter(agemos < 60) %>%
  filter(height_length_cm < 120,
         height_length_cm > 45) %>%
  select(id, sex_num, height_length_cm, weight_kg, agemos) %>%
  drop_na() %>%
  as.data.frame() # cannot be type tbl_df

library(zscorer)

# height for age z-score
zscorer_df$haz <- getCohortWGS(data = zscorer_df,
                                 sexObserved = "sex_num",
                                 firstPart = "height_length_cm",
                                 secondPart = "agemos",
                                 index = "hfa")
# weight for age z-score
zscorer_df$waz <- getCohortWGS(data = zscorer_df,
                                 sexObserved = "sex_num",
                                 firstPart = "weight_kg",
                                 secondPart = "agemos",
                                 index = "wfa")
# weight fot height z-score
zscorer_df$whz <- getCohortWGS(data = zscorer_df,
                                 sexObserved = "sex_num",
                                 firstPart = "weight_kg",
                                 secondPart = "height_length_cm",
                                 index = "wfh")

zscorer_df %>%
  select(haz, waz, whz) %>%
  skimr::skim()
## Skim summary statistics
##  n obs: 1246 
##  n variables: 3 
## 
## ── Variable type:numeric ───────────────────────────────────────────
##  variable missing complete    n  mean   sd    p0    p25    p50  p75 p100
##       haz       0     1246 1246 -0.04 1.05 -3.64 -0.72  -0.057 0.61 3.6 
##       waz       0     1246 1246  0.43 1.07 -3.36 -0.26   0.41  1.09 6.2 
##       whz       0     1246 1246  0.62 1.05 -3.48 -0.034  0.55  1.24 6.46
##      hist
##  ▁▁▃▇▇▃▁▁
##  ▁▂▆▇▂▁▁▁
##  ▁▁▆▇▂▁▁▁

childsds

The childsds package by Mandy Vogel has child reference data from many other countries (WHO, UK, Germany, Italy, China, and more).

install.packages("chilsds")

You can obtain information about the reference data by doing:

childsds::cdc.ref
## *** Group of Reference Tables ***
## [1] "CDC containing 8 reference tables"
## 
## *** Table of Reference Values ***
## [1] "bmi fitted with: male BCCG, female BCCG"
##           sex minage   maxage
## male     male      2 20.04167
## female female      2 20.04167
## 
## *** Table of Reference Values ***
## [1] "height fitted with: male BCCG, female BCCG"
##           sex minage   maxage
## male     male      0 2.958333
## female female      0 2.958333
## 
## *** Table of Reference Values ***
## [1] "height2 fitted with: male BCCG, female BCCG"
##           sex minage maxage
## male     male      2     20
## female female      2     20
## 
## *** Table of Reference Values ***
## [1] "weight fitted with: male BCCG, female BCCG"
##           sex minage maxage
## male     male      0      3
## female female      0      3
## 
## *** Table of Reference Values ***
## [1] "hc fitted with: male BCCG, female BCCG"
##           sex minage maxage
## male     male      0      3
## female female      0      3
## 
## *** Table of Reference Values ***
## [1] "wfl fitted with: male BCCG, female BCCG"
##           sex minage maxage
## male     male     45  103.5
## female female     45  103.5
## 
## *** Table of Reference Values ***
## [1] "wfl2 fitted with: male BCCG, female BCCG"
##           sex minage maxage
## male     male     77  121.5
## female female     77  121.5
## 
## *** Table of Reference Values ***
## [1] "weight2_20 fitted with: male BCCG, female BCCG"
##           sex minage maxage
## male     male      2     20
## female female      2     20
## [1] "Flegal, Katherine M., and T. J. Cole. Construction of LMS Parameters for the Centers for Disease Control and Prevention 2000 Growth Charts. Hational health statitics reports 63."
## [1] "bmi - bmi"
## [1] "height0_3 - height 0 - 3 years old"
## [1] "height2_20 - height 2- 20 years old"
## [1] "weight -  - weight 0 - 3 years old"
## [1] "weight2_20 - weight 2- 20 years old"
## [1] "hc - headcircumference"
## [1] "wfl - weight for length"
## [1] "wfl and wfls - age must refer to the length variable; the function gives the sds for a given weight conditional on height"
## [1] "use one of the following keys: bmi - height0_3 - height2_20 - weight - hc - wfl - wfl2 - weight2_20"

We can calculate the BMI z-score and percentile given the reference charts using the sds() function:

library(childsds)

childsds_df <- child_df %>%
  filter(agemos >= 24) %>%
  mutate(bmiz = sds(bmi,
                    age = ageyears,
                    sex = sex, male = "Male", female =  "Female",
                    ref = cdc.ref,
                    item = "bmi",
                    type = "SDS"),
         bmipct = round(pnorm(bmiz)*100,2),
         # or use sds function and type = "perc"
         bmipct2 = sds(bmi,
                    age = ageyears,
                    sex = sex, male = "Male", female =  "Female",
                    ref = cdc.ref,
                    item = "bmi",
                    type = "perc")) %>% 
  glimpse() 
## Observations: 3,426
## Variables: 14
## $ id               <dbl> 83738, 83739, 83743, 83745, 83746, 83748, 837...
## $ agemos           <dbl> 141, 54, 217, 185, 52, 41, 210, 196, 182, 194...
## $ sex_num          <dbl> 2, 1, 1, 2, 2, 1, 2, 2, 1, 1, 2, 2, 2, 1, 2, ...
## $ bmi              <dbl> 18.1, 15.7, 26.2, 25.0, 16.1, 16.1, 29.0, 22....
## $ height_cm        <dbl> 143.5, 102.1, 166.1, 169.2, 105.0, 103.6, 161...
## $ length_cm        <dbl> NA, NA, NA, NA, NA, 104.2, NA, NA, NA, NA, NA...
## $ weight_kg        <dbl> 37.2, 16.4, 72.4, 71.7, 17.7, 17.3, 75.9, 51....
## $ sex_mf           <chr> "F", "M", "M", "F", "F", "M", "F", "F", "M", ...
## $ sex              <chr> "Female", "Male", "Male", "Female", "Female",...
## $ height_length_cm <dbl> 143.5, 102.1, 166.1, 169.2, 105.0, 103.6, 161...
## $ ageyears         <dbl> 11.750000, 4.500000, 18.083333, 15.416667, 4....
## $ bmiz             <dbl> 0.06918427, 0.16255051, 1.14597646, 1.1663773...
## $ bmipct           <dbl> 52.76, 56.46, 87.41, 87.83, 73.79, 58.83, 93....
## $ bmipct2          <dbl> 52.76, 56.46, 87.41, 87.83, 73.79, 58.83, 93....

Plot the values:

sub <- "Using CDC growth charts from childsds package"
vars <- c("bmiz", "bmipct")
  
purrr::walk(.x = vars, .f = density_fun, df = childsds_df, sub = sub)
## Warning: Removed 86 rows containing non-finite values (stat_density).

## Warning: Removed 86 rows containing non-finite values (stat_density).

Other work

There is some other publicly available work in this research area:

  • SITAR: SuperImposition by Translation And Rotation model by Tim Cole and colleagues. The SITAR package provides functions for fitting and plotting growth curve models. More background information can be found here.

  • Canadian Pediatric Endocrine Group: This group has made available SAS, R, an Stata programs for the WHO growth charts, an R script for waist circumference z-scores (NHANES, cycle III), and an R script for the preterm z-scores (Fenton reference). They have also published many shiny apps!


Please comment if you have found something that I’m missing. I will write a second post with thoughts on what to do next with our unconf project, mchtoolbox.


  1. Thanks Jenny Draper, Jennifer Thompson, Kyle Hamilton, and Charles Gray for being on this team!

  2. Thanks Mara for the encouragement!