Changes in version 2020-09-28 New Features - tidytable variants of functions, i.e. reshape_wide_tt(), renumber_time_id_tt(), pat_status_tt(), vital_status_tt(), calc_futime_tt() ⇒ the _tt variants usually have smaller memory use than tidyverse and data.table variants. Execution time is usually much faster than tidyverse and comparable to or a little slower than the data.table variant. - sir_byfutime(): - is much faster using tidytable package - gained the option race_var to optionally stratify SIR calculations by race. - summarize_sir_results(): - new function that increases functionality in summarizing results from sir_byfutime() function - new option to define custom site_var_name - new package website https://marianschmidt.github.io/msSPChelpR - new sample datasets included in the package to demonstrate examples (#36) Breaking Changes - sir_byfutime(): - options add_total_row and add_total_fu are replaced by calc_total_row and calc_total_fu. These are logical parameters now. The positioning of total rows and columns is completely handled by the summarize_sir_results() function now. There total rows can be set to top and bottom and total columns to left and right. - option expcount_src including related parameters stdpop_df, refpop_df, std_pop, truncate_std_pop and pyar_var have been removed. Function sir_byfutime() will only work calculating expected counts based on reference rates, not within the cohort of the dataset. To calculate expected based on the cohort, a new function create_refrates will be added in the future. (#41) - option collapse_ci has been removed and added to summarize_sir_results() instead. - option name for tumor site variable changed from icdcat_var to site_var - option name for age/age group variable changed from agegroup_var to age_var - in total the parameters expcount_src, futime_src, stdpop_df, refpop_df, std_pop, truncate_std_pop, pyar_var, icdcat_var, collapse_ci have been removed to simply the function ⇒ make sure you remove these arguments from your sir_byfutime() function calls. - sir(): - is superseded by the use of sir_byfutime(). To migrate your former sir() functions, you can simply use sir_byfutime(, futime_breaks = "none") that will yield the same results. - summarize_sir_results(): - option name for tumor site variable changed from summarize_icdcat to summarize_site - reshape_long_tidyr(): - option var_selection is deprecated. Please select variables before running the reshape_long_* functions. - asir(): - option name for age/age group variable changed from agegroup_var to age_var - option name for tumor site variable changed from icdcat_var to site_var - pat_status(), pat_status_tt(), vital_status(), and vital_status_tt(): - Capitalized default variable labelling. - This might break code that relied on using the labels coming out of these functions in later filter or mutate functions. - ir_crosstab_byfutime(): - option futime_breaks now uses breaks in years instead of months as previously. - default futime_var is now follow-up time in years - now requires dplyr version 1.0.0 - now requires tidytable package - the default option name for tumor site variable changed from icdcat_var to site_var. This need manual update of function calls of sir_byfutime() and asir(), if option is specified. - the default variable name for tumor site in all functions has been changed from t_icdcat to t_site. So the reference data frames used will need to have a t_site column. - the data.table variants of functions (renumber_time_id_dt(), pat_status_dt(), reshape_long_dt(), reshape_wide_dt(), vital_status_dt()) have been removed for simplicity, please use tidytable variants, i.e. reshape_wide_tt(), renumber_time_id_tt(), pat_status_tt(), vital_status_tt(), calc_futime_tt(), instead. They will give the same data.table output and same performance. Bug Fixes - implement new reliable routine to split df when reshape_wide() with option chunks is used. Closes #1. - Sorting of columns in wide datasets by reshape_wide_tidyr() and reshape_wide_tt() is now preserved. Closes #31. - ensure sorting in renumer_time_id() and make sure that new_time_id_var is returned as integer. - fix bug in pat_status_*(., check = TRUE)option - improve internal tests in sir_byfutime() so that PYARs do not get lost before running summary function - sir_byfutime() now also gives correct results if range of futime_breaks is not 0-Inf but smaller Changes in version 2020-05-21 New Features - add timevar_max option to renumber_time_id() function; use sorting by date of diagnosis instead of old time_id_var - various improvements to reshape_wide_tidyr() function - various improvements to reshape_wide_dt() function which is much faster now and uses data.table::dcast instead of stats::reshape now - various improvements to pat_status() and pat_status_dt() functions - option summarize_icdcat in summarize_sir_results() is now functional - update vignette vignette("introduction") Bug Fixes - fix incomplete check for required variables in pat_status() and pat_status_dt() functions - fix error in check for required variables in renumber_time_id() that broke functions - fix bug in check for end of FU time in pat_status() and calc_futime() - implement new tidyselect routine using tidyselect::all_of in summarize_sir_results() Changes in version 0.9.1.9000 Changes in version 0.9.1 (2024-01-23) New Features - new function histgroup_iarc() to create variable for groups of malignant neoplasms considered to be histologically 'different' for the purpose of defining multiple tumors, ICD-O-3 (see #100) - some functions gain new quiet argument to suppress rlang::warn() and rlang::inform() messages. You can use this when you have checked your results for correctness and want to reduce message output, but keep the progress bars. - asir(): add World Standard Population 2000-2025 for function with option std_pop=="WHO2000" as described here: https://seer.cancer.gov/stdpopulations/world.who.html - sir_byfutime() gains new argument expect_missing_refstrata_df. You can define another dataframe that contains strata expected to be missing from refrates_df (because they are not explicitly coded with incidence = 0). This can be helpful, if refrates_df has a lot of strata and 0 incidence strata have been removed to save storage space. Internally, the rows of expect_missing_refstrata_df will be appended to refrates_df. This reduces the number of lines reported in attribute problems_missing_ref_strata. Default setting is expect_missing_refstrata_df = NULL. - sample data set for data("us_second_cancer") gains new variable t_hist on histology, i.e. ICD-O-3-Code on tumor morphology (4 digits) Breaking Changes - no breaking changes in this version Bug fixes - make calc_refrates() more robust for missing race_var (Closes #89) - fix bug in calc_refrates() using calc_totals == TRUE (Closes #90) - fix bug in calc_refrates() using numeric versions of fill_sites (Closes #92) - fix bug in asir() that throws error for variable not needed (Closes #95) Internal - replace progress bars by cli - deprecate verb.()syntax from tidytable (Closes #94) Changes in version 0.9.0 (2022-06-10) New Features - new function calc_refrates() to calculate age-, sex-, region-, year-specific reference rates from a long format dataframe with cancer cases that are counted for incident cases and then matched with a reference population. The resulting reference rates dataframe can directly be used with sir_byfutime() function. - functions gain new default dattype = NULL and thus are more flexible to take other source data types (Closes #73) Breaking Changes - functions asir, calc_futime*, calc_refrates, ir_crosstab_byfutime, pat_status*, renumber_time_id*, and sir_byfutime now by default are set to dattype = NULL. If you relied on automatic variable naming feature, you need to add dattype = "seer"or dattype = "zfkd" to your function call. - fix typo in attribute names: attributes are now correctly named problems_missing_count_strata and problems_missing_fu_strata (Closes #80) Bug fixes - sir_byfutime(): - attributes with notes and problems are now correctly saved to results_df Internal - deprecated functions from tidytable package have been replaced (Closes #71 and #74) Changes in version 0.8.7 (2021-07-01) New Features - new function sir_ratio() and related sir_ratio_lci() and sir_ratio_uci() to calculate ratio of two SIRs/SMRs to get relative risk and confidence limits for this ratio. - tidytable variant of reshape_long function, i.e. reshape_long_tt() ⇒ the _tt variants usually have smaller memory use than tidyverse and data.table variants. Execution time is usually much faster than tidyverse and comparable to or a little slower than the data.table variant. - summarize_sir_results(): - add ability to summarize by different site_var than the one used in sir_byfutime() Bug fixes - summarize_sir_results(): - PYARs are now correctly calculated when using summarize_site == TRUE. Previously the results incorrectly counted each site multiple times. (Closes #62) - pat_status(): - update default values for dattype = "zfkd" Internal - add R-CMD-Check to github actions Changes in version 0.8.6 (2020-11-04) New Features - new sample data set for standard populations ⇒ data("standard_population") - new sample data set for us population ⇒ data("population_us") (Closes #58) Bug fixes - sir_byfutime(): change output of integer columns to numeric to fix bug in summarize_sir_results() (Closes #59) Other changes - add examples to function documentation (Closes #56) - remove "R" from package title (Closes #57) - update package description (Closes #54) - update introduction vignette vignette("introduction") Changes in version 0.8.3 New Features - new faster version of reshape_long based on data.table - start new vignette on workflow from filtered long dataset to follow-up times vignette("patstatus_futime") Bug Fixes - implement new tidyselect routine using tidyselect::all_of for vector-based variable selection - implement correct referencing in vital_status_dt and pat_status_dt - add exports from data.table - update documentation for sir and sir_byfutime functions - make reshape_long function work Changes in version 0.8.2 Changes in version 0.8.1 New Features - new faster version of vital_status function using data.table - new faster version of pat_status function using data.table Changes in version 0.8.0 New Features - new faster version of reshape_wide_dt function based on data.table and without problematic slices done by reshape_wide - new faster version of renumber_time_id function based on data.table Changes in version 0.7.4 New Features - new function renumber_time_id Changes in version 0.7.3 Bug Fixes - add check to revert status_var to numeric in case it was created with option as_labelled_factor - fix label bug in life_var_new Changes in version 0.7.2 - add option as_labelled_factor to vital_status function - fix newly introduced error in vital_status function Changes in version 0.7.1 - fix error in vital_status function by replacing sjlabelled::get_label function Changes in version 0.7.0 - fix error in pat_status and vital_status functions due to change in sjlabelled package Changes in version 0.6.10 - rebuild description file and manual Changes in version 0.6.9 - remove nest_legacy functions and use new tidyr syntax, close #19 Changes in version 0.6.8 - make summarize_sir_results function work without break variables Changes in version 0.6.7 - for function sir_byfutime ⇒ make option add_total_row work, even if option ybreak_vars = "none" Changes in version 0.6.6 - Make use of time_id_var and case_id_var use coherent across reshape functions Changes in version 0.6.5 - Fixed issue in Namespace Changes in version 0.6.4 - Added a NEWS.md file to track changes to the package. Changes in version 0.6.3 - add option futime_breaks = "none" to sir_byfutime function Changes in version 0.6.0 - includes a new function to calculate crude (absolute) incidence rates a tabulate them by whatever number of grouping variables and it can be used as a Table 1 for publications ⇒ The function is called msSPChelpR::ir_crosstab - includes a new function to calculate SIRs (standardized incidence ratios) by whatever strata you desire (unlimited ybreak_vars; one xbreak_var) and additionally customized breaks for follow-up times (default is: to 6 months, .5-1 year, 1-5 years, 5-10 years, >10 years) ⇒ attention, it only makes sense to stratify results (ybreak_vars or xbreak_var) by variables measured at baseline and not for variables that are dependent on the occurrence of an SPC) ⇒ function msSPChelpR::sir_byfutime ⇒ depending on the number of stratification variables you are using, this function may result in a very long results data.frame. So please use it together with the new function msSPChelpR::summarize_sir_results - includes a new function to summarize results dataframes from SIR calculations - New reshape functions that are faster and are using less memory