Previous discussion on the German KBV data here and here, including comments by the original Substack author AMidwesternDoctor.
For anons wanting to look at the original data, here are three files (this link expires in 12 days; plz lemme know if it doesn't work). The .xlsx file has the raw data, the .txt file is the German ICD codes, and the .csv file contains the results, sorted in descending order by -log10(p-value).
Analysis steps:
Stack the data and add together the code and nocode numbers for each quarter and ICD-10 code, and then take log2 of this sum to standardize it.
Join in the German ICD-10 code definitions (they are a bit different from other ICD-10 versions).
Split the data into wide form with one row per quarter, creating 15,650 columns for each of the ICD-10-GM code definitions
Create a binary variable "covid19_vax" with value 0 for all quarters prior to 2021 and 1 for all 2021 and 2022 quarters.
Perform 15,650 t-tests with the binary variable and adjust for multiple testing using false discovery rate (FDR).
Screen for effects that are positive and significant according to FDR.
I'm using JMP software (Response Screening platform), which is way more convenient than Excel and great for graphics and stats without any coding.
The signals light up like a Christmas tree with hundreds of significant differences, including the R9* death codes in the OP plot.
Oh my gosh. You are so much more tech savvy than I am. I don't have the skills to do this. Maybe you can take a few screen shots of the data and post it here for the rest of us, along with your interpretation. That is, if it is not too much trouble.
It would be so appreciated.
Thanks. I'm using the t-test from stats 101, but doing 15650 of them and adjusting for the big number of tests so we control the false discovery rate. The third file has the results in a .csv file sorted by significance and the steps above are how I got there from the raw data in the first two files.
Previous discussion on the German KBV data here and here, including comments by the original Substack author AMidwesternDoctor.
For anons wanting to look at the original data, here are three files (this link expires in 12 days; plz lemme know if it doesn't work). The .xlsx file has the raw data, the .txt file is the German ICD codes, and the .csv file contains the results, sorted in descending order by -log10(p-value).
Analysis steps:
Stack the data and add together the code and nocode numbers for each quarter and ICD-10 code, and then take log2 of this sum to standardize it.
Join in the German ICD-10 code definitions (they are a bit different from other ICD-10 versions).
Split the data into wide form with one row per quarter, creating 15,650 columns for each of the ICD-10-GM code definitions
Create a binary variable "covid19_vax" with value 0 for all quarters prior to 2021 and 1 for all 2021 and 2022 quarters.
Perform 15,650 t-tests with the binary variable and adjust for multiple testing using false discovery rate (FDR).
Screen for effects that are positive and significant according to FDR.
I'm using JMP software (Response Screening platform), which is way more convenient than Excel and great for graphics and stats without any coding.
The signals light up like a Christmas tree with hundreds of significant differences, including the R9* death codes in the OP plot.
Oh my gosh. You are so much more tech savvy than I am. I don't have the skills to do this. Maybe you can take a few screen shots of the data and post it here for the rest of us, along with your interpretation. That is, if it is not too much trouble.
It would be so appreciated.
Thanks. I'm using the t-test from stats 101, but doing 15650 of them and adjusting for the big number of tests so we control the false discovery rate. The third file has the results in a .csv file sorted by significance and the steps above are how I got there from the raw data in the first two files.
I look forward to your results in a future posting!