Bayesian Network Analysis:R bnlearn解析 No.1

COVID-19重症呼吸不全患者に対するHNFC/MVの治療選択に対する影響因子:因果分析=>ベイジアンネットワーク 有向非巡回グラフ(DAG)の作成
16データでの解析、ROXの閾値6.1と4.88での人工呼吸率&死亡率の比較、推定
by R 4.05 Oct 28, 2022

全59件のデータ:ROX閾値と人工呼吸頻度(率)と死亡頻度(率)の関係

59件のデータから得られた尤度としては、
人工呼吸率は、ROX<=6.1では78.3%、ROX<=4.88では90.0%、 死亡率は、ROX<=6.1では8.0%、ROX<=4.88では10.0%であった。 この値が推測の目安となるであろう。 データは16項目(離散値:3項目と、連続値:13項目の混合)
MV:離散値
ROX:連続値
LIV:連続値
Gender:離散値
Age:連続値
BMI:連続値
WBC:連続値

Cr:連続値

CRP:連続値

LDH:連続値

Ddim:連続値

PSI:連続値

CCI:連続値

DtoH:連続値

DtoHF:連続値

Mortality:離散値

以下、ROXの閾値を6.1としたデータセット#1と、4.88としたデータセット#2と平行して解析を進めて、
比較検討する。

A data.frame: 59 × 16
MV ROX LIV Gender Age BMI WBC Cr CRP LDH Ddim PSI CCI DtoH DtoHF Mortality

off 9.9 29.2 male 53 34.8 12600 1.0 9.4 327 0.6 53 1 4 9 no
off 11.4 25.0 female 74 25.9 5500 0.8 14.7 459 1.4 114 4 9 9 no
off 9.3 22.2 female 43 24.9 7600 0.6 12.3 375 1.1 43 0 8 8 no
off 6.7 22.5 female 72 18.5 19500 0.7 4.9 378 0.7 72 1 11 10 no
………

ROXとLIVに関しては、閾値を設定して、High(hi)とLow(lo)の離散値へ変換
ROXの閾値は6.1と4.88, LIVの閾値は35.5とする。

A data.frame: 59 × 16
MV ROX LIV Gender Age BMI WBC Cr CRP LDH Ddim PSI CCI DtoH DtoHF Mortality

off hi lo male 53 34.8 12600 1.0 9.4 327 0.6 53 1 4 9 no
off hi lo female 74 25.9 5500 0.8 14.7 459 1.4 114 4 9 9 no
off hi lo female 43 24.9 7600 0.6 12.3 375 1.1 43 0 8 8 no
off hi lo female 72 18.5 19500 0.7 4.9 378 0.7 72 1 11 10 no
off lo lo male 59 24.8 9500 0.6 14.2 475 36.0 79 1 13 12 no
…………………

tidyverseライブラリを呼んで離散データの文字列データセットをカテゴリカルデータセットに変換する。

データフレームの離散値カテゴリーをカテゴリカル・データへ変換

A data.frame: 59 × 16
MV ROX LIV Gender Age BMI WBC Cr CRP LDH Ddim PSI CCI DtoH DtoHF Mortality

off hi lo male 53 34.8 12600 1.0 9.4 327 0.6 53 1 4 9 no
off hi lo female 74 25.9 5500 0.8 14.7 459 1.4 114 4 9 9 no
off hi lo female 43 24.9 7600 0.6 12.3 375 1.1 43 0 8 8 no
off hi lo female 72 18.5 19500 0.7 4.9 378 0.7 72 1 11 10 no
off lo lo male 59 24.8 9500 0.6 14.2 475 36.0 79 1 13 12 no
…………

A data.frame: 59 × 16
MV ROX LIV Gender Age BMI WBC Cr CRP LDH Ddim PSI CCI DtoH DtoHF Mortality

off hi lo male 53 34.8 12600 1.0 9.4 327 0.6 53 1 4 9 no
off hi lo female 74 25.9 5500 0.8 14.7 459 1.4 114 4 9 9 no
off hi lo female 43 24.9 7600 0.6 12.3 375 1.1 43 0 8 8 no
off hi lo female 72 18.5 19500 0.7 4.9 378 0.7 72 1 11 10 no
……….

ヒートマップ作成

1. bnlearn解析:その1:Hill-Climb法

まずは、空のDAG構造を生成する。

ブラックリスト作成

A matrix: 3 × 2 of type chr
from to
Gender Age
Mortality PSI
Mortality Cr

ホワイトリスト作成

A matrix: 7 × 2 of type chr
from to
ROX MV
LIV MV
LIV LDH
Age BMI
MV Mortality
ROX Mortality
LIV Mortality

Hill-Climb法の関数hc()で、データセットと、ホワイトリスト、ブラックリストを指定して、dag構造を構築する。

Bayesian network learned via Score-based methods

model:
[ROX][Gender][Age][WBC][CRP][LIV|ROX][BMI|Age][MV|ROX:LIV][Cr|MV:Gender]
[DtoH|MV][Mortality|MV:ROX:LIV][Ddim|ROX:Gender:Mortality][PSI|LIV:Age:Cr]
[DtoHF|DtoH][CCI|ROX:PSI][LDH|MV:LIV:Ddim:CCI]
nodes: 16
arcs: 23
undirected arcs: 0
directed arcs: 23
average markov blanket size: 4.38
average neighbourhood size: 2.88
average branching factor: 1.44

learning algorithm: Hill-Climbing
score: BIC (cond. Gauss.)
penalization coefficient: 2.038769
tests used in the learning procedure: 462
optimized: TRUE

Bayesian network learned via Score-based methods

model:
[ROX][Gender][Age][WBC][CRP][LIV|ROX][BMI|ROX:Age][Cr|ROX:Gender:Age]
[MV|ROX:LIV][PSI|LIV:Age:Cr][CCI|CRP:PSI][DtoH|MV][Mortality|MV:ROX:LIV]
[Ddim|ROX:Gender:BMI:DtoH][DtoHF|ROX:DtoH][LDH|MV:LIV:Ddim:CCI]
nodes: 16
arcs: 27
undirected arcs: 0
directed arcs: 27
average markov blanket size: 5.25
average neighbourhood size: 3.38
average branching factor: 1.69

learning algorithm: Hill-Climbing
score: BIC (cond. Gauss.)
penalization coefficient: 2.038769
tests used in the learning procedure: 521
optimized: TRUE

ROXの閾値が6.1と4.88と異なるだけで、ROXの”hi”, “lo”の比率変化が、DAGの構造に影響したことがわかる。
ROX閾値6.1ではarcは23個だが、一方、ROX閾値4.88では、arcは27個に増加している。

2つのデータセット由来のDAG構造を比較してみる。

推定1

[1] “P(MV=on, ROX=Low)” 
0.718
[1] “P(MV=on, ROX=Low)”
0.9

尤度は、ROX<=6.1では人工呼吸率は78.3%、ROX<=4.88では人工呼吸率は90.0% であったので、ほぼ近い。

[1] “P(MV=on, ROX=High)”
0.173
[1] “P(MV=on, ROX=High)”
0.304

[1] “P(Mortality=yes, ROX=Low)”
0.087
[1] “P(Mortality=yes, ROX=Low)”
0.094

尤度は、ROX<=6.1では死亡率は8.0%、ROX<=4.88では人工呼吸率は10.0% であったのでほぼ近いところ。

[1] “P(Mortality=yes, ROX=High)”
0.059
[1] “P(Mortality=yes, ROX=High)”
0.058

[1] “P(MV=on, ROX=Low, LIV=High)”
0.929
[1] “P(MV=on, ROX=Low, LIV=High)”
1

[1] “P(MV=off, ROX=Low, LIV=High)”
0.077
[1] “P(MV=off, ROX=Low, LIV=High)”
0

[1] “P(MV=ON, ROX=Low, LIV=High, LDH=High)”
0.946
[1] “P(MV=ON, ROX=Low, LIV=High, LDH=High)”
1

[1] “P(MV=OFF, ROX=Low, LIV=High, LDH=High)”
0.058
[1] “P(MV=OFF, ROX=Low, LIV=High, LDH=High)”
0

[1] “P(MV=ON, ROX=Low, LIV=High, LDH=High, Age=High)”
0.92
[1] “P(MV=ON, ROX=Low, LIV=High, LDH=High, Age=High)”
1

ベイジアン ネットワークの等価クラスと v 構造を見つけ、モラルグラフを構築するか、等価クラスの一貫した拡張を作成。
※グラフ理論では、有向非巡回グラフの同等の無向形式を見つけるためにモラルグラフが使用される。これはジャンクションツリーアルゴリズムの重要なステップであり、グラフィカルモデルでの信頼度の伝播に使用される。
したがって、以cpdagで作成したグラフでは、有向、無向Arcが混在する。

Bayesian network learned via Score-based methods

model:
[partially directed graph]
nodes: 16
arcs: 23
undirected arcs: 7
directed arcs: 16
average markov blanket size: 4.38
average neighbourhood size: 2.88
average branching factor: 1.00

learning algorithm: Hill-Climbing
score: BIC (cond. Gauss.)
penalization coefficient: 2.038769
tests used in the learning procedure: 462
optimized: TRUE

Bayesian network learned via Score-based methods

model:
[partially directed graph]
nodes: 16
arcs: 27
undirected arcs: 6
directed arcs: 21
average markov blanket size: 5.25
average neighbourhood size: 3.38
average branching factor: 1.31

learning algorithm: Hill-Climbing
score: BIC (cond. Gauss.)
penalization coefficient: 2.038769
tests used in the learning procedure: 521
optimized: TRUE

2. bnlearn解析:その2:続けてブートストラップ法

0.5

0.5

2つのDAGを比較する。

2つのDAGを比較する。

推測2

0.727
0.91

尤度は、ROX<=6.1では人工呼吸率は78.3%、ROX<=4.88では人工呼吸率は90.0% であったので、ほぼ近い。

0.181
0.306

0.82
0.694

0.276
0.1

0.159
0.164

0.077
0.1

尤度は、ROX<=6.1では死亡率は8.0%、ROX<=4.88では人工呼吸率は10.0% であったのでほぼ近いところ。

0.135
0.137

3. bnlearn解析:その3:dealライブラリを用いて、データから学習させてモデル構築

## 17 ( 6 discrete+ 11 ) nodes;score= ;relscore=
1 MV discrete(2)
2 ROX discrete(2)
3 LIV discrete(2)
4 Gender discrete(2)
5 Age continuous()
6 BMI continuous()
7 WBC continuous()
8 Cr continuous()
9 CRP continuous()
10 LDH continuous()
11 Ddim continuous()
12 PSI continuous()
13 CCI continuous()
14 DtoH continuous()
15 DtoHF continuous()
16 Mortality discrete(2)
17 XYZ discrete(1)
## 17 ( 6 discrete+ 11 ) nodes;score= ;relscore=
1 MV discrete(2)
2 ROX discrete(2)
3 LIV discrete(2)
4 Gender discrete(2)
5 Age continuous()
6 BMI continuous()
7 WBC continuous()
8 Cr continuous()
9 CRP continuous()
10 LDH continuous()
11 Ddim continuous()
12 PSI continuous()
13 CCI continuous()
14 DtoH continuous()
15 DtoHF continuous()
16 Mortality discrete(2)
17 XYZ discrete(1)

Imaginary sample size: 64
Imaginary sample size: 64

Random/Generated Bayesian network

model:
[Mortality][XYZ][Gender|Mortality][LIV|Mortality][DtoHF|Gender:LIV:Mortality]
[MV|LIV:Mortality][ROX|LIV:MV:Mortality][Cr|Gender:LIV:Mortality:ROX]
[Ddim|Gender:LIV:Mortality:ROX][DtoH|DtoHF:Gender:Mortality:ROX]
[CCI|Cr:Gender:ROX][CRP|Cr:Gender:LIV:Mortality]
[LDH|Ddim:Gender:LIV:Mortality:ROX][PSI|CCI:CRP:Cr:LDH:LIV]
[Age|LIV:Mortality:PSI][BMI|Age:LIV:Mortality][WBC|BMI:CRP:ROX]
nodes: 17
arcs: 48
undirected arcs: 0
directed arcs: 48
average markov blanket size: 7.06
average neighbourhood size: 5.65
average branching factor: 2.82

generation algorithm: Empty

Random/Generated Bayesian network

model:
[Mortality][XYZ][Gender|Mortality][LIV|Mortality][DtoHF|Gender:LIV:Mortality]
[MV|LIV:Mortality][ROX|LIV:MV:Mortality][Cr|Gender:LIV:Mortality:ROX]
[Ddim|Gender:LIV:Mortality:ROX][DtoH|DtoHF:Gender:Mortality:ROX]
[CCI|Cr:Gender:ROX][CRP|Cr:Gender:LIV:Mortality]
[LDH|Ddim:Gender:LIV:Mortality:ROX][PSI|CCI:CRP:Cr:LDH:LIV]
[Age|LIV:Mortality:PSI][BMI|Age:LIV:Mortality][WBC|BMI:CRP:ROX]
nodes: 17
arcs: 48
undirected arcs: 0
directed arcs: 48
average markov blanket size: 7.06
average neighbourhood size: 5.65
average branching factor: 2.82

generation algorithm: Empty

‘[MV][Gender][Age][WBC][CRP][XYZ][ROX|MV][LIV|MV][BMI|Age][LDH|MV][DtoH|MV][Mortality|MV][Cr|ROX:Mortality][Ddim|ROX:Gender:Mortality][PSI|Age:CRP:Mortality][DtoHF|DtoH][CCI|ROX:PSI]’
‘[MV][Gender][WBC][CRP][XYZ][ROX|MV][LIV|MV][LDH|MV][DtoH|MV][Mortality|MV][BMI|ROX][Cr|MV:Gender:Mortality][DtoHF|ROX:DtoH][Age|BMI][Ddim|ROX:Gender:BMI:DtoH][PSI|Age:CRP:Mortality][CCI|CRP:PSI]’

‘MV”Gender”Age”WBC”CRP”XYZ”ROX”LIV”BMI”LDH”DtoH”Mortality”Cr”Ddim”PSI”DtoHF”CCI’
‘MV”Gender”WBC”CRP”XYZ”ROX”LIV”LDH”DtoH”Mortality”BMI”Cr”DtoHF”Age”Ddim”PSI”CCI’

‘[MV][Gender][Age][WBC][CRP][XYZ][ROX|MV][LIV|MV][BMI|Age][LDH|MV][DtoH|MV][Mortality|MV][Cr|ROX:Mortality][Ddim|Gender:ROX:Mortality][PSI|Age:CRP:Mortality][DtoHF|DtoH][CCI|ROX:PSI]’
‘[MV][Gender][WBC][CRP][XYZ][ROX|MV][LIV|MV][LDH|MV][DtoH|MV][Mortality|MV][BMI|ROX][Cr|MV:Gender:Mortality][DtoHF|ROX:DtoH][Age|BMI][Ddim|Gender:ROX:DtoH:BMI][PSI|CRP:Mortality:Age][CCI|CRP:PSI]’

‘[MV][Gender][Age][WBC][CRP][XYZ][ROX|MV][LIV|MV][BMI|Age][LDH|MV][DtoH|MV][Mortality|MV][Cr|ROX:Mortality][Ddim|ROX:Gender:Mortality][PSI|Age:CRP:Mortality][DtoHF|DtoH][CCI|ROX:PSI]’
‘[MV][Gender][WBC][CRP][XYZ][ROX|MV][LIV|MV][LDH|MV][DtoH|MV][Mortality|MV][BMI|ROX][Cr|MV:Gender:Mortality][DtoHF|ROX:DtoH][Age|BMI][Ddim|ROX:Gender:BMI:DtoH][PSI|Age:CRP:Mortality][CCI|CRP:PSI]’

Random/Generated Bayesian network

model:
[Age][CRP][Gender][MV][WBC][XYZ][BMI|Age][DtoH|MV][LDH|MV][LIV|MV]
[Mortality|MV][ROX|MV][Cr|Mortality:ROX][Ddim|Gender:Mortality:ROX]
[DtoHF|DtoH][PSI|Age:CRP:Mortality][CCI|PSI:ROX]
nodes: 17
arcs: 17
undirected arcs: 0
directed arcs: 17
average markov blanket size: 2.82
average neighbourhood size: 2.00
average branching factor: 1.00

generation algorithm: Empty

Random/Generated Bayesian network

model:
[CRP][Gender][MV][WBC][XYZ][DtoH|MV][LDH|MV][LIV|MV][Mortality|MV][ROX|MV]
[BMI|ROX][Cr|Gender:MV:Mortality][DtoHF|DtoH:ROX][Age|BMI]
[Ddim|BMI:DtoH:Gender:ROX][PSI|Age:CRP:Mortality][CCI|CRP:PSI]
nodes: 17
arcs: 21
undirected arcs: 0
directed arcs: 21
average markov blanket size: 3.65
average neighbourhood size: 2.47
average branching factor: 1.24

generation algorithm: Empty

推測3

[1] “P(MV=on, ROX=Low)”
0.723
[1] “P(MV=on, ROX=Low)”
0.903
尤度は、ROX<=6.1では人工呼吸率は78.3%、ROX<=4.88では人工呼吸率は90.0% であったので、ほぼ近い。

[1] “P(MV=on, ROX=High)”
0.171
[1] “P(MV=on, ROX=High)”
0.308

[1] “P(Mortality=yes, ROX=Low)”
0.124
[1] “P(Mortality=yes, ROX=Low)”
0.139

参考までに尤度は、ROX<=6.1では死亡率は8.0%、ROX<=4.88では人工呼吸率は10.0% であった。

[1] “P(Mortality=yes, ROX=High)”
0.028
[1] “P(Mortality=yes, ROX=High)”
0.049

[1] “P(MV=on, ROX=Low, LIV=High)”
0.938
[1] “P(MV=on, ROX=Low, LIV=High)”
0.979
[1] “P(MV=off, ROX=Low, LIV=High)”
0.054
[1] “P(MV=off, ROX=Low, LIV=High)”
0.021

[1] “P(MV=ON, ROX=Low, LIV=High, LDH=High)”
0.984
[1] “P(MV=ON, ROX=Low, LIV=High, LDH=High)”
0.997
[1] “P(MV=OFF, ROX=Low, LIV=High, LDH=High)”
0.013
[1] “P(MV=OFF, ROX=Low, LIV=High, LDH=High)”
0.004

[1] “P(MV=ON, ROX=Low, LIV=High, LDH=High, Age=High)”
0.984
[1] “P(MV=ON, ROX=Low, LIV=High, LDH=High, Age=High)”
0.984
[1] “P(MV=OFF, ROX=Low, LIV=High, LDH=High, Age=High)”
0.018
[1] “P(MV=OFF, ROX=Low, LIV=High, LDH=High, Age=High)”
0.007

4. bnlearn解析:その4:すべてをカテゴリカルデータに変換して、Hill-Climb法

A data.frame: 59 × 16
MV ROX LIV Gender Age BMI WBC Cr CRP LDH Ddim PSI CCI DtoH DtoHF Mortality

1 2 1 1 1 2 2 2 2 1 1 1 1 1 1 1
1 2 1 2 2 2 1 1 2 2 2 2 2 1 1 1
1 2 1 2 1 2 2 1 2 1 2 1 1 1 1 1
1 2 1 2 2 1 2 1 1 1 1 1 1 2 2 1
………..

A data.frame: 59 × 16
MV ROX LIV Gender Age BMI WBC Cr CRP LDH Ddim PSI CCI DtoH DtoHF Mortality

1 2 1 1 1 2 2 2 2 1 1 1 1 1 1 1
1 2 1 2 2 2 1 1 2 2 2 2 2 1 1 1
1 2 1 2 1 2 2 1 2 1 2 1 1 1 1 1
1 2 1 2 2 1 2 1 1 1 1 1 1 2 2 1
1 2 1 1 1 2 2 1 2 2 2 1 1 2 2 1
……………

A data.frame: 59 × 16
MV ROX LIV Gender Age BMI WBC Cr CRP LDH Ddim PSI CCI DtoH DtoHF Mortality

1 2 1 1 1 2 2 2 2 1 1 1 1 1 1 1
1 2 1 2 2 2 1 1 2 2 2 2 2 1 1 1
1 2 1 2 1 2 2 1 2 1 2 1 1 1 1 1
1 2 1 2 2 1 2 1 1 1 1 1 1 2 2 1
1 1 1 1 1 2 2 1 2 2 2 1 1 2 2 1
……………

A data.frame: 59 × 16
MV ROX LIV Gender Age BMI WBC Cr CRP LDH Ddim PSI CCI DtoH DtoHF Mortality

1 1 2 1 1 2 1 1 1 1 1 1 1 1 1 1
1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1
1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1
1 1 2 2 2 1 2 1 1 1 1 1 1 2 2 1
1 1 2 1 1 1 1 1 1 1 2 1 1 2 2 1
……………

Bayesian network learned via Score-based methods

model:
[Gender][Cr][CRP][Ddim][CCI|Cr][ROX|CCI][Age|CCI][LIV|ROX][BMI|Age][WBC|ROX]
[PSI|Age][MV|ROX:LIV][LDH|MV:LIV][DtoH|MV][Mortality|MV:ROX:LIV][DtoHF|DtoH]
nodes: 16
arcs: 16
undirected arcs: 0
directed arcs: 16
average markov blanket size: 2.00
average neighbourhood size: 2.00
average branching factor: 1.00

learning algorithm: Hill-Climbing
score: BIC (disc.)
penalization coefficient: 2.038769
tests used in the learning procedure: 292
optimized: TRUE

Bayesian network learned via Score-based methods

model:
[ROX][Gender][WBC][Cr][CRP][Ddim][LIV|ROX][CCI|Cr][MV|ROX:LIV][Age|CCI]
[BMI|Age][LDH|MV:LIV][PSI|Age][DtoH|MV][Mortality|MV:ROX:LIV][DtoHF|DtoH]
nodes: 16
arcs: 14
undirected arcs: 0
directed arcs: 14
average markov blanket size: 1.75
average neighbourhood size: 1.75
average branching factor: 0.88

learning algorithm: Hill-Climbing
score: BIC (disc.)
penalization coefficient: 2.038769
tests used in the learning procedure: 264
optimized: TRUE

推測4

[1] “P(MV=on, ROX=Low)”
0.723
[1] “P(MV=on, ROX=Low)”
0.715
[1] “P(MV=on, ROX=High)”
0.306
[1] “P(MV=on, ROX=High)”
0.309

[1] “P(Mortality=yes, ROX=Low)”
0.079
[1] “P(Mortality=yes, ROX=Low)”
0.078
[1] “P(Mortality=yes, ROX=High)”
0.063
[1] “P(Mortality=yes, ROX=High)”
0.065

[1] “P(MV=on, ROX=Low, LIV=High)”
0.931
[1] “P(MV=on, ROX=Low, LIV=High)”
0.928
[1] “P(MV=off, ROX=Low, LIV=High)”
0
[1] “P(MV=off, ROX=Low, LIV=High)”
0

5. bnlearn解析:その55:カテゴリカルデータを用いて、ブートストラップ法

0.48

0.495

推測5

0.723
0.893

尤度は、ROX<=6.1では人工呼吸率は78.3%、ROX<=4.88では人工呼吸率は90.0% であったので、ほぼ近い。

0.178
0.307

0.828
0.692

0.292
0.1

0.17
0.167

0.088
0.086

参考までに尤度は、ROX<=6.1では死亡率は8.0%、ROX<=4.88では人工呼吸率は10.0% であった。

0.139
0.136