eachanjohnson commited on
Commit
98b8ff5
·
verified ·
1 Parent(s): 201cc71

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,140 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: tabular-regression
4
+ tags:
5
+ - chemistry
6
+ - microbiology
7
+ - antibiotics
8
+ library_name: duvida
9
+ datasets:
10
+ - scbirlab/thomas-2018-spark-wt
11
+ ---
12
+
13
+ # Predictor of _Francisella tularensis_ MICs
14
+
15
+ _Updated:_ Fri 28 Mar 18:11:30 GMT 2025
16
+
17
+ Trained on the _Francisella tularensis_, WT accumulator phenotype subset of the [human-curated SPARK dataset](https://doi.org/10.1021/acsinfecdis.8b00193) ( rows in total for _Francisella tularensis_).
18
+
19
+ ## Model details
20
+
21
+ This model was trained using [our Duvida framework](https://github.com/scbirlab/duvida),
22
+ as a result of hyperparameter searches and selecting the model that performs best on unseen test data
23
+ (from a scaffold split).
24
+
25
+ Duvida also saves the training data in this checkpoint to allows the calculation of uncertainty metrics
26
+ based on that training data.
27
+
28
+ This model is the best regression model from a hyperparameter search, determined
29
+ by Spearman's $\rho$ on a held-out test set not used in training or early stopping.
30
+
31
+ ### Model architecture
32
+
33
+ - **Regression**
34
+
35
+ ```json
36
+
37
+ {
38
+ "dropout": 0.2,
39
+ "ensemble_size": 10,
40
+ "extra_featurizers": null,
41
+ "learning_rate": 0.0001,
42
+ "model_class": "FPMLPModelBox",
43
+ "n_hidden": 5,
44
+ "n_units": 16,
45
+ "use_2d": true,
46
+ "use_fp": true
47
+ }
48
+ ```
49
+
50
+ ### Model usage
51
+
52
+ You can use this model with:
53
+
54
+ ```python
55
+ from duvida.autoclasses import AutoModelBox
56
+ modelbox = AutoModelBox.from_pretrained("hf://scbirlab/spark-dv-2503-ftul")
57
+ modelbox.predict(filename=..., inputs=[...], columns=[...]) # make predictions on your own data
58
+ ```
59
+
60
+ ## Training details
61
+
62
+ - **Dataset:** [SPARK, WT accumulator, _Francisella tularensis_ subset](https://huggingface.co/datasets/scbirlab/thomas-2018-spark-wt)
63
+ - **Input column:** smiles
64
+ - **Output column:** pmic
65
+ - **Split type:** Murcko scaffold
66
+ - **Split proportions:**
67
+ - 70% training (6770 rows)
68
+ - 15% validation (for early stopping) (1450 rows)
69
+ - 15% test (for selecting hyperparameters) (1451 rows)
70
+
71
+ Here is the training log:
72
+
73
+ <img src="training-log.png" width=450>
74
+
75
+ And these are the evaluation scores.
76
+
77
+ Train (6770 rows):
78
+
79
+ ```json
80
+
81
+ {
82
+ "Pearson r": 0.5315396541042405,
83
+ "RMSE": 0.1128498837351799,
84
+ "Spearman rho": 0.6696014953216823
85
+ }
86
+ ```
87
+
88
+ Validation (1450 rows):
89
+
90
+ ```json
91
+
92
+ {
93
+ "Pearson r": 0.2988704143091524,
94
+ "RMSE": 0.12225893139839172,
95
+ "Spearman rho": 0.5774266044687701
96
+ }
97
+ ```
98
+
99
+
100
+ Test (1451 rows):
101
+
102
+ ```json
103
+
104
+ {
105
+ "Pearson r": 0.3527353695023497,
106
+ "RMSE": 0.10296344012022018,
107
+ "Spearman rho": 0.618750940647444
108
+ }
109
+ ```
110
+
111
+ ## Training data details
112
+
113
+ The training data were collated by the authors of:
114
+
115
+ > Joe Thomas, Marc Navre, Aileen Rubio, and Allan Coukell
116
+ > Shared Platform for Antibiotic Research and Knowledge: A Collaborative Tool to SPARK Antibiotic Discovery
117
+ > ACS Infectious Diseases 2018 4 (11), 1536-1539
118
+ > DOI: 10.1021/acsinfecdis.8b00193
119
+
120
+ We cleaned the original SPARK dataset to subset the most relevant columns, remove empty values,
121
+ give succint column titles, and split by species.
122
+
123
+ This particular dataset retains only measurements on bacteria with wild-type accumulation phenotypes.
124
+
125
+ ### Dataset Sources
126
+
127
+ - **Repository:** https://www.collaborativedrug.com/spark-data-downloads
128
+ - **Paper:** https://doi.org/10.1021/acsinfecdis.8b00193
129
+
130
+ ### Data Collection and Processing
131
+
132
+ Data were processed using [schemist](https://github.com/scbirlab/schemist), a tool for processing chemical datasets.
133
+
134
+ The SMILES strings have been canonicalized, and split into training (70%), validation (15%), and test (15%) sets
135
+ by Murcko scaffold for each species with more than 1000 entries. Additional features like molecular weight and
136
+ topological polar surface area have also been calculated.
137
+
138
+ ### Who are the source data producers?
139
+
140
+ Joe Thomas, Marc Navre, Aileen Rubio, and Allan Coukell
data-config.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_default_cache": "cache/duvida/data",
3
+ "_in_key": "inputs",
4
+ "_input_cols": [
5
+ "smiles"
6
+ ],
7
+ "_label_cols": [
8
+ "pmic"
9
+ ],
10
+ "_out_key": "labels",
11
+ "input_shape": [
12
+ 2248
13
+ ],
14
+ "output_shape": [
15
+ 1
16
+ ]
17
+ }
data-load-args.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cache": "/nemo/lab/johnsone/home/users/johnsoe/projects/abx-discovery-strategy/models/spark/Francisella-tularensis/31/cache",
3
+ "features": [
4
+ "smiles"
5
+ ],
6
+ "filename": "/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Francisella-tularensis/scaffold-split-train.csv.gz",
7
+ "labels": [
8
+ "pmic"
9
+ ]
10
+ }
eval-metrics_test.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "Pearson r": 0.3527353695023497,
3
+ "RMSE": 0.10296344012022018,
4
+ "Spearman rho": 0.618750940647444
5
+ }
eval-metrics_train.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "Pearson r": 0.5315396541042405,
3
+ "RMSE": 0.1128498837351799,
4
+ "Spearman rho": 0.6696014953216823
5
+ }
eval-metrics_validation.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "Pearson r": 0.2988704143091524,
3
+ "RMSE": 0.12225893139839172,
4
+ "Spearman rho": 0.5774266044687701
5
+ }
input-data.hf/data-00000-of-00001.arrow ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6cf123b1b5ea8f059779d5f1785c9db1193db46a4750e6bf7f2e332a4c619b67
3
+ size 690568
input-data.hf/dataset_info.json ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "builder_name": "csv",
3
+ "citation": "",
4
+ "config_name": "default",
5
+ "dataset_name": "csv",
6
+ "dataset_size": 2688094,
7
+ "description": "",
8
+ "download_checksums": {
9
+ "/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Francisella-tularensis/scaffold-split-train.csv.gz": {
10
+ "num_bytes": 536218,
11
+ "checksum": null
12
+ }
13
+ },
14
+ "download_size": 536218,
15
+ "features": {
16
+ "smiles": {
17
+ "dtype": "string",
18
+ "_type": "Value"
19
+ },
20
+ "inputs": {
21
+ "feature": {
22
+ "dtype": "string",
23
+ "_type": "Value"
24
+ },
25
+ "_type": "Sequence"
26
+ },
27
+ "labels": {
28
+ "feature": {
29
+ "dtype": "float64",
30
+ "_type": "Value"
31
+ },
32
+ "_type": "Sequence"
33
+ }
34
+ },
35
+ "homepage": "",
36
+ "license": "",
37
+ "size_in_bytes": 3224312,
38
+ "splits": {
39
+ "train": {
40
+ "name": "train",
41
+ "num_bytes": 2688094,
42
+ "num_examples": 6770,
43
+ "dataset_name": "csv"
44
+ }
45
+ },
46
+ "version": {
47
+ "version_str": "0.0.0",
48
+ "major": 0,
49
+ "minor": 0,
50
+ "patch": 0
51
+ }
52
+ }
input-data.hf/state.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_data_files": [
3
+ {
4
+ "filename": "data-00000-of-00001.arrow"
5
+ }
6
+ ],
7
+ "_fingerprint": "98c4ff865d26cf29",
8
+ "_format_columns": null,
9
+ "_format_kwargs": {},
10
+ "_format_type": null,
11
+ "_output_all_columns": false,
12
+ "_split": "train"
13
+ }
logs-csv/lightning_logs/version_0/hparams.yaml ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ dropout: 0.2
2
+ ensemble_size: 10
3
+ extra_featurizers: null
4
+ learning_rate: 0.0001
5
+ n_hidden: 5
6
+ n_input: 2248
7
+ n_out: 1
8
+ n_units: 16
9
+ optimizer: !!python/name:torch.optim.adam.Adam ''
10
+ reduce_lr_on_plateau: true
11
+ reduce_lr_patience: 10
12
+ use_2d: true
13
+ use_fp: true
logs-csv/lightning_logs/version_0/metrics.csv ADDED
@@ -0,0 +1,155 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ epoch,learning_rate,loss,step,val_loss
2
+ 0,9.999999747378752e-05,,423,1.035925030708313
3
+ 0,,9.994428634643555,423,
4
+ 1,9.999999747378752e-05,,847,0.6954950094223022
5
+ 1,,3.126251697540283,847,
6
+ 2,9.999999747378752e-05,,1271,0.6660816073417664
7
+ 2,,2.558513641357422,1271,
8
+ 3,9.999999747378752e-05,,1695,0.5705931186676025
9
+ 3,,2.1037137508392334,1695,
10
+ 4,9.999999747378752e-05,,2119,0.49348127841949463
11
+ 4,,1.747497320175171,2119,
12
+ 5,9.999999747378752e-05,,2543,0.3814232051372528
13
+ 5,,1.4690725803375244,2543,
14
+ 6,9.999999747378752e-05,,2967,0.34099045395851135
15
+ 6,,1.2258453369140625,2967,
16
+ 7,9.999999747378752e-05,,3391,0.29116082191467285
17
+ 7,,1.0339215993881226,3391,
18
+ 8,9.999999747378752e-05,,3815,0.2379973977804184
19
+ 8,,0.9013144969940186,3815,
20
+ 9,9.999999747378752e-05,,4239,0.19984234869480133
21
+ 9,,0.7816758155822754,4239,
22
+ 10,9.999999747378752e-05,,4663,0.1637083888053894
23
+ 10,,0.6932660937309265,4663,
24
+ 11,9.999999747378752e-05,,5087,0.1320888251066208
25
+ 11,,0.6253244876861572,5087,
26
+ 12,9.999999747378752e-05,,5511,0.11645940691232681
27
+ 12,,0.5718656778335571,5511,
28
+ 13,9.999999747378752e-05,,5935,0.10740785300731659
29
+ 13,,0.5331162214279175,5935,
30
+ 14,9.999999747378752e-05,,6359,0.09628769755363464
31
+ 14,,0.5030606389045715,6359,
32
+ 15,9.999999747378752e-05,,6783,0.09018626809120178
33
+ 15,,0.47158119082450867,6783,
34
+ 16,9.999999747378752e-05,,7207,0.08879426121711731
35
+ 16,,0.4476066827774048,7207,
36
+ 17,9.999999747378752e-05,,7631,0.0879310742020607
37
+ 17,,0.4256654381752014,7631,
38
+ 18,9.999999747378752e-05,,8055,0.08867588639259338
39
+ 18,,0.4078732132911682,8055,
40
+ 19,9.999999747378752e-05,,8479,0.08685865998268127
41
+ 19,,0.3951185345649719,8479,
42
+ 20,9.999999747378752e-05,,8903,0.08222781866788864
43
+ 20,,0.3779689073562622,8903,
44
+ 21,9.999999747378752e-05,,9327,0.08167942613363266
45
+ 21,,0.3682374358177185,9327,
46
+ 22,9.999999747378752e-05,,9751,0.08089277893304825
47
+ 22,,0.3554483950138092,9751,
48
+ 23,9.999999747378752e-05,,10175,0.08071091026067734
49
+ 23,,0.3422313630580902,10175,
50
+ 24,9.999999747378752e-05,,10599,0.07949075102806091
51
+ 24,,0.3378938138484955,10599,
52
+ 25,9.999999747378752e-05,,11023,0.08147379755973816
53
+ 25,,0.32231298089027405,11023,
54
+ 26,9.999999747378752e-05,,11447,0.07940167188644409
55
+ 26,,0.3146795630455017,11447,
56
+ 27,9.999999747378752e-05,,11871,0.07670588791370392
57
+ 27,,0.3067043125629425,11871,
58
+ 28,9.999999747378752e-05,,12295,0.07626417279243469
59
+ 28,,0.30274152755737305,12295,
60
+ 29,9.999999747378752e-05,,12719,0.0779544860124588
61
+ 29,,0.2946133017539978,12719,
62
+ 30,9.999999747378752e-05,,13143,0.07719951868057251
63
+ 30,,0.29268163442611694,13143,
64
+ 31,9.999999747378752e-05,,13567,0.07851152867078781
65
+ 31,,0.28270256519317627,13567,
66
+ 32,9.999999747378752e-05,,13991,0.07809314876794815
67
+ 32,,0.2779752314090729,13991,
68
+ 33,9.999999747378752e-05,,14415,0.07809542864561081
69
+ 33,,0.27120184898376465,14415,
70
+ 34,9.999999747378752e-05,,14839,0.07520253211259842
71
+ 34,,0.2669171392917633,14839,
72
+ 35,9.999999747378752e-05,,15263,0.07598806172609329
73
+ 35,,0.26092231273651123,15263,
74
+ 36,9.999999747378752e-05,,15687,0.0762818232178688
75
+ 36,,0.2556494176387787,15687,
76
+ 37,9.999999747378752e-05,,16111,0.07429581135511398
77
+ 37,,0.252155065536499,16111,
78
+ 38,9.999999747378752e-05,,16535,0.07375046610832214
79
+ 38,,0.24888691306114197,16535,
80
+ 39,9.999999747378752e-05,,16959,0.07450631260871887
81
+ 39,,0.24435901641845703,16959,
82
+ 40,9.999999747378752e-05,,17383,0.0731251984834671
83
+ 40,,0.2419043779373169,17383,
84
+ 41,9.999999747378752e-05,,17807,0.07379607111215591
85
+ 41,,0.23947010934352875,17807,
86
+ 42,9.999999747378752e-05,,18231,0.0751994252204895
87
+ 42,,0.23747314512729645,18231,
88
+ 43,9.999999747378752e-05,,18655,0.07375573366880417
89
+ 43,,0.23268665373325348,18655,
90
+ 44,9.999999747378752e-05,,19079,0.07372378557920456
91
+ 44,,0.23062841594219208,19079,
92
+ 45,9.999999747378752e-05,,19503,0.07421385496854782
93
+ 45,,0.22760289907455444,19503,
94
+ 46,9.999999747378752e-05,,19927,0.07232991605997086
95
+ 46,,0.2254146933555603,19927,
96
+ 47,9.999999747378752e-05,,20351,0.07266100496053696
97
+ 47,,0.22298133373260498,20351,
98
+ 48,9.999999747378752e-05,,20775,0.07445989549160004
99
+ 48,,0.22175002098083496,20775,
100
+ 49,9.999999747378752e-05,,21199,0.07215843349695206
101
+ 49,,0.21755626797676086,21199,
102
+ 50,9.999999747378752e-05,,21623,0.07133714854717255
103
+ 50,,0.21510690450668335,21623,
104
+ 51,9.999999747378752e-05,,22047,0.0713556781411171
105
+ 51,,0.21215327084064484,22047,
106
+ 52,9.999999747378752e-05,,22471,0.07214436680078506
107
+ 52,,0.20858827233314514,22471,
108
+ 53,9.999999747378752e-05,,22895,0.07109259814023972
109
+ 53,,0.2087220847606659,22895,
110
+ 54,9.999999747378752e-05,,23319,0.07146094739437103
111
+ 54,,0.20866826176643372,23319,
112
+ 55,9.999999747378752e-05,,23743,0.07050544768571854
113
+ 55,,0.2037658840417862,23743,
114
+ 56,9.999999747378752e-05,,24167,0.06917674839496613
115
+ 56,,0.20306703448295593,24167,
116
+ 57,9.999999747378752e-05,,24591,0.07003764808177948
117
+ 57,,0.1986663043498993,24591,
118
+ 58,9.999999747378752e-05,,25015,0.07062102109193802
119
+ 58,,0.19890999794006348,25015,
120
+ 59,9.999999747378752e-05,,25439,0.07197015732526779
121
+ 59,,0.19659635424613953,25439,
122
+ 60,9.999999747378752e-05,,25863,0.07174472510814667
123
+ 60,,0.19441528618335724,25863,
124
+ 61,9.999999747378752e-05,,26287,0.06968678534030914
125
+ 61,,0.1919819861650467,26287,
126
+ 62,9.999999747378752e-05,,26711,0.07158518582582474
127
+ 62,,0.1918257474899292,26711,
128
+ 63,9.999999747378752e-05,,27135,0.07025542110204697
129
+ 63,,0.18889901041984558,27135,
130
+ 64,9.999999747378752e-05,,27559,0.07171856611967087
131
+ 64,,0.1903674602508545,27559,
132
+ 65,9.999999747378752e-05,,27983,0.0730924904346466
133
+ 65,,0.18704192340373993,27983,
134
+ 66,9.999999747378752e-05,,28407,0.07027459144592285
135
+ 66,,0.18396589159965515,28407,
136
+ 67,9.999999747378752e-05,,28831,0.07252991199493408
137
+ 67,,0.18247202038764954,28831,
138
+ 68,9.999999747378752e-06,,29255,0.07075433433055878
139
+ 68,,0.18238519132137299,29255,
140
+ 69,9.999999747378752e-06,,29679,0.07024384289979935
141
+ 69,,0.18167123198509216,29679,
142
+ 70,9.999999747378752e-06,,30103,0.07038987427949905
143
+ 70,,0.18080022931098938,30103,
144
+ 71,9.999999747378752e-06,,30527,0.07059729099273682
145
+ 71,,0.18018695712089539,30527,
146
+ 72,9.999999747378752e-06,,30951,0.07024990022182465
147
+ 72,,0.17985540628433228,30951,
148
+ 73,9.999999747378752e-06,,31375,0.07034288346767426
149
+ 73,,0.18019035458564758,31375,
150
+ 74,9.999999747378752e-06,,31799,0.07050464302301407
151
+ 74,,0.18130117654800415,31799,
152
+ 75,9.999999747378752e-06,,32223,0.07073500752449036
153
+ 75,,0.18008898198604584,32223,
154
+ 76,9.999999747378752e-06,,32647,0.07040064036846161
155
+ 76,,0.1794595569372177,32647,
logs/lightning_logs/version_0/events.out.tfevents.1743097022.cn021.1601982.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7d84848c58ca2aefccca85ba1a6fde513b752228128875c1b4e296f126e8947a
3
+ size 15326
logs/lightning_logs/version_0/events.out.tfevents.1743097061.cn004.4053787.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ecc64be71b663e97bea7fa2a3ea1d00587000c44970a903f58789ca3dbe54190
3
+ size 18851
logs/lightning_logs/version_0/hparams.yaml ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ dropout: 0.2
2
+ ensemble_size: 10
3
+ extra_featurizers: null
4
+ learning_rate: 0.0001
5
+ n_hidden: 5
6
+ n_input: 2248
7
+ n_out: 1
8
+ n_units: 16
9
+ optimizer: !!python/name:torch.optim.adam.Adam ''
10
+ reduce_lr_on_plateau: true
11
+ reduce_lr_patience: 10
12
+ use_2d: true
13
+ use_fp: true
metrics.csv ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ split,split_filename,config_i,model_class,n_parameters,filename,features,labels,cache,extra_featurizers,use_2d,use_fp,dropout,ensemble_size,learning_rate,n_hidden,n_units,val_filename,epochs,batch_size,RMSE,Pearson r,Spearman rho
2
+ train,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Francisella-tularensis/scaffold-split-train.csv.gz,31,FPMLPModelBox,370890,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Francisella-tularensis/scaffold-split-train.csv.gz,['smiles'],['pmic'],/nemo/lab/johnsone/home/users/johnsoe/projects/abx-discovery-strategy/models/spark/Francisella-tularensis/31/cache,,True,True,0.2,10,0.0001,5,16,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Francisella-tularensis/scaffold-split-validation.csv.gz,2000,16,0.1128498837351799,0.5315396541042405,0.6696014953216823
3
+ validation,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Francisella-tularensis/scaffold-split-validation.csv.gz,31,FPMLPModelBox,370890,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Francisella-tularensis/scaffold-split-train.csv.gz,['smiles'],['pmic'],/nemo/lab/johnsone/home/users/johnsoe/projects/abx-discovery-strategy/models/spark/Francisella-tularensis/31/cache,,True,True,0.2,10,0.0001,5,16,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Francisella-tularensis/scaffold-split-validation.csv.gz,2000,16,0.12225893139839172,0.2988704143091524,0.5774266044687701
4
+ test,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Francisella-tularensis/scaffold-split-test.csv.gz,31,FPMLPModelBox,370890,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Francisella-tularensis/scaffold-split-train.csv.gz,['smiles'],['pmic'],/nemo/lab/johnsone/home/users/johnsoe/projects/abx-discovery-strategy/models/spark/Francisella-tularensis/31/cache,,True,True,0.2,10,0.0001,5,16,/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Francisella-tularensis/scaffold-split-validation.csv.gz,2000,16,0.10296344012022018,0.3527353695023497,0.618750940647444
modelbox-config.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dropout": 0.2,
3
+ "ensemble_size": 10,
4
+ "extra_featurizers": null,
5
+ "learning_rate": 0.0001,
6
+ "model_class": "FPMLPModelBox",
7
+ "n_hidden": 5,
8
+ "n_units": 16,
9
+ "use_2d": true,
10
+ "use_fp": true
11
+ }
params.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0f2ed8906bc0009c6266a5df23d985f883001007c931bd1282a6a1307846af1a
3
+ size 1549446
predictions_test.csv.gz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c79d4846ed3e3721e1be8073c169740f7ba2ca18dd4a300aef7abb048e8a8ad6
3
+ size 1577745
predictions_train.csv.gz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:47ef7dd44f4699d8dcab5542da85685f080e8df246bff4a33f5895e3c4537eb9
3
+ size 6527542
predictions_validation.csv.gz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7a331e2554bdcd466931450af466115f313e63ce472fa0e3ff821c0266fac755
3
+ size 1514064
repo-name.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ scbirlab/spark-dv-2503-ftul
training-args.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "batch_size": 16,
3
+ "epochs": 2000,
4
+ "val_filename": "/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Francisella-tularensis/scaffold-split-validation.csv.gz"
5
+ }
training-data.hf/cache-44d673fcffb8a7e3.arrow ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:31ccaae78ca2a3d642b795c6e5a658f3832d10f97f69f0fb3c14a5a7756fe8f5
3
+ size 122659696
training-data.hf/cache-a76fe32afdf81d9d.arrow ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f156c8dd5575d670ffd5726bd3c87974337d5febc3a3596ea91d04a6e2b7e515
3
+ size 122659696
training-data.hf/data-00000-of-00001.arrow ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4028244896cd86ac3cb9686229d7c8d3ac4f0517062a65a0f5833aa8b9853fb0
3
+ size 122152664
training-data.hf/dataset_info.json ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "builder_name": "csv",
3
+ "citation": "",
4
+ "config_name": "default",
5
+ "dataset_name": "csv",
6
+ "dataset_size": 2688094,
7
+ "description": "",
8
+ "download_checksums": {
9
+ "/nemo/lab/johnsone/home/users/johnsoe/data/datasets/thomas-2018-spark-wt/Francisella-tularensis/scaffold-split-train.csv.gz": {
10
+ "num_bytes": 536218,
11
+ "checksum": null
12
+ }
13
+ },
14
+ "download_size": 536218,
15
+ "features": {
16
+ "smiles": {
17
+ "dtype": "string",
18
+ "_type": "Value"
19
+ },
20
+ "inputs": {
21
+ "feature": {
22
+ "dtype": "float64",
23
+ "_type": "Value"
24
+ },
25
+ "_type": "Sequence"
26
+ },
27
+ "labels": {
28
+ "feature": {
29
+ "dtype": "float64",
30
+ "_type": "Value"
31
+ },
32
+ "_type": "Sequence"
33
+ }
34
+ },
35
+ "homepage": "",
36
+ "license": "",
37
+ "size_in_bytes": 3224312,
38
+ "splits": {
39
+ "train": {
40
+ "name": "train",
41
+ "num_bytes": 2688094,
42
+ "num_examples": 6770,
43
+ "dataset_name": "csv"
44
+ }
45
+ },
46
+ "version": {
47
+ "version_str": "0.0.0",
48
+ "major": 0,
49
+ "minor": 0,
50
+ "patch": 0
51
+ }
52
+ }
training-data.hf/state.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_data_files": [
3
+ {
4
+ "filename": "data-00000-of-00001.arrow"
5
+ }
6
+ ],
7
+ "_fingerprint": "c9efbdfca8a5884e",
8
+ "_format_columns": null,
9
+ "_format_kwargs": {
10
+ "dtype": "float"
11
+ },
12
+ "_format_type": "numpy",
13
+ "_output_all_columns": false,
14
+ "_split": "train"
15
+ }
training-log.csv ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ epoch,step,learning_rate,loss,val_loss
2
+ 0,423,9.999999747378752e-05,9.994428634643556,1.035925030708313
3
+ 1,847,9.999999747378752e-05,3.126251697540283,0.6954950094223022
4
+ 2,1271,9.999999747378752e-05,2.558513641357422,0.6660816073417664
5
+ 3,1695,9.999999747378752e-05,2.1037137508392334,0.5705931186676025
6
+ 4,2119,9.999999747378752e-05,1.747497320175171,0.4934812784194946
7
+ 5,2543,9.999999747378752e-05,1.4690725803375244,0.3814232051372528
8
+ 6,2967,9.999999747378752e-05,1.2258453369140625,0.3409904539585113
9
+ 7,3391,9.999999747378752e-05,1.0339215993881226,0.2911608219146728
10
+ 8,3815,9.999999747378752e-05,0.9013144969940186,0.2379973977804184
11
+ 9,4239,9.999999747378752e-05,0.7816758155822754,0.1998423486948013
12
+ 10,4663,9.999999747378752e-05,0.6932660937309265,0.1637083888053894
13
+ 11,5087,9.999999747378752e-05,0.6253244876861572,0.1320888251066208
14
+ 12,5511,9.999999747378752e-05,0.5718656778335571,0.1164594069123268
15
+ 13,5935,9.999999747378752e-05,0.5331162214279175,0.1074078530073165
16
+ 14,6359,9.999999747378752e-05,0.5030606389045715,0.0962876975536346
17
+ 15,6783,9.999999747378752e-05,0.4715811908245086,0.0901862680912017
18
+ 16,7207,9.999999747378752e-05,0.4476066827774048,0.0887942612171173
19
+ 17,7631,9.999999747378752e-05,0.4256654381752014,0.0879310742020607
20
+ 18,8055,9.999999747378752e-05,0.4078732132911682,0.0886758863925933
21
+ 19,8479,9.999999747378752e-05,0.3951185345649719,0.0868586599826812
22
+ 20,8903,9.999999747378752e-05,0.3779689073562622,0.0822278186678886
23
+ 21,9327,9.999999747378752e-05,0.3682374358177185,0.0816794261336326
24
+ 22,9751,9.999999747378752e-05,0.3554483950138092,0.0808927789330482
25
+ 23,10175,9.999999747378752e-05,0.3422313630580902,0.0807109102606773
26
+ 24,10599,9.999999747378752e-05,0.3378938138484955,0.0794907510280609
27
+ 25,11023,9.999999747378752e-05,0.322312980890274,0.0814737975597381
28
+ 26,11447,9.999999747378752e-05,0.3146795630455017,0.079401671886444
29
+ 27,11871,9.999999747378752e-05,0.3067043125629425,0.0767058879137039
30
+ 28,12295,9.999999747378752e-05,0.302741527557373,0.0762641727924346
31
+ 29,12719,9.999999747378752e-05,0.2946133017539978,0.0779544860124588
32
+ 30,13143,9.999999747378752e-05,0.2926816344261169,0.0771995186805725
33
+ 31,13567,9.999999747378752e-05,0.2827025651931762,0.0785115286707878
34
+ 32,13991,9.999999747378752e-05,0.2779752314090729,0.0780931487679481
35
+ 33,14415,9.999999747378752e-05,0.2712018489837646,0.0780954286456108
36
+ 34,14839,9.999999747378752e-05,0.2669171392917633,0.0752025321125984
37
+ 35,15263,9.999999747378752e-05,0.2609223127365112,0.0759880617260932
38
+ 36,15687,9.999999747378752e-05,0.2556494176387787,0.0762818232178688
39
+ 37,16111,9.999999747378752e-05,0.252155065536499,0.0742958113551139
40
+ 38,16535,9.999999747378752e-05,0.2488869130611419,0.0737504661083221
41
+ 39,16959,9.999999747378752e-05,0.244359016418457,0.0745063126087188
42
+ 40,17383,9.999999747378752e-05,0.2419043779373169,0.0731251984834671
43
+ 41,17807,9.999999747378752e-05,0.2394701093435287,0.0737960711121559
44
+ 42,18231,9.999999747378752e-05,0.2374731451272964,0.0751994252204895
45
+ 43,18655,9.999999747378752e-05,0.2326866537332534,0.0737557336688041
46
+ 44,19079,9.999999747378752e-05,0.230628415942192,0.0737237855792045
47
+ 45,19503,9.999999747378752e-05,0.2276028990745544,0.0742138549685478
48
+ 46,19927,9.999999747378752e-05,0.2254146933555603,0.0723299160599708
49
+ 47,20351,9.999999747378752e-05,0.2229813337326049,0.0726610049605369
50
+ 48,20775,9.999999747378752e-05,0.2217500209808349,0.0744598954916
51
+ 49,21199,9.999999747378752e-05,0.2175562679767608,0.072158433496952
52
+ 50,21623,9.999999747378752e-05,0.2151069045066833,0.0713371485471725
53
+ 51,22047,9.999999747378752e-05,0.2121532708406448,0.0713556781411171
54
+ 52,22471,9.999999747378752e-05,0.2085882723331451,0.072144366800785
55
+ 53,22895,9.999999747378752e-05,0.2087220847606659,0.0710925981402397
56
+ 54,23319,9.999999747378752e-05,0.2086682617664337,0.071460947394371
57
+ 55,23743,9.999999747378752e-05,0.2037658840417862,0.0705054476857185
58
+ 56,24167,9.999999747378752e-05,0.2030670344829559,0.0691767483949661
59
+ 57,24591,9.999999747378752e-05,0.1986663043498993,0.0700376480817794
60
+ 58,25015,9.999999747378752e-05,0.1989099979400634,0.070621021091938
61
+ 59,25439,9.999999747378752e-05,0.1965963542461395,0.0719701573252677
62
+ 60,25863,9.999999747378752e-05,0.1944152861833572,0.0717447251081466
63
+ 61,26287,9.999999747378752e-05,0.1919819861650467,0.0696867853403091
64
+ 62,26711,9.999999747378752e-05,0.1918257474899292,0.0715851858258247
65
+ 63,27135,9.999999747378752e-05,0.1888990104198455,0.0702554211020469
66
+ 64,27559,9.999999747378752e-05,0.1903674602508545,0.0717185661196708
67
+ 65,27983,9.999999747378752e-05,0.1870419234037399,0.0730924904346466
68
+ 66,28407,9.999999747378752e-05,0.1839658915996551,0.0702745914459228
69
+ 67,28831,9.999999747378752e-05,0.1824720203876495,0.072529911994934
70
+ 68,29255,9.999999747378752e-06,0.1823851913213729,0.0707543343305587
71
+ 69,29679,9.999999747378752e-06,0.1816712319850921,0.0702438428997993
72
+ 70,30103,9.999999747378752e-06,0.1808002293109893,0.070389874279499
73
+ 71,30527,9.999999747378752e-06,0.1801869571208953,0.0705972909927368
74
+ 72,30951,9.999999747378752e-06,0.1798554062843322,0.0702499002218246
75
+ 73,31375,9.999999747378752e-06,0.1801903545856475,0.0703428834676742
76
+ 74,31799,9.999999747378752e-06,0.1813011765480041,0.070504643023014
77
+ 75,32223,9.999999747378752e-06,0.1800889819860458,0.0707350075244903
78
+ 76,32647,9.999999747378752e-06,0.1794595569372177,0.0704006403684616
training-log.png ADDED