Research Skills
Bioinformatics & Wet Lab competencies from doctoral research
Research Focus
Bridging computational and experimental cancer research — from AI-driven drug discovery pipelines (SieveAI) and molecular dynamics simulations (coarse-grain MARTINI for membranes, atomistic CHARMM36 for proteins) to cell-based validation assays. Specializing in protein-ligand docking, protein-protein interactions, structure prediction (AlphaFold2, SwissModel, I-TASSER, Rosetta), MMPBSA binding free energy, and exosomal miRNA analysis. Automated workflows from PDB preparation through docking, MD, and result extraction — reducing months of manual work to hours.
Bioinformatics & Computational Biology
Computational drug discovery, molecular dynamics, genomics data analysis, and pipeline development. Built and published multiple open-source tools.
Protein-Ligand Docking
Molecular docking to predict drug-target binding — used AutoDock Vina, SwissDock, and PATCHDOCK to screen thousands of compounds and identify potential therapeutic candidates for cancer targets like CXCL9/10 and SKP2.
- AutoDock Vina — batch protein-ligand docking (1,950+ complexes)
- Protein-protein docking — HADDOCK, pyDockWeb, PatchDock, ClusPro
- Protein-protein interaction — CD151 cholesterol binding, TWIST1-OGT/OGase, survivin-caspase complexes
- Protein-complex docking — ternary systems (survivin + OBPHA + caspase-3/7/9)
- Peptide-peptide docking — TWIST1-OGT/OGase interactions
- Ligand preparation — AutoDockTools, OpenBabel, PubChemPy 3D SDF, SwissParam topology
- Post-docking analysis — ChimeraX H-bond/contact automation, PLIP, PRODIGY
- Cancer targets — CXCL9/10, SKP2, PPARγ, BRCA1, GRK2, CD63-VEGF, p53, Bcl-2, PDGFR
- DrugBank & FDA drug screening — 8K+ conformers, immune checkpoints (CD28, TIGIT, PD-1)
Molecular Dynamics Simulation
Molecular dynamics simulations with GROMACS and Desmond — coarse-grain MARTINI for lipid–ligand membrane systems, atomistic CHARMM36/AMBER for protein–ligand complexes, and CHARMM-GUI membrane building. RMSD/RMSF/Rg analysis for validation of docking predictions.
- GROMACS — full MD pipeline (pdb2gmx, editconf, solvate, genion, grompp, mdrun) up to 300ns
- Coarse-grain MD — MARTINI force field for lipid–ligand membrane systems (Lapatinib/DMPC/cholesterol)
- Atomistic MD — CHARMM36 force field for protein–ligand complexes; AMBER/CHARMM36 for membranes
- CHARMM-GUI membrane builder — DMPC, DOPC, DOPS, cholesterol bilayer preparation & embedding
- Desmond MD — local installation & WebGro cloud submission, trajectory analysis
- MMPBSA — gmx_MMPBSA binding free energy calculations (300ns trajectories)
- RMSD / RMSF / Rg / SASA / DSSP — protein stability, secondary structure, solvent analysis
- SwissParam & PRODRG — ligand topology/ITP generation for GROMACS CHARMM force field
- GMXvg — published GROMACS visualization & plotting tool
Protein Structure Prediction
Protein structure prediction with AlphaFold2, SwissModel, I-TASSER, and Rosetta — including mutant modeling, transmembrane protein prediction, protein comparison (RMSD via ChimeraX Matchmaker), and O-GlcNAcylation site prediction.
- AlphaFold2 — protein structure prediction via Google Colab (IL27 complex, CD151, CD63)
- SwissModel — homology modeling + QA (quality assessment) for mutant proteins (3EQH-ALA76GLY)
- I-TASSER — 3D structure prediction for membrane & transmembrane proteins
- Rosetta — protein structure prediction and design
- Avogadro — energy minimization of small molecules & ligand 3D optimization
- Protein structure comparison — ChimeraX Matchmaker (RMSD, primary/secondary/tertiary structure)
- PDB processing — UniProt mapping, PDBTM for transmembrane, chain cleanup, HETATM separation
- O-GlcNAcylation site prediction — YinOYang, dbPTM, PhosphoSitePlus for GRK2 sites (S20, S121, S370)
- Membrane protein resources — PDBTM, TMDock, MemProtMD, PerMemDB, MBPpred, ProteinTools
Genomics & Transcriptomics
GEO dataset mining, TCGA cancer genomics (GDC portal), differential gene expression (PyDGE), miRNA database consolidation (miRBase, ExoCarta, EVmiRNA), ncRNA analysis, pathway enrichment (KEGG, STRING, FunRich), and cancer gene resources (IntOGen, cBioPortal, COSMIC).
- GEO dataset mining — GEOParse querying, GDS database search, sample/platform/series filtering (GSE15852, GSE73002, GSE77348)
- Differential gene expression — PyDGE framework, MCF7 vs MCF10A, normal vs tumor expression analysis
- TCGA cancer genomics — GDC portal data download (gdc-client), TCGA-BRCA miRNA-seq, TNBC cohort filtering, sample type codes
- miRNA database consolidation — miRBase (2,693 mature), ExoCarta, EVmiRNA, miRCancer cross-referencing & Venn analysis
- miRNA target prediction — TargetScan, miRDB, DIANA, miRWalk, miRTarBase validated targets
- Exosomal miRNA analysis — miR-34a, miR-10b, miR-21, miR-9; ExoLoger prediction database
- ncRNA analysis — lncRNA, circRNA, siRNA, piRNA; RNAComposer, UNAFold, MXfold2 for 3D structure
- Pathway & network analysis — KEGG (KEGGScape), WikiPathways, Biocarta, STRING interaction networks, FunRich enrichment
- NCBI/Entrez queries — PubMed search, GDS metadata extraction, Gene database cross-referencing
- Cancer gene resources — IntOGen (BRCA driver genes), cBioPortal, OncoKB, COSMIC, CancerES (IIITD)
- PTM databases — dbPTM, PhosphoSitePlus, O-GlcNAc (oglcnac.mcw.edu), VerSeDa
Drug Discovery Pipeline
Built SieveAI, an automated drug discovery pipeline that orchestrates screening, docking, and scoring in batch — reducing months of manual work to hours. Published and open-sourced with Zenodo DOI.
- SieveAI — end-to-end automated drug discovery pipeline: PDB prep → ligand prep → docking → result extraction → filtering
- Bulk docking automation — 576 cancer genes × 5 ligands = 1,950+ complexes; 9,127 Vina results parsed & filtered
- DrugBank high-throughput screening — GRK2 (3,894 complexes ~60h compute), immune checkpoints (8K+ conformers)
- Automated PDB processing — Python scripts for ATOM/HETATM separation, chain cleanup, grid box calculation, PDBQT conversion
- Bulk ligand preparation — AutoDockTools prepare_ligand4.py, OpenBabel batch PDB→SDF→SMILES, PubChemPy 3D SDF download
- Automated result extraction — Vina score parsing, ChimeraX H-bond/contact command generation, best-pose selection by residue interaction
- SwissADME bulk — automated SMILES submission, ADME radar scraping, property aggregation across 88+ compounds
- MDDAA-Mate — docking analysis assistant for validation & cross-checking against published results
Bioinformatics Data Mining
Developed custom scraping frameworks for PubMed literature mining, extracting structured data from thousands of abstracts and scientific web sources for meta-analysis and systematic reviews.
- PubMed querying & full-text mining — biopubmed CLI tool
- Scientific literature meta-analysis — systematic review automation
- Web scraping framework (Scrapper) — GEO, DrugBank, UniProt, ZINC, PubChem data extraction
- API integration — PubChemPy 3D SDF download, RCSB ligand fetch, KEGG KGML pathway
- miR literature curation — exosomal miR article classification & filtering
Biomedical Text Mining
Biomedical NLP for named entity recognition and keyword co-occurrence analysis from research abstracts — enabling automated literature synthesis and hypothesis generation from large corpuses.
- Biomedical NLP — keyword co-occurrence, named entity recognition, spaCy POS/lemma extraction
- Text mining — structured extraction from thousands of PubMed results
- Exosomal miR prediction — NLP + clustering for unvalidated miR identification
- Regex-based data extraction — SMILES conversion, gene ID mapping, UniProt batch queries
Machine Learning for Bioinformatics
Deep neural networks for metabolite classification using SMILES/SMARTS fingerprinting, complemented by Random Forest, SVM, and XGBoost models with rigorous evaluation (AUC, F1, MCC).
- DNN for metabolite classification — SMILES/SMARTS fingerprinting
- Random Forest / SVM / XGBoost — cancer gene expression classifiers
- Model evaluation — AUC, F1, MCC metrics
- PCA & one-hot encoding — miRNA sequence analysis & dimensionality reduction
Research Software & Tools
Authored and published computational software tools — SieveAI (drug discovery), ExoLoger (exosomal miRNA prediction), GMXvg (GROMACS visualization, Zenodo DOI), miRvim (3D miRNA structure database), and UtilityLib (Python utilities, Zenodo DOI).
- SieveAI — automated drug discovery pipeline
- ExoLoger — exosomal miRNA prediction database
- GMXvg — GROMACS visualization & plotting
- miRVim — 3D miRNA structure database
- UtilityLib — Python utility library (PyPI v2.21.4)
- TheBiomics — Drupal education platform (17K+ users)
- biopubmed — PubMed scraping & processing CLI
- Scrapper — scientific web data extraction framework
- MDDAA-Mate — docking analysis assistant & validation tool
- PyDGE — differential gene expression analysis framework
Wet Lab & Experimental Biology
Hands-on experience in cell culture, molecular biology assays, protein work, and in vitro studies across cancer biology and pharmacology.
Cell Culture & Maintenance
Maintained mammalian cell lines (MG63, MCF7, MM231, LN229) with strict aseptic technique — cryopreservation, sub-culturing, passaging, transfection optimization, and mycoplasma testing.
- Mammalian cell culture — MG63, MCF7, MDA-MB-231, LN229, A549
- Animal cell culture — aseptic technique, laminar flow hood, CO₂ incubator operation
- Cell line maintenance — sub-culturing, passaging, cell counting (hemocytometer), viability assessment
- Cryopreservation — liquid nitrogen storage, freeze-thaw recovery, DMSO cryoprotectant protocols
- Transfection optimization — lipid-based and electroporation methods
- Mycoplasma testing & contamination control
- Cancer cell to adipocyte differentiation (PPARγ agonist studies)
Protein Estimation & Assays
Protein quantification (Bradford, BCA), separation and detection via SDS-PAGE and Western blotting, and interaction studies with ELISA and co-immunoprecipitation.
- Protein estimation — Bradford & BCA assay (standard curve preparation)
- SDS-PAGE & Western blotting — Vimentin, target protein detection
- Wet blot transfer — tank transfer system for protein membrane immobilization
- Dry blot transfer — semi-dry transfer for rapid protein detection
- Gel Doc imaging — documentation & analysis of electrophoresis gels and blots
- ELISA — quantitative protein interaction analysis
- Co-immunoprecipitation — protein-protein interaction validation
- A280 protein quantification
Molecular Biology Techniques
Standard molecular biology workflow — RNA extraction with TRIzol, cDNA synthesis, qPCR/RT-PCR for gene expression, agarose gel electrophoresis, and plasmid isolation with cloning and primer design.
- RNA extraction — TRIzol method
- cDNA synthesis & gene expression analysis
- RT-PCR & qPCR — reverse transcription, quantitative expression profiling
- Agarose gel electrophoresis — nucleic acid separation & Gel Doc visualization
- Plasmid isolation, cloning & primer design
- Competent cell transformation
In Vitro Studies
Cell-based assays for drug screening — MTT/XTT viability, colony formation, wound healing migration, invasion assays, apoptosis detection (Annexin V), and drug combination synergy (CI index).
- MTT & XTT viability assays — dose-response drug screening, IC₅₀ determination
- Colony formation assay — clonogenic survival quantification
- Wound healing migration assay — scratch assay for cell motility
- Invasion assays — transwell migration & Matrigel invasion
- Apoptosis detection — Annexin V / PI staining
- Drug combination synergy — Combination Index (CI) method
- Cancer cell vs normal cell comparative studies
Instruments & Equipment
Laboratory instruments and equipment routinely operated for cell culture, molecular biology, protein analysis, and chromatography workflows.
- Gel Doc system — gel documentation, blot imaging & densitometry analysis
- CO₂ incubator — mammalian cell culture environment control
- Laminar flow hood — aseptic technique & sterile workspace
- Microplate reader — absorbance/fluorescence for MTT, Bradford, BCA, ELISA
- Centrifuge — refrigerated & bench-top, cell pelleting, fractionation
- PCR thermal cycler — RT-PCR & qPCR amplification
- Electrophoresis apparatus — vertical SDS-PAGE & horizontal agarose gel
- Wet & dry blot transfer systems — tank & semi-dry protein transfer
- HPLC system — basic operation & familiarization with analytical chromatography devices
Chromatography & HPLC
Basic chromatography techniques including column, thin-layer, and familiarization with HPLC devices for analytical separations.
- Chromatography basics — column, thin-layer & basic HPLC familiarization
- Column & thin-layer chromatography
- Standard curve & peak integration