Informacija

Postoji li neki način da se utvrdi je li gen funkcionalan za pojedinca samo s referentnom sekvencom i slijedom gena pojedinca?

Postoji li neki način da se utvrdi je li gen funkcionalan za pojedinca samo s referentnom sekvencom i slijedom gena pojedinca?


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Imam genski slijed tona pojedinaca unutar podvrste od cdna i ncbi IDed sekvence također. Postoji li ikakav način da se pomoću samo sekvence kaže da je gen za pojedinca nefunkcionalni ili samo djelomično funkcionalan?


Općenito, ne. Možda postoje specifični slučajevi u kojima možete, na primjer ako postoji stop kodon, ili mutacija pomaka okvira rano, ili je netko drugi vidio vašu točnu varijantu i eksperimentalno dokazao da ona nema funkciju.

Ali općenito, ne možete predvidjeti funkciju u silikonu sami podaci.


Moguće je predvidjeti promjenu uzrokovanu mutacijom pomoću programa poput Sift ili polyphen (samo za ljudske proteine), ali to su samo grube procjene


Ne znam tvoj nivo znanja, pa ti se ovo možda čini vrlo očitim, ali…

  • Prevođenje u šest okvira daje vam neke naznake. Ako je sekvenca degenerirana i puna je stop kodona, onda je nefunkcionalna.

  • Poravnajte sekvence s Mega ili AliView ili bilo čime, a zatim ih prevedite. Ako u vašoj sekvenci postoji mutacija pomaka okvira, ona postaje vrlo vidljiva unutar poravnanja i velika je vjerojatnost da slijed više ne funkcionira.

  • Pronađite informacije o domeni proteina u Pfam-u i pažljivo pogledajte poravnanja u funkcionalnoj domeni. Ako su inače sačuvani ostaci mutirani, to bi trebalo zazvoniti. (Budući da će sve vaše sekvence biti vrlo slične jedna drugoj, možda ćete htjeti dodati neke sekvence proteina drugih vrsta, tako da možete lakše pronaći aktivne ostatke unutar domene)


GOToolBox: funkcionalna analiza skupova genskih podataka temeljena na ontologiji gena

Razvili smo metode i alate temeljene na resursu Gene Ontology (GO) koji omogućuje identifikaciju statistički previše ili nedovoljno zastupljenih pojmova u skupu podataka gena, grupiranje funkcionalno povezanih gena unutar skupa i dohvaćanje gena koji dijele bilješke s upitom gen. GO napomene također mogu biti ograničene na tanku hijerarhiju ili danu razinu ontologije. Izvorni kodovi dostupni su na zahtjev i distribuiraju se pod GPL licencom.


Pozadina

Genetske varijacije nisu ograničene na polimorfizme pojedinačnih nukleotida ili male insercije i delecije, već se protežu i na (velike) strukturne varijacije. Te strukturne varijacije uključuju varijacije broja kopija (CNV) i varijacije prisutnosti/odsutnosti (PAV), koje mogu uzrokovati značajne varijacije sadržaja gena među pojedinačnim genomima [1, 2]. Komparativna analiza više genoma istog filogenetskog klada omogućuje identifikaciju PAV-a koji su povezani s fenotipskim osobinama. U slučaju vrsta usjeva, izvediva je identifikacija PAV-a na kojima se nalaze specifične agronomske osobine koje se javljaju samo u jednoj ili nekoliko vrsta [3,4,5]. Kako postaju dostupni više visoko susjednih genomskih sekvenci, pangenomi su prikladni za opisivanje i istraživanje raznolikosti genskog skupa biološke klade, npr. vrsta, rod ili viši [6, 7].

Smatra se da su geni pangenoma podijeljeni na jezgru i skup gena koji je neophodan, a potonji se u literaturi često naziva i 'dodatkom'. Jezgreni geni pojavljuju se u svim istraživanim genomima, dok se geni koji se ne mogu koristiti samo u jednom ili nekoliko genoma [8]. U studijama eukariotskog pangenoma, geni jezgre i geni koji se mogu koristiti uglavnom se identificiraju na temelju sličnosti sekvenci, npr. korištenjem GET_HOMOLOGUES-EST Markov klasteriranja [9], grupiranja obitelji gena OrthoMCL [10] ili BLASTN [11]. Ponekad se poziva na treću kategoriju 'uvjetno potrebnih' gena [12] ili se geni mogu klasificirati kao 'oblak', 'ljuska', 'meka jezgra' i 'jezgra' [13] ili čak kao 'jezgra', ' softcore', 'dispensable' i 'private' [14]. Međutim, ova se različita klasifikacija ne temelji na biološkoj raspoloživosti gena i oslanja se na jednu ili više proizvoljnih graničnih vrijednosti. Neke studije smatraju gene 'jezgrom' ako se ti geni pojavljuju u najmanje 90% istraživanih genoma [11] u drugim studijama, samo geni koji se nalaze u svim genomima dio su jezgre genoma [10]. Osim toga, skupine ovisnosti mogu utjecati na neophodnost određenih gena. Mora se razmotriti mogućnost da se dva gena 'zamijene' određenim brojem drugih gena. Neki geni, npr. obitelj gena, mogla bi biti potrebna u određenom omjeru i stoga su samo uvjetno potrebna [12]. Nadalje, sklopovi genoma ili transkriptoma mogu biti nepotpuni što dovodi do umjetno nestalih gena [15]. Jedan od načina da se to zaobiđe je oslanjanje samo na visokokvalitetne referentne sekvence genoma, čime se izbjegavaju dodatni sklopovi koji su potencijalni izvori pogrešaka.

Ovdje predstavljamo QUOD—bioinformatički alat za kvantificiranje nemogućnosti gena. An A. thaliana skup podataka od oko 1000 primjeraka korišten je za izračunavanje ocjene raspoloživosti po genu koji je izveden iz pokrivenosti svih gena u danim genomima. Ovaj rezultat je potvrđen usporedbom rezultata BUSCO-a i funkcionalnim ispitivanjem gena s visokom ocjenom raspoloživosti. Naš alat je jednostavan za korištenje za sve vrste biljnih vrsta. QUOD proširuje jasnu klasifikaciju gena kao 'jezgre' i 'nepotrebne' na temelju proizvoljnog praga na kontinuiranu ocjenu raspoloživosti.


Genetski poremećaji

Genetski poremećaji mogu se pojaviti iz više razloga. Genetski poremećaji se često opisuju u terminima kromosoma koji sadrži gen koji je promijenjen kod ljudi koji imaju poremećaj. Ako se gen nalazi na jednom od prva 22 para kromosoma, koji se nazivaju autosomi, genetski poremećaj naziva se autosomno stanje. Ako je gen na X kromosomu, poremećaj se naziva X-vezan.

Genetski poremećaji također su grupirani prema načinu na koji se odvijaju u obiteljima. Poremećaji mogu biti dominantni ili recesivni, ovisno o tome kako uzrokuju stanja i kako se odvijaju u obitelji.

Dominantna

Dominantne bolesti mogu biti uzrokovane samo jednom kopijom gena koji ima DNK mutaciju. Ako jedan roditelj ima bolest, svako dijete ima 50% šanse da naslijedi mutirani gen.

Recesivan

Za recesivne bolesti, obje kopije gena moraju imati DNK mutaciju kako bi se dobila jedna od ovih bolesti. Ako oba roditelja imaju jednu kopiju mutiranog gena, svako dijete ima 25% šanse da oboli od bolesti, iako je nijedan roditelj nema. U takvim slučajevima svaki roditelj se naziva nositeljem bolesti. Oni mogu prenijeti bolest na svoju djecu, ali sami nemaju bolest.


Rezultati

Detekcija aberantne ekspresije gena u više transkriptomskih fenotipova

Kvantificirali smo tri transkripcijska fenotipa za svaki gen kako bismo uhvatili širok raspon funkcionalnih učinaka uzrokovanih regulatornim genetskim varijantama. Ukratko, generirali smo da bismo identificirali izuzetke izraza (eOutliers). Z rezultati iz ispravljenih podataka o ekspresiji po tkivu kako bi se utvrdilo ima li gen u pojedincu izrazito visoku ili nisku ekspresiju (slika S1) (15, 16). Da bismo identificirali gene s prekomjernom alelnom neravnotežom [allele-specific Expression (ASE) outliers (aseOutliers)] koristili smo ANEVA-DOT (analiza varijacije ekspresije-doziranje test outlier sl. S2 i S3) (16, 17). Ova metoda koristi procjene genetske varijacije u doziranju svakog gena u populaciji kako bi identificirala gene za koje pojedinac ima heterozigotnu varijantu s neobično jakim učinkom na regulaciju gena (17). Izrazi spajanja (sOutliers) otkriveni su korištenjem SPOT-a (otkrivanje izvanrednog spoja), pristupa koji je ovdje uveden koji odgovara Dirichlet-Multinomijalnoj distribuciji izravno na broj čitanja podijeljenih na alternativno spojenim spojevima ekson-egzon za svaki gen. SPOT zatim identificira pojedince koji značajno odstupaju od očekivanja na temelju ove prilagođene distribucije (slike S4 do S6) (16). Svaka od tri metode primijenjena je na sve GTEx uzorke. Pojedinac je nazvan višetkivnim odstupnikom za dati gen ako je njegova srednja statistika odstupanja u svim izmjerenim tkivima premašila odabrani prag (slika 1A) (16). Koristeći ovaj višetkivni pristup za svaki fenotip, otkrili smo da svaki pojedinac ima medijan od četiri eOutlier, četiri aseOutlier i pet sOutlier gena.

(A) RNA-seq podaci u 838 osoba kombinirani su u 49 tkiva i korišteni za identifikaciju zajedničke ekspresije tkiva, ASE i alternativnih odstupanja od spajanja. (B) Relativni rizik od novih (ne u gnomAD), singletona, doubletona, rijetkih (MAF <1%) i niskofrekventnih (MAF 1 do 5%) varijanti u prozoru od 10 kb oko gena izvan svih tipova podataka u usporedbi s individue koje se ne izdvajaju za iste gene. Outliers su definirani kao oni s vrijednostima >3 SDs od srednje vrijednosti (|medijan Z| > 3) ili, ekvivalentno, medijan P < 0,0027. Trake predstavljaju interval pouzdanosti od 95%. (C) Dodjeljujući svakom odstupniku njegov najposljedičniji obližnji RV, relativni rizik za različite kategorije RV-ova pada unutar 10 kb svakog tipa odstupanja. Umetnuti panel prikazuje obogaćenja za podskup kategorija varijanti na log(2)-transformiranoj y-osna skala za bolju vidljivost. (D) Udio izvanrednih vrijednosti na određenom pragu koji imaju obližnji RV u danoj kategoriji. eOutlier |medijan Z rezultati| bili pretvoreni u P vrijednosti pomoću kumulativne funkcije gustoće vjerojatnosti za normalnu distribuciju. TE, prijenosni element INV, inverzija BND, prekidni kraj DEL, brisanje DUP, dupliciranje. (E) Udio RV vozila u danoj kategoriji koji dovode do odstupanja na a P-vrijednostni prag od 0,0027 za sve tipove.

Geni s aberantnom ekspresijom, ASE i spajanjem obogaćeni su za funkcionalno različite RV-ove

Primijetili smo da su višetkivni odstupnici za bilo koji od tri transkriptomska fenotipa imali značajno veću vjerojatnost da nose RV (MAF <1%) u tijelu gena ili ±10 kb nego pojedinci bez odstupanja, procijenjeno među 714 osoba s europskim podrijetlom. Ova obogaćivanja su progresivno bila izraženija za rjeđe varijante i bila su jača za strukturne varijante (SV) nego za varijante s jednim nukleotidom (SNV) i indeli (slika 1B). Ovi trendovi nisu bili oslonjeni na specifičan izbor praga koji se koristi za definiranje odstupanja (sl. S7 i S8).

Pronašli smo samo 35 slučajeva u kojima je pojedinačni gen bio višetkivni izvanredan za sva tri transkripcijska fenotipa. Svi osim jednog od njih imali su obližnji RV, a većina je bila označena kao varijante spajanja. Među genima koji su bili izvan granica za dva transkripcijska fenotipa u pojedinca (n = 465), najveće preklapanje dogodilo se između aseOutliers i eOutliers (n = 319 sl. S9A). Otkrili smo da se aseOutliers sa skromnim izrazom mijenja (1 < |median Z| < 3) pokazao je jače obogaćivanje za obližnje RV od onih bez ikakve promjene ekspresije (slika S9), naglašavajući važnu prednost kombiniranja ovih fenotipova za otkrivanje različitih RV učinaka. Otkrili smo da su geni za koje nisu identificirani izvanredni pojedinci obogaćeni za pojmove biološkog procesa genske ontologije koji se odnose na senzornu percepciju i detekciju kemijskih podražaja za sve vrste izvanrednih vrijednosti (sl. S10) (16), što je u skladu s obogaćenjima uočenim za gene koji nemaju lokuse kvantitativnih svojstava cis ekspresije (eQTL) otkrivene u GTEx-u (18).

Otkrili smo da različiti tipovi genetskih varijanti pridonose izvanrednim vrijednostima za tri molekularna fenotipa, iako su rijetke varijante donora spoja obogaćene u blizini svih vrsta izvanrednih vrijednosti (slika 1C). Najveće razlike u obogaćivanju varijantnog tipa među tri tipa outliera bile su varijacije broja kopija (CNV) i duplikacije, koje su gotovo isključivo bile povezane s eOutliersima, te varijante akceptora spajanja, koje su znatno više obogaćene unutar sOutliersa (slika S11).

Za sve fenotipove, udio outliers s obližnjim RV bilo koje kategorije povećao se s pragom strogosti (slika 1D). Za eOutliers, aseOutliers i sOutliers, na najstrožem pragu srednjeg odstupanja P < 1,1 × 10 –7, većina pojedinaca je nosila barem jedan RV u blizini gena izvana (82 do 94%). Kad smo dalje promatrali RV-ove s funkcionalnim napomenama (iz napomena navedenih na slici 1C), otkrili smo da su nedovoljno izraženi eOutlieri bili najviše interpretabilni, s 88% RV-ova povezanih s izvanrednim vrijednostima koje imaju dodatnu funkcionalnu napomenu, dok su aseOutliers imali najmanji udio na 56% (slika 1D). Ova analiza pruža daljnji uvid u očekivanja za uzročne tipove RV-a kada se u pojedincu primijeti izvanredni učinak određene veličine.

Suprotno tome, veliki udio gena s obližnjim rijetkim genetskim varijantama nije se pojavio kao izvanredni, čak ni za najpredvidljivije klase kao što su varijante gubitka funkcije. Najveći udio varijanti koje su dovele do bilo kakvog izvanrednog statusa bile su rijetke varijante donora i akceptora spoja, od kojih je samo 7,2 odnosno 6,8 % dovelo do manjeg stanja (slika 1E i slika S11). Općenito, dok su neki transkriptomski učinci možda bili propušteni, niska učestalost s kojom su RV ovih klasa doveli do velikih promjena transkriptoma pojačava korisnost uključivanja funkcionalnih podataka u varijantnu interpretaciju čak i za specifične klase varijanti koje se već koriste u kliničkoj interpretaciji.

Genomski položaj RV predviđa utjecaj na ekspresiju

Iako smo prvenstveno procijenili RV-ove koji se javljaju ili unutar gena izvana ili u 10-kb okolnom prozoru, regulacija gena može se dogoditi na većim udaljenostima (19, 20). Budući da smo primijetili najjača obogaćenja za varijante najniže frekvencije, presijecali smo singleton varijante [(SV), tj. one koje se pojavljuju samo jednom u GTEx i SNV-ovima i/ili indelima koje se ne pojavljuju u bazi podataka agregacije genoma (gnomAD) (21)] s prozorima duljine 200 kb isključujući druge prozore i uzvodno od gena izvan granica i usporedio njihovu učestalost kod izvanrednih naspram pojedinaca koji nisu. SNV obogaćivanja brzo su opala na većim udaljenostima od gena, ali su ostala slabo obogaćena za eOutliers do 200 kb. Isto je vrijedilo i za rijetke indele, s obogaćivanjem na 200 kb samo za sOutliers. SV su ostali obogaćeni na mnogo većim udaljenostima, obogaćeni su 2,33 puta do 800 kb do 1 Mb uzvodno i do 600 kb nizvodno od genskog tijela (slika 2A i slika S12A).

(A) Relativni rizik od SNV-a i indela (ne nalazi se u gnomAD-u) i SV-a (singleton u GTEx-u) na različitim udaljenostima uzvodno od gena koji se ne nalaze (isključivo za smeće) u različitim tipovima podataka. (B) Udio eOutliera s TSS RV u motivima promotora unutar 1000 bp. Ispod i iznad spremnika definirani su medijanom Z bodovni prag od 3, a kontrole su sve osobe s medijanom Z rezultat <3 za isti skup izvanrednih gena. (C) Grafički sažetak pozicijske nomenklature u odnosu na promatrana mjesta spajanja donora i akceptora. (D) Relativni rizik (y-os) sOutliera (srednja skupina LeafCutter). P < 1 × 10 -5 ) RV se nalazi na određenom položaju u odnosu na mjesto spajanja (x-os) u usporedbi s RV-ovima koji nisu izvan granica. Proračun relativnog rizika rađen je odvojeno za mjesta spajanja donora i akceptora. (E) Nezavisne matrice težine položaja koje prikazuju mutacijske spektre sOutliera (srednja skupina LeafCuttera P < 1 × 10 -5 ) RV-ovi na položajima u odnosu na mjesta spajanja s negativnim korištenjem spoja (tj. mjesta spajanja korištena su manje kod pojedinaca koji se izdvajaju nego kod onih koji nisu izvan granica). (F) Upotreba spoja na mjestu spajanja prirodni je zapisnik udjela očitavanja u grupi LeafCutter mapiranja na mjesto spajanja od interesa u SOutlieru (srednji LeafCutter klaster P < 1 × 10 -5 ) uzoraka u odnosu na frakciju u uzorcima koji nisu izdvojeni agregirani po tkivima uzimanjem medijana (16). Upotreba spoja (y-os) najbližih mjesta spajanja RV-ima koja leže unutar polipirimidinskog trakta (A – 5, A – 35) vezanih prema vrsti varijante (x-os).

RV-ovi u promotorskim regijama su prethodno bili povezani s ekspresijom izvana (5, 15). Kako bismo proširili ova opažanja i procijenili vrste mjesta vezanja faktora transkripcije (TF) koja bi mogla dovesti do izvanrednih vrijednosti, testirali smo obogaćivanje proksimalnih varijanti rijetkih početnih mjesta transkripcije (TSS) u specifičnim TF motivima u blizini pod- i prekomjernih eOutliers. Za manje eOutliere vidjeli smo obogaćivanje varijanti u GABP, TF koji aktivira gene koji kontroliraju stanični ciklus, diferencijaciju i druge kritične funkcije (22). Za pretjerano eOutliers, vidjeli smo obogaćivanje RV-ova koji presijecaju E2F4 motiv, TF koji je prijavljen kao transkripcijski represor (23). I u nedostatku iu prekomjernom eOutlieru vidjeli smo RV vozila YY1, koji može djelovati kao aktivator ili represor, ovisno o kontekstu (24), i bio je povezan s GABP u koregulacijskim mrežama (sl. 2B i sl. S12B) (25). Stoga, ove prirodne perturbacije RV-a mogu pružiti informacije o tome kako specifični TF-ovi mogu snažno regulirati svoje ciljne gene prema gore ili prema dolje.

RV mogu utjecati na više gena i dovesti do nove fuzije gena

Primijetili smo da RV-ovi također mogu utjecati na više gena kod pojedinca. Pronašli smo snažno obogaćivanje multigenskih učinaka među eOutliersima i, u manjoj mjeri, aseOutliersima (slika S13). Očekivano, nismo vidjeli obogaćivanje za obližnje parove Outlier, koji su manje podložni koregulaciji (26). Unutar prozora od 100 kb, susjedni eOutlier geni bili su 70 puta češći nego što bi se slučajno očekivalo da se nasumično crtaju izvanredni parovi. Oni su također bili značajno obogaćeni za rijetke CNV, duplikacije i TSS varijante u blizini jednog ili oba gena u usporedbi s pojedincima koji su imali izvanrednu ekspresiju, ali samo za jedan od gena (slika S13). Također smo otkrili da su rijetka obogaćivanja SV bila prisutna u blizini eOutliera bez obzira na to je li SV preklapao sam gen (slika S14). Promatrali smo 27 primjera rijetkih SV-ova, uključujući delecije, duplikacije i prekide, povezanih s eOutliersima u najmanje dva gena u istoj osobi (sl. S15 i tablica S1). Za jedan od njih, uočili smo dokaze fuzijskog transkripta koji je rezultat brisanja koja obuhvaća kraj gena SPTBN1 i TSS EML6. Ovo brisanje dovelo je do nedovoljne ekspresije SPTBN1 (medijan Z rezultat = –4,67) i prekomjerna ekspresija EML6 (medijan Z rezultat = 8,12) u usporedbi sa svim ostalim osobama. Potvrđujući prisutnost novog fuzijskog transkripta zametne linije, pronašli smo dokaze specifičnog transkripta koji obuhvaća SPTBN1 i EML6 u više tkiva za pojedinca s delecijom (sl. S16). Za oba ova gena, ovaj pojedinac je također pokazao slabiji signal (medijan SPOT P = 0,0005 za EML6 i 0,0035 za SPTBN1). Identifikacija fuzijskih transkripata bila je od posebnog interesa u dijagnozi i prognozi raka (2730), a i EML geni i SPTBN1 su prethodno bili uključeni u fuzije povezane s rakom (31, 32).

RV-ovi u spajanju konsenzus sekvence pogona splicing outliers

Prethodne studije su pokazale da RV-ovi ometaju mjesta spajanja rezultiraju izvanrednim alternativnim obrascima spajanja (33, 34). Koristili smo pozive Soutlier za svaki klaster LeafCutter (16, 35) za precizniju procjenu obogaćivanja varijanti povezanih s spajanjem. Uočili smo ekstremno obogaćivanje RV-ova u blizini mjesta spajanja u Soutliersima. SOutlier je bio 333 puta vjerojatniji od neoutliera da utočište RV unutar 2-bp prozora oko mjesta spajanja (sl. S17A) (16), sa signalom koji opada na većim udaljenostima, ali je i dalje obogaćen do 100 bp (relativni rizik = 7,43). Da bismo dobili obogaćivanje razlučivosti baznih parova, izračunali smo relativni rizik od sOutlier RV-ova koji se nalaze na određenim pozicijama u odnosu na promatrana mjesta spajanja donatora i akceptora (16). Deset pozicija u blizini mjesta spajanja pokazalo je značajno obogaćivanje RV-ova u sOutliersima u usporedbi s kontrolama (slika 2, C i D). Ovi položaji su točno odgovarali položajima za koje se također pokazalo da su netolerantni na mutacije zbog njihove očuvane uloge u spajanju (ove ćemo položaje nazivati ​​konsenzusnim slijedom spajanja) (34). Među najbogatijim pozicijama unutar konsenzusnog niza spajanja bila su četiri bitna položaja mjesta spajanja (D + 1, D + 2, A – 2, A – 1) (36), što je pokazalo prosječni relativni rizik od 195.

SOutliers je dalje zabilježio transkripcijske posljedice i za varijante koje su poremetile referentni slijed konsenzusa spajanja i one koje su stvorile novi slijed konsenzusa spajanja. Pojedinci s izvanrednijim varijantama u kojima je rijedak alel odstupio od konsenzusnog slijeda spajanja pokazali su smanjenu upotrebu spoja mjesta spajanja u blizini varijante, dok su pojedinci s varijantama u kojima je rijedak alel stvorio konsenzusnu sekvencu spajanja pokazali povećanu upotrebu spoja na mjestu spajanja u blizini varijante u odnosu na neisturene (slika 2E i slike S17B i S18) (16). Vidjeli smo povezani obrazac obogaćivanja nakon odvajanja označenih i novih (neoznačenih) mjesta spajanja (slika S19). Soutliers su također obogaćeni za RV pozicionirane unutar polipirimidinskog trakta (PPT), visoko očuvane regije bogate pirimidinima,

5 do 35 bp uzvodno od mjesta spajanja akceptora (37). RV je bila 6,25 puta veća vjerojatnost da će se nalaziti u PPT-u u blizini sOutlier-a u odnosu na ne-izlazni. Izrazi s RV koji su promijenili položaj u PPT-u iz pirimidina u purin (tj. poremetili postojeći PPT) pokazali su smanjenu upotrebu spoja na mjestu spajanja u blizini varijante, dok je obrnuto vrijedilo za varijante koje su promijenile položaj u PPT od purina do pirimidina (slika 2F i slika S20).

RV-ovi u tkivno-specifičnim regulatornim regijama mogu dovesti do tkivno-specifične ekspresije

Premda višetkivni odstupnici nude poboljšanu snagu za otkrivanje RV učinaka, također smo procijenili RV iz odstupanja otkrivenih u pojedinačnim tkivima. Mjerenja jednog tkiva podložna su većim varijacijama od ponovljenih mjerenja u tkivima, ali su reprezentativna za većinu eksperimentalnih dizajna. Prvo smo izvršili analizu replikacije kod svih pojedinaca s dostupnim podacima za tri metode kako bismo procijenili stupanj do kojeg je status izvana otkriven u jednom tkivu pojedinca repliciran u drugim tkivima (16). U prosjeku smo otkrili da je status eOutlier, aseOutlier i sOutlier u otkrivenom tkivu otkriven u testnom tkivu 5.1, 10.7, odnosno 8.7% vremena (slika 3A i slika S21). To je u skladu s drugim nalazima da su mjerenja ASE konzistentnija u različitim tkivima (18). Uzimajući u obzir klinički dostupna tkiva, naime punu krv, fibroblaste i limfoblastoidne stanice, ako uzmemo u obzir izuzetne vrijednosti uočene za gen u najmanje dva od tih tkiva kod iste osobe, vidjeli smo prosječne stope replikacije u svim ostalim tkivima od 14,1, 20,9 i 15,0% za eOutliers, aseOutliers i sOutliers, redom (sl. S22). I viša stopa replikacije za aseOutliers i povećanje replikacije outliera u nedostupnim tkivima kada se uzme u obzir više od jednog pristupačnog mjerenja su informativni za analizu funkcionalnih podataka iz lako dostupnih tkiva kako bi se razumjela stanja bolesti koja su najrelevantnija za druga tkiva.

(A) Srednja replikacija odstupanja identificiranih po tkivu u svakom drugom tkivu za svaki tip odstupanja. (B) Procjena relativne točke rizika za obližnje rijetke SNV-e za vanjske vrijednosti u svim tkivima pojedinačno. (C) Relativni rizik obogaćivanja za vjerojatne poremećaje gena RV-a u blizini pojedinačnih ispada tkiva na pragu od |Z| > 4 (ekvivalentno SPOT ili ANEVA-DOT P < 0,000063), s jednom točkom po tkivu. (D) Distribucija broja tkiva s aberantnom ekspresijom na temelju ekspresijskih odstupanja definiranih medijanom Z rezultat (eOutliers) ili Mahalanobisova udaljenost P vrijednost (korelacija). (E) Relativni rizik od korelacijskih odstupanja koje pokreće jedno tkivo, definiran kao značajna odstupanja korelacije za koje se uočena promjena u stupnju izraženom bojom točke u samo jednom tkivu (16) noseći RV u pojačivačima koji su označeni tom tkivu unutar 500-kb prozora izvan gena. Neusporedive su definirane kao sve tkivno specifične pojačivače regije bez obzira na tkivo izvana.

Sljedeće smo procijenili sposobnost odstupanja od pojedinačnih tkiva iz svake metode da daju prioritet RV-ovima u blizini gena izvana. AseOutliers jednog tkiva najviše su obogaćeni za obližnje RV-ove, zatim sOutliers, a zatim eOutliers, preko svih pragova graničnih vrijednosti outliera (slika 3B i sl. S21 i S23A). Također smo primijetili obogaćivanje varijanti koje vjerojatno izazivaju besmisleno posredovano propadanje među eOutliersima jednog tkiva, aseOutliersima i sOutliersima (slika 3C i sl. S23B). Dodatno, otkrili smo da su sOutliers jednog tkiva i dalje pokazali snažno obogaćivanje RV u konsenzusnom slijedu spajanja i PPT (slika S24).

Osim rijetkih SV-a koji su posebno obogaćeni na usporedivim pragovima s višetkivnim eOutlierima, jednotkivni eOutlieri pokazuju daleko slabija obogaćenja u odnosu na višetkivne izvanredne vrijednosti za obližnje rijetke SNV-ove i indele na svim pragovima (sl. S25). Kako bismo poboljšali otkrivanje odstupanja specifičnih za tkivo, iskoristili smo širinu dostupnih podataka o tkivu i koristili opažene uzorke korelacije između tkiva kako bismo otkrili izuzetke koji odstupaju od očekivane kovarijance ekspresije u podskupu tkiva (16). Sličan pristup je implementiran za identifikaciju funkcionalnih RV-ova na temelju korelacije ekspresije među genima u jednom tkivu (5). Otkrili smo da su odstupanja identificirana ovim pristupom često vođena promjenama ekspresije u jednom ili nekoliko tkiva u usporedbi s višetkivnim eOutliersima na temelju medijana Z bodovi (slika 3D). Korelacijski tkivno-specifični outliers također su obogaćeni za obližnje RV-ove u prozoru od 10 kb oko gena (slika S26C). Međutim, ovi su izvanredni efekti također obogaćeni za RV u pojačivačima koji su bili aktivni u tkivu(ima) koji pokreću učinak odstupanja (tablica S2), kako je određeno pojedinačnim tkivom Z rezultat i unutar prozora od 500 kb oko gena (slika 3E). Značajno je da su ti tkivno-specifični odstupnici bili iscrpljeni zbog rijetkih varijacija pojačivača označenih u drugim, neusporedivim tkivima.

Određivanje prioriteta RV-ova integracijom genomskih napomena s različitim osobnim transkriptomskim signalima

Kako bismo uključili različite signale transkriptoma u metodu za određivanje prioriteta RV-ova, razvili smo Watershed, nenadzirani vjerojatnosni grafički model koji integrira informacije iz genomskih napomena osobnog genoma (tablica S3) s višestrukim signalima iz podudarnog osobnog transkriptoma. Watershed pruža rezultate koji se mogu koristiti za interpretaciju osobnog genoma ili za katalogizaciju potencijalno utjecajnih rijetkih alela, kvantificirajući posteriornu vjerojatnost da varijanta ima funkcionalni učinak na svaki transkriptomski fenotip na temelju sekvenciranja cijelog genoma (WGS) i RNA-sekvenciranja ( RNA-seq) signala (slika 4A). Model vododjela može se prilagoditi bilo kojoj dostupnoj zbirci molekularnih fenotipova, uključujući različite analize, različita tkiva ili različite izvedene signale. Nadalje, Watershed automatski uči težine rubova Markovljevog slučajnog polja (MRF) koji odražavaju snagu odnosa između različitih uključenih tkiva ili fenotipova koji zajedno omogućuju modelu da točno predvidi funkcionalne učinke.

(A) Grafička sumirajuća oznaka ploče za model Watershed kada se primjenjuje na tri medijana izlaznih signala (izraz, ASE i spajanje). (B) Simetrična toplinska karta koja prikazuje naučene parametre ruba (težine) između parova izvanrednih signala nakon treninga Watershed na tri medijana izvanrednih signala. (C) Udio RV-ova s ​​posteriornom vjerojatnošću Watershed >0,9 (desno) i s GAM vjerojatnošću većom od praga postavljenog tako da odgovara broju varijanti Watershed za svaki izvanredni signal (lijevo) koji dovode do odstupanja na medijani P-vrijednost praga od 0,0027 za tri izvanredna signala (boje). Modeli vododjelnice i GAM evaluirani su na parovima pojedinaca koji su se izdržali. (D) Krivulje preciznog prisjećanja koje uspoređuju performanse Watershed, RIVER i GAM (boje) korištenjem parova pojedinaca koji su se izdržali za tri srednja izlazna signala. (E) Simetrična toplinska karta koja prikazuje naučene parametre ruba tkiva-Watershed (težine) između parova signala izvan tkiva nakon treninga tkiva-Watershed na eOutliers preko pojedinačnih tkiva. Preslikavanje boja tkiva i imena tkiva može se naći na sl. S21D. (F) Područje ispod krivulja preciznog opoziva [AUC(PR) y-os] u jednom tkivu između tkiva-GAM, tkiva-RIVER i tkiva-Watershed (x-axis) kada se primjenjuje na granične vrijednosti na pojedinačnim tkivima u sva tri izvanjska signala (boje). Precizne krivulje opoziva u svakom tkivu generirane su korištenjem parova pojedinaca koji su se izdržali.

Prvo smo primijenili Watershed na GTEx v8 podatke koristeći tri izvanjska signala koja su ovdje ispitana, ekspresiju, ASE i spajanje (slika 4A) (16), za koje je svaki najprije agregiran uzimanjem medijana po tkivima za odgovarajućeg pojedinca. U skladu s postojećim dokazima o sličnosti između izlaznih signala (slika S9), naučeni parametri ruba vododjelnice bili su najjači između ASE i ekspresije, praćeni ASE i spajanjem, ali striktno pozitivni za sve parove izvanrednih signala (tj. svaki izvanredni signal je bio informativna o svim ostalim signalima slika 4B). Da bismo procijenili naš model, koristili smo parove pojedinaca koji su dijelili isti RV, dajući predviđanja vododjela za prvu osobu i procjenjujući ta predviđanja koristeći status izvana druge osobe kao oznaku (15, 16). Razvodnica nadmašuje metode temeljene na samoj sekvenci genoma [naš model genomske anotacije (GAM) i kombinirano iscrpljivanje ovisnih o anotaciji (CADD) Slika 4C i sl. S27] (38, 39). Također smo usporedili performanse Watershed s RIVER [RNA-informirani varijantni učinak na regulaciju (15)], pojednostavljenje modela vododjelnice u kojem se svaki izlazni signal tretira neovisno. Otkrili smo da je eksplicitno modeliranje odnosa između različitih molekularnih fenotipova omogućilo povećanje performansi za Watershed (slika 4D, slike S28 i S29, i tablica S4) (16). Primijetili smo da su čak i najpredvidljivije genomske napomene rezultirale samo eOutliersima, aseOutliersima i sOutliersima u 2,8, 7,9 i 14,3% vremena, redom (slike 1E i 4C). Međutim, integriranjem transkriptomskih signala s genomskim napomenama iz Watersheda (na stražnjem pragu od 0,9) otkriveni su SNV-ovi koji su rezultirali eOutliers, aseOutliers i sOutliers s većom frekvencijom 11.1, 33.3, odnosno 71.4% vremena (slika 4C). . S30).

Nadalje smo proširili okvir Watershed kako bismo dali prioritet varijantama na temelju njihovog predviđenog utjecaja specifičnog za tkivo. Trenirali smo tri modela "tkiva-Watershed" (po jedan za svaki ekspresiju, ASE i spajanje zasebno), u kojima svaki model razmatra učinke u svim tkivima zajedno, dijeleći informacije u MRF-u, i na kraju daje 49 tkivno specifičnih rezultata za svaki RV (sl. S29 i S31) (16). Uočili smo da su parametri naučeni za svaki od tri modela tkiva-Watersheda nalikovali poznatim obrascima sličnosti tkiva (slika 4E i slika S32) (18). Nadalje, korištenjem pojedinaca koji su se izdržali, model tkiva-Watershed nadmašio je model RIVER u kojem se svako tkivo tretira potpuno neovisno (P = 2.00 × 10 −5 , P = 2,00 × 10 −5 i P = 5.90 × 10 −3 for expression, ASE, and splicing, respectively one-sided binomial test Fig. 4F and figs. S33 and S34) and a collapsed RIVER model trained with single median outlier statistics (P = 0.0577, P = 0.251, and P = 0.00128 for expression, ASE, and spicing, respectively one-sided binomial test figs. S35 and 36). Critically, integrative models that incorporated transcriptomic signal and genomic annotations from a single tissue still outperformed methods based only on genome sequence annotations (Fig. 4F), supporting the benefit of collecting even a single RNA-seq sample to improve personal genome interpretation.

Replication and experimental validation of predicted RV transcriptome effects

We first assessed the replication of “candidate causal RVs” previously identified by the SardiNIA Project (6), using GTEx Watershed prioritization. Of five SardiNIA candidate causal RVs that were also present in a GTEx individual, four had high (>0.7) GTEx Watershed expression posterior probabilities (table S5). Next, we tested replication of GTEx RVs, prioritized by Watershed, in an independent cohort evaluating 97 whole-genome and matched transcriptome samples from the Amish Study of Major Affective Disorders (ASMAD) (40). We evaluated GTEx RVs also present in this cohort at any frequency, quantifying eOutlier, aseOutlier, and sOutlier signal in each ASMAD individual harboring one of the GTEx variants (16). For all three phenotypes, ASMAD individuals with variants having high (>0.8) Watershed posterior probability based on GTEx data had significantly more extreme outlier signals at nearby genes compared with individuals with variants having low (<0.01) GTEx Watershed posterior probability (expression: P = 2.729 × 10 −6 , ASE: P = 2.86 × 10 −3 , and splicing: P = 5.86 × 10 −13 Wilcoxon rank-sum test fig. S37). Every variant with a high GTEx Watershed splicing posterior probability (>0.8) resulted in an sOutlier (P ≤ 0.01) in the ASMAD cohort. Furthermore, ASMAD individuals with variants having high (>0.8) GTEx Watershed posterior probability had significantly larger outlier signals relative to equal size sets of variants prioritized by GAM (expression: P = 0.00129, ASE: P = 0.0287, and splicing: P = 0.00058 Wilcoxon rank-sum test fig. S37). Overall, RVs prioritized by Watershed using GTEx data displayed evidence of functional effects in ASMAD individuals.

We further applied both a massively parallel reporter assay (MPRA) and a CRISPR-Cas9 assay to assess the impact of Watershed-prioritized RVs. We experimentally tested the regulatory effects of 52 variants with moderate Watershed expression posterior (≥0.5) and 98 variants with low Watershed expression posterior (<0.5) using MPRA (16). We observed increased effect sizes for RVs with high Watershed expression posterior relative to variants with low expression posterior (P = 0.025 one-sided Wilcoxon rank-sum test fig. S38 and table S6). Next, we assessed the functional effects of 20 variants by editing them into inducible-Cas9 293T cell lines. These included 14 rare stop-gained variants and six non-eQTL common variants as negative controls. Of the 14 rare stop-gained variants, 13 had expression or ASE Watershed posterior >0.8, with the remaining variant [previously tested in (41)] having a posterior of 0.22. All control variants had Watershed posteriors <0.03. Of the 13 variants with a Watershed posterior >0.8, 12 showed a significant decrease in expression of the rare allele (P < 0.05, Bonferroni corrected fig. S39 and table S7) (16).

Aberrant expression informs RV trait associations

We found that each individual had a median of three eOutliers, aseOutliers, and sOutliers (median outlier P < 0.0027) with a nearby RV. When filtering by moderate Watershed posterior probability (>0.5) of affecting expression, ASE, or splicing, individuals had a median of 17 genes with RVs predicted to affect expression, 27 predicted to affect ASE, and nine predicted to affect splicing (Fig. 5A). From the set of outlier calls, we found multiple instances of RVs influencing well-known and well-studied genes, including APOE i FAAH (table S8). In particular, for APOE, which has been associated with numerous neurological diseases and psychiatric disorders (42), we found two aseOutlier individuals both carrying a rare, missense variant, rs563571689, with ASE Watershed posteriors >0.95, not previously reported. Za FAAH, which has been linked to pain sensitivity in numerous contexts (43, 44), we found two eOutlier individuals with a rare 5′ untranslated region variant, rs200388505, with ASE and expression Watershed posteriors >0.9.

(A) Distribution of the number of outlier genes, outlier genes with a nearby RV, and genes with a high Watershed posterior variant per data type. We added one to all values so that individuals with 0 are included. (B) Distribution of effect sizes, transformed to a percentile, for the set of GTEx RVs that appear in UKBB and are not outlier variants, those that are outlier variants, and those outlier variants that fall in colocalizing genes for the matched trait across 34 traits. Percentiles were calculated on the set of rare GTEx variants that overlap UKBB. The set of genes was restricted to those with at least one outlier individual in any data type and a nearby variant included in the test set (4787 variants and 1323 genes). P values were calculated from a one-sided Wilcoxon rank-sum test. (C) Proportion of variants filtered by Watershed posterior that fell in the top 25% of effect sizes for a colocalized trait (red) and the proportion of randomly selected variants of an equal number that also fall in these regions over 1000 iterations (black). (D) Manhattan plot (top) across chromosome 9 for asthma in the UKBB, filtered for non–low-confidence variants, with two high-Watershed variants, rs149045797 and rs146597587, shown in pink and the lead colocalized variant, rs3939286, shown in blue. The variants’ effect size ranks were similarly high for both self-reported and diagnosed asthma, but the summary statistics are shown for asthma diagnosis here. The UKBB MAF versus absolute value of the effect size for all variants within 10 kb of the Watershed variant is also shown (bottom). (E) Manhattan plot across chromosome 22 for self-reported high cholesterol in the UKBB, filtered to remove low confidence variants, with the high-Watershed variant rs564796245 shown in pink. The UKBB MAF versus absolute value of the effect size for all variants within 10 kb of the Watershed variant is also shown (bottom).

To assess whether identified rare functional variants from GTEx associate with traits, we intersected this set with variants present in the UKBB (12). We focused on a subset of 34 traits for which GWAS association for a UKBB trait had evidence of colocalizations with eQTLs and/or alternative splicing QTLs (sQTLs) in any tissue (table S9) (16, 45). GTEx has demonstrated that genes with RV associations for a trait are strongly enriched for their eQTLs colocalizing with GWAS signals for the same trait (18), indicating that QTL evidence can be used to guide RV analysis. Furthermore, RVs near GTEx outliers had larger trait association effect sizes than background RVs near the same set of genes in the UKBB data (P = 3.51 × 10 −9 one-sided Wilcoxon rank-sum test), with a shift in median effect size percentile from 46 to 53%. Notably, outlier variants that fell in or nearby genes with an eQTL or sQTL colocalization had even larger effect sizes (median effect size percentile 88%) than nonoutlier variants (P = 1.93 × 10 −5 one-sided Wilcoxon rank-sum test) or outlier variants falling near any gene not matched to a colocalizing trait (P = 4.88 × 10 −5 one-sided Wilcoxon rank-sum test Fig. 5B).

Although most variants tested in UKBB had low Watershed posterior probabilities of affecting the transcriptome (fig. S40A), we hypothesized that filtering for those variants that do have high posteriors would yield variants in the upper end of the effect size distribution for a given trait. For each variant tested in UKBB, we took the maximum Watershed posterior per variant and compared this with a genomic annotation-defined metric, CADD (38, 39). We found that Watershed posteriors were a better predictor of variant effect size than CADD scores for the same set of RVs in a linear model (Table 1). Across different Watershed posterior thresholds, we found that the proportion of variants falling in the top 25% of RV effect sizes in colocalized regions exceeded the proportion expected by chance (Fig. 5C). Whereas filtering by CADD score did return some high effect size variants, this proportion declined at the highest thresholds (fig. S40D). Furthermore, there was very little overlap between variants with high Watershed posteriors and high CADD variants (fig. S40D), with CADD variants more likely to occur in coding regions and Watershed variants more frequent in noncoding regions (fig. S40D). Thus, the approaches largely identified distinct and complementary sets of variants for these traits.

Shown are the coefficient estimates and 95% confidence intervals from separate linear models with variant effect size percentile as the response and CADD score or Watershed posterior (scaled to have a mean of 0 and an SD of 1 so that values are of comparable range) as the predictor for all tested variants in colocalized regions (n = 5277).

We identified 33 rare GTEx variant trait combinations in which the variant had a Watershed posterior >0.5 and fell in the top 25% of variants by effect size for the given trait (table S10). We highlight two such examples, for asthma and high cholesterol (Fig. 5, D and E), showing that although RVs usually do not have the frequency to obtain genome-wide significant P values, when they are prioritized by the probability of affecting expression, we could identify those with greater estimated effect sizes on the trait (table S11). In the case of asthma, the RV effect sizes in UKBB were three times greater than the lead colocalized variant. These variants included rs146597587, which is a high-confidence loss-of-function splice acceptor with an overall gnomAD AF of 0.0019, and rs149045797, an intronic variant with a frequency of 0.0019, both of which were associated with the gene IL33, the expression of which has been implicated in asthma (46, 47). Previous work has identified the protective association between rs146597587 and asthma (48, 49), and we found that this is potentially mediated by outlier allelic expression of IL33 leading to moderate decreases in total expression, with median Z scores ranging from –1.08 to –1.77 in individuals with the variant, and median single-tissue Z scores across the six individuals exceeding –2 in 10 tissues. An asthma association had also been reported recently for the other high Watershed asthma-associated variant rs149045797 and was in perfect linkage disequilibrium with rs146597587 (50). An additional high Watershed variant, rs564796245, an intronic variant in TTC38 with a gnomAD AF of 0.0003, had a high effect size for self-reported high cholesterol in the UKBB but was not previously reported. We were able to test this variant against four related blood lipids traits in the MVP (51). We found that for these traits, which included high-density lipoprotein (HDL), low-density lipoprotein, total cholesterol, and triglycerides, among rare (gnomAD AF <0.1%) variants within a 250-kb window of rs564796245, this variant was in the top 5% of variants by effect size for HDL specifically, it was in the top 1% (fig. S41). We also assessed this variant’s association with the same four traits in the JHS (14), an African American cohort in which four individuals carried the RV. Here, we found that the direction of effect was consistent with MVP and UKBB for all four traits (tables S11 and S12), and the variant fell in the top 28th to 38th percentile of all rare (gnomAD AF <0.1%) variants in this region (fig. S42). Only four of the variants tested in UKBB had Watershed posterior probabilities >0.9 for colocalized genes, but of those, three showed high effect sizes for a relevant trait (table S10).


What do the new ‘gay genes’ tell us about sexual orientation?

Didn’t we already know there were “gay genes”?
We have known for decades that sexual orientation is partly heritable in men, thanks to studies of families in which some people are straight and some people are gay. In 1993, genetic variations in a region on the X chromosome in men were linked to whether they were heterosexual or homosexual, and in 1995, a region on chromosome 8 was identified. Both findings were confirmed in a study of gay and straight brothers in 2014. However, these studies didn’t home in on any specific genes on this chromosome.

What’s new about the latest study?
For the first time, individual genes have been identified that may influence how sexual orientation develops in boys and men, both in the womb and during life. Alan Sanders at North Shore University, Illinois, and his team pinpointed these genes by comparing DNA from 1077 gay and 1231 straight men. They scanned the men’s entire genomes, looking for single-letter differences in their DNA sequences. This enabled them to home in on two genes whose variants seem to be linked to sexual orientation.

What genes did they find and what do they do?
One of the genes, which sits on chromosome 13, is active in a part of the brain called the diencephalon. Interestingly, this brain region contains the hypothalamus, which was identified in 1991 as differing in size between gay and straight men. This was discovered by neuroscientist Simon LeVay, who says he is excited that the gene discovery seems to fit with what he found.

Oglas

Other research has found that this gene, called SLITRK6, is active in the hypothalamus of male mice fetuses a few days before they are born. “This is thought to be a crucial time for sexual differentiation in this part of the brain,” says LeVay. “So this particular finding is a potential link between the neuroanatomy and molecular genetics of sexual orientation.

What is the other gene?
This gene is found on chromosome 14 and is mainly active in the thyroid, but also the brain. Zvao TSHR, it makes a type of receptor protein that recognises and binds to a hormone that stimulates the thyroid. In this way, the gene plays an important role in controlling thyroid function.

Činjenica da se TSHR seems to be involved in sexual orientation fits with evidence that thyroid function seems to be linked to sexuality. Na primjer, TSHR function is disrupted in a genetic condition called Grave’s disease, which causes the thyroid gland to become over-active, accelerating metabolism and leading to weight-loss. Grave’s disease is more common in gay than straight men, and some research suggests that gay men tend to be thinner – which might possibly be a result of thyroid overdrive.

Are all men who have the “gay” variants of these genes gay?
No, says Sanders, because many other factors play a role, including the environment. “There are probably multiple genes involved, each with a fairly low effect,” he says. “There will be men who have the form of gene that increases the chance of being gay, but they won’t be gay.”

Because many genes and other factors seem likely to play a role in sexual orientation, this may explain why some people are bisexual or see sexual orientation as a spectrum.

What about women who are gay? Are there “lesbian genes”?
Our biological understanding of homosexuality in women lags behind. Some researchers say this is partly because women who have sex with women tend to be more fluid in their sexual orientation.

There have been studies suggesting that there is a genetic element to homosexuality in women, but more research has been done in men, says Sanders.

Why should we care about the genetics of being gay?
The latest findings open the prospect to identifying the whole pathway of genes involved in both homosexual and heterosexual orientation, says Dean Hamer at the US National Institutes of Health, who led the study that pinpointed chromosome X back in 1993. “It adds yet more evidence that sexual orientation is not a ‘lifestyle choice’. But the real significance is that it takes us one step closer to understanding the origins of one of the most fascinating and important features of human beings.”

Journal reference: Nature Scientific Reports, DOI: 10.1038/s41598-017-15736-4


Reference

Edwards Stacey L, Beesley J, French Juliet D, Dunning Alison M. Beyond GWASs: illuminating the dark road from association to function. Am J Hum Genet. 201393(5):779–97.

Cavalli M, Pan G, Nord H, et al. Allele-specific transcription factor binding to common and rare variants associated with disease and gene expression. Hum Genet. 2016135:485–97.

MacArthur J, Bowler E, Cerezo M, et al. The new NHGRI-EBI catalog of published genome-wide association studies (GWAS catalog). Nukleinske kiseline Res. 201745(D1):D896–901.

The ENCODE Project C. An integrated encyclopedia of DNA elements in the human genome. Priroda. 2012489(7414):57–74.

Maurano MT, Humbert R, Rynes E, et al. Systematic localization of common disease-associated variation in regulatory DNA. Znanost. 2012337(6099):1190–5.

Younesy H, Möller T, Heravi-Moussavi A, et al. ALEA: a toolbox for allele-specific epigenomics analysis. Bioinformatika. 201430(8):1172–4.

Yang XD, Xiang DX, Yang YY. Role of E3 ubiquitin ligases in insulin resistance. Diabetes Obes Metab. 201618(8):747–54.

Marfella R, D’Amico M, Di Filippo C, et al. The possible role of the ubiquitin proteasome system in the development of atherosclerosis in diabetes. Cardiovasc Diabetol. 20076:35.

Magee N, Zhang Y. Role of early growth response 1 in liver metabolism and liver cancer. Hepatoma Res. 20173(11):268.

Gokey NG, Lopez-Anido C, Gillian-Daniel AL, Svaren J. Early growth response 1 (Egr1) regulates cholesterol biosynthetic gene expression. J Biol Chem. 2011286(34):29501–10.

Shen N, Yu X, Pan F-Y, Gao X, Xue B, Li C-J. An early response transcription factor, Egr-1, enhances insulin resistance in type 2 diabetes with chronic hyperinsulinism. J Biol Chem. 2011286(16):14508–15.

Wang F, Kuang Y, Salem N, Anderson PW, Lee Z. Cross-species hybridization of woodchuck hepatitis viral infection-induced woodchuck hepatocellular carcinoma using human, rat and mouse oligonucleotide microarrays. J Gastroenterol Hepatol. 200924(4):605–17.

Pollak NM, Hoffman M, Goldberg IJ, Drosatos K. Krüppel-like factors: crippling and uncrippling metabolic pathways. JACC. 20183(1):132–56.

Kumadaki S, Karasawa T, Matsuzaka T, et al. Inhibition of ubiquitin ligase F-box and WD repeat domain-containing 7α (Fbw7α) causes hepatosteatosis through Krüppel-like factor 5 (KLF5)/peroxisome proliferator-activated receptor γ2 (PPARγ2) pathway but not SREBP-1c protein in mice. J Biol Chem. 2011286(47):40835–46.

Bernon C, Carré Y, Kuokkanen E, et al. Overexpression of Man2C1 leads to protein underglycosylation and upregulation of endoplasmic reticulum-associated degradation pathway. Glycobiology. 201121(3):363–75.

Andersson R, Gebhard C, Miguel-Escalada I, et al. An atlas of active enhancers across human cell types and tissues. Priroda. 2014507:455.

Hynds RE, Vladimirou E, Janes SM. The secret lives of cancer cell lines. Dis Model Mech. 201811(11):dmm037366.

Brodt P. Role of the microenvironment in liver metastasis: from pre- to prometastatic niches. Clin Cancer Res. 201622(24):5971.

McGranahan N, Rosenthal R, Hiley CT, et al. Allele-specific HLA loss and immune escape in lung cancer evolution. stanica. 2017171(6):1259–71 e11.

Kassel R, Cruise MW, Iezzoni JC, Taylor NA, Pruett TL, Hahn YS. Chronically inflamed livers up-regulate expression of inhibitory B7 family members. Hepatology. 200950(5):1625–37.

Amiot L, Vu N, Samson M. Biology of the immunomodulatory molecule HLA-G in human liver diseases. J Hepatol. 201562(6):1430–7.

Kundaje A. A comprehensive collection of signal artifact blacklist regions in the human genome. 2013. ENCODE [hg19-blacklist-READMEdoc - EBI] Available online at: https://sites.google.com/site/anshulkundaje/projects/blacklists.

Boyle AP, Hong EL, Hariharan M, et al. Annotation of functional variation in personal genomes using RegulomeDB. Genom Res. 201222(9):1790–7.

Ernst J, Kellis M. ChromHMM: automating chromatin state discovery and characterization. Nat metode. 20129(3):215–6.


The Gene for Big Brains

Scientists, led by Max Planck Institute’s Wieland Huttner, have identified a gene that triggers a human embryo to grow the vast supply of brain cells that largely forms the foundation for our braininess.1 The same gene is found in modern humans, Neanderthals, and Denisovans. Called ARHGAP11B, Huttner says this is “the first human-specific gene where we could show that it contributes to the pool of basal brain stem cells and can trigger a folding of the neocortex. In that way, we managed to take the next step in tracing evolution.”2

Searching for the link between this human gene and the genes of our supposed ape cousins, Marta Florio and colleagues on a team led by Huttner report in Napredak znanosti that the nucleotide sequence in human-specific ARHGAP11B differs from a similar gene in apes by just one nucleotide.3 A nucleotide is the equivalent of a letter in the genetic language. That difference in spelling might well be the genetic basis for one of the greatest physical differences between apes and humans.

Florio’s team genetically engineered a form of ARHGAP11B with a spelling error. They believe this misspelled human gene is the ancestral form of ARHGAP11B because it is spelled like a similar gene in the chimpanzee, which they firmly believe to be the human’s cousin. When tested on mouse embryos,4 this “ancestral” gene was unable to trigger proliferation of basal progenitor cells. (Basal progenitor cells are the cells that differentiate into neurons as embryonic development continues.) This simple spelling error nips any big-brained potential in the bud. Therefore, Florio’s team concludes that the ability of the human ARHGAP11B gene to stimulate stem cell production in a human embryo’s brain evolved “from a change that is tiny on a genomic scale but substantial in its functional and evolutionary consequences.”5


Keep this in mind

Gene sequencing is already contributing to the development of better, more targeted, and potentially safer medicines. Its use to inform treatment decisions, reduce the use of less effective treatments, and possibly reduce the risk of relapse or provide functional cures is revolutionary.

In the future, we may see more blurring of the lines separating gene-sequencing system manufacturers like Illumina, drug developers like Novartis, and genetic services companies like Guardant. We're already seeing collaborations that cut across these individual market segments, such as Grail, a company spun out of Illumina that's using gene sequencing to develop next-generation cancer tests that could catch disease at its earliest stage. Since these companies may wind up competing more aggressively with one another in the future, investors will want to keep close tabs on this market.


Considering interactions between genes, environments, biology, and social context

Kristen Jacobson received her Ph.D. in Human Development and Family Studies from the Pennsylvania State University in 1999. She spent a year as a postdoctoral scholar in psychiatric genetics under the direction of Dr. Kenneth Kendler at the Virginia Institute for Psychiatric and Behavioral Genetics, where she later served as faculty from 2000-2005. Dr. Jacobson is currently an Assistant Professor of Psychiatry at the University of Chicago, and serves as the Associate Director for Twin Projects and the Associate Director of the Clinical Neuroscience and Psychopharmacology Research Unit. Dr. Jacobson is a collaborator on a number of twin studies of children, adolescents, and adults, and is currently conducting a multidisciplinary, multi-level study of adolescent development, From Neighborhoods to Neurons and Beyond, funded by an NIH New Innovator Award . She is editor of a special issue of Behavior Genetics entitled Pathways between Genes, Brain, and Behavior (expected publication January, 2010). New areas of research involve pilot studies of epigenetics in both mice and humans.

Bronfenbrenner’s bioecological model (Bronfenbrenner & Ceci, 1994) highlights the need to consider interactions between individual, family, peer, school, and community characteristics in understanding individual differences in human development. In order to obtain a complete understanding of the processes involved in individual differences, multidisciplinary studies that measure risk and protective factors at multiple levels of analysis are required. With recent advances in human molecular genetics, the need to integrate environmental measures into genomic studies is of even greater importance. While the mapping of the human genome and the corresponding availability of genome-wide association analysis (GWAS) techniques has led to a flurry of research activity trying to discover “genes for” particular disorders and traits, a significant body of research, both historic as well as quite recent, cautions that efforts to uncover specific genetic variants that ignore the effects of social and contextual environments in genetic studies of individual differences in human behavior and traits may be futile. This essay briefly reviews some of the most interesting work regarding the interplay of genes and environments on individual differences in human development.

Priroda protiv Nurture

For years, behavioral genetic studies using twin or adoptive samples have been considered the gold standard for assessing the joint effects of nature and nurture in accounting for individual differences in human behaviors and traits. Decades of behavioral genetic research have demonstrated the importance of genetically-influenced characteristics on individual differences in child, adolescent, and adult behaviors and traits. At the same time, behavioral genetic studies have revealed that generally over half of the variation in individual behaviors and traits is due to environmental factors, typically environmental factors that are unique across people within the same family or that have different effects on behavior (i.e., nonshared environmental influence).

Genetic influence has been found on “environmental” measures, suggesting the presence of gene à environment correlations. Gene à environment correlations arise because exposure to certain risk and protective environments is not random, but rather is influenced by inherited characteristics of the individual, and also because children “inherit” both genes and environments from their parents. The role of genes and environments in mediating pathways between risk and behavior is complex, however. For example, recent quasi-longitudinal work using twins to understand the relationship between peer group deviance and adolescent problem behavior found that while genetic factors accounted for most of the relationship between earlier problem behavior and later peer group deviance (consistent with genetic characteristics of an individual relating to peer selection), the relationship between prior peer group deviance and later problem behavior was largely environmentally mediated (consistent with peer influence effects (Kendler, Jacobson, Myers, & Eaves, 2008).

Priroda i Nurture

While the nature versus nurture debate may have attenuated in recent years with consensus from many fields regarding the importance of both genes and environments, other areas of research have further identified interactions between nature and nurture as important components of individual differences. A host of adoption studies in the 1980s and 1990s have shown that genetic liability to antisocial behavior (as indexed through biological parent psychopathology and substance abuse) is only associated with the development of adult criminality and aggression under adverse adoptive environmental conditions, indicating that neither nature nor nurture was sufficient in and of itself to cause pathology (Cadoret, Yates, Troughton, Woodworth, & Stewart, 1995 Cloninger & Gottesman, 1987).

Alternatively, gene X environment (gXe) interactions may be implicated when the relative importance of genetic influence on behaviors and traits as measured through standard twin designs varies across social and ecological context. For example, a study by Rowe, Almeida, and Jacobson (1999) integrated genetically-informative regression models within a hierarchical linear modeling design to show that levels of parental warmth, measured at the aggregate school level, moderated the heritability (i.e., proportion of individual differences due to genetic factors) of adolescent aggression. Heritabilities of delinquent behavior are increased among adolescents living in families with high rates of dysfunction (Button, Scourfield, Martin, Purcell, & McGuffin, 2005), while the heritability of adolescent smoking decreases with higher levels of parental monitoring (Dick et al., 2007). Family and personal religiosity has been shown to decrease the importance of genetic variance on adolescent substance use behaviors (Koopmans, Slutske, Heath, Neale, & Boomsma, 1999 Timberlake et al., 2006), and urban-rural differences in the heritability of adolescent alcohol use were found to be mediated by contextual factors such as alcohol sales and neighborhood migration (Dick, Rose, Viken, Kapiro, & Koskenvuo, 2001). These latter areas of research may be of particular importance in generalizing results from prior twin studies to minority individuals or individuals in socially and economically disadvantaged environments, as most large-scale twin registries are based on primarily middle-class, Caucasian or Asian samples.

More recently, attention has turned to using measured genotypes and measured environments to investigate ”classic” gXe interactions for a number of important behaviors. Caspi et al.(2002) have elucidated an important and highly replicated (Kim-Cohen et al., 2006) gXe interaction using measured genotype (MAO-A gene) and environmental risk (child abuse) variables, demonstrating that the relationship between child maltreatment and various indices of aggressive and antisocial behavior is attenuated among individuals with the high MAO-A activity genotype.

Another highly replicated interaction has been found between a serotonin transporter gene (5-HTTPLR) and stressful life events in predicting depression (Canli & Lesch, 2007). Further studies have found interactions between the 5-HTTPLR genotype and socioeconomic status (SES) for aggression in preadolescents (Nobile et al., 2007), between the 5-HTTPLR genotype and lab-induced stress for lab measures of aggression in adult males (Verona, Joiner, Johnson, & Bender, 2006) and between life stress and the 5-HTTPLR genotype for individual differences in amygdala activation (Canli et al., 2006). There is also emerging evidence for environmental modification of dopaminergic genes related to impulsivity and aggression, with studies finding significant interactions among the DRD4-7 repeat polymorphism and caregiver quality in predicting higher levels of aggression and impulsive traits in infants and preschoolers (Bakermans-Kranenburg & van Ijzendoorn, 2006 Sheese, Voelker, Rothbart, & Posner, 2007), and interactions between SES and the DRD4 gene for aggression in pre-adolescents (Nobile et al., 2007). Thus, genes implicated in multiple neurotransmitter pathways work in conjunction with a host of social and environmental experiences to alter individual differences across multiple behaviors and traits.

Additional Gene-Environment Interplay

While the above section concerns statistical interactions between genes and environments which may represent genetic sensitivity to environmental stressors, or, alternatively, environmental exacerbation of genetic effects, another potentially important avenue for research concerns the dynamic interplay between genes and environments, that is, genetic influence on okruženja and environmental influences on genes. By now, it is fairly common knowledge that when measures of family environment are treated as ‘phenotypes’ in traditional behavioral genetic models, significant genetic influences on these measures are often detected (Plomin & Bergeman, 1991). Decades of behavioral genetic studies have provided considerable evidence for significant genetic influence for measures such as various dimensions of parenting, indices of SES such as income and educational level, social support, and stressful life events (see Kendler & Baker [2007] for a recent review). What has been slower to develop, however, is the notion that environmental influences and experiences can have profound effects on genetic influence. While the underlying DNA structure and sequence individuals are born with does not change over time, a newer area of research in epigenetics is beginning to identify factors that may alter gene expression and function across the lifespan.

Epigenetics, defined formally as changes in gene expression caused by mechanisms other than changes in the underlying DNA sequence, offers an exciting new frontier in the study of human psychiatric and medical diseases, and psychological behaviors and traits. Epigenetic mechanisms include DNA methylation and chromatin remodeling, the latter via post-translational modifications (e.g. methylation, acetylation, phosphorylation and ubiquitylation) to histone proteins which form the scaffold for the DNA helix. Although some epigenetic processes are essential to organism function (e.g., differentiation of cells in the developing embryo during morphogenesis), other epigenetic processes can have major adverse effects on health and behavioral outcomes. While some epigenetic changes only occur within the course of one individual organism's lifetime, animal models suggest that other epigenetic changes can be inherited from one generation to the next (see Champagne [2008] for a review), contributing, in part, to the heritability of behavioral traits and psychiatric disease.

However, a growing field of research suggests that environmental experiences, particularly those related to stress, have the capacity to alter biological and genetic mechanisms associated with increased risk of problem behavior. Again, the notion that environmental experience can change biological processes has important historical precedence. Harlow’s seminal deprivation studies of non-human primates have shown that disruptions in early rearing environments have the capacity to disrupt psychobiological regulatory functions, leading to behavioral changes. Other important animal research has begun to identify the precise mechanisms by which social environmental factors can alter epigenetic programming. Relatively recent research using animal models offers an elegant demonstration of how early environmental stressors can alter neurobiological responsivity to future stressful conditioning (Meaney, 2001). Meaney’s model highlights how individual differences in maternal behaviors can cause regulatory changes in the corticotropin releasing hormone (CRH) system at the level of the central nucleus of the amygdala, and how these changes relate further to changes in adrenocortical and autonomic effects of later stressful events. Importantly, his work suggests that these effects can be altered through intervention (Weaver et al., 2005). Differences in early maternal care have also been associated with differences in methylation of the glucocorticoid receptor gene promoter in the hippocampus (Meaney & Szyf, 2005). Most critically, a recent comparison of post-mortem brain tissue from a sample of patients with a history of child abuse and/or neglect and who died by suicide indicated DNA hypermethylation of the rRNA promoter region in the hippocampus relative to controls who experienced sudden, accidental death (McGowan et al., 2008), supporting the hypothesis that epigenetic changes due to social and environmental experiences are related to behavioral traits.

Other studies of monozygotic twins have identified variations in DNA methylation levels in certain target gene promoter regions. Because identical twins share identical genomes and experience many of the same family environmental factors, this indicates that environmental experiences that are not shared among children in the same family have an important causal role in gene expression, and may further be related to behavioral differences among identical twin pairs. Importantly, within-pair differences in DNA methylation and histone acetylation patterns were increased in older twin pairs, especially those who had different lifestyles and had spent fewer years of their lives together, strongly supporting epigenetic processes as a part of nonshared environmental influence on individual differences (Fraga et al., 2005). This suggests that epigenetic processes represent a fundamental gene-environment interface in the development and ongoing plasticity of the human brain.

Zaključci

While there is no doubt that genetic studies of individual behaviors and traits will increase our understanding of both normal human variation and pathological disorders, there is increasing recognition that the interplay between genes and environments is remarkably complex. Not only are both genes and environments important for both normal and abnormal human development, but genes and environments operate interactively to produce both risk and resilience to specific behavioral and psychiatric disorders. More importantly, emerging lines of research from epigenetics suggest that not only can nature alter nurture, but nurture, in turn, has the power to modify nature. Thus, genomic studies that incorporate a range of social and environmental influences will further our understanding of the complex dance between nature and nurture in human development.

Bakermans-Kranenburg, M. J., & van Ijzendoorn, M. H. (2006). Gene-environment interaction of the dopamine d4 receptor (drd4) and observed maternal insensitivity predicting externalizing behavior in preschoolers. Dev Psychobiol, 48(5), 406-409.

Bronfenbrenner, U., & Ceci, S. J. (1994). Nature-nurture reconceptualized in developmental perspective: A bioecological model. Psychol Rev, 101(4), 568-586.

Button, T. M., Scourfield, J., Martin, N., Purcell, S., & McGuffin, P. (2005). Family dysfunction interacts with genes in the causation of antisocial symptoms. Behav Genet, 35(2), 115-120.

Cadoret, R. J., Yates, W. R., Troughton, E., Woodworth, G., & Stewart, M. A. (1995). Genetic-environmental interaction in the genesis of aggressivity and conduct disorders. Arch Gen Psychiatry, 52(11), 916-924.

Canli, T., & Lesch, K.-P. (2007). Long story short: The serotonin transporter in emotion regulation and social cognition. Nat Neurosci, 10(9), 1103.

Canli, T., Q. M., Omura, K., Congdon, E., Haas, B.W., Amin, Z., Herrmann, M.J., et al. (2006). Neural correlates of epigenesis. Proc Natl Acad Sci, 103, 16033-16038.

Caspi, A., McClay, J., Moffitt, T. E., Mill, J., Martin, J., Craig, I. W., et al. (2002). Role of genotype in the cycle of violence in maltreated children. Znanost, 297(5582), 851-854.

Champagne, F. A. (2008). Epigenetic mechanisms and the transgenerational effects of maternal care. Front Neuroendocrinol, 29(3), 386-397.

Cloninger, C. R., & Gottesman, I. (1987). Genetic and environmental factors in antisocial behavior disorder. In S. A. Mednick, T. E. Moffitt & S. A. Stack (Eds.), The causes of crime: New biological approaches (pp. 99-102). Cambridge: Cambridge University Press.

Dick, D. M., Rose, R. J., Viken, R. J., Kaprio, J., & Koskenvuo, M. (2001). Exploring gene-environment interactions: Socioregional moderation of alcohol use. J Abnorm Psychol, 110(4), 625-632.

Dick, D. M., Viken, R., Purcell, S., Kaprio, J., Pulkkinen, L., & Rose, R. J. (2007). Parental monitoring moderates the importance of genetic and environmental influences on adolescent smoking. J Abnorm Psychol, 116(1), 213-218.

Fraga, M. F., Ballestar, E., Paz, M. F., Ropero, S., Setien, F., Ballestar, M. L., et al. (2005). Epigenetic differences arise during the lifetime of monozygotic twins. Proc Natl Acad Sci U S A, 102(30), 10604-10609.

Kendler, K. S., & Baker, J. H. (2007). Genetic influences on measures of the environment: A systematic review. Psychol Med, 37(5), 615-626.

Kendler, K. S., Jacobson, K., Myers, J. M., & Eaves, L. J. (2008). A genetically informative developmental study of the relationship between conduct disorder and peer deviance in males. Psychol Med, 38(7), 1001-1011.

Kim-Cohen, J., Caspi, A., Taylor, A., Williams, B., Newcombe, R., Craig, I. W., et al. (2006). MAOA, maltreatment, and gene-environment interaction predicting children's mental health: New evidence and a meta-analysis. Mol Psychiatry, 11(10), 903-913.

Koopmans, J. R., Slutske, W. S., Heath, A. C., Neale, M. C., & Boomsma, D. I. (1999). The genetics of smoking initiation and quantity smoked in dutch adolescent and young adult twins. Behav Genet, 29(6), 383-393.

McGowan, P. O., Sasaki, A., Huang, T. C., Unterberger, A., Suderman, M., Ernst, C., et al. (2008). Promoter-wide hypermethylation of the ribosomal rna gene promoter in the suicide brain. PLOS JEDAN, 3(5), e2085.

Meaney, M. J. (2001). Maternal care, gene expression, and the transmission of individual differences in stress reactivity across generations. Annu Rev Neurosci, 24, 1161-1192.

Meaney, M. J., & Szyf, M. (2005). Maternal care as a model for experience-dependent chromatin plasticity? Trends Neurosci, 28(9), 456-463.

Nobile, M., Giorda, R., Marino, C., Carlet, O., Pastore, V., Vanzin, L., et al. (2007). Socioeconomic status mediates the genetic contribution of the dopamine receptor d4 and serotonin transporter linked promoter region polymorphisms to externalization in preadolescence. Development and Psychopathology, 19(4), 1147-1160.

Plomin, R., & Bergeman, C. S. (1991). The nature of nurture: Genetic influence on "environmental" measures. Behavioral & Brain Sciences, 14, 373-427.

Rowe, D. C., Almeida, D. M., & Jacobson, K. C. (1999). School context and genetic influences on aggression in adolesceence. Psychological Science, 10, 277-280.

Sheese, B., Voelker, P., Rothbart, M., & Posner, M. (2007). Parenting quality interacts with genetic variation in dopamine receptor d4 to influence temperament in early childhood. Developmental Psychopathology, 19, 1039-1046.

Timberlake, D. S., Rhee, S. H., Haberstick, B. C., Hopfer, C., Ehringer, M., Lessem, J. M., et al. (2006). The moderating effects of religiosity on the genetic and environmental determinants of smoking initiation. Nicotine Tob Res, 8(1), 123-133.

Verona, E., Joiner, T. E., Johnson, F., & Bender, T. W. (2006). Gender specific gene-environment interactions on laboratory-assessed aggression. Biol Psychol, 71(1), 33-41.

Weaver, I. C., Champagne, F. A., Brown, S. E., Dymov, S., Sharma, S., Meaney, M. J., et al. (2005). Reversal of maternal programming of stress responses in adult offspring through methyl supplementation: Altering epigenetic marking later in life. J Neurosci, 25(47), 11045-11054.


Gledaj video: Kuinka Paljon Maailman Merissä On Muovia? (Srpanj 2022).


Komentari:

  1. Patrick

    Ova tema je jednostavno bez premca :), jako mi je ugodna.

  2. Wiellatun

    Svidjelo mi se! Uzimam ....)))))))

  3. Shandy

    Žao mi je, ali, po mom mišljenju, pogriješili su. u stanju sam to dokazati. Piši mi na PM, priča se s tobom.

  4. Gilmore

    Mislim da nisu u pravu. Pokušajmo razgovarati o tome. Napiši mi u pm, govori.

  5. Abdul-Azim

    niste u pravu. uvjeren sam. Razgovarajmo o tome. Pišite mi na PM, javićemo se.

  6. Nehemiah

    Zanimljivo je. Nećete mi pitati, gdje mogu pronaći više informacija o ovom pitanju?

  7. Kester

    Po mom mišljenju niste u pravu. Uvjeren sam. Predlažem da razgovaraju. Pišite mi u PM, razgovarat ćemo.

  8. Oswald

    i don't know



Napišite poruku