Informacija

Vizualiziranje podskupa stabla života

Vizualiziranje podskupa stabla života



We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Razumijem da mnoga kurirana stabla života već postoje (npr. http://tolweb.org/tree/), ali postoji li ikakva web stranica koja omogućuje unos popisa organizama, a zatim stvaranje trenutačne najbolje pretpostavke o njihovim evolucijskim odnosima?

Najbolja stranica koju sam uspio pronaći do sada je http://itol.embl.de/itol.cgi, koja vam omogućuje da odaberete određene vrste s glavnog stabla za ucrtavanje u podstablo, ali samo stablo je prilično malo , pa se ne mogu napraviti detaljnije usporedbe.


http://phylot.biobyte.de/ obavlja potrebnu zadaću (generiranje filogenetskog stabla na temelju određenih organizama, korištenjem tablica taksonomije NCBI).

Na primjer, unos elemenata stabla

Trichomonas vaginalis,Trypanosoma brucei,Homo sapiens,Fibroporia radiculosa,Paramecium tetraurelia,Tetrahymena thermophila,Cryptosporidium muris,Cryptosporidium hominis,Blastocystis hominis

generira stablo


Proučavanje divovskih virusa potresa stablo života

Nova studija o divovskim virusima podržava ideju da su virusi drevni živi organizmi, a ne neživi molekularni ostaci koji divljaju, kao što su tvrdili neki znanstvenici. Studija bi mogla preoblikovati univerzalno obiteljsko stablo, dodajući četvrtu veliku granu trima za koje se većina znanstvenika slaže da predstavljaju temeljna područja života.

Nova otkrića pojavljuju se u časopisu BMC evolucijska biologija.

Istraživači su koristili relativno novu metodu kako bi zavirili u daleku prošlost. Umjesto da uspoređuju genetske sekvence, koje su nestabilne i brzo se mijenjaju tijekom vremena, tražili su dokaze prošlih događaja u trodimenzionalnim, strukturnim domenama proteina. Ti strukturni motivi, nazvani nabori, relativno su stabilni molekularni fosili koji - poput fosila ljudskih ili životinjskih kostiju - nude tragove za drevne evolucijske događaje, rekao je profesor znanosti o usjevima Sveučilišta Illinois i Instituta za genomsku biologiju Gustavo Caetano-Anollés, koji vodio analizu.

"Baš poput paleontologa, promatramo dijelove sustava i kako se oni mijenjaju tijekom vremena", rekao je Caetano-Anollés. Neki proteinski nabori pojavljuju se samo u jednoj skupini ili u podskupini organizama, rekao je, dok su drugi zajednički za sve dosad proučavane organizme.

"Mi činimo vrlo osnovnu pretpostavku da su strukture koje se pojavljuju češće i u više skupina najstarije strukture", rekao je.

Većina napora da se dokumentira povezanost svih živih bića izostavila je viruse iz jednadžbe, rekao je Caetano-Anollés.

"Uvijek smo gledali posljednjeg univerzalnog zajedničkog pretka uspoređujući stanice", rekao je. "Nikada nismo dodavali viruse. Stoga smo stavili viruse u mješavinu da vidimo odakle su ti virusi došli."

Istraživači su proveli popis svih proteinskih nabora koji se pojavljuju u više od 1000 organizama koji predstavljaju bakterije, viruse, mikrobe poznate kao arheje i sva druga živa bića. Istraživači su uključili divovske viruse jer su ti virusi veliki i složeni, s genomima koji supariraju - a u nekim slučajevima i nadmašuju - genetsku sposobnost najjednostavnijih bakterija, rekao je Caetano-Anollés.

"Divovski virusi imaju nevjerojatne strojeve koji su vrlo slični strojevima koje imate u ćeliji", rekao je. "Imaju složenost i moramo objasniti zašto."

Dio te složenosti uključuje enzime uključene u prevođenje genetskog koda u proteine, rekao je. Znanstvenici su bili zaprepašteni otkrivši te enzime u virusima, budući da virusi nemaju sve druge poznate mašinerije za izgradnju proteina i moraju zapovijedati proteinima domaćina kako bi obavili posao umjesto njih.

U novoj studiji znanstvenici su mapirali evolucijske odnose između proteina stotina organizama i koristili informacije za izgradnju novog univerzalnog stabla života koje je uključivalo viruse. Rezultirajuće stablo imalo je četiri jasno diferencirane grane, od kojih je svaka predstavljala posebnu "superskupinu". Divovski virusi tvorili su četvrtu granu stabla, uz bakterije, arheje i eukarije (biljke, životinje i sve druge organizme sa stanicama s jezgrom).

Istraživači su otkrili da su mnogi od najstarijih proteinskih nabora -- onih koji se nalaze u većini staničnih organizama -- također bili prisutni u divovskim virusima. To sugerira da su se ti virusi pojavili prilično rano u evoluciji, blizu korijena stabla života, rekao je Caetano-Anollés.

Nova analiza dodaje dokaze da su divovski virusi izvorno bili mnogo složeniji nego što su danas i da su tijekom vremena doživjeli dramatično smanjenje genoma, rekao je Caetano-Anollés. Ovo smanjenje vjerojatno objašnjava njihovo eventualno usvajanje parazitskog načina života, rekao je. On i njegovi kolege sugeriraju da su divovski virusi više nalik svojim izvornim precima nego manji virusi sa smanjenim genomima.

Istraživači su također otkrili da se čini da su virusi ključni "širači informacija", rekao je Caetano-Anollés.

"Proteinske strukture koje drugi organizmi dijele s virusima imaju posebnu kvalitetu, (šire) su rasprostranjene od drugih struktura", rekao je. "Svaka i svaka od ovih struktura nevjerojatno je otkriće u evoluciji. A virusi distribuiraju ovu novost", rekao je.

Većina studija divovskih virusa "usmjerava u istom smjeru", rekao je Caetano-Anollés. "I ova studija nudi više dokaza da su virusi ugrađeni u tkivo života."

Istraživački tim uključivao je studente diplomskog studija Arshan Nasir i Kyung Mo Kim s Korejskog istraživačkog instituta za bioznanost i biotehnologiju.


Drvo života raste u Teksasu

Tandy Warnow, David Bruton, Jr. Centennial profesor računalnih znanosti. Znanstvenici nastavljaju usavršavati, a ponekad i radikalno mijenjati, naše razumijevanje “Drveta života” – načina na koji su vrste povezane jedna s drugom. Koriste računsku snagu Texas Advanced Computing Center (TACC) na Sveučilištu Texas u Austinu kako bi bolje razumjeli podrijetlo vrsta i, u konačnici, pomogli u borbi protiv bolesti i razvoju boljih usjeva.

Dok se nekoć evolucijska povijest temeljila na odnosima kostiju, kostura i drugih morfoloških tragova, danas je DNK danas glavni informator u priči o tome kako je Zemlja postala tako raznoliko mjesto.

Filogenetika je grana znanosti o životu koja proučava evolucijske odnose među organizmima na temelju genetskih dokaza. Usklađivanjem molekularnih sekvenci različitih vrsta, znanstvenici mogu vidjeti kako se organizmi razlikuju na genetskoj razini, odrediti gdje su se razišli i mapirati stabla grananja odnosa na temelju poravnanja.

S obzirom na smanjenje cijene sekvenciranja gena, istraživači provode više filogenetskih studija. Čak i tako, proces slaganja desetaka tisuća sekvenci od stotina ili tisuća vrsta nevjerojatno je kompliciran, čak i za računalo.

"Najtočnija stabla se procjenjuju korištenjem metoda koje pokušavaju riješiti teške probleme optimizacije", rekla je Tandy Warnow, profesorica računalnih znanosti na Sveučilištu Texas u Austinu i Guggenheim Fellow.

“Dok se ta rješenja mogu napraviti na malim skupovima podataka ili skupovima podataka umjerene veličine, na velikim skupovima podataka mogu potrajati jako dugo – tjednima do mjesecima do godina računalnog vremena. Teksaški napredni računalni centar na kraju je ključan za te probleme.”

TACC, na J.J. Pickle Research Campus u sjevernom Austinu, pokreće neke od najvećih i najmoćnijih sustava na svijetu, ali čak i njihova superračunala teško mogu pratiti tempo genetskih istraživanja. Prema Mooreovom zakonu, performanse računala se udvostručuju svake dvije godine. Međutim, sposobnost sekvencera gena za stvaranje podataka rasla je još bržom brzinom.

"To je drugačija vrsta izazova", rekao je Warnow. "Ne radi se samo o tome kako provodimo analize na velikim skupovima podataka, već i kako pristupiti podacima na razuman način?"

Zavadi pa vladaj

Warnow radi s postdoktorandom Kevinom Liuom sa Sveučilišta Rice i Siavashom Mirarabom, doktorom znanosti. student računalnih znanosti na Sveučilištu Texas u Austinu, kako bi stvorio pametnije, brže i točnije algoritme za primjenu na neke od najvećih skupova podataka ikada stvorenih.

Ovo filogenetsko stablo, koje su stvorili David Hillis, Derreck Zwickil i Robin Gutell, prikazuje evolucijske odnose oko 3000 vrsta kroz Stablo života. Prikazano je manje od 1 posto poznatih vrsta. Uz donaciju od 1,5 milijuna dolara od Nacionalne zaklade za znanost (kroz projekt Sastavljanje stabla života), znanstvenici su razvili softver koji omogućuje računalima da brže crtaju bolja evolucijska stabla.

Zove se SATé — Simultaneous Alignment and Tree Estimation — i koristi novi pristup zavadi i vladaj.

"Podjelom stvarno velikog skupa podataka koji je teško uskladiti na male skupove podataka koji su usko povezani, možete dobiti dobre procjene za svaki podskup, a zatim dobiti usklađenje za cijeli skup podataka", objasnio je Warnow.

Ogromna superračunala, kao što je Ranger u TACC-u, poravnavaju sekvence svakog podskupa i kombiniraju poravnanja u poravnanje na cijelom skupu sekvenci.

Ne postoji način da se zna je li stablo koje proizlazi iz ovih simulacija apsolutno točno. Neka stabla su očito pogrešna - na primjer, ona koja prikazuju ljude i krokodile na istoj grani, odvojene od čimpanza - ali većina je vjerojatna.

Iz tog razloga, SATé koristi statističku metodu kako bi pružio maksimalnu ocjenu vjerojatnosti: mjeru kojom se procjenjuje njezina točnost u odnosu na druge odgovore. SATé ponavlja proces poravnanja i građenja stabla mnogo puta dok se ne postigne stablo s najvećom vjerojatnošću.

U razvoju softvera najbolji proizvodi nisu samo najnoviji, već oni za koje se pokazalo da su bolji od alternativa. U tu svrhu, Warnow i njezin tim su radili kao testeri za osiguranje kvalitete i pouzdanost, rješavajući teške probleme evolucijske stabla više puta, s različitim metodama i parametrima, kako bi osigurali da SATé daje rezultat najviše kvalitete.

Prvo objavljeno u časopisu Science, a kasnije istraženo u časopisima PLoS Currents i Systematic Biology, istraživači su u više navrata pokazali da SATé radi, kao i metode poravnanja i procjene stabla koje se obično koriste, a koje analiziraju stabla kao pojedinačne jedinice. Ali SATé je daleko brži ili postiže veću točnost, ali za isto vrijeme.

Za Ptice

Warnow i njezin tim također surađuju s evolucijskim biolozima na projektima u kojima njihovo vodstvo može dovesti do novih uvida.

Od dana Charlesa Darwina, znanstvenici su raspravljali o evolucijskoj povijesti ptica koje ne lete, poznatih kod ratita. Kako je toliko sličnih vrsta dospjelo u tako daleke kutke Zemlje?

"Teorija pomjeranja kontinenata pružila je prikladan odgovor", rekao je Michael Braun, kustos na odjelu za sustavnu biologiju na Institutu Smithsonian. “Ove su ptice evoluirale od zajedničkog pretka koji ne leti, a zatim su prešle na svoje trenutne distribucije. 40 godina to je ostalo objašnjenje raspršivanja vrsta u udžbeniku.”

To je bilo sve dok Braun nije kroz DNK analizu otkrio da je drevna (ali još uvijek živa) obitelj ptica pronađena u Južnoj Americi, tinamou, bila jedna od najsrodnijih skupina s emusima i nojevima. Ali tinamou bi mogao letjeti - otkriće je prvi put objavljeno 2009. godine.

Ova činjenica, u kombinaciji s nedostatkom skeletnih dokaza o pticama koje ne lete prije nego što su se kontinenti raspali, dovela je do rekonceptualizacije grane ptičjeg drveta. Ratiti su zapravo potjecali od letećih ptica koje su putovale na mjesta gdje let više nije bio evolucijska prednost te su posljedično izgubile sposobnost letenja.

“Teško je prepoznati odnose među vrstama koristeći samo morfologiju, ali kada možemo koristiti molekule i odgovarajuće analitičke metode da pronađemo odnose, to nam pomaže bolje razumjeti kako je došlo do te adaptivne evolucije”, rekao je Braun.

Nedavno je Warnow radio s Braunom, koristeći SATé, kako bi ponovno analizirao svoja kontroverzna otkrića. Njihova studija potvrdila je evolucijski odnos koji je Braun otkrio.

Hitna filogenetika

Bolje, brže, točnije filogenetske metode mogu imati utjecaj na život ili smrt za ljude.

Centri za kontrolu i prevenciju bolesti koriste alate za usklađivanje sekvenci i evolucijske alate za izgradnju stabala kada se pojavi novi virus kako bi utvrdili odakle je mogao doći i kako se razlikuje od prethodnih virusa.

Znanstvenici za biljke također koriste alate za izgradnju stabala kako bi odredili koji su geni povezani s pozitivnim osobinama kao što su otpornost i tolerancija na sušu. Ovo znanje omogućuje znanstvenicima uzgoj produktivnijih usjeva, pomažući u prehrani svijeta.

Ali nijedan od ovih problema nije lako riješen.

"Mnoge istraživačke grupe procjenjuju stabla koja sadrže od nekoliko tisuća do stotina tisuća vrsta, prema konačnom cilju procjene stabla života, koje sadrži možda čak nekoliko milijuna listova", napisao je Warnow u nedavnom članku u Systematic Biology . "Ove filogenetske procjene predstavljaju ogromne računske izazove, a trenutne računske metode vjerojatno neće uspjeti čak ni na skupovima podataka u donjem dijelu ovog raspona."

Drugim riječima, mali problemi mogu biti nadohvat ruke, ali veliki ostaju.

"Ne postaje lakše, ali postaje zabavnije", rekao je Warnow.

Autor Aaron Dubrow, izvorno objavljeno na web stranici Texas Advanced Computing Center.


Vježba Drvo života

Slika ispod je primjer kako će vježba stablo života izgledati nakon završetka. Uspio sam dovršiti ovaj grubi nacrt za otprilike sat vremena. Upute u nastavku će opisati kako možete izraditi svoj vlastiti.

Prvi korak je naravno crtanje stabla. Dolje sam uključio video koji bi trebao pomoći ako se osjećate izgubljeno. Međutim, trebao bih napomenuti da bi–barem za vaš prvi nacrt–moglo biti od pomoći da bude grubo. Uvijek se možete vratiti kasnije i ponovno nacrtati ili dotjerati svoj postojeći crtež radi estetike. Ovaj krug je sve o prikupljanju informacija.

Zatim slijedite donje upute za označavanje. Ako možete misliti samo na jednu ili dvije stvari po odjeljku odjednom, ne brinite o tome. Priroda ove vježbe je da dok dovršavate svaki korak, otključava više sjećanja i ideja za druge dijelove. Možete preskočiti i ispuniti stvari u bilo kojem trenutku. Najkorisnije na početku je samo zapisati stvari i vidjeti kamo će vas to odvesti. Možda ćete se iznenaditi!

Kompostna hrpa (Neobavezno–Ali toplo se preporučuje!)

Zapišite sve u svojoj kompostnoj hrpi što bi inače bilo u ostalim odjeljcima opisanim u nastavku, ali što sada više ne želite biti definirane.

To su često izvori traume, zlostavljanja, kulturnih standarda normalnosti/ljepote/itd. ili bilo što drugo što oblikuje negativne misli o sebi u vašem umu. Možete zapisivati ​​mjesta, ljude, probleme, iskustva. Što god trebate.

Ja sam svoju zamaglila gore, ali možete vidjeti da ima nekoliko stavki. Općenito, svi oni imaju veze s traumama iz prošlosti i štetnim odnosima koje pokušavam otpustiti. Otkrio sam da je ideja o kompostnoj hrpi izuzetno koristan način razmišljanja o tim stvarima. Pogotovo jer mnoge od njih nisu uredno kategorizirane kao “sve loše”.

U stvari, postoji dosta životnih lekcija koje sam naučio kroz stvari koje su završile u mojoj hrpi komposta. I kao što bi kompostna hrpa trebala raditi, na kraju ću razbiti te stvari i ponovno zasijati bogate dijelove natrag u svoj život.

Isto možete učiniti i sa svojim.

Korijeni

Zapišite odakle dolazite na korijenima. To može biti vaš rodni grad, država, država itd. Također možete zapisati kulturu u kojoj ste odrasli, klub ili organizaciju koja je oblikovala vašu mladost, ili roditelj/skrbnik.

Tlo

Zapišite stvari koje odlučite raditi na tjednoj bazi na terenu. To ne bi trebale biti stvari koje ste prisiljeni činiti, već stvari koje ste odabrali učiniti za sebe.

Prtljažnik

Napišite svoje vještine i vrijednosti na prtljažniku. Odlučio sam napisati svoje vrijednosti počevši od podnožja prtljažnika koji ide prema gore. Zatim sam prešao na navođenje svojih vještina. Za mene se to činilo prirodnim napredovanjem od korijena do vrijednosti do vještina.

Podružnice

Zapišite svoje nade, snove i želje na granama. Oni mogu biti osobni, zajednički ili opći za cijelo čovječanstvo. Razmišljajte i dugoročno i kratkoročno. Raširite ih po raznim granama.

Listovi

Zapišite imena onih koji su vam značajni na pozitivan način. Vaši prijatelji, obitelj, kućni ljubimci, heroji itd.

Voće

Zapišite naslijeđe koje vam je prenijeto. Možete započeti gledanjem imena koja ste upravo napisali na listovima i razmišljanjem o utjecaju koji su oni imali na vas i što su vam dali tijekom godina. To može biti materijalno, kao što je nasljedstvo, ali najčešće će to biti atributi poput hrabrosti, velikodušnosti, ljubaznosti itd.

(Savjet: ako je vaše stablo već prilično pretrpano, možda pokušajte nacrtati neke košare s voćem u podnožju svog stabla i označiti ih u skladu s tim.)

Cvijeće i sjemenke

Zapišite naslijeđe koje želite ostaviti drugima na cvijeće i sjemenke.

(Savjet: opet, možda ćete htjeti razriješiti svoj crtež vizualizacijom mladica, košara cvijeća itd. na koje ćete zapisati ove predmete.)


Konceptualni okviri: Reinterpretacija TOL-a

Mnogi značajni napredak postignut je u filogenetskim metodama, uključujući razvoj sofisticiranih evolucijskih modela, tehnika građenja stabala (uključujući brže alate prikladne za analizu skupova podataka u cijelom genomu) i procjene pouzdanosti zaključaka o stablima, kao i baza podataka i drugih računski alati. U ovom odjeljku prvenstveno se bavimo konceptima koji podupiru ove metode i njihovim odgovarajućim rezultatima. Konkretno, naš fokus je na tome kako je TOL rekonceptualiziran u svjetlu činjenice da što se više molekularnih podataka analizira, to je teže izravno interpretirati evolucijsku povijest tih molekula. Umjesto da se odreknu univerzalnog stabla, mnogi evolucijski biolozi su umjesto toga odlučili restrukturirati svoje razumijevanje TOL-a u odnosu na tijela podataka i što se s njima može učiniti. Ocrtat ćemo različite pozicije koje obuhvaćaju sve širi raspon modifikacija osnovnog koncepta TOL-a (slika 1.). Ti se stavovi kreću od "business as usual" na temelju pronalaženja jasnih signala jednog pravog TOL-a, do perspektive u kojoj se lokalno drveće vidi kao samo povremene strukture u "stvarnoj" mreži života. Sve te pozicije oslanjaju se na metaforu Darwinovog stabla, a također se preklapaju i napajaju jedna u drugu na različite načine, ali svaka za sebe zapovijeda poseban konceptualni prostor.

Konceptualni okviri TOL-a u odnosu na usporedbu Darwinovog stabla.

1. Stabla gena kao stabla vrsta

Stabla genskih i proteinskih sekvenci obično se smatraju najvrjednijim kada se može opravdati da predstavljaju stabla vrsta. Da bi se postigao ovaj reprezentativni status, gen ili skup gena mora zadovoljiti neke kriterije genealoških biljega. Prva dva povezana kriterija su najočitija: i) gen mora biti (gotovo) univerzalan, tj. predstavljen lako prepoznatljivim ortolozima (po mogućnosti jednom kopijom) u svim staničnim oblicima života ii) slijed dotičnog gena mora biti dovoljno očuvan da omogući izgradnju jednoznačnog poravnanja i informativnog stabla. Treći kriterij je kontroverzniji i teže ga je primijeniti: gen koji se koristi za konstrukciju referentnog stabla mora biti minimalno sklon HGT-u. Geni favorizirani prema ovim kriterijima uključuju one za ribosomsku RNA, ribosomske proteine, faktore elongacije, RNA polimeraze i nekoliko drugih (gotovo) univerzalnih, visoko konzerviranih gena [28, 29]. Neki od ovih markera smatraju se toliko evolucijski "posebnima" da su postali osnova referentnih stabala za cijeli TOL [30, 15]. Često se raspravlja o problemima najpoznatijih referentnih stabala, stabala gena 16S i 18S rRNA (npr. [31, 32]). Ipak, za mnoge evolucijske biologe koncept referentnog stabla još uvijek može biti opravdan sve dok se razumiju njegova ograničenja (npr. [33]).

Međutim, istraživači - čak i ako nastave koristiti referentna stabla - sve više prepoznaju da stabla jednog gena, pa čak i složena stabla s više gena, mogu zatamniti više nego što otkrivaju. Ova stabla ne mogu uzeti u obzir ne-bifurkacijske obrasce iz glavnih evolucijskih događaja, kao što su endosimbioza, koevoluirajuće simbioze, hibridizacija i bilo koje druge pojave fuzije loza [34-37]. Općenito, HGT je sada prepoznat kao glavni čimbenik evolucije u svijetu prokariota. Tretiranje svih ovih procesa koji nisu slični stablu kao problema koji prikrivaju "pravi" TOL uvelike iskrivljuje i ograničava razumijevanje evolucijske povijesti koja je jedan od središnjih ciljeva evolucijske biologije - zajedno s razumijevanjem procesa i obrazaca evolucije [38] .

Drugi način povezivanja stabala gena i stabala vrsta je razmišljanje o stablima gena sadržanim "unutar" stabla vrsta. Ovaj put je posebno atraktivan za sistematiku organizama za koje već postoji široko prihvaćen filogenetski smještaj u TOL-u (prvenstveno višestanični eukarioti), ali je također bio privlačn za filogenetiku prokariota. Očigledan problem je da stablo vrste mora biti "predodređeno" kako bi se izabrala i konstruirala prava stabla gena (npr. [39, 40]), a to čini da filogenija kružno pretpostavlja svoj zaključak. Međutim, kao i s prethodnim konceptualnim odnosom između stabala gena i vrsta, nesklad između stabala za pojedinačne gene - ne samo kod prokariota, a ne samo zbog HGT - također je doveo do temeljnih pitanja o tome može li se genska stabla jednostavno shvatiti kao praćenje povijesti "unutar" stabla poznate vrste [41–43]. "U razmatranju ovih pitanja", napisao je filogenetičar Wayne Maddison,

“čovjek je provociran da precizno razmisli što je filogenija. Možda je pogrešno gledati da se neka stabla gena slažu, a druga genska stabla kao da se ne slažu sa stablom vrsta, radije, sva stabla gena su dio stabla vrsta, što se može vizualizirati kao nejasna statistička distribucija, oblak povijesti gena" [41].

Umjesto da su stabla gena sadržana u stablima vrsta ili umjesto njih, novi koncepti TOL-a i evolucijske povijesti općenito počeli su se artikulirati sa sve većom dostupnošću usporednih genomskih podataka. Budući da se stablo vrsta općenito percipira kao pravi cilj filogenije (ili barem donedavno tako), osmišljene su nove tehnike modeliranja i razvijeni širi tretmani podataka kako bi se stablo vrsta predstavilo manje problematično. S obzirom na obilje molekularnih podataka, velika ulaganja su uložena u pokušaje rekonstrukcije stabala genoma. U tom procesu revidiran je sam koncept "drveta vrste" (a time i TOL).

2. Stabla genoma kao stabla stanica

Pod širokom zastavom filogenomike razvijeni su i zagovarani napori da se pomire nedosljedne podatke i razriješi redoslijed grananja svih životnih linija [44]. Filogenetičari su dužni vjerovati da se tragovi vertikalnog signala mogu otkriti među evolucijskom bukom (iako upravo te kategorije impliciraju određena očekivanja) te su tako razdvojeni između tumačenja takvog signala kao središnje istine evolucijske povijesti ili kao pokazatelja ograničene genetske srodnosti. to nije nužno središnje za naše razumijevanje evolucije. Glavni rezultat pokušaja razumijevanja odnosa između navodnog signala i šuma u genomskim podacima bilo je stvaranje novih koncepata TOL-a. Iako je uključeno nekoliko metodoloških putova [45, 46]), dva toka izgradnje stabla genoma ilustriraju ovu napetost zbog njihovog bitno različitih temeljnih načina razmišljanja o TOL-u.

Pristup jezgri genoma bavi se evolucijski stabilnom jezgrom gena za koje se može smatrati da predstavljaju lozu organizma, što se smatra procesom replikacije binarnog genoma i diobe stanice (čime se opravdava načelo bifurkacije). U skladu s prethodno spomenutim kriterijima za izbor referentnih gena, ovim pristupom nastoji se identificirati gene koji su široko zastupljeni u genomima, i što je najvažnije, koji proizvode kongruentne filogenetske signale (npr. [47–52]). Pod ovim konceptualnim okvirom postignut je određeni stupanj uspjeha (mogu se koristiti različite metode), uz identifikaciju univerzalnih gena za koje se čini da prate istu evolucijsku priču. Međutim, postoje mnoga pitanja o tome jesu li stabla generirana u tu svrhu, posebno povezana stabla sekvenci, metodološki artefakti [53] i govore li takve analize mnogo o TOL-u ili jednostavno proizvode djelomično iskrivljenu povijest nekoliko gena.

Možda je najveći problem s takvim pristupom koliko dobro identificirane jezgre predstavljaju evolucijsku povijest organizama i genoma koji ih sadrže. (gotovo) univerzalna genska jezgra staničnog života iznimno je mala i funkcionalno iskrivljena. Jedna pomno analizirana analiza jezgre ispitala je genome 191 vrste iz sve tri domene života, ali je uspjela identificirati samo 31 univerzalni gen, prvenstveno one za ribosomske proteine ​​[54]. Genomi prokariota obično sadrže između 1.000 i 4.000 gena, tako da je svako stablo izgrađeno na temelju 31 gena vrlo smanjena reprezentacija namjeravanog TOL-a - "stablo od 1%" u poznatoj oštroumnoj kritici [36]. Općenito govoreći, činjenica da su svi geni u genomima prokariota vjerojatno doživjeli barem jedan HGT događaj u 3,5 milijardi godina povijesti staničnog genoma znači da ne postoji iskonski neprenesena jezgra [55]. Temeljni pristup bi se stoga mogao bolje tumačiti kao da se bavi "najmanje prenesenim" podskupom gena. U tom slučaju, jezgra bi bila "nejasan" skup gena koji prikazuje određeni statistički trend, a ne točno definiran skup, a to je konceptualni prostor u kojem živi još jedna verzija TOL-a temeljenog na genomu.

Pristupi središnjem trendu izgrađeni su na kvantificiranju većeg i manjeg prijenosa. Oni kombiniraju pojedina stabla gena kako bi u prvi plan postavili vertikalne uzorke stabala na mnogo kompliciranijoj pozadini "šume" života [56–60]. Takve konceptualizacije utječu na prodornost HGT-a, ali traže indikativnu poruku okomitog spuštanja iz složenih podataka. Ovaj trend, koji se sastoji od najuniverzalnijeg signala, obično se može tek slabo uočiti na dubokim filogenetskim razinama, osim signala bifurkacije između arheja i bakterija [57]. Možda neće biti moguće, u konačnici, pronaći bilo koje druge detalje dubokog grananja, pa čak i vrhovi stabala mogu ostati u nedoumici za neke loze [55, 61, 62]. Ipak, za neke od ovih konstrukcija superstabla, čini se da se TOL "modalne informacije" pojavljuje dovoljno snažno da bude stablo "kičme" koje je samo obloženo finom "paučinom" HGT-a [60].

Iako niti jedna od ovih analiza ne vidi središnji trend kao većinski signal u šumi, prepoznaju ga kao iznimno važan. U jednom slučaju, kada se koristi posebno dizajnirana ocjena "tree-net trend", središnji trend poput stabla iznosi približno 40% ukupnih informacija o evoluciji prokariota [58]. No, je li takvo "statističko stablo" ono što se tradicionalno podrazumijeva pod TOL? To svakako nije bio način na koji je TOL zamišljen u prvoj eri molekularne filogenije prije priznanja da različiti geni mogu imati različite evolucijske povijesti. Statistički TOL pristup također uključuje priznanje da usrednjavanje signala iz različitih stabala gena može proizvesti artefaktička stabla dok prikriva relevantne aspekte evolucije [63]. Spremnost da se napravi ovaj prijelaz možda ima više veze s percipiranom epistemološkom funkcijom TOL-a (koju istražujemo u nastavku), nego s predanošću ontologiji stabla (npr. njegovoj "stvarnosti").


REZULTATI

Da bismo konstruirali bazu podataka AnnoTree, ponovno smo označili svih 28 941 prokariotskih genoma u GTDB (Izdanje 03-RS86) koristeći dosljedan cjevovod za napomene. Nakon predviđanja gena, dodijelili smo funkcionalne napomene [Pfam proteinske obitelji (10), TIGRFAM proteinske obitelji (18) i KEGG Orthology (KO) identifikatori (28)] proteinskim sekvencama koristeći standardne pragove pouzdanosti, što je rezultiralo 106 856 093 Pfam, 27 624 080 TIGRFAM i 67 878 ​​984 KEGG napomene. Sve taksonomske informacije, sekvence proteina i funkcionalne napomene pohranjene su u pozadinskoj MySQL bazi podataka radi brzog dohvaćanja prednjom aplikacijom AnnoTree (slika 1). Kako bi omogućio filogenetsku vizualizaciju svih 28 941 prokariotskih genoma, AnnoTree dijeli bakterijska i arhejska stabla života u različite poglede prema svakoj glavnoj taksonomskoj razini. Korisnik može istražiti filogenetsku distribuciju osobine bilo gdje od tipa do razine genoma u bilo kojoj taksonomskoj domeni. Dodatno, AnnoTree se može koristiti za istraživanje prilagođenih stabala i skupova podataka (pogledajte Dostupnost podataka).

Tijek podataka u aplikaciji AnnoTree. Sirove vrijednosti i izračunate značajke izvedene iz podataka dobivenih iz GTDB-a pohranjeni su u MySQL bazi podataka koja će se ažurirati kako bi odgovarala revizijama napravljenim u GTDB-u. Korisnici mogu pristupiti podacima relevantnim za njihove upite u obliku slika i tablica koje se prikazuju u njihovom pregledniku. Same brojke i podaci koji se koriste za njihovo generiranje mogu se preuzeti u različitim formatima datoteka sa sučelja AnnoTree.

Tijek podataka u aplikaciji AnnoTree. Sirove vrijednosti i izračunate značajke izvedene iz podataka dobivenih iz GTDB-a pohranjeni su u MySQL bazi podataka koja će se ažurirati kako bi odgovarala revizijama napravljenim u GTDB-u. Korisnici mogu pristupiti podacima relevantnim za njihove upite u obliku slika i tablica koje se prikazuju u njihovom pregledniku. Same brojke i podaci koji se koriste za njihovo generiranje mogu se preuzeti u različitim formatima datoteka sa sučelja AnnoTree.

AnnoTree se može upitati na nekoliko načina: prema obitelji proteina Pfam, obitelji proteina TIGRFAM, terminu KO ili taksonomskom nazivu/id. Upiti za napomene mogu se filtrirati prema njihovim odgovarajućim ocjenama pouzdanosti, kao što su E-vrijednost i postotak usklađenosti. Osim toga, vrste koje se pojavljuju u BLAST rezultatu mogu se vizualizirati izravnim učitavanjem BLAST XML2 izlazne datoteke. AnnoTree će zatim generirati 'oslikanu' filogeniju koristeći bojenje od korijena do vrha za sve loze koje sadrže podudaranja s upitom (slika 2). Vizualizacije su također popraćene osnovnim taksonomskim informacijama i zbirnim statistikama distribucije na temelju GTDB nomenklature (slika 2). Publication-quality SVG images, Newick formatted phylogenies for any selected subset of the tree, and taxonomic distribution tables of all queries can be downloaded for offline analysis or editing. Confidence scores (E-values) and options for downloading protein sequences for each annotation in a genome or lineage are displayed within a pop-up window when a colored node is selected on the tree.

AnnoTree interface overview. AnnoTree can be queried with any number of KO identifiers, Pfam families, Tigrfam families, or NCBI taxon identification numbers to display a mapping of those traits on the GTDB tree at any resolution. Lineages containing at least one genome with the query annotation(s) are highlighted in red. A circle chart displays a taxonomic summary of the genomes containing the flagellin gene (KO identifier: K02406) at a chosen taxonomic level. Smaller trees below show the interactive view when different taxonomic levels are selected by the user. When a highlighted node is clicked, a window appears (not shown in figure) displaying basic taxonomic information, zooming options, and annotation confidence scores.

AnnoTree interface overview. AnnoTree can be queried with any number of KO identifiers, Pfam families, Tigrfam families, or NCBI taxon identification numbers to display a mapping of those traits on the GTDB tree at any resolution. Lineages containing at least one genome with the query annotation(s) are highlighted in red. A circle chart displays a taxonomic summary of the genomes containing the flagellin gene (KO identifier: K02406) at a chosen taxonomic level. Smaller trees below show the interactive view when different taxonomic levels are selected by the user. When a highlighted node is clicked, a window appears (not shown in figure) displaying basic taxonomic information, zooming options, and annotation confidence scores.

Since all data is precomputed, users can explore the phylogenomic distribution of any combination of gene families within seconds. As an example, the recent metagenomics-driven discovery of commamox bacteria ( 29, 30) can be reproduced through a simple AnnoTree query by searching for genomes possessing all three key genes that act as a signature for commamox activity: KO terms K00371 (nxrB), K10944 (amoA) and K10535 (hao). Highlighted in the tree are the known commamox species (i.e. organisms within the genus Nitrospira), along with several additional taxa implicated as having potential commamox-like activity (e.g. Crenothrix) ( Supplementary Figure S1 ).

As a second example, the recent discoveries of homologs of important bacterial toxins outside of their respective bacterial lineages can be reproduced and visualized phylogenetically using simple AnnoTree queries. A query with Pfam PF01742 (botulinum neurotoxin protease) reveals a taxonomic distribution outside of Clostridium including the lineages Weissella i Chryseobacterium, consistent with earlier analyses ( 31, 32) ( Supplementary Figure S2 ). Similarly, a search with the diphtheria toxin domains (PF02763 or PF02764) reveals homologs in related genera Streptomyces i Austwickia, again reproducing recent analyses ( 33) almost instantaneously ( Supplementary Figure S3 ). These examples illustrate the use of AnnoTree as a hypothesis-generating tool by revealing distributions of gene families that may be new or unexpected to users.

Lineage-specific gene families

As an initial exploration of the data within AnnoTree, we examined the distributions of all 77 004 395 bacterial Pfam and KO annotations when mapped onto the bacterial GTDB tree of life (Release 02-RS83). Based on the phylogenetic conservation score (τD) ( 22), 68.1% of KO identifiers and 60.0% of Pfam protein families had significantly non-random phylogenomic distributions (P < 0.05), revealing a greater phylogenetic congruency for KO predictions than Pfam predictions. Next, we analyzed the distributions of Pfam and KO annotations, and used standard binary classification metrics to identify those with strong lineage-specificity (see Methods) ( Supplementary Data File S1 ). Extremely lineage-specific families were identified as those with both very high (≥95%) preciznost (percentage of genomes in the clade containing a trait) and very high (≥95%) osjetljivost (percentage of a trait-containing genomes occurring in the clade). Based on these criteria, we identified 358 (3.2%) Pfam protein families and 152 (0.9%) KO identifiers with lineage-specific distributions in Bacteria. We observed a trend in which lineage-specific KO identifiers and Pfam protein families increase in frequency from higher (e.g. phylum) to lower (e.g. species) taxonomic levels ( Supplementary Figure S4 ), consistent with the idea that gene family taxonomic distributions tend to diversify over time and that HGT impacts evolution over short evolutionary timescales ( 34). Although lineage-specific families are relatively rare at high taxonomic levels, these cases often represent ancient, clade-defining bacterial innovations. Examples include K18955 (WhiB family transcriptional regulator) in the Actinobacteria, PF07542 (ATP12 chaperone) in the Alphaproteobacteria, and numerous photosynthesis-related genes within the Cyanobacteria (class Oxyphotobacteria).

Lineage-specific gene families can provide insights into the unique biology of their respective organisms. For example, eight lineage-specific Pfam and KO annotations were detected within the Endozoicomonas subtree, a clade of endosymbiotic bacteria that inhabit numerous marine eukaryotic hosts ( 35). Consistent with possible utilization of host processes, the lineage-specific genes detected within this clade appear to be of eukaryotic origin and include genes involved in cytoskeletal organization (PF01302), eukaryotic cell–cell signaling (PF00812), apoptosis inhibition (K010343, K010344, K04725, PF07525) and eukaryotic proteolysis (K01378). Given the occurrence of numerous lineage-specific gene families in Endozoicomonas, we asked whether lineage-specific gene families may be overrepresented in certain taxa or branches of the bacterial tree. Indeed, lineage-specific genes were significantly enriched in specific taxonomic groups. Notable examples include 37 Pfam protein families within the Bacillus_A genus, and 19 Pfam protein families within the Actinobacteria that are largely composed of proteins of unknown function. We also observed an overrepresentation of lineage-specific gene families in numerous well-studied pathogens (e.g. Bordetella, Helicobacter, Legionella i Vibrio) ( Supplementary Figures S5–S7 Supplementary Data File S1 ). This is in part due to the presence of lineage-specific virulence factors and toxins, but is also likely influenced by annotation bias towards organisms of biomedical interest ( 36).

Gene families with patchy distributions

Although 60–68% of functional annotations show a significant phylogenetic signal when mapped onto the tree, more surprising are the remaining 30–40% that show more random phylogenetic distributions, potentially reflecting the widespread horizontal transfer and/or frequent gene gain/loss that is known to occur in bacterial genomes ( 37, 38). To investigate this further, we ranked all Pfam and KEGG annotations according to their phylogenetic patchiness, determined by homoplasy score (total number of gains and losses by parsimony) normalized by gene family size after filtering out traits with family size <50 ( Supplementary Data File S2 , see Materials and Methods ). Next, we grouped KO terms into their higher-level functional categories for visual comparison of broader trends (Figure 3, Supplementary Data File S3 ). Not surprisingly, ‘viral’ (bacteriophage) genes ranked the highest in homoplasy in both Pfam and KEGG annotations, and therefore are the single most phylogenetically scattered class of genes in bacteria. In contrast, gene functions with extremely low homoplasy include sporulation, photosynthesis, and core processes such as transcription, replication and protein synthesis (Figure 3). Highly scattered genes showed significant overrepresentation among specific taxonomic groups such as the genera Pseudomonas_E, Streptomyces, i Mycobacterium ( Supplementary Data Files S4 and S5 ), suggesting that these taxa may be taxonomic ‘hotspots’ of HGT.

Phylogenetic patchiness of annotations inferred using AnnoTree. Phylogenetic patchiness was computed for each KEGG KO identifier and Pfam protein family using the consistency index (CI), a common homoplasy metric representing the inverse of the minimum possible number of state changes (trait gain or loss) given the tree topology. The final phylogenetic patchiness score is equal to -log(CI)/log(family size) where family size is the total number of genomes containing the trait. (A) Density plot showing the distribution of phylogenetic patchiness scores of Pfam protein families and KO identifiers with different visual examples of varying patchiness (red = present gray = absent). The phylogenetic distribution plots are, from left to right: K10922 (transmembrane regulatory protein ToxS), K18955 (WhiB transcriptional regulator), PF01848 (ATP12 chaperone), PF01848 (Hok/Sok antitoxin system), and K07495 (putative transposase). (B) Mean-sorted box plots containing phylogenetic patchiness scores of KO identifiers in their respective KEGG pathways and KEGG BRITE categories. The mean patchiness score of a set of KO identifiers in a KEGG pathway or KEGG BRITE category is indicated by a black line.

Phylogenetic patchiness of annotations inferred using AnnoTree. Phylogenetic patchiness was computed for each KEGG KO identifier and Pfam protein family using the consistency index (CI), a common homoplasy metric representing the inverse of the minimum possible number of state changes (trait gain or loss) given the tree topology. The final phylogenetic patchiness score is equal to -log(CI)/log(family size) where family size is the total number of genomes containing the trait. (A) Density plot showing the distribution of phylogenetic patchiness scores of Pfam protein families and KO identifiers with different visual examples of varying patchiness (red = present gray = absent). The phylogenetic distribution plots are, from left to right: K10922 (transmembrane regulatory protein ToxS), K18955 (WhiB transcriptional regulator), PF01848 (ATP12 chaperone), PF01848 (Hok/Sok antitoxin system), and K07495 (putative transposase). (B) Mean-sorted box plots containing phylogenetic patchiness scores of KO identifiers in their respective KEGG pathways and KEGG BRITE categories. The mean patchiness score of a set of KO identifiers in a KEGG pathway or KEGG BRITE category is indicated by a black line.

We then examined in more detail the top 100 gene families that showed the most scattered distributions across the bacterial tree. Not surprisingly, this list of gene families is dominated by transposases, CRISPR- and bacteriophage-associated gene families ( Supplementary Data File S2 ). Numerous gene families of unknown function were included among the most patchy gene families, but further examination revealed that most of these genes are likely bacteriophage-derived. The extreme phylogenetic patchiness of bacteriophage and CRISPR genes is not only consistent with their known evolutionary dynamics but could also reflect the ongoing ‘arms race’ between these two opposing biological forces (phage infection versus phage defense). Other biologically relevant members of the 1% most highly scattered KO genes include: K19057-K19059 (merC, merD, i merR od mer operon) for mercury resistance K19155 and K19156, components of a toxin-antitoxin system characterized in E coli K15943, K15945, and K16411 for polyketide antibiotic biosynthesis and K19173-K19175 for DNA backbone S-modification (phosphorothioation) ( Supplementary Data File S2 ).

Reductive dehalogenases

As a case study for the hypothesis generation and data mining strengths of AnnoTree, we selected a gene family of significant biological interest that ranked among the top percentile of homoplasy scores: pcpC tetrachloro-p-hydroquinone reductive dehalogenase (K15241) Supplementary Data File S2 ). As key enzymes in bioremediation of chlorinated solvents, there has been extensive characterization of the diversity and phylogenomic distribution of reductive dehalogenases (Rdhs) and organohalide respiring organisms ( 39). Using AnnoTree, we compiled a dataset of Rdh genes and associated taxa using Pfam query PF13486. Our analysis produced a comprehensive dataset of 1,299 putative Rdh genes from 385 genera and 38 phyla ( Supplementary Table S1, Figures S8, S9 ), which not only recapitulates the known diversity of Rdh-associated phyla, but significantly expands it. In comparison, a manually-curated Rdh-specific database contains 264 Rdh genes from only 19 genera and 6 phyla ( 39), less than 15% of the total diversity identified by AnnoTree ( Supplementary Table S1 ). The AnnoTree-derived dataset includes several newly predicted rdh-encoding taxa discovered from metagenome-assembled genomes ( Supplementary Table S2 ), including the candidate phyla KSB1 (4 of 6 genomes, rdh copy number = 1) and UBP10 (7 of 14 genomes, rdh copy number = 1), as well as Rhodospirillales UBA2165 (rdh copy number = 13) and Acidobacterium UBA2161 (rdh copy number = 8) ( Supplementary Figure S9, Table S2 ). The novel organisms with high rdh copy numbers are potential obligate organohalide respirers and may be valuable for remediation efforts. By revealing both known and potentially novel groups of organohalide respiring bacteria, the Rdh case study highlights the ability of AnnoTree to capture a broad and complete taxonomic diversity of a gene family, with accompanying hypothesis generation around the evolution and ecology of a function of interest.


On Multiple Trees

TreeJuxtaposer: Scalable Tree Comparison using Focus+Context with Guaranteed Visibility. Tamara Munzner, François Guimbretière, Serdar Tasiran, Li Zhang, Yunhong Zhou. "MunzerComparingTrees.pdf"

The Challenge of Visualising Multiple Overlapping Classification Hierarchies. Martin Graham, Jessie B Kennedy and Chris Hand. "UIDIS"

A Comparison of Set-Based and Graph-Based Visualisations of Overlapping Classification Hierarchies. Martin Graham, Jessie B Kennedy, Chris Hand. ACM 2000. "p41-graham.pdf"

Pullan, M.R., Watson, M.F., Kennedy, J.B., Raguenaud, C. & Hyam, R.: The Prometheus Taxonomic Model: a practical approach to representing multiple classifications. - Taxon 49: 55-75. 2000. "Pullan00Taxon.pdf"

Visualising Multiple Overlapping Classification Hierarchies. PhD. Thesis. Martin James Graham. Napier University, December 2001 "GrahamThesisFinal.pdf"


Conclusion: synthesizing tree-reading frameworks

Our review shows that there are some well-elaborated works on tree-reading skills that thus far have not explicitly referred to each other. The two major systems show different approaches: Halverson and Friedrichsen (2013) consider the total spectrum of learners’ progress in handling evolutionary trees, from absolute novices to longtime experts, in a hierarchical structure. Novick and Catley (2016) use a smaller-scale approach, describing task-oriented skills needed for fully understanding tree-reading. Novick and Catley’s task-oriented system seems suitable for easily generating learning assignments, while Halverson and Friedrichsen’s system seems to constitute a good basis for structuring a complete process of learning by starting to dismantle common misconceptions and then improving skills with increasing difficulty in ordered sequence. The skills proposed by other authors substantiate several skills or skill levels in the skill systems.

In general, our literature overview shows that multiple groups have worked on modeling tree-reading skills, and some major advancements have been made. At the same time, however, it has become clear that there has been no attempt to unify and combine the insights already gained. Publications show only few cross-references to works on tree-reading skills by other authors, leading to mainly singular, not explicitly interlinked approaches. Furthermore, research on tree-thinking skills so far has focused on deducing skills or systems from theory, observation, or experience, and there has been no major attempt to empirically verify the proclaimed models.

Based on the works published on tree-thinking skills (Halverson and Friedrichsen 2013 Novick and Catley 2016) and on skills published by other authors (Blacquiere and Hoese 2016 Meir et al. 2007), we wish to present a proposal for a synthetic hierarchical system of tree-reading skills consisting of six skill levels. This system could at this point be seen as an example of how such a synthesis might look, as it is the result of a theoretical approach drawing together the previous works of different authors.

The hierarchical nature of this system largely follows the hierarchy of Halverson and Friedrichsen’s system (2013), although one minor adjustment of the order has been made, as explained below. The structure of the proposed system, along with the allocation of the proposed skill levels to published skill systems, is also explained below, as well as presented in Table 1 in the form of major ideas.

The hierarchy starts at skill level zero (“naïve handling”). Students at this level are not able to analyze a tree correctly, nor do they know the symbolic meaning of the different components of the tree. Interpretations of a given tree are largely based on one or more learners’ misconceptions and tend to over-interpret uninformative facets of a tree diagram over others. This level corresponds to the first three skills of Halverson and Friedrichsen, which are all characterized by fragmented knowledge of evolutionary trees (Halverson and Friedrichsen 2013).

Skill level one (“identifying structures”) represents the ability to identify and interpret the meaning of diagrammatic elements of the representation. This includes knowledge of the meaning of nodes, branches, labels, and the direction of time, but also slightly more elaborate knowledge, like the positions of MRCAs in the tree. This level corresponds with Halverson and Friedrichsen’s level four (“symbolic use of the representation”), where the students have knowledge of the meaning and importance of diagrammatic features but cannot interpret the diagram any further (Halverson and Friedrichsen 2013).

The second skill level (“handling apomorphies”) encompasses the ability to interpret traits labeled in a tree. This includes tasks in both directions, naming all traits that a taxon shows and listing all taxa that show certain traits. This skill can only be utilized if the given tree shows traits or apomorphies by any representational means (e.g., pictorial or textual, along the lines, with reference markings, etc.). The basis for this skill level is the combination of several skills proposed by Novick and Catley (2016), all of which focus on identifying and interpreting labelled apomorphies [(A) “identify characters,” (B) “identify taxa,” (H) “evolutionary sequence,” and (I) “convergent evolution”]. In Halverson and Friedrichsen’s model, handling apomorphies is part of the extensive skill level (6). It was separated into a distinct skill level, as many evolutionary trees do not show apomorphies, so handling apomorphies is not a skill generally needed to understand every tree, but it can greatly improve the handling of a tree if apomorphies are present (Catley et al. 2010 Novick et al. 2010).

The third skill level (“identifying relationships”) describes the core tasks of tree-reading. This skill covers all tasks that answer questions about the relative relationships of different species and the formation of clades in a given tree. Typical questions at this level are “Which group is the closest relative to group X?”, “Is group X more closely related to group Y than to group Z?”, and “Which groups form a clade with groups X, Y, and Z?” This level corresponds to four of the skills of Novick and Catley (2016) [(C) “identify/evaluate clades,” (D) “identify nested clades,” (E) “evolutionary relationship: resolved structure,” and (F) “evolutionary relationship: polytomy”] and to skill level six of Halverson and Friedrichsen. It consists of a set of skills pertaining to evaluating monophyletic groups and relative evolutionary relationships.

The fourth skill level (“comparing trees”) incorporates the ability to mentally rotate branches in a tree, to analyze subtrees, and to decide whether given trees show the same or different relationships. The same applies to comparing different representational styles (e.g., rectangular, circular, and diagonal trees). This level corresponds to two skills identified by Novick and Catley [(K) “rotation” and (J) “subset of the ToL”] and to Halverson and Friedrichsen’s skill level five (“conceptual use of representation”). At this point, we diverged from Halverson and Friedrichsen’s skill hierarchy, as this skill does not refer merely to the knowledge that trees can be rotated around nodes, but to the more complex task of reasoning about relationships with different subsets and the appearance of a tree. Furthermore, analyzing and comparing multiple evolutionary trees requires the formation of multiple complex mental models (Hochpöchler et al. 2013). Comparing two trees requires the learner to process many more graphical elements at the same time than when evaluating the relative relationships of a number of species (Kim et al. 2000). Thus, this skill necessitates the ability to evaluate evolutionary relationships in a very complex and demanding way and has to follow skill level four. The understanding that trees can come in different formats but are informationally equivalent can be found in skill level six of Halverson and Friedrichsen’s system. This is also an aspect of our fourth skill level. Therefore, we deviated from the hierarchy of Halverson and Friedrichsen in this respect.

The fifth and final level (“arguing and inferring”) aims at going beyond the given information in the representation. It covers the ability to form conclusions and predictions based on the phylogeny, which may extend to taxa or traits not presented. It is based on Halverson and Friedrichsen’s level seven (“expert use of representation”) and represents the ability to interpret evolutionary trees in a deeper way than students are normally able to. Depicted information is used to form inferences and arguments that go beyond the presented information. This includes forming new mental models of composite trees, solving complex phylogenetic problems, and deciding which tree formats are best suited to different means of representation. The resulting skill levels, together with an explanation of the levels and the corresponding skills by other authors, can be seen in Table 3.


Tree Thinking

Abstract diagrams are critically important in most, if not all, science disciplines (Novick, 2006). In biology, hierarchical diagrams are especially common. Since 2004, I have been investigating college and high school students’ understanding of cladograms, the most important tool that contemporary scientists use to reason about evolutionary relationships. Most of this research has been conducted in collaboration with Kefyn Catley, an evolutionary biologist and science educator at Western Carolina University.

A cladogram is a type of hierarchical diagram that depicts hypotheses about nested sets of taxa that are supported by shared, evolutionarily novel characters called synapomorphies. For example, the cladogram shown at the top of the page indicates that one synapomorphy for birds and alligators is that they both possess a gizzard. That is, birds and alligators share a most recent common ancestor (MRCA) that evolved the novel character of possessing a gizzard. A group of taxa consisting of the MRCA and all descendants of that ancestor is called a klada ili monophyletic group. Thus, birds and alligators comprise a clade (in the cladogram shown above). Because of the nesting inherent in hierarchical diagrams, birds, alligators, and lizards also comprise a clade. And those three taxa plus mammals (represented by manatees and elephants in the cladogram above) constitute another clade, etc. The synapomorphy supporting the bird/alligator clade distinguishes the MRCA of birds and alligators from the earlier ancestor common to birds, alligators, and lizards. And the synapomorphy supporting the bird/alligator/lizard clade (see UV light) distinguishes the MRCA of those three taxa from the earlier ancestor common to birds, alligators, lizards, and mammals. The latter ancestor evolved the novel character of having an amniotic egg, a critical development in the history of life on Earth that enabled vertebrates possessing this character to complete their life cycles on land.

Biologists use the tool of phylogenetics along with its product, the cladogram, to study macroevolution, the subdiscipline of biology that synthesizes events of Earth history and deep time (the well-established theory that Earth is billions of years old) with mechanisms that generate and maintain the biodiversity of our planet. Macroevolutionary processes operate at the level of species and above, resulting in the formation, radiation, and extinction of higher groups of taxa. Macroevolution explains, for example, both the origin and radiation of mammalian taxa. In contrast, microevolution concerns processes that occur at the level of the organism (i.e., genome, individual, and population). Microevolution explains, for example, the appearance of antibiotic-resistant strains of bacteria.

Cladograms are the most important tool used by evolutionary biologists because they document and organize existing knowledge about the properties of species and higher-order taxa. Tree thinking is the ability to understand and reason with evolutionary relationships depicted in cladograms (phylogenetic trees). The power of tree thinking is that the resulting classification scheme­—for example that alligators are more closely related to birds than to lizards because of their shared MRCA—reflects current understanding of the history of life on Earth (i.e., the evolutionary relationships among taxa). Thus, inferences based on this classification scheme are likely to be more informative and to have greater practical value than inferences based on other criteria. For example, inferring which antivenin to use to counteract the bite of a venomous king brown snake based on its close evolutionary relationship to the red-bellied black snake is more likely to lead to a successful outcome (namely, survival!) than is basing the choice of antivenin on the king brown snake’s similar appearance to the western brown snake.

Summary of My Research

Pregled.My research on tree thinking falls into three broad categories: (a) Influences of diagram design on interpretations of evolutionary relationships, (b) assessing and improving students’ tree-thinking skills, and (c) effects of prior knowledge about taxonomic relationships on tree thinking. The studies of diagram design are based primarily in cognitive and perceptual psychology, with strong implications for education. The instructional studies are rooted in science education while being informed by cognitive psychology. The studies of prior knowledge reflect a more even mix of psychological and educational foundations. All studies are informed by expert knowledge of evolutionary biology. This research has used a variety of different kinds of tasks, including those that require diagram comprehension, translation from one diagram format to another, and inference. Measures of performance include accuracy, types of errors made, written explanations (evidence cited) in support of one’s responses, and patterns of eye movements.

Influences of diagram design on interpretations of evolutionary relationships. Consistent with a large cognitive psychological literature on diagram comprehension, we would expect students’ interpretations of Tree-of-Life diagrams to be influenced by how those diagrams are designed. Thus, one major focus of my research program has been to discover how diagram design affects students’ interpretations of a variety of different types of Tree-of-Life representations.

One exciting project compared students’ ability to extract the hierarchical structure from cladograms depicted in different ways. Cladograms are typically drawn in one of two formats: rectangular trees (left diagram in the figure below) and diagonal ladders (right diagram in the figure below). In an analysis of the cladograms printed in a professional journal, Novick and Catley (2007) found that rectangular trees are by far the preferred format among evolutionary biologists: 83% vs. 17%. In high school and biology textbooks, however, the diagonal format was found to occur slightly more often than the rectangular format: 59% vs. 41% for high school biology texts and 54% vs. 46% for college texts (Catley & Novick, 2008).

Rectangular tree (left) and diagonal ladder (right) cladogram formats.

In several studies (Novick & Catley, 2007, 2013), we found that students had difficulty understanding and reasoning from the diagonal cladogram format and that this difficulty stems from the Gestalt principle of good continuation, which works to conceal the critical information about hierarchical levels in this format. One implication of these results is that if some method can be found to break good continuation at the appropriate points along the continuous lines, students’ ability to correctly extract the hierarchical structure of diagonal cladograms should improve. Consistent with this prediction, we found that adding a synapomorphy to mark each branching point in diagonal cladograms greatly improved students’ ability to translate those cladograms to the rectangular format (Novick, Catley, & Funk, 2010). In a final study in this line of research, we found that biology students preferentially scan diagonal cladograms from left to right, following their highly practiced directional pattern for reading written text, and that they prefer to scan along the main diagonal line at the base of the cladogram (Novick, Stull, & Catley, 2012). This impairs their ability to uncover the correct pattern of nesting in diagonal cladograms as those cladograms are typically drawn in textbooks and the biology literature (see above figure).

I am excited to report that based on our research, many textbooks for introductory biology, evolution, and zoology classes have changed from depicting cladograms in the diagonal to the rectangular format to improve student comprehension and learning. Introductory biology textbooks alone reach approximately 800,000 students every year.

My current research is examining the importance of another Gestalt grouping principle in influencing students’ interpretations of the evolutionary relationships depicted in cladograms. I have recently come to believe that the fundamental difficulty students need to overcome to acquire expertise in tree thinking is to understand that any specific evolutionary tree is a subset of the complete, unimaginably large Tree of Life. My prior research with Kefyn Catley suggests that students instead reify the particular groupings they see and fail to appreciate that these groupings are largely an artifact of the specific taxa that happen to be included in the particular tree under consideration. This reification of particular groupings occurs, I believe, because of the Gestalt principles of grouping, which are part of the foundation of human perception. I am pursuing this new line of research in collaboration with Linda Fuselier, an evolutionary biologist at the University of Louisville. We are examining the role of the Gestalt principle of connectedness in determining students’ interpretations of the relationships depicted in rectangular format cladograms. By testing students enrolled in biology classes at different levels (e.g., introductory biology for majors and nonmajors vs. more advanced classes), we will be able to discern the extent to which reliance on Gestalt grouping versus most recent common ancestry changes as a function of biological expertise.

Assessing and improving students’ tree-thinking skills. As documented in three recent publications (Novick & Catley, 2016, 2017 Novick, Catley, & Schreiber, 2014), using the knowledge we gained from our extensive research on tree thinking, Kefyn Catley and I set out to create, implement, and test a research-based tree-thinking curriculum and assessment instrument. Our efforts were very successful with students from a wide variety of biology backgrounds, ranging from little or no biology coursework in college to extensive biology coursework consistent with being a senior biology major. Over three connected and iterative studies, we were able to show that direct instruction produced skills that transferred to regular classroom practices and lab settings and appeared to enhance student understanding of macroevolutionary patterns and processes. Some of the instructional materials we developed are available for download here and from the lessons and resources for teachers section of the Understanding Evolution web site maintained by the University of California Museum of Paleontology.

Effects of prior knowledge about taxonomic relationships on tree thinking. A third focus of my research program concerns students’ folkbiological knowledge about taxonomic relationships among living things and the impact of such knowledge on their ability to engage in tree thinking. Students’ folkbiological knowledge often conflicts with well-established scientific taxonomy. For example, although students (even after an introductory biology course for majors) group lizards together with frogs in the folkbiological category of reptiles and amphibians, lizards are in fact more closely related to mammals because those taxa share a MRCA that evolved the novel character of possessing an amniotic egg (see the cladogram at the top of this page).

In one project (Novick & Catley, 2014), I examined how college and high school students responded when their prior knowledge conflicted with the evolutionary information provided in rectangular format cladograms. In two studies, college and high school students received matched pairs of cladograms that depicted an identical pattern of relationships among either familiar or unfamiliar taxa. When the taxa were familiar, the cladograms showed (correct) relationships that conflicted with students’ prior knowledge. For example, one such cladogram showed that mushrooms are more closely related to animals than to plants, contradicting folkbiological taxonomy that mushrooms are plants. Students answered evolutionary relationship questions about both cladograms in each matched pair. For both student groups, accuracy was higher when the cladograms depicted relationships among unfamiliar rather than familiar taxa (i.e., when folkbiological knowledge was not available to contradict the scientific information presented).

An additional study reported in Novick and Catley (2014) examined college students’ willingness to include birds in the reptile category, where they belong, as a function of the strength of the supporting evidence. Even with salient visual evidence in the cladogram supporting this grouping, approximately half the students resisted this classification. On the positive side, students did at least choose a coherent definition of reptiles. For example, when they excluded birds from the category, they also excluded crocodiles, to which birds are most closely related. Evidently, the strength of many students’ prior belief that birds are not reptiles is greater than their prior belief that crocodiles are reptiles.

The difficulty of persuading students of the inaccuracy of their prior knowledge may relate in part to the length of time over which their misconceptions have been reinforced. Brenda Phillips, a former postdoctoral fellow in my laboratory, collected some preliminary data on pre-K through 6th grade children’s and college students’ knowledge about the relationships among sets of three familiar taxa (e.g., camels, elephants, and zebras beavers, snakes, and frogs). In several respects, the responses of K-1st grade, 4th-6th grade, and college students were remarkably similar. For example, given the set of beavers, snakes, and frogs, most students in all age groups responded, incorrectly, that snakes and frogs are most closely related. See if you can figure out the age group of the student providing each of the following three explanations for this response: (a) “Both live near/in water and are reptile family members” (b) “They are both not mammals” (c) “They’re both amphibians and can go underwater and stay underwater, and can both go on land. They both like bugs.” [**Answers are at the bottom of this page.]

Research Support

Much of the research described here was supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305A080621 to Vanderbilt University (Laura R. Novick, PI Kefyn M. Catley, Co-I). The opinions expressed are those of the authors and do not represent views of the Institute or the U.S. Department of Education. My current research is being supported by a small grant from Peabody College of Vanderbilt University.

Instructional Materials Available for Download

As part of the above-mentioned IES grant, Kefyn Catley and I developed a variety of instructional materials for teaching tree thinking to undergraduates. Some of these materials are available for download here, as well as from the lessons and resources for teachers section of the Understanding Evolution web site maintained by the University of California Museum of Paleontology.

** (a) Vanderbilt student, (b) kindergarten or first grade student, (c) 4th-6th grade student.


Zaključci

Munzner and colleagues have demonstrated the advantage of using hierarchical data viewers enhanced with a 3D hyperbolic view over conventional 2D based viewers for efficiency of deciphering tree-based information [18]. While the 3D hyperbolic visualization of phylogenetic trees will not fully supplant 2D viewers, it can serve as an additional module to augment other visualization components. In the future, a phylogenetic tree visualization tool that integrates several visualization components in a similar way to the XML3D tool used by Risden et al. [18] would be desirable. The Walrus viewer and the conversion tool are a step towards this goal.