ÿØÿàJFIFÿþ ÿÛC       ÿÛC ÿÀÿÄÿÄ"#QrÿÄÿÄ&1!A"2qQaáÿÚ ?Øy,æ/3JæÝ¹È߲؋5êXw²±ÉyˆR”¾I0ó2—PI¾IÌÚiMö¯–þrìN&"KgX:Šíµ•nTJnLK„…@!‰-ý ùúmë;ºgµŒ&ó±hw’¯Õ@”Ü— 9ñ-ë.²1<yà‚¹ïQÐU„ہ?.’¦èûbß±©Ö«Âw*VŒ) `$‰bØÔŸ’ëXÖ-ËTÜíGÚ3ð«g Ÿ§¯—Jx„–’U/ÂÅv_s(Hÿ@TñJÑãõçn­‚!ÈgfbÓc­:él[ðQe 9ÀPLbÃãCµm[5¿ç'ªjglå‡Ûí_§Úõl-;"PkÞÞÁQâ¼_Ñ^¢SŸx?"¸¦ùY騐ÒOÈ q’`~~ÚtËU¹CڒêV  I1Áß_ÿÙ o\[c@sOdZdZddlZddlmZddlZddlZddlZdZ yddl Z dZ WnGe k ryddl Z dZ Wqe k rdZ qXnXyddlZWne k rnXejdjejZejd jejZd efd YZd dd YZdddYZdS(sBBeautiful Soup bonus library: Unicode, Dammit This library converts a bytestream to Unicode through any means necessary. It is heavily based on code from Mark Pilgrim's Universal Feed Parser. It works best on XML and HTML, but it does not rewrite the XML or HTML to reflect a new encoding; that's the tree builder's job. tMITiN(tcodepoint2namecCstj|dS(Ntencoding(tcchardettdetect(ts((s./usr/lib/python2.7/site-packages/bs4/dammit.pytchardet_dammitscCstj|dS(NR(tchardetR(R((s./usr/lib/python2.7/site-packages/bs4/dammit.pyR!scCsdS(N(tNone(R((s./usr/lib/python2.7/site-packages/bs4/dammit.pyR'ss!^<\?.*encoding=['"](.*?)['"].*\?>s0<\s*meta[^>]+charset\s*=\s*["']?([^>]*?)[ /;'">]tEntitySubstitutioncBseZdZdZe\ZZZidd6dd6dd6dd 6d d 6Zej d Z ej d Z e dZ e dZe dZe edZe edZe dZRS(sASubstitute XML or HTML entities for the corresponding characters.cCsi}i}g}x\ttjD]H\}}t|}|dkrc|j||||s&([<>]|&(?!#\d+;|#x[0-9a-fA-F]+;|\w+;))s([<>&])cCs#|jj|jd}d|S(Nis&%s;(tCHARACTER_TO_HTML_ENTITYtgettgroup(tclstmatchobjtentity((s./usr/lib/python2.7/site-packages/bs4/dammit.pyt_substitute_html_entityZscCs|j|jd}d|S(smUsed with a regular expression to substitute the appropriate XML entity for an XML special character.is&%s;(tCHARACTER_TO_XML_ENTITYR&(R'R(R)((s./usr/lib/python2.7/site-packages/bs4/dammit.pyt_substitute_xml_entity_scCsNd}d|krBd|kr9d}|jd|}qBd}n|||S(s*Make a value into a quoted XML attribute, possibly escaping it. Most strings will be quoted using double quotes. Bob's Bar -> "Bob's Bar" If a string contains double quotes, it will be quoted using single quotes. Welcome to "my bar" -> 'Welcome to "my bar"' If a string contains both single and double quotes, the double quotes will be escaped, and the string will be quoted using double quotes. Welcome to "Bob's Bar" -> "Welcome to "Bob's bar" RRs"(treplace(tselftvaluet quote_witht replace_with((s./usr/lib/python2.7/site-packages/bs4/dammit.pytquoted_attribute_valuefs   cCs4|jj|j|}|r0|j|}n|S(s Substitute XML entities for special XML characters. :param value: A string to be substituted. The less-than sign will become <, the greater-than sign will become >, and any ampersands will become &. If you want ampersands that appear to be part of an entity definition to be left alone, use substitute_xml_containing_entities() instead. :param make_quoted_attribute: If True, then the string will be quoted, as befits an attribute value. (tAMPERSAND_OR_BRACKETtsubR,R2(R'R/tmake_quoted_attribute((s./usr/lib/python2.7/site-packages/bs4/dammit.pytsubstitute_xmls  cCs4|jj|j|}|r0|j|}n|S(sSubstitute XML entities for special XML characters. :param value: A string to be substituted. The less-than sign will become <, the greater-than sign will become >, and any ampersands that are not part of an entity defition will become &. :param make_quoted_attribute: If True, then the string will be quoted, as befits an attribute value. (tBARE_AMPERSAND_OR_BRACKETR4R,R2(R'R/R5((s./usr/lib/python2.7/site-packages/bs4/dammit.pyt"substitute_xml_containing_entitiess  cCs|jj|j|S(sReplace certain Unicode characters with named HTML entities. This differs from data.encode(encoding, 'xmlcharrefreplace') in that the goal is to make the result more readable (to those with ASCII displays) rather than to recover from errors. There's absolutely nothing wrong with a UTF-8 string containg a LATIN SMALL LETTER E WITH ACUTE, but replacing that character with "é" will make it more readable to some people. (tCHARACTER_TO_HTML_ENTITY_RER4R*(R'R((s./usr/lib/python2.7/site-packages/bs4/dammit.pytsubstitute_htmls (t__name__t __module__t__doc__RR$tHTML_ENTITY_TO_CHARACTERR9R+RRR7R3t classmethodR*R,R2tFalseR6R8R:(((s./usr/lib/python2.7/site-packages/bs4/dammit.pyR 5s&  %tEncodingDetectorcBs\eZdZdeddZdZedZe dZ e eedZ RS(s^Suggests a number of possible encodings for a bytestring. Order of precedence: 1. Encodings you specifically tell EncodingDetector to try first (the override_encodings argument to the constructor). 2. An encoding declared within the bytestring itself, either in an XML declaration (if the bytestring is to be interpreted as an XML document), or in a tag (if the bytestring is to be interpreted as an HTML document.) 3. An encoding detected through textual analysis by chardet, cchardet, or a similar external library. 4. UTF-8. 5. Windows-1252. cCs}|p g|_|pg}tg|D]}|j^q%|_d|_||_d|_|j|\|_ |_ dS(N( toverride_encodingstsettlowertexclude_encodingsRtchardet_encodingtis_htmltdeclared_encodingtstrip_byte_order_marktmarkuptsniffed_encoding(R.RJRBRGREtx((s./usr/lib/python2.7/site-packages/bs4/dammit.pyt__init__s (   cCsO|dk rK|j}||jkr+tS||krK|j|tSntS(N(RRDRER@taddtTrue(R.Rttried((s./usr/lib/python2.7/site-packages/bs4/dammit.pyt_usables    ccst}x+|jD] }|j||r|VqqW|j|j|rW|jVn|jdkr|j|j|j|_n|j|j|r|jVn|j dkrt |j|_ n|j|j |r|j Vnx(dD] }|j||r|VqqWdS(s<Yield a number of encodings that might work for this markup.sutf-8s windows-1252N(sutf-8s windows-1252( RCRBRQRKRHRtfind_declared_encodingRJRGRFR(R.RPte((s./usr/lib/python2.7/site-packages/bs4/dammit.pyt encodingss$      cCs"d}t|tr||fSt|dkrg|d dkrg|dd!dkrgd}|d}nt|dkr|d dkr|dd!dkrd}|d}ni|d d krd }|d}nF|d d krd }|d}n#|d d krd}|d}n||fS(sMIf a byte-order mark is present, strip it and return the encoding it implies.iisssutf-16bessutf-16leissutf-8ssutf-32bessutf-32leN(Rt isinstancetunicodetlen(R'tdataR((s./usr/lib/python2.7/site-packages/bs4/dammit.pyRIs* " "    cCs|rt|}}n%d}tdtt|d}d}tj|d|}| r~|r~tj|d|}n|dk r|jdjdd}n|r|j SdS( sGiven a document, tries to find its declared encoding. An XML encoding is declared at the beginning of the document. An HTML encoding is declared in a tag, hopefully near the beginning of the document. iig?tendpositasciiR-N( RWtmaxtintRtxml_encoding_retsearcht html_meta_retgroupstdecodeRD(R'RJRGtsearch_entire_documentt xml_endpost html_endposRHtdeclared_encoding_match((s./usr/lib/python2.7/site-packages/bs4/dammit.pyRR+s    N( R;R<R=RR@RMRQtpropertyRTR?RIRR(((s./usr/lib/python2.7/site-packages/bs4/dammit.pyRAs !t UnicodeDammitcBseZdZidd6dd6ZdddgZgdegdZd Zd d Z d d Z e d Z dZ dZi dd6dd6dd6dd6dd6d d 6d d#6d d&6d d)6d d,6dd/6dd26dd56d6d76dd:6d6d;6d6d<6dd?6ddB6ddE6ddH6ddK6ddN6ddQ6ddT6ddW6ddZ6dd]6dd`6d6da6ddd6ddg6Zidhd6dd6did6djd6dkd6dld 6dmd#6dnd&6dod)6dpd,6dqd/6drd26dsd56d6d76dtd:6d6d;6d6d<6dud?6dudB6dvdE6dvdH6dwdK6dxdN6dydQ6dzdT6d{dW6d|dZ6d}d]6d~d`6d6da6ddd6ddg6dd6dd6dd6dd6dd6dd6dd6dqd6dd6dfd6dd6dd6dd6dd6dd6dxd6dd6dd6dd6dd6d d6dd6dd6dwd6did6dd6dd6dd6dd6dd6dd6d6d6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dwd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6Zizd d 6d d 6d d6dd6dd6dd6dd6dd6dd6dd6dd6dd 6d!d"6d#d$6d%d&6d'd(6d)d*6d+d,6d-d.6d/d06d1d26d3d46d5d66d7d86d9d:6d;d<6d=d>6d?d@6dAdB6dCdD6dEdF6dGdH6dIdJ6dKdL6dMdN6dOdP6dQdR6dSdT6dUdV6dWdX6dYdZ6d[d\6d]d^6d_d`6dadb6dcdd6dedf6dgdh6didj6dkdl6dmdn6dodp6dqdr6dsdt6dudv6dwdx6dydz6d{d|6d}d~6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6dd6Zd!d"d#gZeddZeddZedddZRS($sA class for detecting the encoding of a *ML document and converting it to a Unicode string. If the source encoding is windows-1252, can replace MS smart quotes with their HTML or XML equivalents.s mac-romant macintoshs shift-jissx-sjiss windows-1252s iso-8859-1s iso-8859-2cCsn||_g|_t|_||_tjt|_t |||||_ t |t si|dkr||_ t ||_d|_dS|j j |_ d}x?|j jD]1}|j j }|j|}|dk rPqqW|sOxa|j jD]P}|dkr|j|d}n|dk r|jjdt|_PqqWn||_|sjd|_ndS(NR RZR-sSSome characters could not be decoded, and were replaced with REPLACEMENT CHARACTER.(tsmart_quotes_tottried_encodingsR@tcontains_replacement_charactersRGtloggingt getLoggerR;tlogRAtdetectorRURVRJtunicode_markupRtoriginal_encodingRTt _convert_fromtwarningRO(R.RJRBRiRGREtuR((s./usr/lib/python2.7/site-packages/bs4/dammit.pyRMXs>              cCs|jd}|jdkr9|jj|j}n|jj|}t|tkr|jdkrdj|djdj}qdj|djdj}n |j}|S(s[Changes a MS smart quote character to an XML or HTML entity, or an ASCII character.iRZtxmls&#xt;Ri(R&RitMS_CHARS_TO_ASCIIR%tencodetMS_CHARSttypettuple(R.tmatchtorigR4((s./usr/lib/python2.7/site-packages/bs4/dammit.pyt _sub_ms_chars'' tstrictcCs|j|}| s+||f|jkr/dS|jj||f|j}|jdk r||jkrd}tj|}|j |j |}ny+|j |||}||_||_ Wnt k r}dSX|jS(Ns([-])(t find_codecRjRRRJRitENCODINGS_WITH_SMART_QUOTESRRR4R~t _to_unicodeRqt Exception(R.tproposedterrorsRJtsmart_quotes_retsmart_quotes_compiledRtRS((s./usr/lib/python2.7/site-packages/bs4/dammit.pyRrs"   cCst|||S(sGiven a string and its encoding, decodes the string into Unicode. %encoding is a string recognized by encodings.aliases(RV(R.RXRR((s./usr/lib/python2.7/site-packages/bs4/dammit.pyRscCs|js dS|jjS(N(RGRRoRH(R.((s./usr/lib/python2.7/site-packages/bs4/dammit.pytdeclared_html_encodings cCs|j|jj||pu|r?|j|jddpu|r`|j|jddpu|rr|jpu|}|r|jSdS(Nt-R t_(t_codectCHARSET_ALIASESR%R-RDR(R.tcharsetR/((s./usr/lib/python2.7/site-packages/bs4/dammit.pyRs!! cCsE|s |Sd}ytj||}Wnttfk r@nX|S(N(RtcodecsRt LookupErrort ValueError(R.Rtcodec((s./usr/lib/python2.7/site-packages/bs4/dammit.pyRs  teurot20ACst stsbquot201Astfnoft192stbdquot201Esthellipt2026stdaggert2020stDaggert2021stcirct2C6stpermilt2030stScaront160stlsaquot2039stOEligt152st?ss#x17Dt17Dssstlsquot2018strsquot2019stldquot201Cstrdquot201Dstbullt2022stndasht2013stmdasht2014sttildet2DCsttradet2122stscaront161strsaquot203Astoeligt153sss#x17Et17EstYumlR stEURt,tfs,,s...t+s++t^t%tSR!tOEtZRRt*Rs--t~s(TM)RR#toetztYst!stcstGBPst$stYENst|sss..sss(th)ss<>ss1/4ss1/2ss3/4sstAsssssstAEstCstEsssstIsssstDstNstOssssssstUssssstbstBstasssssstaessRSsssstissssstnsssssst/sssssstyssss€is‚isƒis„is…is†is‡isˆis‰isŠis‹isŒisŽis‘is’is“is”is•is–is—is˜is™isšis›isœisžisŸis is¡is¢is£is¤is¥is¦is§is¨is©isªis«is¬is­is®is¯is°is±is²is³is´isµis¶is·is¸is¹isºis»is¼is½is¾is¿isÀisÁisÂisÃisÄisÅisÆisÇisÈisÉisÊisËisÌisÍisÎisÏisÐisÑisÒisÓisÔisÕisÖis×isØisÙisÚisÛisÜisÝisÞisßisàiisâisãisäisåisæisçisèiséisêisëisìisíisîisïisðisñisòisóisôisõisöis÷isøisùisúisûisüisýisþiiiiiiitutf8c Cs|jddjd kr-tdn|jdkrNtdng}d }d }x|t|krd||}t|tst|}n||jkr||jkrx|j D]5\}} } ||kr|| kr|| 7}PqqWqc|d krW||j krW|j |||!|j |j ||d 7}|}qc|d 7}qcW|d kru|S|j ||d j |S(sFix characters from one encoding embedded in some other encoding. Currently the only situation supported is Windows-1252 (or its subset ISO-8859-1), embedded in UTF-8. The input must be a bytestring. If you've already converted the document to Unicode, you're too late. The output is a bytestring in which `embedded_encoding` characters have been converted to their `main_encoding` equivalents. RRs windows-1252t windows_1252sPWindows-1252 and ISO-8859-1 are the only currently supported embedded encodings.Rsutf-8s4UTF-8 is the only currently supported main encoding.iiiR (s windows-1252R(Rsutf-8( R-RDtNotImplementedErrorRWRUR\tordtFIRST_MULTIBYTE_MARKERtLAST_MULTIBYTE_MARKERtMULTIBYTE_MARKERS_AND_SIZEStWINDOWS_1252_TO_UTF8RR( R'tin_bytest main_encodingtembedded_encodingt byte_chunkst chunk_starttpostbytetstarttendtsize((s./usr/lib/python2.7/site-packages/bs4/dammit.pyt detwingle s<         N(RR(RR(RR(RR(RR(RR(RR(RR(RR(RR(RR(RR(s#x17DR(RR(RR(RR(RR(RR(RR(RR(RR(RR(RR(RR(RR(s#x17ER(RR (RR(iii(iii(iii(R;R<R=RRRR@RMR~RrRRfRRRRyRwRRRRR?R(((s./usr/lib/python2.7/site-packages/bs4/dammit.pyRgEsd   1        (((R=t __license__RthtmlentitydefsRRRltstringRt chardet_typeRRt ImportErrorRt iconv_codecRRxRR]R_tobjectR RARg(((s./usr/lib/python2.7/site-packages/bs4/dammit.pyts6