U .e(!@sddlmZmZmZddlmZmZddlmZm Z ddl Z ddl Z ddl m Z ddlmZmZmZmZddlmZdd lmZdd lmZzdd lmZWnek reZYnXed d eDZedd eDZedd eDZeeddgBZdZej rJeddkr&e!ddks*t"e #edde$ddZ%n e #eZ%e&ddddddddddd d!d"d#d$d%d&d'd(d)d*d+d,d-d.d/d0d1d2d3d4d5g Z'e #d6Z(iZ)Gd7d8d8e*Z+d9d:Z,Gd;d<dd>e-Z.Gd?d@d@e/Z0GdAdBdBe*Z1GdCdDdDe*Z2dEdFZ3dS)G)absolute_importdivisionunicode_literals) text_type binary_type) http_clienturllibN) webencodings)EOFspaceCharacters asciiLettersasciiUppercase)_ReparseException)_utils)StringIO)BytesIOcCsg|]}|dqSasciiencode.0itemrE/usr/lib/python3.8/site-packages/pip/_vendor/html5lib/_inputstream.py srcCsg|]}|dqSrrrrrrrscCsg|]}|dqSrrrrrrrs>.)sumr#r%rrrr,^szBufferedStream._bufferedBytescCs<|j|}|j||jdd7<t||jd<|Sr')r"r4r#appendr$r()r%r3datarrrr1as   zBufferedStream._readStreamcCs|}g}|jd}|jd}|t|jkr|dkr|dks>t|j|}|t||krl|}|||g|_n"t||}|t|g|_|d7}|||||||8}d}q|r|||d|S)Nrr )r$r(r#r-r7r1join)r%r3ZremainingBytesrvZ bufferIndexZ bufferOffsetZ bufferedDataZ bytesToReadrrrr2hs&     zBufferedStream._readFromBufferN) __name__ __module__ __qualname____doc__r&r+r0r4r,r1r2rrrrr!9s  r!cKst|tjs(t|tjjr.t|jtjr.d}n&t|drJt|dt }n t|t }|rdd|D}|rvt d|t |f|St |f|SdS)NFr4rcSsg|]}|dr|qS)Z _encoding)endswith)rxrrrrs z#HTMLInputStream..z3Cannot set an encoding with a unicode input, set %r) isinstancerZ HTTPResponserZresponseZaddbasefphasattrr4r TypeErrorHTMLUnicodeInputStreamHTMLBinaryInputStream)sourcekwargsZ isUnicodeZ encodingsrrrHTMLInputStreams       rJc@speZdZdZdZddZddZddZd d Zd d Z d dZ dddZ ddZ ddZ dddZddZdS)rFProvides a unicode stream of characters to the HTMLTokenizer. This class takes care of character encoding and removing or replacing incorrect byte-sequences and also provides column and line tracking. i(cCsZtjsd|_ntddkr$|j|_n|j|_dg|_tddf|_| ||_ | dS)Initialises the HTMLInputStream. HTMLInputStream(source, [encoding]) -> Normalized stream from source for use by html5lib. source can be either a file-object, local filename or a string. The optional encoding parameter must be a string that indicates the encoding. If specified, that encoding will be used, regardless of any BOM or later declaration (such as in a meta element) Nu􏿿r rutf-8certain) rsupports_lone_surrogatesreportCharacterErrorsr(characterErrorsUCS4characterErrorsUCS2ZnewLineslookupEncoding charEncoding openStream dataStreamreset)r%rHrrrr&s   zHTMLUnicodeInputStream.__init__cCs.d|_d|_d|_g|_d|_d|_d|_dS)Nr)r* chunkSize chunkOffseterrors prevNumLines prevNumCols_bufferedCharacterr6rrrrWszHTMLUnicodeInputStream.resetcCst|dr|}nt|}|SzvProduces a file object from source. source can be either a file object, local filename or a string. r4)rDrr%rHr"rrrrUs z!HTMLUnicodeInputStream.openStreamcCsT|j}|dd|}|j|}|dd|}|dkr@|j|}n ||d}||fS)N rrr )r*countr\rfindr])r%r.r*ZnLinesZ positionLineZ lastLinePosZpositionColumnrrr _positions   z HTMLUnicodeInputStream._positioncCs||j\}}|d|fS)z:Returns (line, col) of the current position in the stream.r )rdrZ)r%linecolrrrr$szHTMLUnicodeInputStream.positioncCs6|j|jkr|stS|j}|j|}|d|_|S)zo Read one character from the stream or queue if available. Return EOF when EOF is reached. r )rZrY readChunkr r*)r%rZcharrrrrhs   zHTMLUnicodeInputStream.charNcCs|dkr|j}||j\|_|_d|_d|_d|_|j|}|j rX|j |}d|_ n|s`dSt |dkrt |d}|dksd|krdkrnn|d|_ |dd}|j r| || d d }| d d }||_t ||_d S) NrXrFr r iz ra T)_defaultChunkSizerdrYr\r]r*rZrVr4r^r(ordrPreplace)r%rYr8Zlastvrrrrgs0           z HTMLUnicodeInputStream.readChunkcCs(ttt|D]}|jdqdS)Ninvalid-codepoint)ranger(invalid_unicode_refindallr[r7)r%r8_rrrrQ%sz*HTMLUnicodeInputStream.characterErrorsUCS4cCsd}t|D]}|rqt|}|}t|||drrt|||d}|tkrl|j dd}q|dkr|dkr|t |dkr|j dqd}|j dqdS)NFroTrjir ) rqfinditerrmgroupstartrZisSurrogatePairZsurrogatePairToCodepointnon_bmp_invalid_codepointsr[r7r()r%r8skipmatchZ codepointr)Zchar_valrrrrR)s"  z*HTMLUnicodeInputStream.characterErrorsUCS2Fc Cszt||f}Wnhtk rx|D]}t|dks$tq$ddd|D}|sZd|}td|}t||f<YnXg}||j|j }|dkr|j |j krqn0| }||j kr| |j|j |||_ q| |j|j d| s~qq~d|} | S)z Returns a string of characters from the stream up to but not including any character in 'characters' or EOF. 'characters' must be a container that supports the 'in' method and iteration over its characters. rXcSsg|]}dt|qS)z\x%02x)rm)rcrrrrNsz5HTMLUnicodeInputStream.charsUntil..z^%sz[%s]+N)charsUntilRegExKeyErrorrmr-r:recompilerzr*rZrYendr7rg) r%Z charactersZoppositecharsr|Zregexr;mrrrrr charsUntil@s0    z!HTMLUnicodeInputStream.charsUntilcCsT|dk rP|jdkr.||j|_|jd7_n"|jd8_|j|j|ksPtdSr')rZr*rYr-)r%rhrrrungetos   zHTMLUnicodeInputStream.unget)N)F)r<r=r>r?rlr&rWrUrdr$rhrgrQrRrrrrrrrFs   & /rFc@sLeZdZdZdddZddZd d Zdd d Zd dZddZ ddZ dS)rGrKN windows-1252TcCsn|||_t||jd|_d|_||_||_||_||_ ||_ | ||_ |j ddk sbt |dS)rLidrN)rU rawStreamrFr& numBytesMetanumBytesChardetoverride_encodingtransport_encodingsame_origin_parent_encodinglikely_encodingdefault_encodingdetermineEncodingrTr-rW)r%rHrrrrrZ useChardetrrrr&s  zHTMLBinaryInputStream.__init__cCs&|jdj|jd|_t|dS)Nrrn)rTZ codec_info streamreaderrrVrFrWr6rrrrWszHTMLBinaryInputStream.resetcCsDt|dr|}nt|}z||Wnt|}YnX|Sr_)rDrr0r+r!r`rrrrUs z HTMLBinaryInputStream.openStreamcCs|df}|ddk r|St|jdf}|ddk r:|St|jdf}|ddk rX|S|df}|ddk rt|St|jdf}|ddk r|djds|St|jdf}|ddk r|S|rpzddl m }Wnt k rYnXg}|}|j s<|j |j}t|tst|s&q<||||q|t|jd}|j d|dk rp|dfSt|jdf}|ddk r|StddfS)NrNrZ tentativezutf-16)UniversalDetectorencodingr) detectBOMrSrrdetectEncodingMetarname startswithrZ%pip._vendor.chardet.universaldetectorr ImportErrorZdonerr4rrBr3r-r7Zfeedcloseresultr0r)r%ZchardetrTrZbuffersZdetectorr#rrrrrsR           z'HTMLBinaryInputStream.determineEncodingcCs|jddkstt|}|dkr&dS|jdkrFtd}|dk stnT||jdkrf|jddf|_n4|jd|df|_|td|jd|fdS)Nr rNutf-16beutf-16lerMrzEncoding changed from %s to %s)rTr-rSrrr0rWr)r%Z newEncodingrrrchangeEncodings   z$HTMLBinaryInputStream.changeEncodingc Cstjdtjdtjdtjdtjdi}|jd}t|t sr?r&rWrUrrrrrrrrrGs * >"rGc@seZdZdZddZddZddZdd Zd d Zd d Z ddZ ddZ e e e Z ddZe eZefddZddZddZddZdS) EncodingByteszString-like object with an associated position and various extra methods If the position is ever greater than the string length then an exception is raisedcCst|tstt||SN)rBr3r-__new__lowerr%valuerrrrLszEncodingBytes.__new__cCs d|_dS)Nr)rdrrrrr&PszEncodingBytes.__init__cCs|Srrr6rrr__iter__TszEncodingBytes.__iter__cCs>|jd}|_|t|kr"tn |dkr.t|||dS)Nr rrdr( StopIterationrEr%prrr__next__Ws  zEncodingBytes.__next__cCs|Sr)rr6rrrnext_szEncodingBytes.nextcCsB|j}|t|krtn |dkr$t|d|_}|||dSr'rrrrrpreviouscs zEncodingBytes.previouscCs|jt|krt||_dSrrdr(r)r%r$rrr setPositionlszEncodingBytes.setPositioncCs*|jt|krt|jdkr"|jSdSdS)Nrrr6rrr getPositionqs  zEncodingBytes.getPositioncCs||j|jdSNr )r$r6rrrgetCurrentByte{szEncodingBytes.getCurrentBytecCsH|j}|t|kr>|||d}||kr4||_|S|d7}q||_dS)zSkip past a list of charactersr Nr$r(rdr%rrr|rrrrys  zEncodingBytes.skipcCsH|j}|t|kr>|||d}||kr4||_|S|d7}q||_dSrrrrrr skipUntils  zEncodingBytes.skipUntilcCs>|j}|||t|}||}|r:|jt|7_|S)zLook for a sequence of bytes at the start of a string. If the bytes are found return True and advance the position to the byte after the match. Otherwise return False and leave the position alone)r$r(r)r%r3rr8r;rrr matchBytess  zEncodingBytes.matchBytescCsR||jd|}|dkrJ|jdkr,d|_|j|t|d7_dStdS)zLook for the next sequence of bytes matching a given sequence. If a match is found advance the position to the last byte of the matchNrrr T)r$findrdr(r)r%r3Z newPositionrrrjumpTos zEncodingBytes.jumpToN)r<r=r>r?rr&rrrrrrpropertyr$r currentBytespaceCharactersBytesryrrrrrrrrHs      rc@sXeZdZdZddZddZddZdd Zd d Zd d Z ddZ ddZ ddZ dS)rz?Mini parser for detecting character encoding from meta elementscCst||_d|_dS)z3string - the data to work on for encoding detectionN)rr8rr%r8rrrr&s zEncodingParser.__init__c Csd|jfd|jfd|jfd|jfd|jfd|jff}|jD]Z}d}|D]D\}}|j|rFz|}WqWqFtk rd}YqYqFXqF|s:qq:|jS) Nsr8rr6rrrrszEncodingParser.handleCommentcCs|jjtkrdSd}d}|}|dkr,dS|ddkr\|ddk}|r|dk r||_dSq|ddkr|d}t|}|dk r||_dSq|ddkrtt|d}|}|dk rt|}|dk r|r||_dS|}qdS) NTFrs http-equivr s content-typecharsetscontent) r8rr getAttributerrSContentAttrParserrparse)r%Z hasPragmaZpendingEncodingattrZtentativeEncodingcodecZ contentParserrrrrs8      zEncodingParser.handleMetacCs |dS)NF)handlePossibleTagr6rrrrsz%EncodingParser.handlePossibleStartTagcCst|j|dS)NT)rr8rr6rrrrs z#EncodingParser.handlePossibleEndTagcCsb|j}|jtkr(|r$||dS|t}|dkrD|n|}|dk r^|}qLdS)NTr)r8rasciiLettersBytesrrrspacesAngleBracketsr)r%ZendTagr8r|rrrrrs    z EncodingParser.handlePossibleTagcCs |jdS)Nrrr6rrrrszEncodingParser.handleOthercCs|j}|ttdgB}|dks2t|dks2t|dkr>dSg}g}|dkrV|rVqnX|tkrj|}qnD|dkrd|dfS|tkr|| n|dkrdS||t |}qF|dkr| d|dfSt ||}|dkrJ|}t |}||kr"t |d|d|fS|tkr<|| q||qnJ|d krbd|dfS|tkr||| n|dkrdS||t |}|t krd|d|fS|tkr|| n|dkrdS||qdS) z_Return a name,value pair for the next attribute in the stream, if one is found, or None/Nr )rN=)rrr9)'"r) r8ryr frozensetr(r-r:asciiUppercaseBytesr7rrrr)r%r8r|ZattrNameZ attrValueZ quoteCharrrrrsb             zEncodingParser.getAttributeN) r<r=r>r?r&rrrrrrrrrrrrrs$rc@seZdZddZddZdS)rcCst|tst||_dSr)rBr3r-r8rrrrr&fszContentAttrParser.__init__cCsz|jd|jjd7_|j|jjdksr&rrrrrresrcCsft|tr0z|d}Wntk r.YdSX|dk r^z t|WStk rZYdSXndSdS)z{Return the python codec name corresponding to an encoding or None if the string doesn't correspond to a valid encoding.rN)rBrdecodeUnicodeDecodeErrorr lookupAttributeError)rrrrrSs   rS)4Z __future__rrrZpip._vendor.sixrrZpip._vendor.six.movesrrrrZ pip._vendorr Z constantsr r r rrrXriorrrrrrrrZinvalid_unicode_no_surrogaterOrbr-revalrqsetrxZascii_punctuation_rer}objectr!rJrFrGr3rrrrSrrrrs     "   JgIh6'