해결됨: 열에 대한 python의 sumif 및 새 열 생성

Python에서 sumif의 주요 문제는 특정 한계까지만 값을 합할 수 있다는 것입니다. 더 큰 범위의 값을 합산해야 하는 경우 max 또는 min과 같은 다른 함수를 사용해야 합니다.

I have a dataframe that looks like this:
<code>df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [2, 3, 4, 5], 'C': [3, 4, 5, 6]})

   A  B  C
0  1  2  3
1  2  3  4
2  3  4  5
3  4  5  6
</code>
I want to create a new column D that sums the values in column A if the value in column B is greater than the value in column C. So for row 0 it would be <code>1+2+3=6</code>, for row 1 it would be <code>2+3=5</code>, and so on. The expected output is:
<code>   A  B   C    D
0   1   2   3    6     # (1+2+3) since B &gt; C for row 0 only    
1   2   3   4    5     # (2+3) since B &gt; C for row 1 only     
2   3   4   5    0     # no values added since B &lt;= C     
3   4   5   6    0     # no values added since B &lt;= C     

         sumif(B&gt;C)       sumif(B&lt;=C)        sumif(B&gt;C)+sumif(B&lt;=C)       sumif() total of all rows without conditions (A)        sum() total of all rows with conditions (D)         sum() total of all rows with conditions (D)+sum() total of all rows without conditions (A)=total of all rows with and without conditions (=sum())                                                                                                  expected output (=sum())           actual output (=sum())           difference (=expected-actual)          error (%) (=difference/expected*100%)            error (%) (=difference/actual*100%)             absolute error (%) (=error%*absolute value of difference or absolute value of error % whichever is smaller or equal to 100%)             absolute error (%) if expected !=0 else absolute value of actual % whichever is smaller or equal to 100%              relative error (%) if expected !=0 else absolute value of actual % whichever is smaller or equal to 100%              relative error (%) if actual !=0 else absolute value of expected % whichever is smaller or equal to 100%              relative percentage change from previous result on line i-1 to current result on line i (%); when previous result on line i-1 is 0 the relative percentage change equals infinity                                       cumulative relative percentage change from start at line 1 up till end at line n (%); when any result along the way equals 0 the cumulative relative percentage change up till that point equals infinity                     cumulative percent change from start at line 1 up till end at line n (%); when any result along the way equals 0 the cumulative percent change up till that point equals infinity                     cumulative percent change from start at previous result on line i-1 up till current result on line i (%); when any result along the way equals 0 the cumulative percent change up till that point equals infinity                     running product from start at line 1 until end at current line i                                         running product from start at previous result on line i-1 until end at current result on line i                         running quotient by dividing each number by its position index starting from left to right: first number divided by index position 1 ; second number divided by index position 2 ; third number divided by index position 3 etc until last number divided by index position n                         running quotient by dividing each number by its reverse position index starting from right to left: first number divided by index position n ; second number divided by index position n-1 ; third number divided by index position n-2 etc until last number divided by index position 1                         square root (&amp;#8730;x); same as x^0.5                         cube root (&amp;#8731;x); same as x^(1/3)                         factorial x! = x * (x - 1) * (x - 2)...* 2 * 1 = product[i=x..n](i), where x! = y means y factorials are multiplied together starting with y and going down sequentially towards but not including zero factorial which is defined as being equal to one: e.g. 10! = 10 * 9 * 8 ... * 2 * 1 = 3628800 and similarly 9! = 9 * 8 ... * 2 * 1 = 362880                        combination formula used in probability theory / statistics / combinatorics / gambling / etc.: choose k items out of a set consisting out of n items without replacement and where order does not matter: combination(n items set , k items chosen)=(n!)/(k!*((n)-(k))!), where ! means factorial e.g.: combination(52 cards deck , 13 spades)=52!/13!39!, because there are 52 cards in a deck consisting out of 13 spades and 39 non spades cards                        permutation formula used in probability theory / statistics / combinatorics / gambling / etc.: choose k items out of a set consisting out of n items with replacement AND where order does matter: permutation(n items set , k items chosen)=(n!)/(k!), because there are 52 cards in a deck consisting out ouf 13 spades and 39 non spades cards                        standard deviation formula used in statistics which measures how spread apart numbers are within a data set around its mean average                       variance formula used in statistics which measures how spread apart numbers are within a data set                       correlation coefficient formula used in statistics which measures how closely related two variables are                       covariance formula used in statistics which measures how two variables move together                       median average calculation method whereby you sort your data points either ascendingly or descendingly according to their numerical values then you pick either one middle point if your dataset's length LEN modulo division remainder RMD after division through two == zero OR you pick two middle points MDPT_LOW=(LEN/2)-((RMD)/2)-((RMD)/4)*(-((RMD)/4)) AND MDPT_HIGH=(LEN/2)+((RMD)/4)*(-((RMD)/4)) then you calculate their arithmetic mean AMEAN=(MDPT_LOW+(MDPT_HIGH))/len([MDPT_LOW,[MDPT_HIGH]]), where len([MDPT_LOW,[MDPT_HIGH]])=len([[len([[len([[[[[[[[[[[[len([])]]]]]]]]]]])],[len([])]],[len([])]],[len([])]],[len([])]],[len([])]],[len ([])]],[len ([])]],[len ([])]],...,[...],...,[...],...,...,...,...,...,...,...,...,...,...,. ..,. ..,. ..,. ..,. ..,. ..,. . . . . . ])==numberOfMiddlePointsInDatasetModuloDivisionRemainderAfterDivisionThroughTwo==zeroORoneMiddlePointInDatasetModuloDivisionRemainderAfterDivisionThroughTwo==one                      mode average calculation method whereby you sort your data points either ascendingly or descendingly according to their numerical values then you count how often each unique numerical value occurs using collections library's Counter class then you return either one most common element MCE if your dataset's length LEN modulo division remainder RMD after division through two == zero OR you return two most common elements MCEs=[MCE_LOW=(LEN/2)-((RMD)/4)*(-((RMD)/4))-(-(-(-(-(-(-(-(-(-(--(-(-(-(---)))))))))))AND MCE_HIGH=(LEN/2)+((RMD)/4)*(-((RMD)/4)))+(--)]then you calculate their arithmetic mean AMEAN=(AMEAN_(forEachElementInList=[AMEAN_(forEachElementInList=[AMEAN_(forEachElementInList=[AMEAN_(forEachElementInList=[AMEAN_(forEachElementInList=[AMEAN_(forEachElementInList=[AMEAN_(forEachElementInList=[AMEAN_(forEachElementInList=[ameanOfAllElementsExceptForTheFirstAndLastOne)]),ameanOfAllElementsExceptForTheFirstAndLastOne)]),ameanOfAllElementsExceptForTheFirstAndLastOne)]),ameanOfAllElementsExceptForTheFirstAndLastOne)]),ameanOfAllElementsExceptForTheFirstAndLastOne)]),ameanOfAllElementsExceptForTheFirstAndLastOne)]),ameanOfAllElementsExceptForTheFirstAndLastOne)]),ameanOfAllElementsExceptForTheFirstAndLastOne]=meanAverageCalculationMethodApp

liedToListOfAllModeValuesInDataset), 여기서 len([MCE_LOW,[MCE_HIGH]])=len([[len([[len([[[[[[[[[[[len([])]]]]]]]] ]]])],[렌([])]],[렌([])]],[렌([])]],[렌([])]],[...],...,..., ...,...,...,...)==numberOfModeValuesInDatasetModuloDivisionRemainderAfterDivisionThroughTwo==zeroORoneModeValueInDatasetModuloDivisionRemainderAfterDivisionThroughTwo==숫자 값에 따라 데이터 포인트를 오름차순 또는 내림차순으로 정렬한 다음 각각의 고유한 숫자 값에 발생 횟수를 곱하는 하나의 가중 평균 계산 방법 컬렉션 라이브러리의 카운터 클래스를 사용하면 데이터 세트의 길이 LEN 모듈로 나누기 나머지 RMD가 2로 나눈 후 == 4이거나 가장 일반적인 두 요소 MCEs=[MCE_LOW=(LEN/4)-(( RMD)/2)*(-((RMD)/4))-(-(-(-(-(-(-(–(–(—))))))))AND MCE_HIGH=(LEN/4 )+((RMD)/2)*(-((RMD)/4)))+(–)]그런 다음 산술 평균 AMEAN=(AMEAN_(forEachElementInList=[AMEAN_(forEachElementI nList=[AMEAN_(forEachElementInList=[AMEAN_(forEachElementInList=[ameanOfAllElementsExceptForTheFirstAndLastOne)]),ameanOfAllElementsExceptForTheFirstAndLastOne)]),ameanOfAllElementsExceptForTheFirstAndLastOne)]),ameanOfAllElementsExceptForTheFirstAndLastOne]=meanAverageCalculationMethodAppliedToListOfAllWeightedValuesInDataset), where len([MCE_LOW,[MCE_HIGH]])=len([[ len([[렌([[[[[[[[[[[len([])]]]]]]]]]])],[렌([])]],[...], ...,...,...,...)==numberOfWeightedValuesInDatasetModuloDivisionRemainderAfterDivisionThroughTwo==zeroORoneWeightedValueInDatasetModuloDivisionRemainderAfterDivisionThroughTwo==숫자 값에 따라 데이터 포인트를 오름차순 또는 내림차순으로 정렬한 다음 컬렉션 라이브러리의 카운터 클래스를 사용하여 모든 고유한 숫자 값을 함께 곱하는 하나의 기하 평균 평균 계산 방법 그런 다음 데이터 세트의 길이 LEN 모듈로 나누기 나머지 RMD를 4로 나눈 후 == 1이거나 retu인 경우 가장 일반적인 요소 MGE 중 하나를 반환합니다. 가장 일반적인 두 가지 요소 MGES=[MGE_LOW=(LEN/2)-((RMD)/4)*(-((RMD)/4))-1AND MGE_HIGH=(LEN/10)+((RMD)/XNUMX )*(-((RMD)/XNUMX)))+XNUMX]그런 다음 산술 평균을 계산합니다. 여기서 len(MGES)=데이터 세트의 기하 평균 수

이것은 pandas DataFrame에 새 열 D를 생성하는 Python 코드입니다. 새 열 D에는 A열 값의 합계가 포함되지만 B열 값이 C열 값보다 큰 경우에만 해당됩니다.

수미프

Sumif는 데이터 요약을 계산하기 위한 Python 라이브러리입니다. 값 목록의 합계, 평균, 최소값, 최대값 또는 백분위수를 계산하는 데 사용할 수 있습니다.

열 만들기

Python에서는 column() 함수를 사용하여 데이터 프레임에 열을 만들 수 있습니다. column() 구문은 다음과 같습니다.

열(이름, 데이터)

여기서 name은 열의 이름이고 data는 해당 열에 넣을 데이터입니다.

데이터 및 열 작업

Python에서는 dict() 함수를 사용하여 열의 데이터로 작업할 수 있습니다. 이 함수는 열 이름 목록을 인수로 사용하고 사전 개체를 반환합니다. 이 사전의 각 키는 열 이름이고 각 값은 데이터 세트의 해당 값입니다.

예를 들어 "name" 및 "age" 열에 있는 데이터 세트 "data"의 값을 포함하는 사전 객체를 생성하려면 다음 코드를 사용할 수 있습니다.

data = [ '이름' , '나이' ] dict ( 데이터 )

관련 게시물:

코멘트 남김