>>> import pandas as pd >>> obj1 = pd.Series(['a', 'b']) >>> obj2 = pd.Series(['c', 'd']) >>> pd.concat([obj1, obj2]) 0 a 1 b 0 c 1 d dtype: object
设置 ignore_index=True,放弃原有的索引值:
1 2 3 4 5 6 7 8 9
>>> import pandas as pd >>> obj1 = pd.Series(['a', 'b']) >>> obj2 = pd.Series(['c', 'd']) >>> pd.concat([obj1, obj2], ignore_index=True) 0 a 1 b 2 c 3 d dtype: object
设置 keys 参数,添加最外层的索引:
1 2 3 4 5 6 7 8 9
>>> import pandas as pd >>> obj1 = pd.Series(['a', 'b']) >>> obj2 = pd.Series(['c', 'd']) >>> pd.concat([obj1, obj2], keys=['s1', 's2']) s1 0 a 1 b s2 0 c 1 d dtype: object
设置 names 参数,为索引添加标签:
1 2 3 4 5 6 7 8 9 10
>>> import pandas as pd >>> obj1 = pd.Series(['a', 'b']) >>> obj2 = pd.Series(['c', 'd']) >>> pd.concat([obj1, obj2], keys=['s1', 's2'], names=['Series name', 'Row ID']) Series name Row ID s1 0 a 1 b s2 0 c 1 d dtype: object
合并 DataFrame 对象:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
>>> import pandas as pd >>> obj1 = pd.DataFrame([['a', 1], ['b', 2]], columns=['letter', 'number']) >>> obj2 = pd.DataFrame([['c', 3], ['d', 4]], columns=['letter', 'number']) >>> obj1 letter number 0 a 1 1 b 2 >>> >>> obj2 letter number 0 c 3 1 d 4 >>> >>> pd.concat([obj1, obj2]) letter number 0 a 1 1 b 2 0 c 3 1 d 4
合并 DataFrame 对象,不存在的值将会被 NaN 填充:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
>>> import pandas as pd >>> obj1 = pd.DataFrame([['a', 1], ['b', 2]], columns=['letter', 'number']) >>> obj2 = pd.DataFrame([['c', 3, 'cat'], ['d', 4, 'dog']], columns=['letter', 'number', 'animal']) >>> obj1 letter number 0 a 1 1 b 2 >>> >>> obj2 letter number animal 0 c 3 cat 1 d 4 dog >>> >>> pd.concat([obj1, obj2]) letter number animal 0 a 1 NaN 1 b 2 NaN 0 c 3 cat 1 d 4 dog
合并 DataFrame 对象,设置 join="inner" 不存在的列将会舍弃:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
>>> import pandas as pd >>> obj1 = pd.DataFrame([['a', 1], ['b', 2]], columns=['letter', 'number']) >>> obj2 = pd.DataFrame([['c', 3, 'cat'], ['d', 4, 'dog']], columns=['letter', 'number', 'animal']) >>> obj1 letter number 0 a 1 1 b 2 >>> >>> obj2 letter number animal 0 c 3 cat 1 d 4 dog >>> >>> pd.concat([obj1, obj2], join="inner") letter number 0 a 1 1 b 2 0 c 3 1 d 4
合并 DataFrame 对象,设置 axis=1 沿 y 轴合并(增加列):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
>>> import pandas as pd >>> obj1 = pd.DataFrame([['a', 1], ['b', 2]], columns=['letter', 'number']) >>> obj2 = pd.DataFrame([['bird', 'polly'], ['monkey', 'george']], columns=['animal', 'name']) >>> obj1 letter number 0 a 1 1 b 2 >>> >>> obj2 animal name 0 bird polly 1 monkey george >>> >>> pd.concat([obj1, obj2], axis=1) letter number animal name 0 a 1 bird polly 1 b 2 monkey george
>>> obj1 = pd.DataFrame([['a', 3], ['d', 2]], columns=['letter', 'number']) >>> obj2 = pd.DataFrame([['c', 1, 'cat'], ['b', 4, 'dog']], columns=['letter', 'number', 'animal']) >>> obj1 letter number 0 a 3 1 d 2 >>> >>> obj2 letter number animal 0 c 1 cat 1 b 4 dog >>> >>> pd.concat([obj1, obj2], sort=True) animal letter number 0 NaN a 3 1 NaN d 2 0 cat c 1 1 dog b 4
【02x00】append
Append 方法事实上是在一个 Series / DataFrame 对象后最追加另一个 Series / DataFrame 对象并返回一个新对象,不改变原对象的值。
>>> import pandas as pd >>> obj1 = pd.DataFrame({'lkey': ['b', 'b', 'a', 'c', 'a', 'a', 'b'], 'data1': range(7)}) >>> obj2 = pd.DataFrame({'rkey': ['a', 'b', 'd'], 'data2': range(3)}) >>> obj1 lkey data1 0 b 0 1 b 1 2 a 2 3 c 3 4 a 4 5 a 5 6 b 6 >>> >>> obj2 rkey data2 0 a 0 1 b 1 2 d 2 >>> >>> pd.merge(obj1, obj2, left_on='lkey', right_on='rkey') lkey data1 rkey data2 0 b 0 b 1 1 b 1 b 1 2 b 6 b 1 3 a 2 a 0 4 a 4 a 0 5 a 5 a 0
【03x05】参数 how
在前面的示例中,结果里面 c 和 d 以及与之相关的数据消失了。默认情况下,merge 做的是内连接('inner'),结果中的键是交集。其他方式还有:'left'、'right'、'outer',含义如下:
>>> import pandas as pd >>> obj1 = pd.DataFrame({'key': ['b', 'b', 'a', 'c', 'a', 'a', 'b'], 'data1': range(7)}) >>> obj2 = pd.DataFrame({'key': ['a', 'b', 'd'], 'data2': range(3)}) >>> obj1 key data1 0 b 0 1 b 1 2 a 2 3 c 3 4 a 4 5 a 5 6 b 6 >>> >>> obj2 key data2 0 a 0 1 b 1 2 d 2 >>> >>> pd.merge(obj1, obj2, on='key', how='inner') key data1 data2 0 b 01 1 b 11 2 b 61 3 a 20 4 a 40 5 a 50 >>> >>> pd.merge(obj1, obj2, on='key', how='outer') key data1 data2 0 b 0.01.0 1 b 1.01.0 2 b 6.01.0 3 a 2.00.0 4 a 4.00.0 5 a 5.00.0 6 c 3.0 NaN 7 d NaN 2.0 >>> >>> pd.merge(obj1, obj2, on='key', how='left') key data1 data2 0 b 01.0 1 b 11.0 2 a 20.0 3 c 3 NaN 4 a 40.0 5 a 50.0 6 b 61.0 >>> >>> pd.merge(obj1, obj2, on='key', how='right') key data1 data2 0 b 0.01 1 b 1.01 2 b 6.01 3 a 2.00 4 a 4.00 5 a 5.00 6 d NaN 2
>>> import pandas as pd >>> left = pd.DataFrame({'key1': ['foo', 'foo', 'bar'], 'key2': ['one', 'two', 'one'], 'lval': [1, 2, 3]}) >>> right = pd.DataFrame({'key1': ['foo', 'foo', 'bar', 'bar'], 'key2': ['one', 'one', 'one', 'two'], 'rval': [4, 5, 6, 7]}) >>> left key1 key2 lval 0 foo one 1 1 foo two 2 2 bar one 3 >>> >>> right key1 key2 rval 0 foo one 4 1 foo one 5 2 bar one 6 3 bar two 7 >>> >>> pd.merge(left, right, on='key1') key1 key2_x lval key2_y rval 0 foo one 1 one 4 1 foo one 1 one 5 2 foo two 2 one 4 3 foo two 2 one 5 4 bar one 3 one 6 5 bar one 3 two 7 >>> >>> pd.merge(left, right, on='key1', suffixes=('_left', '_right')) key1 key2_left lval key2_right rval 0 foo one 1 one 4 1 foo one 1 one 5 2 foo two 2 one 4 3 foo two 2 one 5 4 bar one 3 one 6 5 bar one 3 two 7
>>> import pandas as pd >>> left = pd.DataFrame({'key': ['a', 'b', 'a', 'a', 'b', 'c'], 'value': range(6)}) >>> right = pd.DataFrame({'group_val': [3.5, 7]}, index=['a', 'b']) >>> left key value 0 a 0 1 b 1 2 a 2 3 a 3 4 b 4 5 c 5 >>> >>> right group_val a 3.5 b 7.0 >>> >>> pd.merge(left, right, left_on='key', right_index=True) key value group_val 0 a 03.5 2 a 23.5 3 a 33.5 1 b 17.0 4 b 47.0