파이썬 그룹
인덱스 0 이 값이고 인덱스 1 이 유형 인 데이터 쌍 세트가 있다고 가정하십시오 .
input = [
('11013331', 'KAT'),
('9085267', 'NOT'),
('5238761', 'ETH'),
('5349618', 'ETH'),
('11788544', 'NOT'),
('962142', 'ETH'),
('7795297', 'ETH'),
('7341464', 'ETH'),
('9843236', 'KAT'),
('5594916', 'ETH'),
('1550003', 'ETH')
]
다음과 같이 유형별로 (첫 번째 색인 문자열로) 그룹화하고 싶습니다.
result = [
{
type:'KAT',
items: ['11013331', '9843236']
},
{
type:'NOT',
items: ['9085267', '11788544']
},
{
type:'ETH',
items: ['5238761', '962142', '7795297', '7341464', '5594916', '1550003']
}
]
효율적인 방법으로이를 달성하려면 어떻게해야합니까?
2 단계로 수행하십시오. 먼저 사전을 만듭니다.
>>> input = [('11013331', 'KAT'), ('9085267', 'NOT'), ('5238761', 'ETH'), ('5349618', 'ETH'), ('11788544', 'NOT'), ('962142', 'ETH'), ('7795297', 'ETH'), ('7341464', 'ETH'), ('9843236', 'KAT'), ('5594916', 'ETH'), ('1550003', 'ETH')]
>>> from collections import defaultdict
>>> res = defaultdict(list)
>>> for v, k in input: res[k].append(v)
...
그런 다음 해당 사전을 예상 형식으로 변환하십시오.
>>> [{'type':k, 'items':v} for k,v in res.items()]
[{'items': ['9085267', '11788544'], 'type': 'NOT'}, {'items': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'], 'type': 'ETH'}, {'items': ['11013331', '9843236'], 'type': 'KAT'}]
itertools.groupby에서도 가능하지만 입력을 먼저 정렬해야합니다.
>>> sorted_input = sorted(input, key=itemgetter(1))
>>> groups = groupby(sorted_input, key=itemgetter(1))
>>> [{'type':k, 'items':[x[0] for x in v]} for k, v in groups]
[{'items': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'], 'type': 'ETH'}, {'items': ['11013331', '9843236'], 'type': 'KAT'}, {'items': ['9085267', '11788544'], 'type': 'NOT'}]
이 두 가지 모두 원래 키 순서를 따르지 않습니다. 주문을 유지하려면 OrderedDict가 필요합니다.
>>> from collections import OrderedDict
>>> res = OrderedDict()
>>> for v, k in input:
... if k in res: res[k].append(v)
... else: res[k] = [v]
...
>>> [{'type':k, 'items':v} for k,v in res.items()]
[{'items': ['11013331', '9843236'], 'type': 'KAT'}, {'items': ['9085267', '11788544'], 'type': 'NOT'}, {'items': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'], 'type': 'ETH'}]
파이썬의 내장 itertools
모듈에는 실제로 groupby
함수가 있지만 그룹화 할 요소는 목록에서 연속되도록 그룹화 할 요소를 먼저 정렬해야합니다.
from operator import itemgetter
sortkeyfn = itemgetter(1)
input = [('11013331', 'KAT'), ('9085267', 'NOT'), ('5238761', 'ETH'),
('5349618', 'ETH'), ('11788544', 'NOT'), ('962142', 'ETH'), ('7795297', 'ETH'),
('7341464', 'ETH'), ('9843236', 'KAT'), ('5594916', 'ETH'), ('1550003', 'ETH')]
input.sort(key=sortkeyfn)
이제 입력은 다음과 같습니다.
[('5238761', 'ETH'), ('5349618', 'ETH'), ('962142', 'ETH'), ('7795297', 'ETH'),
('7341464', 'ETH'), ('5594916', 'ETH'), ('1550003', 'ETH'), ('11013331', 'KAT'),
('9843236', 'KAT'), ('9085267', 'NOT'), ('11788544', 'NOT')]
groupby
returns a sequence of 2-tuples, of the form (key, values_iterator)
. What we want is to turn this into a list of dicts where the 'type' is the key, and 'items' is a list of the 0'th elements of the tuples returned by the values_iterator. Like this:
from itertools import groupby
result = []
for key,valuesiter in groupby(input, key=sortkeyfn):
result.append(dict(type=key, items=list(v[0] for v in valuesiter)))
Now result
contains your desired dict, as stated in your question.
You might consider, though, just making a single dict out of this, keyed by type, and each value containing the list of values. In your current form, to find the values for a particular type, you'll have to iterate over the list to find the dict containing the matching 'type' key, and then get the 'items' element from it. If you use a single dict instead of a list of 1-item dicts, you can find the items for a particular type with a single keyed lookup into the master dict. Using groupby
, this would look like:
result = {}
for key,valuesiter in groupby(input, key=sortkeyfn):
result[key] = list(v[0] for v in valuesiter)
result
now contains this dict (this is similar to the intermediate res
defaultdict in @KennyTM's answer):
{'NOT': ['9085267', '11788544'],
'ETH': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'],
'KAT': ['11013331', '9843236']}
(If you want to reduce this to a one-liner, you can:
result = dict((key,list(v[0] for v in valuesiter)
for key,valuesiter in groupby(input, key=sortkeyfn))
or using the newfangled dict-comprehension form:
result = {key:list(v[0] for v in valuesiter)
for key,valuesiter in groupby(input, key=sortkeyfn)}
The following function will quickly (no sorting required) group tuples of any length by a key having any index:
# given a sequence of tuples like [(3,'c',6),(7,'a',2),(88,'c',4),(45,'a',0)],
# returns a dict grouping tuples by idx-th element - with idx=1 we have:
# if merge is True {'c':(3,6,88,4), 'a':(7,2,45,0)}
# if merge is False {'c':((3,6),(88,4)), 'a':((7,2),(45,0))}
def group_by(seqs,idx=0,merge=True):
d = dict()
for seq in seqs:
k = seq[idx]
v = d.get(k,tuple()) + (seq[:idx]+seq[idx+1:] if merge else (seq[:idx]+seq[idx+1:],))
d.update({k:v})
return d
In the case of your question, the index of key you want to group by is 1, therefore:
group_by(input,1)
gives
{'ETH': ('5238761','5349618','962142','7795297','7341464','5594916','1550003'),
'KAT': ('11013331', '9843236'),
'NOT': ('9085267', '11788544')}
which is not exactly the output you asked for, but might as well suit your needs.
I also liked pandas simple grouping. it's powerful, simple and most adequate for large data set
result = pandas.DataFrame(input).groupby(1).groups
result = []
# Make a set of your "types":
input_set = set([tpl[1] for tpl in input])
>>> set(['ETH', 'KAT', 'NOT'])
# Iterate over the input_set
for type_ in input_set:
# a dict to gather things:
D = {}
# filter all tuples from your input with the same type as type_
tuples = filter(lambda tpl: tpl[1] == type_, input)
# write them in the D:
D["type"] = type_
D["itmes"] = [tpl[0] for tpl in tuples]
# append D to results:
result.append(D)
result
>>> [{'itmes': ['9085267', '11788544'], 'type': 'NOT'}, {'itmes': ['5238761', '5349618', '962142', '7795297', '7341464', '5594916', '1550003'], 'type': 'ETH'}, {'itmes': ['11013331', '9843236'], 'type': 'KAT'}]
참고URL : https://stackoverflow.com/questions/3749512/python-group-by
'Programming' 카테고리의 다른 글
Firefox에서 개발 된 Javascript가 IE에서 실패하는 일반적인 이유는 무엇입니까? (0) | 2020.08.05 |
---|---|
ASP.Net 오류 : " 'foo'형식이"temp1.dll "과"temp2.dll에 모두 있습니다 " (0) | 2020.08.05 |
테스트 목적으로 브라우저에서 CSS를 비활성화하는 방법 (0) | 2020.08.05 |
배쉬 평등 연산자 (==, -eq) (0) | 2020.08.05 |
ES6 구문을 사용하여 jquery를 가져 오는 방법은 무엇입니까? (0) | 2020.08.05 |